How we communicate what we do as a company both externally and internally is a vital component to who we are. How we have grown as a team has been greatly affected by the way we communicate data science and software development.
When I first started at Lofty Labs, I quickly realized that there was so much to learn, and understand in this industry. I started by writing out a list of terms that I would hear on a daily basis along with their definitions that I gathered by doing a search on Google as well as asking our development staff questions.
One of the first things I learned when joining Lofty Labs was that data scientists, software engineers, and architects have their own language, which they use to communicate. Sometimes the terminology has only subtle differences and nuances, so it is vital to understand the terms completely so that you can ensure that each member of our business and software development team fully understand the what is necessary in order to complete a project. In hopes of helping you and your team communicate the widely growing field of data science, I’ll share with you a few things that I have learned:
Definitions For Technical Engineering Terms
“Big data is an all-encompassing term for any collection of data sets so large or complex that it becomes difficult to process them using traditional data-processing applications.” Source: Wikipedia
Big data is often described in terms of volume, variety, velocity, variability, and accuracy. The biggest problem that comes with big data is the difficulty in collecting, storing, managing, analyzing and otherwise putting big data to work in creating the most important and usable value.
Cloud computing refers to the practice of using a network of remote servers to store, manage and process data (rather than an on-premise server or a personal computer) with access to such data provided through the Internet (the cloud). Programs, applications and other services may also be hosted in the cloud, which frees companies from the task and expense of building and maintaining data centers and other infrastructure.
A set of rules, policies, standards and models that govern and define the type of data collected and how it is used, stored, managed and integrated within an organization and its database systems.
ETL (Extract, Transform, Load)
ETL refers to the standard structured process for moving data from the source database to the data warehouse.
Data cleansing also called “data scrubbing”, is the process of detecting and correcting inaccurate data or records from a database. It may also involve correcting or removing improperly formatted or duplicate data or records. Such data removed in this process is often referred to as “dirty data.” Data cleansing is an essential task for preserving data quality.
Behavioral Analytics or Sentiment Analysis
As fairly new part of data science, behavioral analytics has played a vital role in the online growth of one of the world’s largest retailers. It brings understanding on what potential and current consumers a do, as well as shows and why they act in certain ways. It is particularly prevalent in the realm of eCommerce as well as brick and mortar retailing. In practice, behavioral analytics seeks to connect seemingly unrelated data points and explain or predict outcomes, future trends or the likelihood of certain events. At the heart of behavioral analytics is such data as online navigation paths, click throughs, social media interactions, purchases or shopping cart abandonment decisions.
This type of project management focuses on collaboration and open communication between group members. The agile software development process is more functional and less static. At Lofty, we use a combination of a software development methodology called Scrum that focuses on the agile approach along with the internal team methods that work best for us and our clients.
Data silos exist when data is isolated, typically lacking a key to link across data systems. With recent clients we have worked to break down data silos Siloed data poses special challenges when consolidating information across an organization.
Deduplication (Also known as “The Dedup Process")
The identification of duplicates within a set of data. Deduplication plays a vital role in cleaning up data and making sure that the insights from that data are accurate. Because the input of data usually comes in at the human level, there’s also a chance that someone will have forgotten the data they originally input, or even forgot their previous password and created a new account. Deduplication has played a major role in many of our projects.
AWS (Amazon Web Services)
AWS provides on demand cloud computing services. Along with that it provides services spanning a wide range, including compute, storage, networking, database, analytics, application services, deployment, management and much more. We here at Lofty Labs prefer Amazon Web Services as a great cloud alternative to locally warehousing all of our and our client’s data.
I highly recommend taking Amazon Web Services Business Professional Accreditation Course. It is a great way to break yourself into this industry if you are a newcomer or a way to retrain yourself on the most current technologies.
If you're an engineer consider validating your knowledge with an AWS architect, developer or operations certification. Learn how to prepare for your AWS Certification exams with these tips from our Director of Engineering.
In conclusion, as data continues to grow and the need to resolve large data issues increases, it is important to understand how data science, and the greater field of engineering, can play a role within your organization. Whether you have customer or personal data spread across multiple systems or are in need of reporting that data, we are here for you! Data analytics can help to break down any gaps or data silos and transform your data into actionable insights.
This post is just the tip of the iceberg on what there is to know about data science. As I continue to learn and as technology increases in this industry, I’ll keep you up-to-date.