Data Engineering

​Data Engineering

The key to understanding what data engineering lies in the “engineering” part. Engineers design and build things. “Data” engineers design and build pipelines that transform and transport data into a format wherein, by the time it reaches the Data Scientists or other end users, it is in a highly usable state. These pipelines must take data from many disparate sources and collect them into a single warehouse that represents the data uniformly as a single source of truth.

Sounds simple enough but a lot of data literacy skills goes into this role. This is why Data Engineers are in such short supply and why there is confusion around the role. The figure below is one example of the activities involved in data engineering.

What Do Data Engineers Do

Data engineering is a skill that is in increasing demand. Data engineers are the people who design the system that unifies data and can help you navigate it. Data engineers perform many different tasks including:

  1. Acquisition: Finding all the different data sets around the business

  2. Cleansing: Finding and cleaning any errors in the data

  3. Conversion:: Giving all the data a common format

  4. Disambiguation: Interpreting data that could be interpreted in multiple ways

  5. Deduplication: Removing duplicate copies of data

Once this is done, data may be stored in a central repository such as a data lake or data lakehouse. Data engineers may also copy and move subsets of data into a data warehouse.

yellow abstract

Searching for a Data Engineering Job?