Data Scientist VS Data Engineer 2021 | What's the difference and what should you be?
2 min read

Data Scientist VS Data Engineer 2021 | What's the difference and what should you be?

Let's talk about the difference between a data scientist and a data engineer. Having been in the data science industry for five years I've worked in both professions.

Let's talk about the difference in background between the two professions.

Education

Firstly data engineers tend to come more from an engineering background, predominantly software engineering or computer science. Data Scientists on the other hand come from a broader range of backgrounds mainly from an engineering or science disciplines. It's not uncommon to see data scientists that have degrees in maths, physics, astrophysics or telecommunications engineering.

Data Science Cycle

A typical cycle consists of defining the business use case, data gathering, extract transform and load, exploratory data analysis, modelling, model evaluation, model deployment followed by QA and monitoring.

In organisations with a small data science team usually less than five people, a data scientist can be expected to work across the whole data science cycle. But things are a little bit different once you move to data science teams of greater than 10 people.

A data scientist will firstly work on defining the business use case that a data product will solve and this will involve them engaging with internal and external stakeholders. They'll then work with the data engineer to figure out a plan for data gathering. The data engineer will then carry out the implementation of that data gathering plan which will involve them consolidating data sets from various sources into a single storage location or data lake. This might also involve them carrying out any third-party data integration that they need to do in order to fetch all of the needed data sets.

The data engineer will then push the data through an ETL process. Then the data scientists will carry out any exploratory data analysis to get a better understanding of the datasets and then they'll carry out any algorithmic modelling that needs to be done to build a data product driven by AI. They will then conduct the model evaluation. From this point onwards the machine learning engineer will carry out the production deployment and monitoring. Sometimes you'll also get hybrid data engineers that are also proficient machine learning engineers.

If you'd like to know more about what a machine learning engineer does then be sure to check out this video where i do a comparison about the job of a data scientist versus a machine learning engineer.

Data Engineer Role Summary

In summary a data engineer is more focused with the intricacies of how data is stored processed and managed. They'll be more grounded in sound software engineering practices and well-versed in concepts of data structures. They'll be the ones that design data processing pipelines and data models that dictate how data sets are stored. So this role is perfect for you if you're more concerned with the details of the data and enjoy building large-scale data processing pipelines. It's truly a rewarding job.

Data Scientist Role Summary

A data scientist on the other hand is more concerned with making a data product that solves a specific business problem. This involves extracting insights from data, training machine learning algorithms to make the data product, running A/B testing to validate the product and then iterating on the whole process, in collaboration with data engineers and machine learning engineers. You'll likely spend more time in meetings with stakeholders as opposed to the average data engineer. As you can see the whole data science cycle is purely a collaborative exercise.

What would you prefer to be? A data engineer or a data scientist?