5 Things I wish I knew before becoming a Data Scientist
In 2017 I took the leap into data science from being an aerospace engineer and during that time I learnt a couple of things that’ll be extremely helpful to you if you’re thinking of diving into data science.
1. Data Science is not JUST training models.
Yep, that’s the first thing I wish I knew. Before I started working as a data scientist I thought that data science was just training models. My naive notion of data science was the following: You collect data from somewhere (not really sure how) -> You train your model -> You predict a random bunch of outputs -> You do something with outputs. I thought that the majority of the time in that data science cycle was solely devoted to models. It’s safe to say that I was really wrong.
I found out that training model is less than 10% of the work. You’ll be working on a lot more than just models and this can include data collection, labelling, data verification, process management and monitoring. This definitely does not sound as exciting as models but that’s the reality if you’re trying to build big scalable systems.
There are definitely roles out there if you want to get more experience with models and these are at consulting companies like McKinsey, BCG, Deloitte, etc. At those companies you’ll be exposed to a diverse range of industries and projects. The downside is that it’s unlikely that you will witness your project being deployed at a large scale. The reality is that building fully fledged AI systems takes a long time and a company might have to dedicate many years of technical effort to get to a point where they have a fully functioning autonomous ml pipeline.
2. Coding is very important in Data Science
A good coding foundation is important at being successful in data science. if you’re already interested in data science I’m sure you’ve checked out the online data science courses such as Machine Learning by Stanford. The main aim of these online courses is to give you a rundown of all of the ML algorithms used in the industry. They even give you code snippets that you can copy and paste to get an algorithm up and running.
This is all well and good if you’re getting started with data science. But if you really want to level up I’d definitely recommend being proficient in a coding language and a good one to get started with would be python. Even if you have a java experience or some good experience in an object-oriented programming language it’s not going to be too hard for you to pick up python. I’d also recommend learning bash just so you know your way around terminal as well as version control using git.
Learning coding allows you to build skills that will be important regardless of whether or not you choose to get into data science as those skills are equally as important in other software engineering disciplines.
3. Data Science Job Ads can be VERY misleading
Data science job ads can be very misleading as it’s a relatively new field. Given the high amount of publicity around this domain, a lot of companies want to implement AI in their workplace. But they aren’t sure exactly what skills they need or whether they’ve even got enough data to do anything with.
They just google the kind of skills a data scientist often need and then vomit it out on a job ad. You will often find data science job ads that want the candidate to have skills in SQL, Python, Scala, Hive, Hadoop, spark and every other programming language under the sun. That’s an alarm bell. Because that just means that they’ve exhaustively listed all the technologies that a data scientist might use and they’re not too sure about the direction of data science in their own company.
I would focus more on job ads that list a couple of technical skills but also focus on soft skills that you might need such as communication skills or presentation skills and I’d also recommend looking at the overall reputation of the employer who has posted that job ad as well as the culture and both of which you can find more about on Glassdoor reviews, which I found very useful on my hunt for data science jobs. Once you get past that resume review stage definitely make sure to grill the technology teams at the companies you’re interviewing at in order to gauge exactly how serious they are regarding data science and machine learning. This helps you make sure that you end up at a company where you’re going to be surrounded by lots of smart people to learn from.
4. Communication and attention to detail are key to being a good data scientist
Communication is important in any field including technical roles. It’s important in making sure that you can work well with your team. It ensures that you can communicate problems and propose solutions accordingly. But it’s especially important in the field of data science. DS teams tend to sit between the engineering team and the sales team.
Consequently, they need to be well versed in converting technical details into big picture summary because as a technical person it’s very easy to get bogged down in the details. But you have to remember that you’re working on a product that you’re gonna be serving to a client who might not be as technical as you are. Sometimes in other technical professions you can get away with not interacting with the client as much but this is definitely not the case in data science.
Attention to detail is important for a couple of reasons. Good attention to detail can help you figure out the shortcomings of your training data which helps you down the line in making your model perform better. It’ll also help you highlight the edge cases that your model fails for. It can’t be understated how important attention to detail is in data science especially if you’re at the cutting edge and you’re trying to squeeze out every single percent to level up your model accuracy.
5. Types of Data Science roles out there
There are three categories of data science roles out there:
- Product Enhancement with Data Science
This is improving an existing product at a company with ML techniques. It could be ML working behind the scenes of the product like in google search or it could be for fraud detection on credit cards using anomaly detection.
2. ML as a product
In this case, ML serves as the sole product for a particular application. A good example of this is autonomous driving. This is primarily done by image segmentation on the fly as the car is driving. The only product is machine learning as autonomous driving would not be possible without it.
3. Decision making using Data Science
This could be a role of an insights analyst. This is a type of data scientist that’ll be working inside of a company to figure out how best to optimise the different processes in order to cut cost and increase revenue. An example of this could be analysing the product usage of your clients to figure out what the best way to lay the features is in order to improve usage. Another example is to use machine learning to figure out if any cost in the finance department is blowing out of proportion compared to five years ago and that could be a simple form of forecasting.
Hope this was some helpful insight into the world of data science. If you’re more of the visual type, here is a video I made summarising the above points: