So You Want to Be a Data Scientist?

A career in data science is hot right now. What is a data scientist, and how can you become one?

Very few of us said “I want to grow up to be a data scientist” when we were kids. But now, in the age of Big Data and economic uncertainty, a career in data science is looking mighty attractive. If you like the idea of working with information and earning a good paycheck, read on. If you’re interested in cutting-edge technology, read on. And if you’re just curious about “the sexiest job of the 21st century” – read on. You’ll find out the basics about data science jobs.

What Is a Data Scientist?

A data scientist is a person who specializes in analyzing data and producing actionable insights. They understand the location, format, and dimensions of the data they have. Furthermore, they must know how to use and understand algorithms to uncover patterns hidden within large amounts of data.

Data scientists also know about the businesses or industries they work in. They need this to a) effectively communicate their findings, and b) solve the following data problems:

  • Prediction
  • Anomaly Detection
  • Personalization
  • Optimization
  • Processing unstructured data

These five problem types are found in many industries, including manufacturing, agriculture, and healthcare. However, the technological, financial, and organizational constraints of each company are different. A data scientist has to know what outcome is expected, and they also must know how to use their tools to produce the best, most accurate results for their company.

And by the way, data scientists make pretty good money: Glassdoor reports that the average US-based data scientist gets paid over $120,000 (100,000€) per year. Plus, there’s room for newcomers: IBM predicts that data science jobs will grow by 28%, with 700,000 new positions created by 2020.

What Does a Data Scientist Need to Know?

The first things you need to be a successful data scientist is curiosity and a willingness to learn. Technical and math skills are also very important. Data scientists must be fluent in database usage and statistics. Database proficiency allows them understand data storage, integrity, and data types, while SQL skills enable them to solve basic data query issues. Completing a course (or several) on relational databases is a good way to start your data science foundation.

In addition, programming skills are essential for those breaking into data science. Many data scientists work with algorithms, which are simply a set of steps that are developed (or programmed) to solve a specific problem. Data scientist jobs particularly look for programming skills in R and Python. Both of these languages have a rich set of algorithm libraries. You can use them to learn about what each algorithm does and what outcome it can produce, which may save you having to write your own algorithm from scratch.

Tell Me More About Algorithms.

Data science students start off learning about basic classification algorithms. This allows them to write code that teaches a machine to recognize various types of data. For example, imagine that you are teaching someone about shapes. You show them a circle, and you give them 25 cards with images of different shapes and tell the person to put all the circle cards into one pile. You’re teaching the person to classify. This concept applies to classification algorithms: they are just code designed to teach the machine to recognize and categorize something. Data scientists use classification algorithms to differentiate data types, such as texts and images.

Once data science students understand classification algorithms, they can begin to work with clustering algorithms. These are used to spot patterns in large amounts of data. They find things that are similar rather than identical (i.e. an oval, as compared to a circle).

Classification and clustering algorithms are already being used by companies like Google, Amazon, and Facebook. These help us find information related to a search topic or recommend similar items when we are shopping online.

What’s Driving the Need for Data Scientists?

A recent article in The Economist compared data to “a new oil”, meaning that the economy will soon be driven by data. It could be said that we are on the brink of another industrial revolution. The first industrial revolution was all about machinery; the second was factories, and the third was Information Technology. Artificial Intelligence – and therefore data science – looks like it will be the fourth.

What will this mean for the average worker? There might be a few million data jobs worldwide by 2020, but we’re not sure what the impact will be on other jobs. Focusing on data science will probably help you find employment fairly easily.

So if you want a career in data science, it’s high time to start learning about its foundations: databases, programming, math, and statistics. Check out Vertabelo Academy’s courses to learn if data science is for you!

Ramkumar Balasubramanian

Fellow @ Wipro Ltd.