Looking for some advice to build a data science portfolio that will put you ahead of other aspiring data scientists? Don’t miss these useful tips.
Why Have a Portfolio at All?
Even though the demand for data scientists is high, the competition for entry-level positions in this field is tough. It should come as no surprise that companies prefer to hire people with at least some real-world experience in data science. But how do you get this experience before you even get hired for your first data science job?
Well, you don’t actually need to be hired to do data science, and building a data science portfolio is an excellent place to start. Data is all around you—all you need to do is just define a problem and demonstrate how good you are at solving it using your data science toolkit.
Creating a Data Science Portfolio That Rocks
So you’ve learned the basics of Python for data science and are looking for a place to start your data science portfolio. But how do you build a really strong portfolio?
Here are my essential tips for building a data science portfolio that will distinguish you from other aspiring data scientists. Let’s dive right in!
1. Build a portfolio around your interests
What are you interested in? Trump’s policies, crime rates across different locales, or maybe the South Park TV show? You can create a data science project for (almost) anything that interests you. Just identify the problem you want to solve (e.g., determining the price of your house for sale) or the question you want to answer (e.g., Who is the most popular character in Game of Thrones?).
Remember: the topic must genuinely interest you. This will motivate you to work hard and go beyond generic analytical tools to find the answers to your burning data questions. And of course, it always shows when people are really passionate about what they’re doing.
2. Pick projects that others will understand
Make sure that the projects in your portfolio aren’t so specific that only experts in the area will be able to follow the story. For instance, you might be very good at chemistry and may decide to analyze how different shampoo ingredients affect a product’s price and reviews. But other people might not like the idea of sifting through esoteric text about sodium laureth sulfates, parabens, and zinc pyrithione.
Of course, if you’re looking for a data science position in a specific niche industry (e.g., chemistry), it would be great to have some specialized projects in your portfolio. But otherwise, you should also consider topics that may interest a broader audience.
3. Avoid common datasets
Commonly available datasets provide a great opportunity to practice newly acquired skills and concepts, so feel free to use them as an exercise. But beyond that, they’re dead horses that have already been thoroughly beaten into their data science graves. So unless you want to get lost in a crowd of job seekers, keep them out of your portfolio.
Besides, when you work with unique datasets and endeavor to solve non-trivial problems, your potential employers can be more confident that each project represents your own work and is not just a copy of somebody else’s code that’s widely available online.
Web scraping is a great way to get a unique dataset. Luckily, Python has a number of libraries that can assist you in getting the most out of the web in a format that’s suitable for analysis. Consider these libraries:
requestswill help you get HTML content.
BeautifulSoupis great for extracting data from HTML files.
pandasis a great choice for further data wrangling and analysis.
4. Balance your portfolio with different projects
Employers are looking for a specific set of skills when searching for a data scientist. Use your portfolio to showcase your skills in Python for data science by including different types of projects:
- A data cleaning project will demonstrate how you’re able to use the pandas library for preparing your data for analysis.
- A data visualization project will show your skills in creating appealing yet meaningful visualizations using available Python libraries (matplotlib, seaborn, plotly, cufflinks, bokeh).
- A machine-learning project is needed to demonstrate your skills in supervised and unsupervised learning using the scikit-learn library.
- A story-telling project will verify your ability to derive non-trivial insights from data.
Feeling a bit rusty with pandas and matplotlib? Check out our Introduction to Python for Data Science online course to brush up on these essential Python libraries.
5. Participate in competitions
Competitions are quite popular in the data science community. Companies, governments, and researchers often provide datasets to the public that data scientists can then analyze to produce the best models for describing the data and bringing value to the data owners.
By participating in different data science competitions, you’ll be able to:
- Practice your coding and data science skills.
- Assess where you stand compared to other data scientists.
- Demonstrate your achievements to potential employers.
Don’t be afraid to strengthen your portfolio by including links to the leaderboards or mentioning percentile ranks for competitions you did particularly well in.
Check out the following data science competition platforms if you’re interested:
6. Check out portfolios of other successful data scientists
It’s always easier to create something when you see good examples. Even after you read tons of write-ups on how to build a perfect data science portfolio, you may still have lots of unanswered questions. How do I put this together? What should the final portfolio look like?
If you feel lost, be sure to check out the portfolios of successful data scientists to get a better idea of what direction to head in. You may be inspired by Sajal Sharma, Donne Martin, or Andrey Lukyanenko.
7. Consider using Jupyter Notebook
Jupyter Notebook allows you to easily mix code, text, and images in Python. This IDE provides great opportunities for creating visually appealing documents that seamlessly combine your code, visualizations, tables, and explanations. However, based on your personal preferences, you may choose to work with another Python IDE. In the end, find something that you’re comfortable with.
8. Post your code on GitHub
GitHub is a popular place where programmers share their code and project results. Generally, it’s common practice among data scientists to make their personal projects publicly available. While business projects are usually not open source due to competition considerations, big tech companies like Facebook and Google make lots of their projects open. So, when you make your work public on GitHub, you demonstrate that you belong to the community of data scientists contributing to open-source work.
9. Tell stories with your data
Data science is all about telling stories with data, so it’s important to show that you feel comfortable using Python and major data science libraries. However, you don’t create plots just to have a pretty picture, and you don’t run machine learning algorithms just to get accurate models. As a data scientist, you should be able to add meaning to your findings, differentiate between what’s important and what isn’t, and elaborate on any interesting insights that you get from your data. Thus, it’s essential that your data science portfolio include a detailed interpretation of each project’s results.
10. Start a blog
Beyond a proficiency in Python for data science, hiring managers have another set of very important skills they look for when searching for data scientists: written and oral communication. In fact, your ability to communicate complex machine learning concepts in simple terms predicts how well you’re going to communicate with your teammates and managers. Are you able to explain the results of your machine learning model so that it makes sense to a non-IT person?
Writing a blog is a great way to demonstrate that you really understand what the data is “telling” you and can explain the results to somebody who’s maybe not as familiar with data science. You can use Medium or other blogging platforms to start your data science blog.
11. Update your portfolio
Building a portfolio is an iterative process. As you acquire new skills, discover new tools, or read about another interesting technique, your portfolio should also be updated to reflect your newfound knowledge. Don’t think that you can’t edit your project after you make it public—it’s absolutely acceptable (and common practice) to iterate and improve upon your projects after they’ve been published, especially on GitHub.
Discovered how to create interactive visualizations? Consider enhancing some of your projects with these plots. Learned about another trick that can boost the performance of your machine learning model? Make sure to update the projects in your portfolio accordingly.
Follow these tips, and your data science portfolio will help you land your first data science job much faster. But of course, you first need to become very comfortable with Python for data science and master other essential data science skills.