12 Best Data Science Resources on the Internet

best websites for data science, data science learning resources, data science community, data science central, learn data science, data science topics, data science blog, data science for beginners pdf, data science resources

Data science is hot right now. If you want to learn more about it, where should you go? Online, of course! Check out our favorite data science sites. Whether you’re a beginner or a pro, these are the sites you should know.

Not so long ago, if you wanted information on a topic like data science, you had to look for it—either at your local library or at a university. Information was golden, and like gold, it was guarded.

Now though, we have almost too much information. The skill isn’t in finding material. It’s in separating the valuable and useful material from the unimportant and useless. This is especially true with a hot topic like data science. There are plenty of “sources” that aren’t worth the time it takes to read them.

With that in mind, we’ve compiled a list of trusted online repositories of data science knowledge. They can be divided into the following groups:

  1. Machine Learning Competitions
  2. Aggregate Sites
  3. Blogs
  4. Q&A Sites
  5. R & Python
  6. MOOC Courses That Are Worth Attending

Let’s jump right in!

Machine Learning Competitions

You might think that data science competition sites are only for experts, but these sites can also offer a lot of good information for beginners.


Kaggle

Kaggle calls itself “your home of data science and machine learning.” It’s best known for its competitions, but the site also has a lot of other information, including a job board, Kaggle Kernels, publicly available datasets from past competitions, and a discussion forum. Newcomers have plenty of ways to hone their craft. The most interesting, in my opinion, are the completed competitions that have been turned into learning opportunities.

The competitions in Kaggle are based on the principle of crowdsourcing. Once a problem is announced by a company, Kaggle members can start trying to solve it, either alone or in groups. The competition rewards are extremely high, so there is usually an enormous response. Kaggle has more than a million registered members, and several thousand teams are usually involved in each competition! It is one of the largest data communities on the internet.

As mentioned earlier, some closed competitions have been turned into learning opportunities for beginners. There’s no prize, but these are a great way to learn data science. Check out these three:

CrowdANALYTIX

CrowdANALYTIX evolved during the last few years from a crowdsourcing platform to a company specializing in artificial intelligence and deep learning. However, they still have an active community and still open new challenges from time to time. Currently, CrowdANALYTIX has three contests created for non-experts:

CrowdANALYTIX has nice tutorials for someone with a business background who wants to know a little bit more about statistical modeling and descriptive statistics.

Analytics Vidhya

Analytics Vidhya is an Indian platform that counts more than 60,000 data scientists from 200+ countries as members. Among many other things, this site offers the following channels:

  • Learn – This is great for those getting started with data science. These are suggested learning paths, links to training materials, and blog articles related to data science and Big Data. I especially like the Infographics section, which presents various topics using interesting diagrams and other visual aids.
  • Engage – The main part of this section (for me, anyway) is the Q&A discussion forum. Anyone can join in with a question or an answer. There are also some quality blog articles regularly published on this site.
  • CompeteHackathons and interactive workshops are the main reason for this platform’s popularity. Analytics Vidhya focuses on predictive modeling competitions. These are based on real-life problems, which makes them a great way to learn the skills data scientists use the most.

Data Science Challenge

Data Science Central is a relatively new resource with only a few challenges published so far. However, the challenges are interesting, which is why they deserve to be on this list. Official sponsors of this site are the UK’s Government Office for Science, Defence Science and Technology, and MI5, The Security Service. So, all challenges and competitions are related to the real-world problems these institutions face. We hope that they won’t stop here and that we will see new projects in the near future. Take a look at their past challenges:

Topcoder

Another interesting resource is Topcoder, a site not only for data science topics and problems but also for high-quality software solutions in general. Topcoder has a wide network made up of developers, designers, testers, and data scientists.

“Topcoder helps companies of all sizes use crowdsourcing to uncover innovative ideas and produce digital solutions — from apps and dashboards to algorithms that help in the fight against cancer. We believe that crowdsourcing democratizes work, because the best ideas and solutions don’t always come from the person with the highest level of education or the most industry experience” (https://www.topcoder.com/about/crowdsourcing/)

Topcoder has more than 1 million members, all opened for crowdsourcing. There are approximately 7,000 challenges per year, and 80 million USD dollars have been paid to Topcoder community members so far.

Current competitions related to data science can be seen at https://www.topcoder.com/challenges?filter[text]=data%20science.

Aggregate Sites

These sites host an incredible amount of information for both the aspiring and professional data scientist.


Data Science Central

Data Science Central is a platform designed for data scientists and Big Data practitioners. It also hosts a lot of information in the form of blog posts. These are organized according to the following channels:

There is also a channel dedicated to data science jobs, and an assortment of webinars are available on the site.

Data Science Central also offers recommendations about books, courses, and other learning methods. The industry’s latest trends are covered in a very understandable and interesting way. Here is just a sample of the content you can find on this site:


KDnuggets

KDnuggets is similar to Data Science Central—it’s a place where you can find a lot of information about data science. However, KDnuggets is organized a bit differently, and it focuses on industry news, opinions and interviews, publicly available datasets, and data science software. There are also pages dedicated to education on this site, including tutorials and courses.

Like Data Science Central, KDnuggets has a job board. They also offer a nice summary of companies according to their area of expertise. Note: Currently, this is limited to US companies and jobs.

Check out the content on KDnuggets.It’s a well-known blog aggregator. Below are some recent top posts:

Blogs

Simply Statistics

Simply Statistics was founded by three biostatistics professors—Jeff Leek, Roger Peng, and Rafa Irizarry. These three are quite famous because of their free Coursera online courses dealing with statistics, data science, and machine learning. They created the Simply Statistics platform to share their ideas and advice on data science and focus on interesting topics they use in their daily work.

This blog is not updated as frequently as some others, but it contains worthwhile posts, articles, and tips. There are also interviews with data scientists that discuss what it’s like to be a data scientist and what their daily work routine looks like.

No Free Hunch

We’ve already mentioned Kaggle. Now meet No Free Hunch, the official Kaggle blog. Most of the articles on this platform are related to Kaggle competitions, but they also cover the following areas:

  • Data Science News
  • Kaggle News
  • Kernels
  • Tutorials
  • Winner’s Interviews

The winner’s interviews section is great. Through these posts, you can get to know experienced Kagglers, their backgrounds, their experiences, and how they go about winning competitions.

Facebook, Airbnb, and Oracle

Companies like Facebook, Airbnb, and Oracle have their own official blog sites related to data science, deep learning, and artificial intelligence. On those sites, you can find cool topics like advanced analytics, computer vision, and neural nets. Here are some topics related to artificial intelligence, deep learning, and cloud architecture:

For more interesting and helpful articles, explore these sites.

Q&A Sites

CrossValidated and Stack Overflow

CrossValidated and Stack Overflow are very similar. In fact, CrossValidated is the sister site to Stack Overflow. Both are question-and-answer sites, and Stack Overflow is visited by more than 50 million developers each month.

Why two Q&A sites? CrossValidated centers around statistics, machine learning, data analysis, data mining, and data visualization. I like to call it Stack Overflow for data scientists. Questions related to R and Python programming can be found on Stack Overflow pages, and questions related to statistical analysis, machine learning, and probability theory will likely be found on CrossValidated.

I cannot imagine doing my daily work without these resources. They’re pulled up on my web browser most of the time.

Quora

Quora is another question-and-answer site where questions are asked, answered, edited, and organized by a community of users. Quora answers questions of all kinds, from cooking to career advice. You can choose specific channels, like technology, or you can search for topics.

Besides questions related to programming or machine learning problems, you can find interesting “general data science” questions on Quora. Here are a few:

These are good topics for someone who is a beginner in this field. Each person who submits an answer must also list their credentials, which is helpful in finding good information.

R & Python

Python.org

Data scientists are generally divided between two languages—some prefer R, others prefer Python. Python.org is for Python developers. This is the official home of the Python programming language. On this site, you can find nearly anything to do with programming in Python—tutorials, documentation, jobs, information about workshops and conferences, the latest news, and upcoming events for Python developers. This is the most important website for data scientists who use Python as their primary programming language.

Python.org is divided into several parts:

Python.org is divided into several parts:

  • About – Learn more about Python, including how to get started programming.
  • Downloads – Download the latest Python release and install it on your computer. This section covers all Python releases.
  • Documentation – A detailed and clear introduction to the language, syntax, and semantics of Python, plus documentation related to the standard library.
  • Community – Information for the Python user community.
  • Events – Announcements for upcoming conferences and other events.
  • Success Stories – 41 stories about Python implementations and Python software.
  • News – Includes interviews with those in the Python community.


R-bloggers

R-bloggers offers insightful posts, daily news, and tutorials all about the R programming language. It has more than 750 contributors and over 50,000 followers and is well-known among R developers.

It is handy to have all articles, advice, and best practices related to R in one place, especially when you need help \in the middle of the development process.

Although R-bloggers is known primarily for its posts, the site also provides a nice learning path for R beginners. This path is divided into several sections (R Basics, Data Manipulation, Data Visualization, Machine Learning, etc.). Each section points you to relevant resources that range from documentation and online courses to books and other methods. It’s a great way to begin learning and stay engaged.

The Comprehensive R Archive Network (CRAN)

CRAN is a collection of sites that carry identical material (so-called web mirrors) consisting of distribution(s), extensions, and documentation for the R programming language. It’s where you can download the latest official release of R, daily snapshots of R (copies of the current source trees), and a wealth of additional contributed code. Without CRAN’s documentation, it would be nearly impossible to program in R.

It is worth mentioning that the R Development Core team has published some very useful manuals. Beginners and professionals alike can benefit from reviewing these:

MOOC Courses That Are Worth Attending

It is worth mentioning MOOCs, which are courses for both beginners and experts in data science. Today, there are many online courses to choose from, especially courses related to data science and data analysis.

Where do you start, and which courses do you choose? Below are my favorites for beginners and for more advanced users.

For People in Business Who Want to Try out Data Science

Do you want to learn how to enhance your career through data science? Do you want to learn more about SQL, R, and/or Python? Do you want to dive into descriptive statistics and exploratory analysis? I recommend the following:

  1. SQL basics – A Vertabelo Academy course for learning SQL from scratch
  2. Introduction to Python for Data Science – A Vertabelo Academy course for learning how to use Python as a tool in data analysis and data science
  3. Python for data science – An IBM course on Coursera for programming in Python for those who have no programming experience at all
  4. R Level 1 – A Udemy course for beginners who are already in IT, which covers all the important aspects of statistical programming (handling different data types, loops, functions, and visualizations)

For People Who Possess Some SQL, R, and/or Python Skills

If you have already taken the basic courses mentioned above, or if you are familiar with programming in SQL, R, and/or Python, you can continue your learning path with the following:

  1. Data science specialization – Johns Hopkins University courses on Coursera ( coding is covered in R)
  2. CS109 Data Science – A Harvard course (data science topics covered in Python)
  3. Statistics and data science – An MIT course on edX (great material where data science topics are covered in Python)
  4. HarvardX's Data Science Professional Certificate – A Harvard course on edX (another data science specialization in R)
  5. Data science - Deep learning with Python – A Lazy Programmer Inc. course on Udemy (the neural net concept is covered in Python)
  6. Machine Learning – A Stanford University course on Coursera (covers concepts like logistic regression and neural nets, and coding is in the Octave programming language)
  7. Become a Data Scientist – A Udacity course (nanodegree program/specialization in Python)

Math Essentials

After completing the courses mentioned above, you will be ready to dive into deep learning and artificial intelligence. First, you will need a good understanding of mathematics. Cover some basic math concepts with the following:

  1. Mathematics for Machine Learning Specialization – An Imperial College London course on Coursera
  2. Mathematical Foundation For Machine Learning and AI – A Eduonix Learning Solutions, Eduonix-Tech course on Udemy

Advanced, Specialized Learning Paths

After mastering the basics through the courses listed above, you will be ready to move on to more complex subjects. Below are some great courses related to artificial intelligence and deep learning:

  1. Deep learning Specialization – A course on Coursera, created by its founder, Andrew Ng
  2. Deep Learning: Convolutional Neural Networks in Python – A course on Udemy
  3. Fundamentals of deep learning for computer vision – An Nvidia course
  4. Artificial Intelligence – A Columbia University course on edX
  5. Deep learning – An IBM course on edX

Why Use These Data Science Resources?

There are many web pages, resources, and communities that are devoted to data science. We’ve discussed the most popular sites—the sites with the highest quality content that every data scientist should know. They are an excellent starting point for those just beginning their data science journey. You can get sound advice, find pointers to the best courses, and learn about what materials and books will further your development and growth. Who knows? Maybe you’ll contribute to the development of these communities and inspire someone else to learn about data science!

Marija Ilic

Marija works as a data scientist in the banking industry. She specializes in big data platforms (Cloudera and Hadoop) with software and technologies such as Hive/Impala, Python and PySpark, Kafka, and R. Marija has an extensive background in DWH/ETL development in the banking industry. Her main interests are predictive modeling, real-time decision-making, and social network analysis. Outside of work, Marija enjoys listening to her favorite LPs on her old gramophone—and never grows tired of its soothing crackle.