12 Best Data Science Resources on the Internet

best websites for data science, data science learning resources, data science community, data science central, learn data science, data science topics, data science blog, data science for beginners pdf, data science resources

Data science is hot right now. If you want to learn more about it, where should you go? Online, of course! Check out our favorite data science sites. Whether you’re a beginner or a pro, these are sites you should know.

Not so long ago, if you wanted information on a topic like data science, you had to look for it – either at your local library or at a university. Information was golden, and like gold it was guarded.

Now, though, we almost have too much information. The skill isn’t in finding material; it’s in separating the valuable and useful from the unimportant and useless. This is especially true with a hot topic like data science. There are plenty of “sources” that aren’t worth the time it takes to read them.

With that in mind, we’ve compiled a list of trusted online repositories of data science knowledge. They can be divided into the following groups:

  1. Machine Learning Competitions
  2. Aggregate Sites
  3. Blogs
  4. Q&A Sites
  5. R & Python

Let’s jump right in!

Machine Learning Competitions

You might think that data science competitions are just for experts, but these sites can also offer a lot of good information for beginners.


Kaggle

Kaggle calls itself “your home of data science and machine learning”. It’s best known for its competitions, but the site also has a lot of other information, including a job board, Kaggle Kernels, publicly available datasets from past competitions, and a discussion forum. Newcomers have plenty of ways to hone their craft; the most interesting, in my opinion, are the completed competitions that have been turned into learning opportunities.

The competitions in Kaggle are based on the principle of crowdsourcing. Once a problem is announced by some company, Kaggle members can start trying to solve it, either alone or in groups. The competition rewards are extremely high, so there is usually an enormous response. Kaggle has more than a million registered members and several thousand teams are usually involved in each competition! It is one of the largest data communities on the Internet.

As we said earlier, some closed competitions have been turned into learning opportunities for beginners. There’s no prize (except knowledge), but these are a great way to learn data science. Check out these three:

CrowdANALYTIX

CrowdANALYTIX may not be as famous as Kaggle – to which it is very similar – but it’s near and dear to my heart. Some of my first data visualization and predictive modeling projects were given a reward on this site, so this community will always be very special to me.

This site features community-held competitions related to data modeling, research, and visualization. However, their approach is slightly different than Kaggle’s. Kaggle will usually provide you with a well-prepared dataset on which you try different Machine Learning (ML) algorithms and then optimize. The emphasis is on model development? and ML algorithms. On CrowdANALYTIX, you cover the entire process of model development?: you find the data, do the web scraping and data cleaning, explain your business approach, and (finally) apply the predictive algorithm.

Currently, CrowdANALYTIX has three contests geared to non-experts:

Analytics Vidhya

Analytics Vidhya is an Indian platform that counts more than 60,000 data scientists from 200+ countries as members. Among many other things, this site offers the following channels:

  • Learn – Great for those getting started with data science. There are some nice suggested learning paths, links to training materials, and blog articles related to data science and Big Data. I especially like the Infographics section, which presents various topics using interesting diagrams and other visual aids.
  • Engage – The main part of this section (for me, anyway) is the Q&A discussion forum. Anyone can join in with a question or an answer. There are also some quality blog articles regularly published on this site.
  • CompeteHackathons and interactive workshops are the main reason for this platform’s popularity. Analytics Vidhya focuses on predictive modeling competitions. These are based on real-life problems, which makes them a great way to learn the skills we use the most.

Aggregate Sites

These sites host a huge amount of information for the aspiring and professional data scientist.

Data Science Central

Data Science Central is a platform designed for data scientists and Big Data practitioners. It also hosts a lot of information in the form of blog posts. These are classed according to the following channels:

There is also a channel dedicated to data science jobs, and an assortment of webinars are available on the site.

Data Science Central also offers recommendations about books, courses, and other learning methods. The industry’s latest trends are covered in a very understandable and interesting way. Here is just a sample of the content you can find on this site:


KDnuggets

KDnuggets is similar to Data Science Central – it’s a place where you can find a lot of information about data science. However, KDnuggets is organized a bit differently, and it focuses on industry news, opinions and interviews, publicly available datasets, and data science software. There’s also pages and pages dedicated to education on this site, including tutorials and courses.

Like Data Science Central, KDnuggets has a job board. They also offer a nice summary of companies according to their area of expertise. Note: Currently, this is limited to US companies and jobs.

Check out the content on KDnuggets – it’s a well-known blog aggregator. Below are some recent top posts:

Blogs

Simply Statistics

Simply Statistics was founded by three biostatistics professors ?– Jeff Leek, Roger Peng and Rafa Irizarry. These three are quite famous because of their free Coursera online courses dealing with statistics, data science, and machine learning. They created the Simply statistics platform to share their ideas and advice on data science. They focus on interesting topics that they use in their daily work.

This blog is not updated as frequently as some others, but it contains worthwhile posts, articles, and tips. There are also interviews with data scientists that discuss what it’s like to be a data scientist and what their daily work routine looks like.

No Free Hunch

We’ve already mentioned Kaggle; now meet No Free Hunch, the official Kaggle blog. Most of the articles on this platform are related to Kaggle competitions, but they also cover the following areas:

  • Data Science News
  • Kaggle News
  • Kernels
  • Tutorials
  • Winner’s Interviews

The winner’s interviews section is great. Through these posts, you can get to know experienced Kagglers – their background, their experience, and how they go about winning competitions.

Q&A Sites

CrossValidated and Stack Overflow

CrossValidated and Stack Overflow are very similar; in fact, CrossValidated is the sister site to Stack Overflow. Both are question-and-answer sites; Stack Overflow is visited by more than 50 million developers each month.

Why two Q&A sites? CrossValidated centers around statistics, machine learning, data analysis, data mining, and data visualization. I like to call it Stack Overflow for data scientists. Questions related to R and Python programming will be placed on Stack Overflow pages; questions related to statistical analysis, Machine Learning or probability theory will likely be found on CrossValidated.

I cannot imagine my daily work without these resources. They’re pulled up on my web browser most of the time.

Quora

Quora is another question-and-answer site where questions are asked, answered, edited, and organized by a community of users. Quora answers questions of all kinds, from cooking to career advice. You can choose specific channels, like technology, or you can search for topics.

Besides questions related to programming issues or Machine Learning problems, you can find interesting “general data science” questions on Quora. Here are a few:

These are good topics for someone who is a beginner in this field. And each person who submits an answer must also list their credentials, so if you look, you can find some really good info here.

R & Python

Python.org

Data scientists are generally divided between two languages?–some prefer R, others prefer Python. Python.org is for Python developers. This is the official home of the Python programming language. On this site, you can find nearly anything to do with programming in Python ?– tutorials, documentation, jobs, information about workshops and conferences, the latest news, and upcoming events for Python developers. This is the most important website for data scientists who use Python as their primary programming language.

Python.org is divided into several parts:

  • About – Learn more about Python, including how to get started programming.
  • Downloads – Download the latest Python release and install it on your computer; this section covers all Python releases.
  • Documentation – A detailed and clear introduction to the language, syntax, and semantics of Python, plus documentation related to the standard library.
  • Community – Information for the Python user community.
  • Events – Announces upcoming conferences and other events.
  • Success Stories – 41 stories about Python implementations and Python software.
  • News – Also includes interviews with those in the Python community.


R-bloggers

R-bloggers offers insightful posts, daily news, and tutorials all about the R programming language. It has more than 750 contributors and over 50,000 followers; it’s quite famous among R developers.

It is handy to have all articles, advice, and best practices related to R in one place? – especially when you need help and you’re in the middle of the development process.

Although R-bloggers is known primarily for its posts, the site also provides a nice learning path for R beginners. This path is divided into several sections (R Basics, Data Manipulation, Data Visualization, Machine Learning, etc.). Each section points you to relevant resources that range from documentation and online courses to books and other methods. It’s a great way to begin learning and stay engaged.

The Comprehensive R Archive Network (CRAN)

CRAN is a collection of sites which carry identical material (so-called web mirrors) consisting of distribution(s), extensions, and documentation for the R programming language. It’s where you can download the latest official release of R, daily snapshots of R (copies of the current source trees), and a wealth of additional contributed code. Without CRAN’s documentation, it would be nearly impossible to program in R.

It is worth mentioning that the R Development Core team has published some very useful manuals. Beginners and pros alike can benefit from reviewing these:

Why Use These Data Science Resources?

There are many webpages, resources, and communities that are devoted to data science. This post listed sites with the highest quality of content?, the most popular sites that every data scientist should know. They are an excellent starting point for those just beginning their data science journey. You can pick up sound advice, find pointers to the best courses, and learn about what materials and books will further your development and growth. Who knows?? Maybe you’ll contribute to the development of these communities and inspire someone else to learn about data science.

Marija works as a data scientist in the banking industry. She specializes in big data platforms (Cloudera and Hadoop) with software and technologies such as Hive/Impala, Python and PySpark, Kafka, and R. Marija has an extensive background in DWH/ETL development in the banking industry. Her main interests are predictive modeling, real-time decision-making, and social network analysis. Outside of work, Marija enjoys listening to her favorite LPs on her old gramophone—and never grows tired of its soothing crackle.

GET ACCESS TO EXPERT SQL CONTENT!