12 Best Data Science Resources on the Internet
Data science is hot right now. If you want to learn more about it, where should you go? Online, of course! Check out our favorite data science sites. Whether you’re a beginner or a pro, these are the sites you should know.
Not so long ago, if you wanted information on a topic like data science, you had to look for it—either at your local library or at a university. Information was golden, and like gold, it was guarded.
Now though, we have almost too much information. The skill isn’t in finding material. It’s in separating the valuable and useful material from the unimportant and useless. This is especially true with a hot topic like data science. There are plenty of “sources” that aren’t worth the time it takes to read them.
With that in mind, we’ve compiled a list of trusted online repositories of data science knowledge. They can be divided into the following groups:
- Machine Learning Competitions
- Aggregate Sites
- Blogs
- Q&A Sites
- R & Python
- MOOC Courses That Are Worth Attending
Let’s jump right in!
Machine Learning Competitions
You might think that data science competition sites are only for experts, but these sites can also offer a lot of good information for beginners.
Kaggle
Kaggle calls itself “your home of data science and machine learning.” It’s best known for its competitions, but the site also has a lot of other information, including a job board, Kaggle Kernels, publicly available datasets from past competitions, and a discussion forum. Newcomers have plenty of ways to hone their craft. The most interesting, in my opinion, are the completed competitions that have been turned into learning opportunities.
The competitions in Kaggle are based on the principle of crowdsourcing. Once a problem is announced by a company, Kaggle members can start trying to solve it, either alone or in groups. The competition rewards are extremely high, so there is usually an enormous response. Kaggle has more than a million registered members, and several thousand teams are usually involved in each competition! It is one of the largest data communities on the internet.
As mentioned earlier, some closed competitions have been turned into learning opportunities for beginners. There’s no prize, but these are a great way to learn data science. Check out these three:
- Titanic: Machine Learning from Disaster
- House Prices: Advanced Regression Techniques
- Digit Recognizer
CrowdANALYTIX
CrowdANALYTIX evolved during the last few years from a crowdsourcing platform to a company specializing in artificial intelligence and deep learning. However, they still have an active community and still open new challenges from time to time. Currently, CrowdANALYTIX has three contests created for non-experts:
- Business Analytics for Beginners Using R–Part I
- Business Analytics for Beginners Using R–Part II
- Business Analytics for Beginners Using R–Part III
CrowdANALYTIX has nice tutorials for someone with a business background who wants to know a little bit more about statistical modeling and descriptive statistics.
Analytics Vidhya
Analytics Vidhya is an Indian platform that counts more than 60,000 data scientists from 200+ countries as members. Among many other things, this site offers the following channels:
- Learn – This is great for those getting started with data science. These are suggested learning paths, links to training materials, and blog articles related to data science and Big Data. I especially like the Infographics section, which presents various topics using interesting diagrams and other visual aids.
- Engage – The main part of this section (for me, anyway) is the Q&A discussion forum. Anyone can join in with a question or an answer. There are also some quality blog articles regularly published on this site.
- Compete – Hackathons and interactive workshops are the main reason for this platform’s popularity. Analytics Vidhya focuses on predictive modeling competitions. These are based on real-life problems, which makes them a great way to learn the skills data scientists use the most.
Data Science Challenge
Data Science Central is a relatively new resource with only a few challenges published so far. However, the challenges are interesting, which is why they deserve to be on this list. Official sponsors of this site are the UK’s Government Office for Science, Defence Science and Technology, and MI5, The Security Service. So, all challenges and competitions are related to the real-world problems these institutions face. We hope that they won’t stop here and that we will see new projects in the near future. Take a look at their past challenges:
- https://www.datasciencechallenge.org/challenges/2/growing-instability
- https://www.datasciencechallenge.org/challenges/1/safe-passage
Topcoder
Another interesting resource is Topcoder, a site not only for data science topics and problems but also for high-quality software solutions in general. Topcoder has a wide network made up of developers, designers, testers, and data scientists.
“Topcoder helps companies of all sizes use crowdsourcing to uncover innovative ideas and produce digital solutions — from apps and dashboards to algorithms that help in the fight against cancer. We believe that crowdsourcing democratizes work, because the best ideas and solutions don’t always come from the person with the highest level of education or the most industry experience” (https://www.topcoder.com/about/crowdsourcing/)
Topcoder has more than 1 million members, all opened for crowdsourcing. There are approximately 7,000 challenges per year, and 80 million USD dollars have been paid to Topcoder community members so far.
Current competitions related to data science can be seen at https://www.topcoder.com/challenges?filter[text]=data%20science.
Aggregate Sites
These sites host an incredible amount of information for both the aspiring and professional data scientist.
Data Science Central
Data Science Central is a platform designed for data scientists and Big Data practitioners. It also hosts a lot of information in the form of blog posts. These are organized according to the following channels:
- Hadoop
- Big Data
- AnalyticBridge (for data analysts and Business Intelligence experts)
- Deep Learning
- AI
- Data Visualization
There is also a channel dedicated to data science jobs, and an assortment of webinars are available on the site.
Data Science Central also offers recommendations about books, courses, and other learning methods. The industry’s latest trends are covered in a very understandable and interesting way. Here is just a sample of the content you can find on this site:
- Some Thoughts on Mid-Career Switching Into Data Science
- Statistics is Dead?–Long Live Data Science
- Time Series Classification with Tensorflow
KDnuggets
KDnuggets is similar to Data Science Central—it’s a place where you can find a lot of information about data science. However, KDnuggets is organized a bit differently, and it focuses on industry news, opinions and interviews, publicly available datasets, and data science software. There are also pages dedicated to education on this site, including tutorials and courses.
Like Data Science Central, KDnuggets has a job board. They also offer a nice summary of companies according to their area of expertise. Note: Currently, this is limited to US companies and jobs.
Check out the content on KDnuggets.It’s a well-known blog aggregator. Below are some recent top posts:
- 30 Essential Data Science, Machine Learning & Deep Learning Cheat Sheets
- Introduction to Blockchains & What It Means to Big Data
- Understanding Machine Learning Algorithms
Blogs
Simply Statistics
Simply Statistics was founded by three biostatistics professors—Jeff Leek, Roger Peng, and Rafa Irizarry. These three are quite famous because of their free Coursera online courses dealing with statistics, data science, and machine learning. They created the Simply Statistics platform to share their ideas and advice on data science and focus on interesting topics they use in their daily work.
This blog is not updated as frequently as some others, but it contains worthwhile posts, articles, and tips. There are also interviews with data scientists that discuss what it’s like to be a data scientist and what their daily work routine looks like.
No Free Hunch
We’ve already mentioned Kaggle. Now meet No Free Hunch, the official Kaggle blog. Most of the articles on this platform are related to Kaggle competitions, but they also cover the following areas:
- Data Science News
- Kaggle News
- Kernels
- Tutorials
- Winner’s Interviews
The winner’s interviews section is great. Through these posts, you can get to know experienced Kagglers, their backgrounds, their experiences, and how they go about winning competitions.
Facebook, Airbnb, and Oracle
Companies like Facebook, Airbnb, and Oracle have their own official blog sites related to data science, deep learning, and artificial intelligence. On those sites, you can find cool topics like advanced analytics, computer vision, and neural nets. Here are some topics related to artificial intelligence, deep learning, and cloud architecture:
- Open-sourcing PyRobot to accelerate AI robotics research (Facebook)
- Amenity Detection and Beyond — New Frontiers of Computer Vision at Airbnb (Airbnb)
- How Did an IT Services Provider Save Their Bacon by Switching to Oracle Cloud Infrastructure? (Oracle)
For more interesting and helpful articles, explore these sites.
Q&A Sites
CrossValidated and Stack Overflow
CrossValidated and Stack Overflow are very similar. In fact, CrossValidated is the sister site to Stack Overflow. Both are question-and-answer sites, and Stack Overflow is visited by more than 50 million developers each month.
Why two Q&A sites? CrossValidated centers around statistics, machine learning, data analysis, data mining, and data visualization. I like to call it Stack Overflow for data scientists. Questions related to R and Python programming can be found on Stack Overflow pages, and questions related to statistical analysis, machine learning, and probability theory will likely be found on CrossValidated.
I cannot imagine doing my daily work without these resources. They’re pulled up on my web browser most of the time.
Quora
Quora is another question-and-answer site where questions are asked, answered, edited, and organized by a community of users. Quora answers questions of all kinds, from cooking to career advice. You can choose specific channels, like technology, or you can search for topics.
Besides questions related to programming or machine learning problems, you can find interesting “general data science” questions on Quora. Here are a few:
- How Can I Become a Data Scientist?
- Why Is Python a Language of Choice for Data Scientists?
- Machine Learning: Is Machine Learning a Field Best Suited for Geniuses? Should I Bother Trying to Pursue It?
- What Programming Language Is Best for Machine Learning and Statistical Analysis? Is it R or Python?
These are good topics for someone who is a beginner in this field. Each person who submits an answer must also list their credentials, which is helpful in finding good information.
R & Python
Python.org
Data scientists are generally divided between two languages—some prefer R, others prefer Python. Python.org is for Python developers. This is the official home of the Python programming language. On this site, you can find nearly anything to do with programming in Python—tutorials, documentation, jobs, information about workshops and conferences, the latest news, and upcoming events for Python developers. This is the most important website for data scientists who use Python as their primary programming language.
Python.org is divided into several parts:
Python.org is divided into several parts:
- About – Learn more about Python, including how to get started programming.
- Downloads – Download the latest Python release and install it on your computer. This section covers all Python releases.
- Documentation – A detailed and clear introduction to the language, syntax, and semantics of Python, plus documentation related to the standard library.
- Community – Information for the Python user community.
- Events – Announcements for upcoming conferences and other events.
- Success Stories – 41 stories about Python implementations and Python software.
- News – Includes interviews with those in the Python community.
R-bloggers
R-bloggers offers insightful posts, daily news, and tutorials all about the R programming language. It has more than 750 contributors and over 50,000 followers and is well-known among R developers.
It is handy to have all articles, advice, and best practices related to R in one place, especially when you need help \in the middle of the development process.
Although R-bloggers is known primarily for its posts, the site also provides a nice learning path for R beginners. This path is divided into several sections (R Basics, Data Manipulation, Data Visualization, Machine Learning, etc.). Each section points you to relevant resources that range from documentation and online courses to books and other methods. It’s a great way to begin learning and stay engaged.
The Comprehensive R Archive Network (CRAN)
CRAN is a collection of sites that carry identical material (so-called web mirrors) consisting of distribution(s), extensions, and documentation for the R programming language. It’s where you can download the latest official release of R, daily snapshots of R (copies of the current source trees), and a wealth of additional contributed code. Without CRAN’s documentation, it would be nearly impossible to program in R.
It is worth mentioning that the R Development Core team has published some very useful manuals. Beginners and professionals alike can benefit from reviewing these:
- An Introduction to R – An introduction to the language and to using R for statistical analysis and graphics.
- R Data Import/Export – Describes the import and export facilities available either in R itself or via packages available on CRAN.
- R Installation and Administration – How to install R.
- The R Language Definition – Details of the expression evaluation process, which is useful to know when you’re programming R functions.
MOOC Courses That Are Worth Attending
It is worth mentioning MOOCs, which are courses for both beginners and experts in data science. Today, there are many online courses to choose from, especially courses related to data science and data analysis.
Where do you start, and which courses do you choose? Below are my favorites for beginners and for more advanced users.
For People in Business Who Want to Try out Data Science
Do you want to learn how to enhance your career through data science? Do you want to learn more about SQL, R, and/or Python? Do you want to dive into descriptive statistics and exploratory analysis? I recommend the following:
- SQL basics – A Vertabelo Academy course for learning SQL from scratch
- Introduction to Python for Data Science – A Vertabelo Academy course for learning how to use Python as a tool in data analysis and data science
- Python for data science – An IBM course on Coursera for programming in Python for those who have no programming experience at all
- R Level 1 – A Udemy course for beginners who are already in IT, which covers all the important aspects of statistical programming (handling different data types, loops, functions, and visualizations)
For People Who Possess Some SQL, R, and/or Python Skills
If you have already taken the basic courses mentioned above, or if you are familiar with programming in SQL, R, and/or Python, you can continue your learning path with the following:
- Data science specialization – Johns Hopkins University courses on Coursera ( coding is covered in R)
- CS109 Data Science – A Harvard course (data science topics covered in Python)
- Statistics and data science – An MIT course on edX (great material where data science topics are covered in Python)
- HarvardX's Data Science Professional Certificate – A Harvard course on edX (another data science specialization in R)
- Data science - Deep learning with Python – A Lazy Programmer Inc. course on Udemy (the neural net concept is covered in Python)
- Machine Learning – A Stanford University course on Coursera (covers concepts like logistic regression and neural nets, and coding is in the Octave programming language)
- Become a Data Scientist – A Udacity course (nanodegree program/specialization in Python)
Math Essentials
After completing the courses mentioned above, you will be ready to dive into deep learning and artificial intelligence. First, you will need a good understanding of mathematics. Cover some basic math concepts with the following:
- Mathematics for Machine Learning Specialization – An Imperial College London course on Coursera
- Mathematical Foundation For Machine Learning and AI – A Eduonix Learning Solutions, Eduonix-Tech course on Udemy
Advanced, Specialized Learning Paths
After mastering the basics through the courses listed above, you will be ready to move on to more complex subjects. Below are some great courses related to artificial intelligence and deep learning:
- Deep learning Specialization – A course on Coursera, created by its founder, Andrew Ng
- Deep Learning: Convolutional Neural Networks in Python – A course on Udemy
- Fundamentals of deep learning for computer vision – An Nvidia course
- Artificial Intelligence – A Columbia University course on edX
- Deep learning – An IBM course on edX
Why Use These Data Science Resources?
There are many web pages, resources, and communities that are devoted to data science. We’ve discussed the most popular sites—the sites with the highest quality content that every data scientist should know. They are an excellent starting point for those just beginning their data science journey. You can get sound advice, find pointers to the best courses, and learn about what materials and books will further your development and growth. Who knows? Maybe you’ll contribute to the development of these communities and inspire someone else to learn about data science!