Spring Deals - hours only!Up to 70% off on all courses and bundles.-Close
Know your problem
Know your data
Visualize your data – numerical variables
Work with your chart
14. Causation or correlation?
Check yourself


So the data's correlation coefficient is high and equals 0.6. We applied a hypothesis test for the correlation coefficient and it turned out that it is significant – there is some positive correlation between the wealth of country and its alcohol consumption. The details of this test aren't important for this course, but you can find them in any statistics handbook.

Based on this high and significant correlation, can we assume that when a country gets rich, it will automatically consume more alcohol? Does this correlation signify a causal connection (i.e. one variable causes a change in the other variable) between these two variables?

No, we can't. Correlations can give us some clues about a possible relationship between data, but it doesn't guarantee that this relationship is causal. There are usually unmeasured, unknown factors that influence variables. In this instance, maybe alcohol consumption can also be influenced by climate, religion, and culture. You cannot claim a causal connection without analyzing all the factors first. All you can see is the result of their influence on both variables, not a direct relationship.

So remember:

Correlation doesn't imply causation.

Such high but non-causal correlations are sometimes called "spurious correlations". Tyler Vigen has collected some examples of extremely ridiculous correlations that illustrates this problem.

In the exercise on the right, there is some real data that is very strongly related. See if you can recognize which relationship may be causal and which is probably random.