This is a good place to talk about comparing percentages between groups. We'll discuss this in detail in another course, but there's a very common problem related to this that we want to point out now.
Up till now, we've visualized 'part of a whole' problems for one group at a time (i.e. one country, France). Suppose we want to compare alcohol consumption by beverage category for two countries – France and the Russian Federation. To do this, we join the Russian Federation's data to the dataset we have for France. The new dataset looks like this:
country beverage consumption percent
1 France Beer 2.20 18.803419
2 France Wine 6.60 56.410256
3 France Spirits 2.70 23.076923
4 France Other 0.20 1.709402
5 Russian Federation Beer 4.11 37.194570
6 Russian Federation Wine 1.28 11.583710
7 Russian Federation Spirits 5.53 50.045249
8 Russian Federation Other 0.13 1.176471
As you can see, the first four rows are exactly the same as in our france_beverages
dataset. The last four contain similar data for the Russian Federation. (The percent
column was already calculated separately for each country.)
The most straightforward solution for visualizing these data for two groups (i.e. Russian Federation and France) is to use two instances of the same chart. Since we've been discussing the pie chart, maybe we would want to use two pie charts, side by side. Or maybe we would use the 100% stacked bar chart, with two bars plotted on the graph instead of just one. Or perhaps we could just add more bars to a classic bar chart. At first glance, all of these make sense. But have a look at the extended charts before you make a decision. They are shown on the left / Click on the exercise on the right to see for yourself.
Unfortunately, not all of these extended versions of the charts work equally well for comparing groups. You'll understand why after completing the exercise. Now, though, we want to be very clear on one thing: using multiple pie charts to compare groups is the worst solution.
Never use pie charts to compare two or more groups.
Comparing percentages for two or more groups using multiple pie charts mean you must try to estimate the size of related wedges without having a common reference. You must find the related wedges on each chart (e.g. the 'Wine' category) and try to determine which is larger, but it's nearly impossible. There's no common baseline - each wedge starts in a different place and there is no common baseline between them. What's worse, you can't create a baseline - you can't add a common axis or legend to the charts to fix this problem.
It's much better to use another chart, such as the grouped bar chart, to visualize this problem for more than one dataset. You'll be comparing multiple groups against the same baseline, so it will be much easier to draw accurate conclusions.