Now that we know what problem, we'll be dealing with in this part of the course, let's look at the data we'll work with.
In this course, we will work with real data from the World Health Organization (WHO), which is the United Nations' international public health initiative. We will draw our data from 2010 statistics about countries' alcohol consumption.1
Why 2010? It is the most recent year that has the most complete data. Some countries calculate their data every year; others wait five or ten years. 2010 is when the most countries reported data, so we'll use this information.
Our goal in this chapter is to use data to answer the following questions:
- How many patterns of alcohol consumption are defined in the data?
- How many countries fit into each pattern?
- Is the distribution of patterns dominated by one category? Or are all pattern categories equally distributed?
- If the distribution is dominated by one category, which one is the most frequent?
In this part, we'll create a
to show the answers. Along the way, we'll learn how to identify the best variables for our purposes, how to define the problem we want to solve, and how to prepare the data and choose a chart.
1. Source: World Health Organization. Recorded alcohol per capita consumption. Patterns of drinking score by country. The data was retrieved on July 1, 2017.