Introduction
Know your problem
Know your data
Visualise your data
8. Step 1 - recognize the problem
Work with your chart
Check yourself

Instruction

We'll start out by learning how to visualize the distribution of the categorical variable. In this case, that's the pattern variable.

Do you remember what a distribution is? In this case, distribution refers to how the values for the variable are grouped throughout their range. It is possible for all values to occur roughly the same number of times, or be “equally distributed”. However, a few values can occur very frequently in the dataset, while the rest occur less often. We would say that these very frequent values “dominate the distribution”.

To describe the distribution of the pattern variable, we first have to ask the right questions. Let's reformulate the general questions about distributions that we asked earlier. We'll make them specific to the pattern variable:

  • How many patterns of alcohol consumption are defined by the WHO?
  • How many countries fit into each pattern?
  • Is the distribution of patterns dominated by one category? Or are all pattern categories equally distributed?
  • If the distribution is dominated by one category, which one is the most frequent?