We'll start out by learning how to visualize the distribution of the categorical variable. In this case, that's the
Do you remember what a distribution is? In this case, distribution refers to how the values for the variable are grouped throughout their range. It is possible for all values to occur roughly the same number of times, or be “equally distributed”. However, a few values can occur very frequently in the dataset, while the rest occur less often. We would say that these very frequent values “dominate the distribution”.
To describe the distribution of the
pattern variable, we first have to ask the right questions. Let's reformulate the general questions about distributions that we asked earlier. We'll make them specific to the
- How many patterns of alcohol consumption are defined by the WHO?
- How many countries fit into each pattern?
- Is the distribution of patterns dominated by one category? Or are all pattern categories equally distributed?
- If the distribution is dominated by one category, which one is the most frequent?