Now we have to prepare some more complicated data – for each of the four wealth categories, we have to calculate the percentages of levels of alcohol consumption. To do that, we first calculate how often each combination of these variables occurs in the dataset, using count()
function, but this time with two arguments, one for each variable:
tab <- count(dataset, variable_x, variable_y)
where variable_x
and variable_y
are the variables we want to analyze.
Then we do something very similar to what we did for the first variable – we calculate the percentage of the whole for each combination. Again, we use the mutate
function. However, this time we have two variables. We must consider which one is the main variable and which is the grouping variable. Earlier, we set wealth category on the x-axis; this makes it the grouping variable because it will determine how the data points are grouped.
To set a variable as a grouping variable, we use the group_by
function:
tab <- group_by(tab, variable_x)
and then we use our
mutate()
function to calculate percentages for each group determined by
variable_x
:
tab <- mutate(tab, percent = n / sum(n) * 100)