Deals Of The Week - hours only!Up to 80% off on all courses and bundles.-Close
Visualize your data
Work with your chart 2
Check yourself

Instruction

We already know one chart that can be used to visualize a distribution: the bar chart. Our new chart is very similar to the bar chart; it even uses bars to encode data. Maybe instead of introducing another chart, we should use the one we already know. How will that work?

These are the first 30 values for the consumption variable:

5.28 0.45 10.60 7.80 7.84 8.15 4.23 10.52 12.10 1.98 9.19 0.01 8.41 14.44 10.22
6.76 1.33 3.95 4.54 5.99 7.52 10.80 4.55 4.16 4.75 2.20 6.15 8.40 1.67 0.50   

In contrast to a categorical variable, a numerical variable like consumption gives us lots of different values. Some values are very close together, but none of the ones shown above appears more than once!

What does this mean for a bar chart? We can expect it will have lots of very short bars. Let's go ahead and create a bar chart for this data. Then we can look at it and assess whether it will be useful in analyzing the distribution of a numerical variable.

The first step in creating a bar chart is to count how many times each variable value appears in the dataset. We can use the count() function again for that:

tab <- count(alcohol_consumption, consumption)

You'll get something like this:

  consumption n
1        0.01 1
2        0.08 1

Because consumption is a numeric variable, we next have to change it to a categorical variable. We can do that using factor:

tab$consumption <- factor(tab$consumption)

Now we can finally draw a bar chart for this data:

ggplot(data = tab, aes(x = consumption, y = n)) + geom_col()

Exercise

Create a frequency table for the consumption variable and visualize it on a bar chart. Follow the steps described above. When you're done, click the Run and Check Code button.

Stuck? Here's a hint!

You should write:

tab <- count(alcohol_consumption, consumption)
tab$consumption <- factor(tab$consumption)
ggplot(data = tab, aes(x = consumption, y = n)) + geom_col()