Visualize your data
5. Prepare the data: interval length
Work with your chart 2
Check yourself

Instruction

We did indeed get the ten intervals we wanted. Now, let's try the other method and set the length of intervals instead of their number. We will use the same commands as in the previous example, but we will change the breaks argument.

Instead, we will write:

split <- cut(vector, breaks = seq(start, end, by = length), include.lowest = TRUE )

This time, we will use a vector for the breaks argument. When we use a vector, the cut function will derive interval limits from the vector's values; it will then sort the dataset's values into these intervals. If we want to create a range of intervals with a fixed length, we need a sequence of numbers. We'll use the seq function for this. We showed it above, but here it is again:

seq(start, end, by = length)

This function creates a vector of numbers, starting with the start value, increasing the value of each number by the amount defined as the length, and then stopping at or near the end value. For example seq(0,10, by=2) will create a vector with the numbers 0,2,4,6,8,10. Giving this vector as the breaks argument cut function will create intervals like [0,2], (2,4], (4,6], (6,8], (8,10].

To count the frequencies of values for each interval, we use the same code as before. First, we create a data frame:

split_df <- data.frame(interval=split)

Then we count interval frequencies:

count(split_df, interval)

Exercise

Let's divide our consumption variable into intervals with a length of three and count the value frequencies for each interval.

Use the three commands discussed above and set the code for breaks as seq(0,15,by=3). (The consumption variable's rounded minimum and maximum values are 0 and 15 liters.)

Remember to specify the consumption column from the alcohol_consumption dataset as the vector in the cut function (alcohol_consumption$consumption).

When you're done, click the Run and Check Code button. Check the length of the created intervals. Do they have a length of three?

Stuck? Here's a hint!

You should write:

split <- cut(alcohol_consumption$consumption, breaks = seq(0, 15, by = 3), include.lowest = TRUE)
split_df <- data.frame(interval = split)
count(split_df, interval)