To create a histogram, we have to divide our numerical variable's range into **intervals** called **bins**. How do we do this?

We can divide our numerical variable in one of two ways:

- Setting the
**number of intervals**, and therefore the **number of bars**,
- Setting the
**interval length**, and therefore the **bars' width**.

Both methods are connected: when you set the number of intervals for your variable's range, you automatically set how wide the bars representing those bins should be. Likewise, in setting interval length, you also set the number of intervals (bars) on your histogram.

When you use `ggplot`

to create a histogram, it **automatically calculates intervals** for the dataset, sorts values into these intervals, and counts the **interval frequencies**. You don't have to do the calculations manually every time you create a histogram. However, by doing the entire process yourself now, you'll have a better understanding of what's going on "under the hood" of `ggplot`

. You'll also get a better understanding of your dataset before you start plotting it.

Let's start by seeing how we can split a numerical variable into a determined number of intervals in R. We'll split the numerical variable using the `cut()`

function:

split <- cut(vector, breaks = no_intervals, include.lowest=TRUE)

- This takes
**vector** as its **first argument**, which has numerical values that will be classed into the proper intervals by the `cut`

function.
- Then we have the
`breaks`

argument, which sets how this division should be done. We want to set the number of intervals, so supply the appropriate **numeric** value here.
- The last argument is
**technical**; it ensures that the **lowest value** in the vector is **included** in the division.

When executed, this function gives **the same vector**, but with **interval labels** for particular values. If we have a vector with the values **1,3,4,5,6,10** and we use this command to divide it into two intervals (of **(1,5]** and **(5,10]**), it will return a factor vector with the following labels: **(1,5]**, **(1,5]**, **(1,5]**, **(1,5]**, **(5,10]**, **(5,10]**. There were **four** values that fell in the **1-5** range, so we have **four** (1,5] labels. Likewise, there were **two** values in the **5-10** range, so we got **two** (5,10] labels.

We can make a dataset from this new vector using this command:

split_df <- data.frame(interval = split)

Now we can use the `count()`

function again to calculate how many values from each interval are in the `split_df`

dataset. (The `count()`

function requires you to specify a **dataset** as the **first argument**.)

By the way, you don't have to use `split`

as the vector name, `split_df`

for the dataset name, and `interval`

for the column name. These are arbitrarily chosen names – you can choose your own if you like.