As you can see, this automatic histogram is not perfect. Although we got answers for our questions, they are not very precise. Histograms do not represent each exact value – e.g. the minimum or maximum values – on the chart. They show the character or spread of the data.
That being said, we can still work with this chart and increase the amount of information it shows.
Set a meaningful range of values for your histogram.
First, let's correct the range of the values. If we know that our variable can't take values above or below a given point, we can adjust the histogram so that the last and/or the first bar will start in the correct place. Of course, there will be situations in which there is no limit for our variable's values. In that case, we will omit this adjustment. When we have limits, though, it's better to define them. Always adjust the range of values after you create the first draft of a histogram.
If there are some mistakes in your data (e.g. the size of a pumpkin is -10 cm), you can spot it on the simplified histogram first.
We don't have to correct for alcohol consumption levels below zero, so our first bar can start at zero. To specify how we want the intervals to work in our histogram, we'll use the
breaks argument in our
geom_histogram() command. It defines how our numerical variable's range should be divided into intervals.
Because we want to set particular limits for our bars, we have to use a vector in the
breaks argument that contains numbers specifying the starting and ending points for each bar. To do that, we use the familiar
seq function, which takes a start point, an end point, and the interval length.
Now, have a look at the code:
aes(x = variable_x)) +
geom_histogram(breaks = seq(start, end, by = length))
In the above code, the most important element in this context is the start argument. It defines the first number in our bar-limits vector, so we can be sure that the first bar will start in the proper place on the axis.