The problems()
function gave us some nice information about the issues in our data: some values in the street_number
column contain a letter (check out line 1001, for instance)! So we can conclude that some streets have a different numbering/address scheme than expected. What can we do to adapt to this? Obviously, the city is not going to change street addresses for the convenience of our dataset!
We can see that this change first occurred in the 1001st row. Remember, the data type of the column is based on the first 1,000 rows. These only contain numbers, so the column was set to an integer type. When row 1,001 included a letter, we got a warning.
To fix this situation, we can add the guess_max
option to the read_csv()
function. The guess_max
option allows us to tell R the exact number of rows to base column data types on. If we wanted to use the first 1001 rows of data in the students
dataset as our criteria, we'd write:
students <- read_csv("data/students.csv", guess_max = 1001)
In the case of our companies
data, setting guess_max
to 1001 would resolve our problem.