problems() function gave us some nice information about the issues in our data: some values in the
street_number column contain a letter (check out line 1001, for instance)! So we can conclude that some streets have a different numbering/address scheme than expected. What can we do to adapt to this? Obviously, the city is not going to change street addresses for the convenience of our dataset!
We can see that this change first occurred in the 1001st row. Remember, the data type of the column is based on the first 1,000 rows. These only contain numbers, so the column was set to an integer type. When row 1,001 included a letter, we got a warning.
To fix this situation, we can add the
guess_max option to the
read_csv() function. The
guess_max option allows us to tell R the exact number of rows to base column data types on. If we wanted to use the first 1001 rows of data in the
students dataset as our criteria, we'd write:
students <- read_csv("data/students.csv", guess_max = 1001)
In the case of our
companies data, setting
guess_max to 1001 would resolve our problem.