Readr
Summary

Instruction

The problems() function gave us some nice information about the issues in our data: some values in the street_number column contain a letter (check out line 1001, for instance)! So we can conclude that some streets have a different numbering/address scheme than expected. What can we do to adapt to this? Obviously, the city is not going to change street addresses for the convenience of our dataset!

We can see that this change first occurred in the 1001st row. Remember, the data type of the column is based on the first 1,000 rows. These only contain numbers, so the column was set to an integer type. When row 1,001 included a letter, we got a warning.

To fix this situation, we can add the guess_max option to the read_csv() function. The guess_max option allows us to tell R the exact number of rows to base column data types on. If we wanted to use the first 1001 rows of data in the students dataset as our criteria, we'd write:

students <- read_csv("data/students.csv", guess_max = 1001)

In the case of our companies data, setting guess_max to 1001 would resolve our problem.

Exercise

Read the data from data/companies.csv into the companies variable again. This time, use read_csv() with the guess_max option set to 1001. What type of column will street_number be now?

Stuck? Here's a hint!

Type:

companies <- read_csv("data/companies.csv", guess_max = 1001)