Summary
24. NAs and factors - summary

Instruction

Awesome! We're almost done with Part 5! Our emphasis in this part of the course was on missing values, and factors. Let's summarize what we learned.

We discussed the concept of a missing value, which is represented by NA in R. All operations involving NA will return NA. Certain functions, such as min(), max(), and mean(), offer an optional na.rm argument to remove NAs from the calculations:

mean(houses$price, na.rm = TRUE)

You can obtain a logical vector of TRUE/FALSE values indicating which values are missing from a vector by using the is.na() function. If you use the sum() function on this logical vector, you'll know how many values are missing from the original vector.

sum(is.na(houses$price))

Since R doesn't know how to perform calculations with NA, we discussed another important topic: imputation methods. To impute a missing value means to replace it with value.

Finally, we looked at factors, which allow you to specify categories of acceptable values. This allows you to limit input values in a column to only a fixed set of values. You create a factor using the factor() function, like this:

houses$district_factor <- factor(houses$district)

We also discussed factor "levels", which are just the categories of acceptable values that the variable can store. You can specify your own levels by using the optional levels argument in the factor() function.

houses$price_category <- factor(houses$price_category, levels=c("HIGH", "MEDIUM", "LOW"))

Exercise

Let's check practice everything you learned. Click Next Exercise to begin.