Introduction To R

24/32 NAs and factors - summary

Awesome! We're almost done with Part 5! Our emphasis in this part of the course was on missing values, and factors. Let's summarize what we learned.

We discussed the concept of a missing value, which is represented by NA in R. All operations involving NA will return NA. Certain functions, such as min(), max(), and mean(), offer an optional na.rm argument to remove NAs from the calculations:

mean(houses$price, na.rm = TRUE)

You can obtain a logical vector of TRUE/FALSE values indicating which values are missing from a vector by using the is.na() function. If you use the sum() function on this logical vector, you'll know how many values are missing from the original vector.

sum(is.na(houses$price))

Since R doesn't know how to perform calculations with NA, we discussed another important topic: imputation methods. To impute a missing value means to replace it with value.

Finally, we looked at factors, which allow you to specify categories of acceptable values. This allows you to limit input values in a column to only a fixed set of values. You create a factor using the factor() function, like this:

houses$district_factor <- factor(houses$district)

We also discussed factor "levels", which are just the categories of acceptable values that the variable can store. You can specify your own levels by using the optional levels argument in the factor() function.

houses$price_category <- factor(houses$price_category, levels=c("HIGH", "MEDIUM", "LOW"))

Let's check practice everything you learned. Click Next Exercise to begin.

Instruction

Exercise

Need assistance?

login:
password:
Remember me on this computer

Recipient's Name:
Recipient's Email:
Your Message (Optional):

Create a free accountand start learning now!

What you get?

Sign upand join a company account!

Log in

Your registration has been successfully finished

Wrap course as a gift

Instruction

Exercise

Need assistance?