Deals Of The Week - hours only!Up to 80% off on all courses and bundles.-Close


Well done! In the previous exercise the phone variable consisted of the following categories: "N", "Y", and "Yes". However, "Yes" represents the same information as "Y". Having two categories with the same meaning causes unnecessary confusion. For example, if you're interested in properties that have a phone, you'll use the following condition:

houses$phone = "Y" | houses$phone = "Yes"

We need to somehow ensure that new phone values entering the table can only belong to one of two possible categories: "Y" or "N". Is there some way we can ensure the value "Yes" doesn't find its way into the phone variable?

Certainly! We can do so by using factors. Factors are special data types in R that are used to create categorical variables. You can create a factor by passing in any vector to the factor() function, like this:


The only required argument for this function is the vector that you would like to use to create the factor. You can assign this factor to a new variable phone_factor, like this:

houses$phone_factor <- factor(houses$phone)


Convert the phone column to a factor and add it to the houses data frame as a new column named phone_factor. Observe the result.