Well done! In the previous exercise the phone
variable consisted of the following categories: "N", "Y", and "Yes". However, "Yes" represents the same information as "Y". Having two categories with the same meaning causes unnecessary confusion. For example, if you're interested in properties that have a phone, you'll use the following condition:
houses$phone = "Y" | houses$phone = "Yes"
We need to somehow ensure that new phone
values entering the table can only belong to one of two possible categories: "Y" or "N". Is there some way we can ensure the value "Yes" doesn't find its way into the phone
variable?
Certainly! We can do so by using factors. Factors are special data types in R that are used to create categorical variables. You can create a factor by passing in any vector to the factor()
function, like this:
factor(houses$phone)
The only required argument for this function is the vector that you would like to use to create the factor. You can assign this factor to a new variable phone_factor
, like this:
houses$phone_factor <- factor(houses$phone)