Introduction
Key elements of the visualization process
Environment - the R Language
16. Factors

Instruction

R also has a special data type called factor. Factor type in R is like an enum type in other programming languages: it contains predefined named values like gender (F or M) or education level (for instance: No high school, High school, College graduage, Other)

Factors usually contain categorical values. Have a look at the factor function below:

factor(vector, levels)

So, to make a new factor data type variable, we call the factor command. The first argument is a vector containing numerical, character, or logical values. These values are strictly defined by the second argument, levels. Levels is simply another vector that defines the values that can go into the vector. If any value from vector is not listed in levels, that vector value will be change to NA - a missing value.

For example, suppose you have a gender vector and want to be sure that no values other than "F" or "M" will be in it. You would write:

factor( c("F", "M", "F", "F", "child"), c("F", "M"))

This will return the result:

[1] F    M    F    F  < NA >
Levels: F M

Because the "child" value was not in the levels vector, it was changed to NA.

So the first step in building a factor type variable is to provide a vector of values. Then you define the levels vector as a list of all the allowed values for that specific variable.

We will mainly use factors to set a specific order of values in our vectors. If you want to conduct more complicated operations using factors in R, you should learn about a package called forcats.

Exercise

You want to classify zoo animals into three categories: mammals, birds, and others. You have a vector with this information; however, someone added an additional category and you want to remove it.

Create a proper factor using the vector below and assign it to the my_factor variable:

c("mammals", "mammals", "others", "others", "birds", "fish")

When you're done, press the Run and Check Code button to check your code.

Stuck? Here's a hint!

You should write:

my_factor <- factor(c("mammals", "mammals", "others", "others", "birds", "fish"), c("mammals", "others", "birds"))