R also has a special data type called factor. Factor type in R is like an enum type in other programming languages: it contains predefined named values like gender (F or M) or education level (for instance: No high school, High school, College graduage, Other)
Factors usually contain categorical values. Have a look at the
factor function below:
So, to make a new factor data type variable, we call the
factor command. The first argument is a
vector containing numerical, character, or logical values. These values are strictly defined by the second argument,
levels. Levels is simply another vector that defines the values that can go into the
vector. If any value from vector is not listed in
levels, that vector value will be change to
NA - a missing value.
For example, suppose you have a gender vector and want to be sure that no values other than "F" or "M" will be in it. You would write:
factor( c("F", "M", "F", "F", "child"), c("F", "M"))
This will return the result:
 F M F F < NA >
Levels: F M
Because the "child" value was not in the levels vector, it was changed to
So the first step in building a
factor type variable is to provide a vector of values. Then you define the
levels vector as a list of all the allowed values for that specific variable.
We will mainly use factors to set a specific order of values in our vectors. If you want to conduct more complicated operations using factors in R, you should learn about a package called forcats.