R also has a special data type called factor. Factor type in R is like an enum type in other programming languages: it contains predefined named values like gender (F or M) or education level (for instance: No high school, High school, College graduage, Other)
Factors usually contain categorical values. Have a look at the factor
function below:
factor(vector, levels)
So, to make a new factor data type variable, we call the factor
command. The first argument is a vector
containing numerical, character, or logical values. These values are strictly defined by the second argument, levels
. Levels is simply another vector that defines the values that can go into the vector
. If any value from vector is not listed in levels
, that vector value will be change to NA
- a missing value.
For example, suppose you have a gender vector and want to be sure that no values other than "F" or "M" will be in it. You would write:
factor( c("F", "M", "F", "F", "child"), c("F", "M"))
This will return the result:
[1] F M F F < NA >
Levels: F M
Because the "child" value was not in the levels vector, it was changed to NA
.
So the first step in building a factor
type variable is to provide a vector of values. Then you define the levels
vector as a list of all the allowed values for that specific variable.
We will mainly use factors to set a specific order of values in our vectors. If you want to conduct more complicated operations using factors in R, you should learn about a package called forcats.