Deals Of The Week - hours only!Up to 80% off on all courses and bundles.-Close
Changing column data types
Basic data cleaning
Summary
11. Summary & Exercise 1

Instruction

In this part, we've learned some basic functions to clean our data. We know how to:

  • Change the data type of a column using functions such as as.factor() or as.character():
    survey$email <- as.character(survey$email)
  • Change columns' names using the colnames() function:
    colnames(survey)[1] <- c("first_name")
  • Delete rows of data to remove duplicates from our dataset by subsetting the data frame with the output of the duplicated() function:
    duplicates <- duplicated(survey)
    survey <- survey[!duplicates, ]
    
  • Separate and join columns with the separate() and unite() functions (from the tidyr package):
    library(tidyr)
    
    survey <- separate(
      survey,
      col = "name",
      sep = "_",
      into = c("first_name", "last_name"))
    
    united_names <- unite(
      survey,
      col = "name",
      sep = "_",
      first_name,
      last_name)
    

Let's review all this information with a short quiz.

Exercise

Load into memory the data/students.csv file and assign it to the students variable. It contains data about students from the Southview Academy school.