Deals Of The Week - hours only!Up to 80% off on all courses and bundles.-Close
Changing column data types
Basic data cleaning
10. Duplicate values
Summary

Instruction

Unfortunately, something has gone very wrong; we seem to have duplicate values in our dataset. To double check if this is really the case, we can use the duplicated() function which will return a vector of TRUEs (if a given value of the vector is duplicated) and FALSEs (otherwise). We use it like this:

duplicates <- duplicated(dataset_name)

Having this vector we may start subsetting. But be careful! We cannot write:

dataset_name <- dataset_name[duplicates, ]

That will keep only the rows where duplicates is TRUE, which means we will have only the duplicate values. What we want is the opposite, so we will write:

dataset_name <- dataset_name[!duplicates, ]

Note the use of the NOT operator (!).

Exercise

Check for duplicates in the survey dataset using the duplicated() function. Assign the resulting vector to the duplicates variable. Remove the duplicates from the survey variable.