Deals Of The Week - hours only!Up to 80% off on all courses and bundles.-Close
Changing column data types
Basic data cleaning
10. Duplicate values
Summary

## Instruction

Unfortunately, something has gone very wrong; we seem to have duplicate values in our dataset. To double check if this is really the case, we can use the duplicated() function which will return a vector of TRUEs (if a given value of the vector is duplicated) and FALSEs (otherwise). We use it like this:

duplicates <- duplicated(dataset_name)

Having this vector we may start subsetting. But be careful! We cannot write:

dataset_name <- dataset_name[duplicates, ]

That will keep only the rows where duplicates is TRUE, which means we will have only the duplicate values. What we want is the opposite, so we will write:

dataset_name <- dataset_name[!duplicates, ]

Note the use of the NOT operator (!).

## Exercise

Check for duplicates in the survey dataset using the duplicated() function. Assign the resulting vector to the duplicates variable. Remove the duplicates from the survey variable.