Tidyr
2. Separating character columns
Changing data format
Summary

Instruction

Sometimes, we have got two values in one column. The name column contains both first and last name of a person, joined by underscore. We can easily separate them into two new columns, using the separate() function:

separate(users, name,
  into = c("first_name", "last_name"),
  sep = "_")

The arguments, in the order you will use them, are the dataset name (users), the column to separate (name), and the names of the columns that will hold the separated information (first_name and last_name).

There is an optional argument, sep, that tells R what the separator character is (e.g., a comma). We almost don't need this here because R performs separate() using a group of non-alphanumeric signs – sep should be _ in our case, but not all names in our data are punctuated similarly. For example, someone's name might be Lyndsey_D'Alessio. In that case, R would actually throw a warning and truncate Lyndsey's last name to just 'D'.

Exercise

Now, we want to separate email addresses into the email name and domain. Use the email column from users, and create the email_name and email_domain columns. Use the "at" symbol, @, as the sep argument. Assign the results to the users_new variable, and look at its first rows using head().

Stuck? Here's a hint!

Type:

users_new <- separate(users, email, 
  into = c("email_name", "email_domain"),
  sep = "@")
head(users_new)