Great! Let's meet one more function in the apply family: tapply()
(the name is short of "table apply").
tapply()
is different; it's used when you need to apply a function to some group of members – that is, on a subset of the data, not on each individual member.
Let's explain this concept with a data frame named monday_statistics
and our sales list. The column named monday
in monday_statistics
represents the number of bracelets sold per hour. Another column named monday_admin
gives us the name of the site administrator who made sure the site was working properly during those hours.
If you want to see the total number of bracelets sold, grouped by site administrator on Monday, you would use tapply()
like this:
tapply(
X = monday_statistics$monday,
INDEX = monday_statistics$monday_admin,
FUN = sum)
This function takes three arguments:
X
– the vector (column) we want to analyze (monday_statistics$monday
).
INDEX
– the vector (column) for grouping the data (monday_statistics$monday_admin
).
FUN
– the function that we want to apply to each subgroup (sum).
The code above gives us this result:
Alex John Tanya
1208 1204 1202
For each administrator, tapply()
returned the total sales on Monday. As you can see Alex, John, and Tanya form the three groups. The values in the monday vector are summarized per group – the same function (sum()
) was applied to all three groups.