In this part of our course, we learned about data frames — basic data structures that are used in R to store tabular data. Let's summarize what we learned.
Data frames are made up of columns and rows. There are several functions in R that we can use to examine the structure of a data frame:
head() is used for displaying fixed number of rows of a data frame (six by default),
ncol() are used for displaying the number of rows and columns in a data frame.
There are several ways to access a data frame column:
- via an index: you can access a single column or multiple columns using column indexes, which start at 1:
cities[, c(3, 4, 5)]
- with the column's name:
cities[, c("city", "population")]
- with the
For row filtering, you can specify conditions and even combine multiple conditions using logical operators
& ("and") and
| ("or"). For example:
cities[cities$country == "Germany", ]
This will retrieve only those rows for which the condition
cities$country == "Germany" is
You can apply the standard analysis functions, such as
summary(), and so on to analyze a filtered data frame. Distributions of numeric variables can be visualized with histograms via the
In the exercises that follow, we will review everything we learned in this part of our course. Let's begin!