Introduction
Missing values
Duplicate rows
Outliers
Joining datasets
13. Merging two DataFrames – common column
Summary

Instruction

Good! Now that we have both DataFrames, it's time to merge them. Suppose we have two DataFrames – cars and owners:

cars_with_owners = cars.merge(owners, on='owner_id')

The code above creates a new variable cars_with_owners that contains data from both cars and owners. How does Python know how to combine these two datasets? It uses the on column – in this case, the owner_id:

Outer merge

Python looks for rows that have the same value in owner_id and matches them. Usually, the join column is some sort of ID that is consistent across multiple CSV files.

Exercise

Create a new variable employee_all that will contain data from employee_names and employee_salaries. Use the employee_id column to merge both datasets. Finally, show the contents of employee_all.

Stuck? Here's a hint!

Start with:

employee_all = employee_names.merge(...)

Complete the code in the parentheses.