Introduction
Missing values
Duplicate rows
Outliers
11. Dealing with outliers
Joining datasets
Summary

Instruction

Great! We've identified an outlier in our dataset – now, what should we do about it?

That's another difficult question, because there is no universal answer. An outlier may be a mistake in the data. If that's the case, you can try to fix the erroneous value. If you can't fix the value, it's best to remove it so that it doesn't distort your data. An outlier can also be a correct, albeit atypical value. In this case, your approach depends on what you intend to do with the data. The rule of thumb is this: if you don't really need a specific value in place of the outlier, just delete the row – if you delete the outlier, it will not distort your analysis results.

A quick recap: to remove the row with index 5 from the cars DataFrame, just use:

cars = cars.drop(5)

Exercise

The year 2000 with the temperature value of 280.0 has an index 10. Delete this row from the temperatures DataFrame. Store the result in temperatures.

Stuck? Here's a hint!

Use:

temperatures = temperatures.drop(index)

Instead of "index", provide the index from the exercise instruction.