Good! Duplicate rows are not easy to identify at first sight. Luckily, just like with
NaN values, we can check if a given DataFrame or specific column contains duplicates. To look for duplicates in a DataFrame named
cars, we can write:
To check if a specific column named
vin has any duplicate rows, we can write:
Both of these expressions return
True if there is at least one duplicate row.
Usually, it makes more sense to check for duplicates in the whole DataFrame rather than in single columns. For instance, in our case, two states could generate the exact same sales figures – and it would be perfectly legit.