First, you have to calculate the mean()
and the std()
of the rating column. Let's assign them to variables for the sake of the code cleanliness:
movies_mean = movies['rating'].mean()
movies_std = movies['rating'].std()
Then you have to select only these movies that have rating smaller than movies_mean
- movies_std
or greater than movies_mean
+ movies_std
:
greater_rating = movies['rating'] > (movies_mean + movies_std)
smaller_rating = movies['rating'] < (movies_mean - movies_std)
Use the loc()
function to select the rating
, filter it with the appropriate conditions, and use the count()
in order to count the number of remaining rows:
movies.loc[greater_rating | smaller_rating, 'rating'].count()