Tag: Learn Statistics

Going Down to South Park, Part 3: TF-IDF Analysis with R

South Park Crew
Do you think it's possible to determine what a text document is about? Read on to see how we can use R to programmatically identify the main topics of several South Park seasons and episodes! In the second article of the series, I showed you how to use R to analyze the differences between South Park characters. I assumed that Eric Cartman is the naughtiest character in the show and tested this hypothesis with real data–check it out to see my conclusion if you haven't already.

Going Down to South Park, Part 2: Text Analysis with R

Who do you think is the naughtiest character in South Park? You’ll know the answer by the end of this article—and I’m sure you’ll be surprised! In the previous article of this series, I showed you how to use R to analyze South Park dialog. We mostly focused on the show overall. This time, I’ll take a closer look at the most famous South Park characters. We’ll see how much they talk and how their sentiments change across the show.

Going Down to South Park, Part 1: Text Analysis with R

Have you ever liked a TV show so much that simply watching it wasn’t enough anymore? Read on to discover how I used R to analyze South Park dialog and ratings! South Parkis an American TV show for adults that’s well known for being very satirical—the series has made fun of nearly every celebrity and isn’t afraid to be provocative. I literally watch the show every day. I also do lots of data analysis in R every day!

High-Performance Statistical Queries: Dependencies Between Continuous and Discrete Variables

In my previous articles, I explained how you can check for associations between two continuous and two discrete variables. This time, we’ll check for linear dependencies between continuous and discrete variables. You can do this by measuring the variance between the means of the continuous variable and different groups of the discrete variable. The null hypothesis here is that all variances between the means are a result of the variance within each group.