In my previous articles, I explained how you can check for associations between two continuous and two discrete variables. This time, we’ll check for linear dependencies between continuous and discrete variables. You can do this by measuring the variance between the means of the continuous variable and different groups of the discrete variable. The null hypothesis here is that all variances between the means are a result of the variance within each group.
In my previous article, we looked at how you can calculate linear dependencies between two continuous variables with covariance and correlation. Both methods use the means of the two variables in their calculations. However, mean values and other population moments make no sense for categorical (nominal) variables. For instance, if you denote “Clerical” as 1 and “Professional” as 2 for an occupation variable, what does the average of 1.5 signify?
My previous article explained how to calculate frequencies using T-SQL queries. Frequencies are used to analyze the distribution of discrete variables. Today, we’ll continue learning about statistics and SQL. In particular, we’ll focus on calculating centers of distribution. In statistics, certain measurements are known as moments. You can describe continuous variables (i.e. a variable that has a large range of possible numbers, such as household incomes in a country) with population moments.