High-Performance Statistical Queries: Dependencies Between Continuous and Discrete Variables

In my previous articles, I explained how you can check for associations between two continuous and two discrete variables. This time, we’ll check for linear dependencies between continuous and discrete variables. You can do this by measuring the variance between the means of the continuous variable and different groups of the discrete variable. The null hypothesis here is that all variances between the means are a result of the variance within each group. If you reject this null hypothesis, then

Continue Reading

High-Performance Statistical Queries: Dependencies Between Discrete Variables

In my previous article, we looked at how you can calculate linear dependencies between two continuous variables with covariance and correlation. Both methods use the means of the two variables in their calculations. However, mean values and other population moments make no sense for categorical (nominal) variables. For instance, if you denote “Clerical” as 1 and “Professional” as 2 for an occupation variable, what does the average of 1.5 signify? You have to find another test for dependencies—a test that

Continue Reading

High Performance Statistical Queries: Linear Dependencies Between Continuous Variables

In my previous articles, I dealt with analyses of only a single variable. Now it is time to check whether two variables of interest are independent or somehow related. For example, a person’s height positively correlates with shoe size. Taller people have larger shoe sizes, and shorter people have smaller shoe sizes. You can find this and many more examples of positive associations at: http://examples.yourdictionary.com/positive-correlation-examples.html. A negative association is also possible. For example, an increase in the speed at which

Continue Reading

High Performance Statistical Queries –Skewness and Kurtosis

In descriptive statistics, the first four population moments include center, spread, skewness, and kurtosis or peakedness of a distribution. In this article, I am explaining the third and fourth population moments, the skewness and the kurtosis, and how to calculate them. Mean uses the values on the first degree in the calculation; therefore, it is the first population moment. Standard deviation uses the squared values and is therefore the second population moment. Skewness is the third, and kurtosis is the

Continue Reading

High Performance Statistical Queries in SQL: Part 3 – Measuring the Spread of a Distribution

Besides knowing the centers of a distribution in your data, you need to know how varied the observations are. In this article, we’ll explain how to find the spread of a distribution. Are you dealing with a very uniform or a very spread population? To really understand what the numbers are saying, you must know the answer to this question. In the second part of this series, we discussed how to calculate centers of distribution. Like the center, the spread can

Continue Reading

High-Performance Statistical Queries in SQL: Part 2 – Calculating Centers of Distribution

My previous article explained how to calculate frequencies using T-SQL queries. Frequencies are used to analyze the distribution of discrete variables. Today, we’ll continue learning about statistics and SQL. In particular, we’ll focus on calculating centers of distribution. In statistics, certain measurements are known as moments. You can describe continuous variables (i.e. a variable that has a large range of possible numbers, such as household incomes in a country) with population moments. These moments give you insight into the distribution

Continue Reading

GET ACCESS TO EXPERT SQL CONTENT!