If you’re familiar with the famous Russian nesting doll, then SQL subqueries should be a peace of cake to understand—subqueries are just queries nested within queries. An SQL subquery is often called an “inner” query; the main query is usually called the “outer” query. This article covers everything you need to know about correlated subqueries.
What Exactly is a Correlated Subquery?
A correlated SQL subquery is just a subquery that is executed many times—once for each record (row) returned by the outer (main) query. In other words, the outer query returns a table with multiple rows; the inner query then runs once for each of those rows. If your outer query returns 10 rows, then the inner query will run 10 times. And if your outer query returns 100 rows, the inner query will run 100 times.
When Should You Use Correlated Subqueries?
Some data questions can only be answered with correlated subqueries. This is particularly true when asking negative data questions.
Negative data questions arise when we search for records that don’t satisfy a particular condition. An example of a simple negative data question is: “Obtain the names of all movies not produced by Steven Spielberg.”
We’ll solve an example of a negative data question in this article. Before we do so, let’s first review the tables involved. We have two tables: the
employee table and the
payment_history table. The second table has a column named
payment_type that denotes whether a payment is part of a regular salary, a bonus, or an award. Let’s see some sample data from the tables we will use:
Here’s the negative data question we’d like to answer:
“Obtain the names of employees who never received an award”
The SQL query that answers the previous question is the following:
SELECT last_name, first_name FROM employee e1 WHERE NOT EXISTS (SELECT ph.last_name FROM payment_history ph WHERE ph.employee_id = e1.employee_id AND ph.payment_type = 'award')
Correlated Subqueries Vs. Simple Subqueries
The main difference between a correlated subquery and a simple subquery is that correlated subqueries reference columns from the table of the outer query.
In the above example,
ph.employee_id = e1.employee_id is a reference to the outer subquery table (e1). To identify a correlated subquery, just look for these kinds of references. If you find at least one, you have a correlated subquery!
The negative part of a data question is often solved in a correlated subquery by using the
NOT EXISTS operator in the
EXISTS is an operator always followed by a subquery. If the subquery returns at least one record, then
EXISTS evaluates to
TRUE. If the subquery returns an empty set, then
EXISTS evaluates to
FALSE. Note that we use
NOT EXISTS, which is simply the opposite of
The result of the previous query is:
In this example, we’ll try to obtain the names of all employees who earned higher salaries in March 2018 than their average monthly salaries for all previous months. Here’s the query we’ll run:
SELECT first_name, last_name FROM employee e1, payment_history ph WHERE e1.employee_id = ph.employee_id AND amount_paid > = ( SELECT AVG(amount_paid) FROM payment_history ph2 WHERE ph2.employee_id = e1.employee_id AND ph2.payment_date < '01/03/2018' AND ph2.payment_type = 'salary' ) AND month(ph.payment_date) =3 AND year(ph.payment_date) = 2018 AND ph.payment_type ='salary'
Enough Negativity. What About Positive Data Questions?
Must we use a correlated subquery to answer a positive data question? No, you don’t have to. You can still do so if you want to, though. For positive questions, we can usually just use a
JOIN condition or a relationship between two tables.
Let’s change our previous question to a positive one and solve it with a
JOIN instead a correlated subquery. The question becomes: “Obtain the names of employees who received award payments.” And the SQL query (without correlated subqueries) that answers this question is:
SELECT first_name, last_name FROM employee e1 JOIN payment_history ph ON ph.employee_id = e1.employee_id WHERE ph.payment_type =award'
The result is:
Word of Caution: Correlated Subqueries Are Generally Slow
I’d just like to mention that we try not to overuse correlated subqueries, if possible. Recall that a correlated subquery runs once for each record returned by the outer query. If the outer query returns thousands upon thousands of records, you can imagine how quickly your query will slow down in performance. In general, you should only use correlated subqueries if they’re absolutely necessary.
Try it Yourself!
We’ve seen that correlated subqueries are an important part of the SQL language and can help us answer different data questions, especially negative ones. We also explained how to recognize a correlated subquery and why we should generally try to avoid correlated subqueries, if possible, due to performance reasons.
To improve your subquery skills, try Vertabelo Academy’s SQL Basics course. We have a specific section devoted to subqueries, with plenty of exercises and examples to help you master the content.