Correlated Subqueries in 5 Minutes

sql correlated subquery, sql subquery, correlated subquery sql server, when to use correlated subquery, correlated subquery tutorial, non correlated subquery, subquery vs correlated subquery, correlated subquery oracle, correlated subquery mysql

If you’re familiar with the famous Russian nesting doll, then SQL subqueries should be a peace of cake to understand—subqueries are just queries nested within queries. An SQL subquery is often called an “inner” query; the main query is usually called the “outer” query. This article covers everything you need to know about correlated subqueries.

What Exactly is a Correlated Subquery?

A correlated SQL subquery is just a subquery that is executed many times—once for each record (row) returned by the outer (main) query. In other words, the outer query returns a table with multiple rows; the inner query then runs once for each of those rows. If your outer query returns 10 rows, then the inner query will run 10 times. And if your outer query returns 100 rows, the inner query will run 100 times.

When Should You Use Correlated Subqueries?

Some data questions can only be answered with correlated subqueries. This is particularly true when asking negative data questions.

Negative data questions arise when we search for records that don’t satisfy a particular condition. An example of a simple negative data question is: “Obtain the names of all movies not produced by Steven Spielberg.”

We’ll solve an example of a negative data question in this article. Before we do so, let’s first review the tables involved. We have two tables: the employee table and the payment_history table. The second table has a column named payment_type that denotes whether a payment is part of a regular salary, a bonus, or an award. Let’s see some sample data from the tables we will use:

employee

employee_id payment_type amount_paid payment_date
100 salary 2000.00 2018-Mar-02
101 salary 1800.00 2018-Mar-02
102 salary 1900.00 2018-Mar-02
101 award 500.00 2018-Mar-08
102 adjustment 124.70 2018-Mar-10

payment_history

employee_id payment_type amount_paid payment_date
100 salary 2000.00 2018-Mar-02
101 salary 1800.00 2018-Mar-02
102 salary 1900.00 2018-Mar-02
101 award 500.00 2018-Mar-08
102 adjustment 124.70 2018-Mar-10

Here’s the negative data question we’d like to answer:

“Obtain the names of employees who never received an award”

The SQL query that answers the previous question is the following:

SELECT last_name, first_name
FROM    employee e1
WHERE NOT EXISTS (SELECT ph.last_name 
                                      FROM payment_history ph 
                                      WHERE ph.employee_id = e1.employee_id 
                                      AND ph.payment_type = 'award')

Correlated Subqueries Vs. Simple Subqueries

The main difference between a correlated subquery and a simple subquery is that correlated subqueries reference columns from the table of the outer query.

In the above example, ph.employee_id = e1.employee_id is a reference to the outer subquery table (e1). To identify a correlated subquery, just look for these kinds of references. If you find at least one, you have a correlated subquery!

The negative part of a data question is often solved in a correlated subquery by using the NOT EXISTS operator in the WHERE clause. EXISTS is an operator always followed by a subquery. If the subquery returns at least one record, then EXISTS evaluates to TRUE. If the subquery returns an empty set, then EXISTS evaluates to FALSE. Note that we use NOT EXISTS, which is simply the opposite of EXISTS.

The result of the previous query is:

first_name last_name
John Smith
Alice Johnson

Another Example

In this example, we’ll try to obtain the names of all employees who earned higher salaries in March 2018 than their average monthly salaries for all previous months. Here’s the query we’ll run:

SELECT    first_name, last_name 
FROM       employee e1, payment_history ph
WHERE    e1.employee_id = ph.employee_id 
     AND     amount_paid > = (
                               SELECT AVG(amount_paid) FROM payment_history ph2
                               WHERE ph2.employee_id = e1.employee_id
                                     AND ph2.payment_date  < '01/03/2018'
                                     AND ph2.payment_type = 'salary' 
                               )
     AND    month(ph.payment_date) =3 
     AND    year(ph.payment_date) = 2018 
     AND    ph.payment_type ='salary'

Enough Negativity. What About Positive Data Questions?

Must we use a correlated subquery to answer a positive data question? No, you don’t have to. You can still do so if you want to, though. For positive questions, we can usually just use a JOIN condition or a relationship between two tables.

Let’s change our previous question to a positive one and solve it with a JOIN instead a correlated subquery. The question becomes: “Obtain the names of employees who received award payments.” And the SQL query (without correlated subqueries) that answers this question is:

SELECT    first_name, last_name 
FROM       employee e1 
                 JOIN payment_history ph ON  ph.employee_id = e1.employee_id 
WHERE    ph.payment_type =award'

The result is:

first_name last_name
Kate Miller

Word of Caution: Correlated Subqueries Are Generally Slow

I’d just like to mention that we try not to overuse correlated subqueries, if possible. Recall that a correlated subquery runs once for each record returned by the outer query. If the outer query returns thousands upon thousands of records, you can imagine how quickly your query will slow down in performance. In general, you should only use correlated subqueries if they’re absolutely necessary.

Try it Yourself!

We’ve seen that correlated subqueries are an important part of the SQL language and can help us answer different data questions, especially negative ones. We also explained how to recognize a correlated subquery and why we should generally try to avoid correlated subqueries, if possible, due to performance reasons.

To improve your subquery skills, try Vertabelo Academy’s SQL Basics course. We have a specific section devoted to subqueries, with plenty of exercises and examples to help you master the content.

Ignacio is a database consultant from Buenos Aires, Argentina. He’s worked for 15 years as a database consultant for IT companies like Informix and IBM. These days, he teaches databases at Sarmiento University and works as a PostgreSQL independent SQL consultant. A proud father of four kids with 54 years in his backpack, Ignacio plays soccer every Saturday afternoon, enjoying every match as if it’s his last one.

GET ACCESS TO EXPERT SQL CONTENT!