Correlated subqueries are the only way to solve some SQL statements. But they can be very slow. In this post, we’ll talk about why, how, and when to use them.
Subqueries are an important resource for increasing the expressive power of SQL. If you haven’t read our previous article, subqueries are simply a
SELECT statement inside another
SELECT. We can use them in different places inside a
SELECT, such as in the
FROM clauses. We can even use them in other statements like
Like simple subqueries, a correlated subquery contains a query within a query. It gets its name because the two queries are related; the inner query uses information obtained from the outer query (e.g. from a table referenced in the outer query). For the programmers among you, this is similar to a nested loop structure.
Before we examine correlated subqueries, let’s have a quick refresher on simple (non-correlated) subqueries.
— Vertabelo (@Vertabelo) December 12, 2016
A Simple Subquery
The following SQL example uses a simple subquery to obtain a list of employees who earn more than the average company salary. In this case, the subquery is shown in bolded text, and its result is the company’s average salary. The outer query compares each employee’s salary with the company average. Easy!
SELECT lastname, firstname, salary FROM employee WHERE salary > (SELECT avg(salary) FROM employee)
Example 1: A simple non-correlated subquery
Next up are correlated subqueries. These are very special, because sometimes they are the only way to solve a query. However, correlated subqueries must be used carefully. They can be slow, as we will explain later.
Using a Correlated Subquery
Let’s begin with an example of a correlated subquery in SQL. Suppose we want to find all employees with a salary higher than their average departmental salary. We would use the following query. Once again, I’ve bolded the subquery:
SELECT lastname, firstname, salary FROM employee e1 WHERE e1.salary > (SELECT avg(salary) FROM employee e2 WHERE e2.dept_id = e1.dept_id)
Example 2: A correlated subquery
The main difference between a correlated subquery and a simple subquery is that correlated subqueries reference columns from the outer table. In the above example,
e1.dept_id is a reference to the outer subquery table. To identify a correlated query, just look for these kinds of references. If you find at least one, you have a correlated subquery!
Let’s look at another example. Suppose we want to obtain the names of departments that have more than 10 employees. We can use the following correlated subquery:
SELECT deptname FROM department d1 WHERE 10 < (SELECT count(*) FROM employee e WHERE e.dept_id = d1.dept_id)
Example 3: Another correlated subquery
Time for one last example. We mentioned that subqueries can be part of the
SELECT clauses. In this example, we will use a correlated subquery in the
SELECT list to find every employee name, their salary, and the average salary of their department. We will obtain the average salary using a correlated subquery inside the
Here’s the code:
SELECT lastname, firstname, salary, (SELECT avg(salary) FROM employee e2 WHERE e2.dep_id = e1.dep_id) AS avg_dept_salary FROM employee e1
Example 4: A correlated subquery in the SELECT list
How Many Times Is a Correlated Subquery Executed?
Suppose we have a table called “
assigned_to_project” that stores the names of employees assigned to projects. We want to find all employees who are not assigned to any projects. The solution is the following query:
SELECT lastname, firstname, salary FROM employee e1 WHERE NOT EXISTS (SELECT project_id FROM assigned_to_project WHERE employee_id = e1.employee_id)
Example 5: A correlated subquery
First of all, this query is easy to analyze. The
NOT EXISTS clause is
TRUE when the subquery returns an empty result set. This happens only for those employees not assigned to any project. Again, quite easy!
However the purpose of this section is to analyze how many times the correlated subquery is executed. Try and figure out which of the following statements is right:
- It is executed just once.
- It is executed once for each project.
- It is executed once for each employee not assigned to any project.
- It is executed once for each employee in the company.
Let’s analyze these options. Suppose we have 1,000 employees and 20 projects. Further, we have 800 employees who are already assigned to a project. In this case, the first option has one execution, the second has 20 executions, the third has 200, and the final option has 1,000.
If you said that the correct answer was “once for each employee in the company”. you’re correct. If we go through the statement, this answer becomes obvious; we need to check if each employee has projects or not. According to the above example, though, this means that the subquery will be executed 1,000 times! In terms of performance, this is the worst scenario of the four.
Because correlated subqueries SQL tend to imply many executions, they also tend to be slow. As a rule, we try to avoid correlated subqueries because of this. But as we already mentioned, sometimes correlation is the only way to solve a query.
When to Use a Correlated Subquery
At this point of the article the reader should be thinking: Just when should I use correlated subqueries? There are some cases when we must. This is particularly true in queries where we’re looking for what we might call negatives.
Here’s an example of a ‘negative’ query. Let’s assume we have a payment history table with a column called
payment_type that indicates if a payment is regular salary, a bonus, or an award. If we want a query to return the employees who never received an award, we would use this query:
SELECT lastname, firstname FROM employees e1 WHERE NOT EXISTS (SELECT ph.lastname FROM payment_history ph WHERE ph.emp_id = e1.employee_id AND ph.payment_type =’award’)
Example 6: A correlated subquery using NOT EXISTS
EXISTS is an unary operator. It has only one operand, which is a subquery (correlated or not). If the subquery returns at least one record, then
TRUE. If the subquery returns no records,
FALSE. In this case, you must use a correlated subquery to get your results.
Using Correlated Subqueries In UPDATE or DELETE Statements
Sometimes we find correlated subqueries in UPDATE or DELETE statements. The following
UPDATE has a correlated subquery that obtains the new value of the
UPDATE employee emp SET all_money_made = (SELECT SUM(payment) FROM payment_history WHERE employee_id = emp.emp_id)
Example 7: A correlated subquery in an UPDATE
This article has shown us when to use a correlated subquery. Queries looking for negatives are good candidates, although there are other occasions when a correlation is the only real option. We’ve also seen how many times a correlated subquery is executed – usually many, many times. This is their biggest drawback.
Try It Yourself
Correlated subqueries are an important resource for the SQL developer. To learn more and improve your subquery skills, try Vertabelo Academy’s SQL Basics course. We have a specific section for subqueries, plus plenty of exercises and examples!