Correlated SQL Subqueries For Newbies

""

Correlated subqueries are the only way to solve some SQL statements. But they can be very slow. In this post, we’ll talk about why, how, and when to use them.

Subqueries are an important resource for increasing the expressive power of SQL. If you haven’t read our previous article, subqueries are simply a SELECT statement inside another SELECT. We can use them in different places inside a SELECT, such as in the WHERE, HAVING, or FROM clauses. We can even use them in other statements like UPDATE or DELETE.

Like simple subqueries, a correlated subquery contains a query within a query. It gets its name because the two queries are related; the inner query uses information obtained from the outer query (e.g. from a table referenced in the outer query). For the programmers among you, this is similar to a nested loop structure.

Before we examine correlated subqueries, let’s have a quick refresher on simple (non-correlated) subqueries.

A Simple Subquery

The following SQL example uses a simple subquery to obtain a list of employees who earn more than the average company salary. In this case, the subquery is shown in bolded text, and its result is the company’s average salary. The outer query compares each employee’s salary with the company average. Easy!

SELECT 
  lastname, 
  firstname, 
  salary 
FROM employee 
WHERE salary > (SELECT avg(salary) 
                FROM employee)

Example 1: A simple non-correlated subquery

Next up are correlated subqueries. These are very special, because sometimes they are the only way to solve a query. However, correlated subqueries must be used carefully. They can be slow, as we will explain later.

Using a Correlated Subquery

Let’s begin with an example of a correlated subquery in SQL. Suppose we want to find all employees with a salary higher than their average departmental salary. We would use the following query. Once again, I’ve bolded the subquery:

SELECT 
  lastname, 
  firstname, 
  salary
FROM employee e1
WHERE e1.salary > (SELECT avg(salary)
                   FROM employee e2 
                   WHERE e2.dept_id = e1.dept_id)

Example 2: A correlated subquery

The main difference between a correlated subquery and a simple subquery is that correlated subqueries reference columns from the outer table. In the above example, e1.dept_id is a reference to the outer subquery table. To identify a correlated query, just look for these kinds of references. If you find at least one, you have a correlated subquery!

Let’s look at another example. Suppose we want to obtain the names of departments that have more than 10 employees. We can use the following correlated subquery:

SELECT deptname
FROM department d1
WHERE 10 < (SELECT count(*)
            FROM employee e
            WHERE e.dept_id = d1.dept_id)

Example 3: Another correlated subquery

Time for one last example. We mentioned that subqueries can be part of the WHERE, FROM, HAVING and SELECT clauses. In this example, we will use a correlated subquery in the SELECT list to find every employee name, their salary, and the average salary of their department. We will obtain the average salary using a correlated subquery inside the SELECT.

Here’s the code:

SELECT 	
  lastname,
  firstname, 
  salary,
  (SELECT avg(salary) 
    FROM employee e2
    WHERE e2.dep_id = e1.dep_id) AS avg_dept_salary
FROM employee e1

Example 4: A correlated subquery in the SELECT list

How Many Times Is a Correlated Subquery Executed?

Suppose we have a table called “assigned_to_project” that stores the names of employees assigned to projects. We want to find all employees who are not assigned to any projects. The solution is the following query:

SELECT 
  lastname, 
  firstname, 
  salary
FROM employee e1
WHERE NOT EXISTS (SELECT project_id
                  FROM assigned_to_project 
                  WHERE employee_id = e1.employee_id)

Example 5: A correlated subquery

First of all, this query is easy to analyze. The NOT EXISTS clause is TRUE when the subquery returns an empty result set. This happens only for those employees not assigned to any project. Again, quite easy!

However the purpose of this section is to analyze how many times the correlated subquery is executed. Try and figure out which of the following statements is right:

  1. It is executed just once.
  2. It is executed once for each project.
  3. It is executed once for each employee not assigned to any project.
  4. It is executed once for each employee in the company.

Let’s analyze these options. Suppose we have 1,000 employees and 20 projects. Further, we have 800 employees who are already assigned to a project. In this case, the first option has one execution, the second has 20 executions, the third has 200, and the final option has 1,000.

If you said that the correct answer was “once for each employee in the company”. you’re correct. If we go through the statement, this answer becomes obvious; we need to check if each employee has projects or not. According to the above example, though, this means that the subquery will be executed 1,000 times! In terms of performance, this is the worst scenario of the four.

Because correlated subqueries SQL tend to imply many executions, they also tend to be slow. As a rule, we try to avoid correlated subqueries because of this. But as we already mentioned, sometimes correlation is the only way to solve a query.

 

correlated subquery, standard sql

When to Use a Correlated Subquery

At this point of the article the reader should be thinking: Just when should I use correlated subqueries? There are some cases when we must. This is particularly true in queries where we’re looking for what we might call negatives.

Here’s an example of a ‘negative’ query. Let’s assume we have a payment history table with a column called payment_type that indicates if a payment is regular salary, a bonus, or an award. If we want a query to return the employees who never received an award, we would use this query:

SELECT 
  lastname, 
  firstname
FROM employees e1
WHERE NOT EXISTS (SELECT ph.lastname 
                  FROM payment_history ph 
                  WHERE ph.emp_id = e1.employee_id 
                  AND ph.payment_type =’award’)

Example 6: A correlated subquery using NOT EXISTS

EXISTS is an unary operator. It has only one operand, which is a subquery (correlated or not). If the subquery returns at least one record, then EXISTS returns TRUE. If the subquery returns no records, EXISTS returns FALSE. In this case, you must use a correlated subquery to get your results.

Using Correlated Subqueries In UPDATE or DELETE Statements

Sometimes we find correlated subqueries in UPDATE or DELETE statements. The following UPDATE has a correlated subquery that obtains the new value of the all_money_made column:

UPDATE employee emp
SET all_money_made = (SELECT SUM(payment)
                      FROM payment_history 
                      WHERE employee_id = emp.emp_id)

Example 7: A correlated subquery in an UPDATE

This article has shown us when to use a correlated subquery. Queries looking for negatives are good candidates, although there are other occasions when a correlation is the only real option. We’ve also seen how many times a correlated subquery is executed – usually many, many times. This is their biggest drawback.

Try It Yourself

Correlated subqueries are an important resource for the SQL developer. To learn more and improve your subquery skills, try Vertabelo Academy’s SQL Basics course. We have a specific section for subqueries, plus plenty of exercises and examples!

Maria Alcaraz

Former Freelance Database Developer, Mother of 4 children

comments powered by Disqus

GET ACCESS TO EXPERT CONTENT!

Over 85.000 happy students
and counting!