The article describes what a subquery is and what these useful statements look like. We will cover basic examples with the IN, EXISTS, ANY, and ALL operators, look at subqueries in from and where clauses, and explore the difference between correlated and nested subqueries.
First, let’s start with an example database. To present some of these statements we need to have an example table and fill it with some data.
What is a Subquery?
A subquery is a SELECT statement with another SQL statement, like in the example below.
select * from product where id in (select product_id from provider_offer where provider_id = 156);
Subqueries are further classified as either a correlated subquery or a nested subquery. They are usually constructed in such a way to return:
- a table
select max(average.average_price) from (select product_category, avg(price) as average_price from product group by product_category) average;
- or a value
select id from purchase where value > (select avg(value) from purchase )
Nested subqueries are subqueries that don’t rely on an outer query. In other words, both queries in a nested subquery may be run as separate queries.
This type of subquery could be used almost everywhere, but it usually takes one of these formats:
SELECT FROM WHERE [NOT] IN (subquery)
Select * from client where city in (select city from provider);
The example subquery returns all clients that are from the same city as the product providers.
The IN operator checks if the value is within the table and retrieves the matching rows.
SELECT FROM WHERE expression comparison_operator [ANY| ALL] (subquery)
Subquery With ALL Operator
The ALL operator compares a value to every value from the result table.
For example, the following query returns all of the models and producers of bikes that have a price greater than the most expensive headphones.
select producer, model from product where product_category = 'bike' and price > ALL (select price from product where product_category = 'headphones');
Similar subquery but with ANY operator:
Subquery With ANY Operator
The ANY operator compares a value to each value in a table and evaluates whether or not the result of an inner query contains at least one row.
The following query returns all of the models and producers of bikes that have a price greater than at least one of the headphones.
select producer, model from product where product_category = 'bike' and price > ANY (select price from product where product_category = 'headphones');
You can also nest a subquery in another aubquery. For example:
Subquery Nested in Another Subquery Using IN Operator
This query returns producers and models of bikes that exist in provider’s offers from the USA.
select producer, model from product where product_category = 'bike' and id in (select distinct product_id from provider_offer where provider_id in (select id from provider where country = 'USA'));
The same could be done using joins.
select product.producer, product.model from product, provider_offer, provider where provider_offer.product_id = product.id and provider_offer.provider_id = provider.id and product_category = 'bike' and provider.country = 'USA';
Subqueries are correlated when the inner and outer queries are interdependent, that is, when the outer query is a query that contains a subquery and the subquery itself is an inner query. Users that know programming concepts may compare it to a nested loop structure.
Let’s start with a simple example.
The inner query calculates the average value and returns it. In the outer query’s where clause, we filter only those purchases which have a value greater than the inner query’s returned value.
Subquery Correlated in Where Clause
select id from purchase p1 where date > '2013-07-15' and value > (select avg(value) from purchase p2 where p1.date = p2.date)
The query returns purchases after 15/07/2014 with a total price greater than the average value from the same day.
The equivalent example, but with joining tables.
Select p1.id from purchase p1, purchase p2 where p1.date = p2.date and p1.date > '2013-07-15' group by p1.id having p1.value > avg(p2.value);
This example can also be written as a select statement with a subquery correlated in a from clause.
The subquery returns the table that contains the average value for each purchase for each day. We join this result with the Purchase table on column ‘date’ to check the condition date > ’15/07/2014′.
select id from purchase, (select date, avg(value) as average_value from purchase where date > '2013-07-15' group by date) average where purchase.date = average.date and purchase.date > '2013-07-15' and purchase.value > average.average_value;
Usually, this kind of subquery should be avoided because indexes can’t be used on a temporary table in memory.
Subquery With EXISTS
SELECT FROM WHERE [NOT] EXISTS (subquery)
The EXISTS operator checks if the row from the subquery matches any row in the outer query. If there’s no data matched, the EXISTS operator returns FALSE.
This Query returns all clients that ordered after 10/07/2013.
select id, company_name from client where exists (select * from purchase where client.id = purchase.client_id where date > '2013-07-10')
When a subquery is used, the query optimizer performs additional steps before the results from the subquery are used. If a query that contains a subquery can be written using a join, it should be done this way. Joins usually allow the query optimizer to retrieve the data in a more efficient way.
There are some limitations of using subqueries:
- In Oracle you can nest up to 255 levels of subqueries in a WHERE clause.
- In SQL Server you can nest up to 32 levels.