Long SQL Query vs. Recursive SQL Query

recursive queries, CTE, recursion, sql basics, common table expressions

Recursion is one of the central ideas in computer science. We can define it as a method for solving problems where the solution of the problem depends on solving a smaller instance of a problem. If this sounds complicated do not fret, in this article we will learn about recursion in SQL that you can practice and deepen in Vertabelo Academy.

Recursion is a way of solving hierarchical problems we find in data with common SQL. These types of queries are also called hierarchical queries. We can find recursion capability in standard SQL since SQL:1999 by way of recursive CTE’s or common table expressions.

Some RDMBS systems have their own way of implementing recursion, most notably Oracle’s databases with CONNECT BY statement. Since recursive CTE is in the SQL standard and it is shared with all major RDBMS vendors we will explore this type of recursion.

RECURSION WITH CTE

Recursion is best mastered with viewing some hierarchical structure. There is no better example of hierarchical data than organization structure of a large enterprise. So we will explore a typical employees table that contains data about employees and there direct superiors.

The table has identifier of the current employee and his direct superior as a reference to the same table. Besides the identifiers we also have in the table name and surname of the employee.

We will construct a query that searches through all the rows in the table, starting from the first row usually called the anchor row. In our table the anchor row is the top manager, he has no reporting manager in the hierarchy above him so his manager_id attribute is null.

  SELECT id,
         manager_id,
         first_name,
         last_name,
         0 depth_level
  FROM   employees
  WHERE  manager_id IS NULL
ID MANAGER_ID FIRST_NAME LAST_NAME
1 John McGee

Let’s say that we want to see who John manages, what would the query look like? Something like this:

SELECT id,
         manager_id,
         first_name,
         last_name
  FROM   employees cur
  WHERE  manager_id in (
    SELECT id
    FROM   employees
    WHERE  manager_id IS NULL
  )

And for managers of those managers:

SELECT id,
  manager_id,
  first_name,
  last_name
FROM employees
WHERE manager_id IN
  (SELECT id
  FROM employees
  WHERE manager_id IN
    ( SELECT id FROM employees WHERE manager_id IS NULL
    )
  ) 

As you see there is a patter emerging there for every new management level we construct a new subquey. This approach is bad as it takes into account a fixed number of levels.

Recursion is a way in which we take these subqueries and transform them so that they are general in a way that they represent the previous result of the query.

In our management example the general part is constructed so that we JOIN previous result set to the current one based on the management id.

  SELECT cur.id,
         cur.manager_id,
         cur.first_name,
         cur.last_name
  FROM   employees cur, previous
  WHERE  cur.manager_id = previous.id

This previous dataset is defined as a CTE.

So the full recursive function looks like this :

WITH previous(id, manager_id,first_name,last_name) AS (
  SELECT id,
         manager_id,
         first_name,
         last_name
  FROM   employees
  WHERE  manager_id IS NULL
  UNION ALL
  SELECT cur.id,
         cur.manager_id,
         cur.first_name,
         cur.last_name
  FROM   employees cur, previous
  WHERE  cur.manager_id = previous.id
)
SELECT *
FROM   previous;

The recursion starts with the top manager and is joined by every new level in management hierarchy. The final SELECT returns the whole dataset.

ID MANAGER_ID FIRST_NAME LAST_NAME
1 John McGee
2 1 Kate Doe
7 1 Ethan Lee
9 1 Emily McPers
3 2 Ethan Smith
4 2 Alexander Lam
8 7 Sophia Wilson
10 9 Jacob Gagnon
12 9 Madison Morin
5 4 Ava Marin
6 4 Olivia Roy
11 10 Logan Tremblay

We can expand this query to make it more useful, let’s say that we want to see the levels in hierarchy. We do this by constructing a new parameter that we increment, let’s call it depth_level. For every level of the hierarchy the number is increased by 1.

WITH previous(id, manager_id,first_name,last_name,depth_level) AS (
  SELECT id,
         manager_id,
         first_name,
         last_name,
         0 depth_level
  FROM   employees
  WHERE  manager_id IS NULL
  UNION ALL
  SELECT cur.id,
         cur.manager_id,
         cur.first_name,
         cur.last_name,
         previous.depth_level+1 depth_level
  FROM   employees cur, previous
  WHERE  cur.manager_id = previous.id
)
SELECT previous.*
FROM   previous;

So we get the levels of the hierarchy.

ID MANAGER_ID FIRST_NAME LAST_NAME DEPTH_LEVEL
1 John McGee 0
2 1 Kate Doe 1
7 1 Ethan Lee 1
9 1 Emily McPers 1
3 2 Ethan Smith 2
4 2 Alexander Lam 2
8 7 Sophia Wilson 2
10 9 Jacob Gagnon 2
12 9 Madison Morin 2
5 4 Ava Marin 3
6 4 Olivia Roy 3
11 10 Logan Tremblay 3

We can use this depth_level to represent the data in a more graphical way with the query

WITH previous(id, manager_id,first_name,last_name,depth_level) AS (
  SELECT id,
         manager_id,
         first_name,
         last_name,
         0 depth_level
  FROM   employees
  WHERE  manager_id IS NULL
  UNION ALL
  SELECT cur.id,
         cur.manager_id,
         cur.first_name,
         cur.last_name,
         previous.depth_level+1 depth_level
  FROM   employees cur, previous
  WHERE  cur.manager_id = previous.id
)
SELECT previous.*,
RPAD('.', (depth_level)*2, '.') || id AS tree
FROM   previous;

and the result set:

ID MANAGER_ID FIRST_NAME LAST_NAME DEPTH_LEVEL TREE
1 John McGee 0 1
2 1 Kate Doe 1 ..2
7 1 Ethan Lee 1 ..7
9 1 Emily McPers 1 ..9
3 2 Ethan Smith 2 ….3
4 2 Alexander Lam 2 ….4
8 7 Sophia Wilson 2 ….8
10 9 Jacob Gagnon 2 ….10
12 9 Madison Morin 2 ….12
5 4 Ava Marin 3 ……5
6 4 Olivia Roy 3 ……6
11 10 Logan Tremblay 3 ……11

Recursion is not the most intuitive part of computer science but it is integral. The best way to learn recursion is by diligent and persistent practice. There is no better place to practice SQL than on Vertableo Academy. So open your browser go through the examples on the Vertabelo Academy and you will be on your way to SQL mastery.

Aldo is a data architect with a passion for the cloud. From leading a team of data professionals to coding a data warehouse in the cloud, Aldo has experience with the whole lifecycle of data-intensive projects. Aldo spends his free time forecasting geopolitical events in forecasting tournaments.

GET ACCESS TO EXPERT SQL CONTENT!