How Recursive Common Table Expressions Work

CTE, recursive queries, WITH,

Recursive Common Table Expressions are immensely useful when you’re querying hierarchical data. Let’s explore what makes them work.

Common Table Expressions (CTEs) are some of the most useful constructions in SQL. Their main purpose is improving query design, which makes queries easier to read. One of the reasons CTEs are so popular is that they let you divide longer queries into shorter subqueries. These are easier to read and edit. (For a detailed introduction to CTEs, read this article.)

Another great feature of CTEs is that they can be recursive. In other words, a recursive query can refer back to itself until a certain threshold is reached. If you need to perform the same function numerous times, you only need to write the code once.

This type of CTE – the recursive CTE – will be the topic of this post.

Recursive CTE Syntax

Remember, CTEs are essentially named subqueries that function like a temporary view. When you write them (or when you’re looking for them in the code), look for the WITH clause, followed by a SELECT, INSERT, UPDATE, or DELETE statement.

Recursive CTEs are usually used in collections that have some type of hierarchy, like bosses and employees, folder structures, or thread structures in an online forum.

Let’s say we have a database for a company that has a few levels of employee hierarchy:

Id FirstName LastName BossId
101 Adam Smith NULL
201 Agata Johnson 101
301 Ewa Novak 201
401 Kate Garrus 401
501 Agnieszka Holmes 201
601 Thomas Bond 301
701 Hubert Gregory 201
801 Marika Walter 301

We want to get a list of all workers and their immediate bosses. A recursive CTE is ideal for this task. This is what it would look like:

WITH all_workers
AS
(
   SELECT Id, BossId, FirstName, LastName, 0 as Org_Level 
   FROM Employees
   WHERE BossId is NULL

   UNION ALL

   SELECT e.Id, e.BossId, e.FirstName, e.LastName, Org_Level + 1
   FROM Employees e INNER JOIN all_workers r 
   ON e.BossId = r.Id
)
SELECT * FROM all_workers

At first glance, this query can be a little complicated. Let’s analyze each statement:

   SELECT Id, BossId, FirstName, LastName, 0 as Org_Level 
   FROM Employees
   WHERE BossId is NULL

The first statement determines the “root” elements, or those at the top level of the hierarchy. In our example, these are the employees who don’t report to a boss, i.e. the Big Bosses themselves. We will expand our structure by working down from these elements.

   UNION ALL

   SELECT e.Id, e.BossId, e.FirstName, e.LastName, Org_Level + 1
   FROM Employees e INNER JOIN all_workers r 
   ON e.BossId = r.Id

Here we’ve got our recursive query, which is correlated with the result of the previous query. At this point, we have our root elements (the Big Bosses). We’ll now create a new collection of employees, but this time it is connected to the previous collection. This is the recursive element: it will keep on connecting subordinates with their subordinates until it gets to the lowest level of the hierarchy.

The UNION ALL command merges the results from each recursive round in a final result table. (Remember: in every step, a recursive CTE only links elements returned by the previous step. It does not go back to the beginning and link everyone with the root-level Big Bosses; it only links them to their immediate supervisor or manager.)

The result table should look like this:

Id BossId FirstName LastName Org_Level
101 NULL Adam Smith 0
201 101 Agata Johnson 1
301 201 Ewa Novak 2
501 201 Agnieszka Holmes 2
701 201 Hubert Gregory 2
601 301 Thomas Bond 3
801 301 Marika Walter 3
401 401 Kate Garrus 4

This table only has three hierarchy levels. But what if our table had, say, 3,000 levels? We would need to stop it at some point. Every recursive CTE statement needs an ending condition, one that defines how many iterations are needed. When the number of iterations is hit, the query stops running. The MAXRECURSION function is very helpful in this situation. All you need to do is add a line after the SELECT clause that states how many recursions (iterations) you need:

OPTION (MAXRECURSION 2)

If we apply this condition to our previous CTE, the result table will look like this:

Id BossId FirstName LastName Org_Level
101 NULL Adam Smith 0
201 101 Agata Johnson 1
301 201 Ewa Novak 2
501 201 Agnieszka Holmes 2
701 201 Hubert Gregory 2

Adapting Recursive CTEs to Different Databases

Different SQL databases have different syntaxes, so don’t be surprised if your recursive CTE needs to be modified a bit. For example:

  • In PostgreSQL and MySQL, use WITH RECURSIVE instead of WITH to define a recursive CTE.
  • Also in PostgreSQL, use LIMIT instead of MAXRECURSION to set the maximum number of iterations.
  • In Oracle, you need to add a condition in the WHERE clause to define iterations; the MAXRECURSION statement won’t work.

CTEs are very powerful SQL tools, especially when used recursively. Recursion is perfect for handling huge tables with many hierarchical levels. To learn more about CTEs, check out Vertabelo Academy’s Common Table Expressions course!

GET ACCESS TO EXPERT SQL CONTENT!