As SQL users, we usually focus on writing queries that return correct results. However, there are more things to consider when you’re writing a query; one of them is query performance. In this article, we’ll look at some examples where query response time is critical.
Scene One: 911 Call Center
Let’s suppose we’re at a 911 call center, when the phone rings. One of the operators answers the call; a witness reports that a man has been shot. After the emergency unit is dispatched, the operator takes notes to identify the perpetrator. The witness describes the perpetrator as a Caucasian man between the ages of 30 and 40 fleeing the crime scene in a green Ford. The following dialog was maintained between the 911 operator and the witness:
911 operator: Could you identify the plate number?
Witness: Not completely, but I saw that it ended in 853 and had an H in the three-character part of the plate. I’m not sure where the H was exactly.
911 operator: Do you remember anything else?
Witness: No, unfortunately not.
911 operator: Thanks for your help. Police units have been dispatched to your location.
Scene Two: The Police Station
Now, suppose you’re an information technology specialist working at a police station, and your specialization is SQL. The call about the shooting was passed on to your police station, and you need to act fast—you need to write a query to obtain identifiable information about the killer. You have access to a national database with information about all citizens and vehicles. You have the following two tables at your disposal:
Table 1: Citizen
Table 2: Vehicle
The query you write will look like this:
SELECT first_name, last_name, address FROM citizen JOIN vehicle ON citizen.citizen_id = vehicle.owner_id WHERE year(birthdate) between 1978 and 1988 -- condition to check age in [30,40] AND citizen.ethnicity = ‘caucasian’ -- condition to check ethnicity AND vehicle.color = ‘green’ -- condition to check car color AND vehicle.brand = ‘Ford’ -- condition to check car brand AND vehicle.plate like ‘%853’ -- condition to check plate number
However, when you execute this query, the response doesn’t appear. Do you know why? Each second of execution time is important, so let’s review some tips to accelerate the query’s performance.
Performance Tip 1: Add Indexes to Join Columns
An index is an internal database structure that belongs to a table and is used to speed up certain queries. Every index has a key, which is the column (or combination of columns) on which the index is based. Only queries with a filter/join on the key column can be accelerated. When we join two tables (like
vehicles), in order to have a fast response of the join query, we need to add an index to one of the joined tables, and the key of this index must be the join column. In this case, we will create a index to try to improve performance:
CREATE INDEX vehicle_i1 ON vehicle(owner_id);
We run the query again, but it still doesn’t finish. Adding an index wasn’t a bad idea; in fact, it’s one of the best ways to optimize query performance. In this specific case, though, there are likely other reasons why our query is slow. Again, time is of the essence. Let’s try another approach.
Performance Tip 2: If Possible, Work With Small Tables
The first thing to note is the size of the tables we’re working with. The
table person has 240 million rows; the
vehicle table has 185 million rows. Clearly, both tables are quite large.
However, we have a way of reducing the number of records: applying filters to the tables and creating a temporary intermediate result set (using Common Table Expressions, or CTEs) with fewer records. If your filtering skills are a bit rusty, feel free to check out our SQL Practice Set to brush up on some of the concepts.
Let’s first analyze the filters for the
citizen table: a Caucasian between the ages of 30 and 40. That doesn’t help much! Any other filters we could try? Yes! The witness mentioned that the suspect was “a Caucasian man.” Let’s add the condition gender = ‘male’. We can assume that the new condition reduces the number of rows in the result set by half.
Great! Let’s now analyze the
vehicles and the color (“green”), brand (“Ford”), and partial plate number filters.
The missing parts of the plate are only two letters. There are 26 * 26 plates of the form H[_][_]-853. So we have 676 different plate numbers that match the description the witness provided. This certainly narrows down our list of potential suspects.
As the witness couldn’t recall the exact position of the “H” in the plate, we also need to consider plates of the form [_]H[_]-853 and [_][_]H-853. So considering these three possibilities, we actually have 3 * 676 = 2028 potential plates. Still not bad! Moreover, if we apply the color (“green”) and brand (“Ford”) filters, we will reduce this number even more. It turns out we have fewer than 10 vehicles matching this description.
By applying these filters, we will get a reduced set of vehicles. We will use a CTE taken from the original
vehicles table with only the records of vehicles we are interested in. The query for the CTE is the following:
SELECT * FROM vehicles WHERE plate LIKE ‘%H%-853’ -- condition to check car color AND brand = ‘Ford’ -- condition to check car color AND color = ‘green’; -- condition to check car color
Scene Three: Identifying the Suspect
Well done! The result of the previous query has only 18 records! We can now perform a join with the
citizen table to obtain the owners of the 18 vehicles. We will use the previous query as a CTE and call it
reduced_vehicle. Let’s write the query:
WITH ( SELECT * FROM vehicles WHERE plate LIKE ‘%H%-853’ -- condition to check car color AND brand = ‘Ford’ -- condition to check car color AND color = ‘green’; -- condition to check car color ) as reduced_vehicle SELECT first_name, last_name, address FROM citizen, reduced_vehicle WHERE citizen.citizen_id = reduced_vehicle.owner_id AND year(birthdate) between 1978 and 1988 -- condition to check age in [30,40] AND citizen.ethnicity = ‘caucasian’ -- condition to check plate number
This query returns two records in less than one second; these are the suspects the police will pursue. Well done!
There are many ways to tackle a problem in SQL, but not all approaches are equally efficient. As an SQL developer, you need to always be on the lookout for potential optimizations that can improve query runtime.
Of course, that’s easier said than done; most SQL developers find query optimization a bit challenging. And at the end of the day, not all slow queries can be improved.
In this article, we learned how to:
- Translate real-world crime knowledge into SQL filters.
- Add an index to improve query performance.
- Improve query performance by reducing the number of rows in related tables.
There are many techniques you can employ to improve query performance; the best way to master these is to practice. Interested in sharpening your SQL skills or learning the basics? Check out our catalog of courses today to discover the magic of data!