Long SQL queries are notoriously hard for beginners to structure and understand. These five tips will teach you the best practices for writing and formatting complex SQL code.
As we all know, SQL queries are essential to database management. Without them, it would be extremely difficult to find and work with the information in a database.
Query length depends on the type of information we need and the size of the database. When we read a simple piece of code, we are usually able to understand it quite easily. But what if we’re dealing with a 1,500-line query? If we want to understand what we’ve done (or let others understand it) we have to organize our code properly. Fortunately, remembering just five useful tips will help us manage long SQL queries.
Tip 1: Indent Your Code
Indentation helps keep your query clean by identifying where each block of code begins. This makes program structure more understandable and enables developers to easily find a specific instruction. When you correctly indent your code, you can quickly see what you are selecting, which tables provide the data, and what restrictions apply to it (i.e. the
GROUP BY clauses).
Let’s see an example of properly indented code:
SELECT column1 ,column2 FROM table1 WHERE column3 IN ( SELECT TOP(1) column4 FROM table2 INNER JOIN table3 ON table2.column1 = table3.column1 )
Compare the code above to this sample:
SELECT column1, column2 FROM table1 WHERE column3 IN ( SELECT TOP(1) column4 FROM table2 INNER JOIN table3 ON table2.column1 = table3.column1)
Now, imagine that there are 100 similar instructions, too. Without indentation, that query will be very hard to read and understand!
There is no single accepted indentation style. Every company or programmer usually develops their own. Even so, your indentation style should make sense to others. To help you get started, I’ve provided links to style guides by ApexSQL and Simon Holywell.
Indentation is the first thing you should do when you start to write code. If you inherit a chunk of un-indented code, there are a lot of sites that let you format an SQL query for free.
Tip 2: Use the WITH Clause
WITH clause allows you to name a subquery block and treat it like a normal table. In this case, the
WITH clause is basically a replacement for a normal subquery.
Look at the query below. Its task is to show the number of hospitals in Los Angeles where patients whose last visit was more than 14 days ago received an average drug dosage over 1,000 units.
WITH patient_data AS ( SELECT patient_id, patient_name, hospital, drug_dosage FROM hospital_registry WHERE (last_visit > now() - interval '14 days' OR last_visit IS NULL) AND city = "Los Angeles" ) WITH average_dosage AS ( SELECT hospital, AVG(drug_dosage) AS Average FROM patient_data GROUP BY hospital ) SELECT count(hospital) FROM average_dosage; WHERE AVG(drug_dosage) > 1000
I’ll explain all elements in this query.
The first element is the
WITH clause. Using it, we can specify two subqueries as
average_dosage and use them like normal tables. The
patient_data subquery was used to create the
average_dosage subquery. So then
average_dosage becomes the
SELECT statement’s final data source.
This arrangement is much cleaner than writing everything in one query and putting all the subqueries in the
Because it makes code easier to read, the
WITH clause is ideal for simplifying complex SQL queries.
Tip 3: Explain Yourself with Comments
Code tells how, but comments explain why. Comments are some of the most important helpful features in any code, from a programming project in Java to an SQL query. They allow programmers to express their thought processes and their desired outcomes.
Always add comments to your code. You’ll be glad you did, and so will any other programmers who work on your SQL. Even if you are just taking a break, add comments – they will help you pick up where you left off.
This is how a very simple query looks without comments:
SELECT name, student_group, FROM course_marks WHERE points < 300 OR points IS NULL;
Do you really know exactly what the
WHERE means? Without comments, it is hard to understand what the coder wanted to accomplish.
Just a few comments are enough to do the job. Let’s see this same code, but with comments and explanations:
/*Get the list of every student who failed or was absent on exam*/ SELECT name, student_group, FROM course_marks WHERE points < 300 OR points IS NULL; --student fails when gets 300 or less points. --When student was absent, instead of points there is NULL in column
Better, isn’t it? These comments have explained the developer’s actions and helped us understand what the query does.
Tip 4: Break Queries into Steps
Complex queries are really a collection of blocks of code, which are much easier to control and to check for mistakes. Building your query a step at a time is a best practice for SQL development. This will enable you to find logical errors faster and be more confident of your results.
— Vertabelo (@Vertabelo) March 23, 2017
Look how much easier it is to understand this complex bit of SQL because it’s been broken into logical steps:
WITH subject AS ( SELECT student_id, subject_id, school_id, sum(exam_score) AS total_score FROM subject_marks GROUP BY student_id, subject_id, school_id ), student AS ( SELECT student_id, school_id, avg(total_score) AS average_student FROM subject GROUP BY student_id, school_id ) SELECT school_id, avg(average_student) FROM student GROUP BY school_id;
We want the result of this query to show us the average amount of points for every student from every school. The first stage (the
WITH subject clause) finds the sum of points for every subject. The second stage (the
WITH student clause) averages points for every student. By the time we reach the
SELECT, all we need to do is find the average amount of points and group the results by school.
By building a query this way, it becomes apparent that there are no long or difficult queries. It just depends on how you choose to break them up and write them.
Tip 5: Stick with One Naming Convention
It is also very important to use a single naming convention for tables, columns, and queries. This makes the code easier to write and read.
When a legacy database has tables named
UserSECOND_NEW, it looks very unprofessional and the coding gets chaotic. Follow the established rules for capitalization, underscores, spaces, etc. This also applies to query styles: stick with the same formatting rules and indentation patterns for the whole database. If you start a new line after a
SELECT, do it in all your queries.
Naming conventions can be very different from database to database. As with indentation, there is no single industry-recognized style. I recommend that you find which conventions are the most widely used and model your own style on those. These articles in the Vertabelo and Launch by Lunch blogs are a good place to start.
The best way to understand the difference between good and bad query styling is to see them for yourself. Below, we find an example of poor style:
SELECT name, SECONDNAME, Date_TIME, address, SecondAddress FROM registry, other_USERSData WHERE registry.id = other_USERSData.id;
This query, which uses one constant style, is much more readable:
SELECT name, second_name, date_time, address, second_address FROM registry, other_users_data WHERE registry.id = other_users_data.id;
To sum up, these tips will help you write better queries, long or short. The best thing you can do is to practice using good techniques. If you don’t know how to develop your skills, Vertabelo Academy is a great solution. There are a lot of lessons that will teach you about managing queries and their structures.
It will take time to learn how to write clean and smart SQL queries. You’ll need to understand table relationships, code and data planning, how to identify the parts of a query, and what results you can expect. Patience and hard work are the keys to success!