Regardless of the engine you are using (SQL Server, mySQL, Oracle, etc), you can prevent common errors and simplify the debugging process. This article will discuss some of the issues you’ll face and will help you tackle them successfully.
One of the best ways to prevent SQL errors is to keep your queries clear and readable. It’s very easy to forget the ideas behind your code! If you have to revisit it (and you will), messy code can be hard to understand. So let’s start with some tips for writing clearer SQL code.
Legibility and Formatting
select person_id from address adr, `option` opt, option_address_type opt_adt where adr.country_id = 1 and opt.person_id = adr.person_id and opt_adt.option_id = opt.option_id and opt_adt.type_id = adr.type_id and ((opt.sample = 1 OR opt.sample = 22) and opt.param = false
Do you find that easy to read? I don’t. But this same query looks much better when properly formatted:
SELECT person_id FROM address adr, “option” opt, option address_type opt_adt WHERE adr.country_id = 1 AND opt.person_id = adr.person_id AND opt_adt.option_id = opt.option_id AND opt_adt.type_id = adr.type_id AND ((opt.sample = 1 OR opt.sample = 22) AND opt.param = FALSE
It’s now quite clear that there is an extra parenthesis after the 4th AND, which will make the query fail.
Naming Tables and Fields
The positive impact of using naming conventions consistently is sometimes underestimated. In addition to following a convention, give tables and fields the most self-explanatory names that you can. For instance, if you store records for users and their purchases in two tables, you could name the tables
users_purchases. It’s apparent what each table stores. Naming the same tables “uss” and “usspchs” is not clear, so it’s not a good idea. Also, don’t switch from singular (user) to plural (users_purchases) table names; pick one and stick with it as part of your convention.
The same applies for table columns. Sometimes it is not clear what columns are for. A user role column that could have been named something like
user_role, might end up being called “param” or another undescriptive name.
If you are worried about table names getting too large and therefore queries becoming too verbose, you can use table aliases.
Aliases allow you to write more compact queries. However, if you are not careful, you may end up with unreadable lines of SQL. Here’s an example:
SELECT a.id, b.name FROM platform_events AS a LEFT JOIN platform_events_calendars AS b ON (b.platform_event_id = a.id) WHERE a.enabled = 1 AND b.platform is not null
Better aliases would make this query more descriptive:
SELECT Event.id, Calendar.name FROM platform_events AS Event LEFT JOIN platform_event_calendars AS Calendar ON (Calendar.platform_event_id = Event.id) WHERE Event.enabled = 1 AND Calendar.platform IS NOT NULL
Understanding Syntax and Reserved Words
SQL parsers will generally point out the location of a syntax error. Be sure to check the documentation if you are in doubt.
Sometimes the parser will return an empty message. Or it might give you a message for an error that you have a hard time finding. If this happens, check your parentheses and quotes; it’s easy to forget to properly close them.
Also, pay special attention to how you use reserved words. Try not to use them as part of your table or column names. For instance, if you have a column named
order, this name could get misinterpreted by the parser. Name columns and tables descriptively, without using reserved words. This will help prevent possible collisions. In the former example, you could name the columns
If you absolutely must use reserved words, enclose the field names that use them with double quotes (“), as per the SQL standard. However, your specific engine may use other escape characters. For example, MySQL uses single quotes (‘).
— Vertabelo (@Vertabelo) January 3, 2017
Say that you have a table called
table with a column called
from. You could write the query by escaping the reserved words, like this:
SELECT “from” FROM “table”;
While you can use escape characters to get round the use of reserved words, consider that every database documentation advises you not to use reserved words as object names. It’s really error prone.
“…It used to work!”
If a query stops working for no apparent reason, look for:
- A schema change
- A change in the database user’s permissions
- A change to the underlying engine (upgrading or downgrading)
The query’s error message will tell you where you should look. If you test the query in your environment and it works, then look at three scenarios shown above. You may have an outdated database schema, or your users have a different engine version or different permissions than you.
Writing Solid Queries
Now that we have covered some easy-to-spot SQL code bugs, let’s turn our attention to tips for writing solid queries.
Understand the Structure
First, I recommend familiarizing yourself with your database’s structure and information. To understand the general structure:
- Look at all existing tables in your model.
- Pay extra attention to the tables that you need to use for your query.
- Get familiar with your tables’ columns and their primary keys.
- Figure out how these tables relate to others; check for foreign keys.
- Check for unique columns and indexes.
Next, look at the data you will use in your query:
- Are there any records?
- How many rows are available in the tables?
- Find any columns holding null and empty values.
- Look at the various data values (numeric, Boolean, date, etc.) that are stored as strings.
- Beware accented characters! (Hint: Pay attention to the database’s charset.) Sometimes the data is wrongly encoded and stored with weird symbols instead of accented characters. This will make your query fail.
Knowing your data will help you approach a query in a more enlightened way. For instance, working with dates that are stored as strings is different than working with dates stored as DATETIME types. Or what if you are not supposed to have null values in a required column but your table definition allows null values? Or what if your table has many millions of records? Your query will have to make optimal use of indexes to prevent really slow execution times. These are all things that you want to know before you start writing!
There is no standard SQL command that will get all tables from a database. It depends on your database engine. Usually you will have a tool for visually exploring the tables that you can use. If not, check the engine’s documentation. For instance:
USE database_name; SHOW TABLES
SELECT * FROM dba_tables
- MS SQL:
SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_TYPE = 'BASE TABLE'
Knowing this, you could start exploring the available columns data with a simple
SELECT * FROM your_desired_table
Then you could see how many rows the table has:
SELECT COUNT(*) FROM your_desired_table
Depending on your engine, you can explore the table structure in greater detail using something like:
- MS SQL:
Understand What Your Query Should Retrieve
Sometimes queries are not as simple as
SELECTing a name
FROM a table. You may need to use averages, summarizations, groups, or values that depend on whether another related table satisfies a given value. My best advice is to be really specific in what you expect and try to consider all possible scenarios.
These techniques can help you to validate your result records and simplify a complex scenario:
- Populate testing tables with well-known data of your own and hit those in your query.
- If you are using so many columns that they overwhelm the output of your
SELECT, start with only the most meaningful columns and add others as you can.
- Use hard-coded values strategically in your query. For instance, replace a subselect with a known fixed value (or a list of values) so you can better focus on other aspects of the query.
- If you are using subqueries, test those independently before you try the entire query. This will ensure the subqueries are working as you intend.
- Test complex expressions (like ones involving math) separately.
- When possible, test and add the component parts of a complex query incrementally to ensure each step works.
Suppose you have a table like this:
CREATE TABLE “syscities” ( “id” INT(11) NOT NULL, “cityname” VARCHAR(512) NOT NULL, “population” INT(11) NOT NULL );
If you have a visual tool, you can use that to populate the table. Otherwise, use some
INSERTs to create a few test values:
INSERT INTO “syscities” (“id”, “cityname”, “population”) VALUES (1, 'Manchester', 1000000); INSERT INTO “syscities” (“id”, “cityname”, “population”) VALUES (2, 'Los Angeles', 2000000);
Having values that you know beforehand and makes it easier to see if your queries are working properly. For instance, if you wanted to select the ID of the city with the largest population, it’s easy to see that ID 2 (Los Angeles) would be the right value to return:
SELECT syscities.id FROM syscities ORDER BY syscities.population DESC LIMIT 1;
Validate with Some Real Data
Before jumping into all of your actual data, I suggest grabbing only a few records for validation purposes. (In this case, “few” could be a couple of records – or maybe hundreds or even thousands if you’re working with large datasets.) Put these known records into a testing table and work with that. Once you see that everything looks good, test the query with the full set.
Always keep in mind that data will change over time, so you’ll need to plan ahead and test your query with some common scenarios:
- No rows: What if your query (and its subqueries) are used on a table with no records?
- Only one row: What if there is only one record in the table?
- A few rows: Testing with odd and even numbers of rows may return different results. Make sure you use both even and odd amounts of records when you test.
- Many, many rows: Have you considered the possibility of having a really huge dataset? If some of the tables referenced in your query could become very large, you must anticipate the effect this can have on the query; maybe it kills the response time or even renders the query useless.For instance, the MySQL function
expr IN (value,...)is limited to the
max_allowed_packetvalue. If you put more than that number of elements in the list, the query will fail. Let’s continue with the
syscitiestable to illustrate this. Suppose I create the following (suboptimal) query:
SELECT syscities.cityname FROM syscities WHERE syscities.id IN (SELECT id FROM syscities WHERE population &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; 10000)
If you are storing records for all major cities worldwide, the
SELECT used in the
WHERE condition would retrieve thousands of records, which could cause the IN clause to get too many IDs. The entire statement could fail.
One More Thing to Remember
Hopefully this guide will help you prevent some mistakes common to all SQL learners. Remember, the server you use (Oracle, MS SQL Server, MySQL, etc.) will impact how you learn SQL, since each one has its own interpretation of the SQL standard. And pay close attention when you search for help online, as you may find solutions that don’t apply to the server (or version) you are using.
If you have any comments, questions, or suggestions that could help other SQL learners, please share them in the comment section below.