Preventing Common SQL Mistakes

Preventing Common SQL Mistakes

Regardless of the engine you are using (SQL Server, mySQL, Oracle, etc), you can prevent common errors and simplify the debugging process. This article will discuss some of the issues you’ll face and will help you tackle them successfully.

One of the best ways to prevent SQL errors is to keep your queries clear and readable. It’s very easy to forget the ideas behind your code! If you have to revisit it (and you will), messy code can be hard to understand. So let’s start with some tips for writing clearer SQL code.

Legibility and Formatting

select person_id from   address adr, `option` opt, option_address_type opt_adt where  adr.country_id = 1 and  opt.person_id = adr.person_id and    opt_adt.option_id = opt.option_id and    opt_adt.type_id = adr.type_id and ((opt.sample = 1 OR opt.sample = 22) and opt.param = false

Do you find that easy to read? I don’t. But this same query looks much better when properly formatted:

SELECT 
    person_id
FROM
    address adr,
    “option” opt,
    option address_type opt_adt
WHERE
    adr.country_id = 1
        AND opt.person_id = adr.person_id
        AND opt_adt.option_id = opt.option_id
        AND opt_adt.type_id = adr.type_id
        AND ((opt.sample = 1 OR opt.sample = 22)
        AND opt.param = FALSE

baner sql operating on data

It’s now quite clear that there is an extra parenthesis after the 4th AND, which will make the query fail.

Naming Tables and Fields

The positive impact of using naming conventions consistently is sometimes underestimated. In addition to following a convention, give tables and fields the most self-explanatory names that you can. For instance, if you store records for users and their purchases in two tables, you could name the tables users and users_purchases. It’s apparent what each table stores. Naming the same tables “uss” and “usspchs” is not clear, so it’s not a good idea. Also, don’t switch from singular (user) to plural (users_purchases) table names; pick one and stick with it as part of your convention.

The same applies for table columns. Sometimes it is not clear what columns are for. A user role column that could have been named something like role or user_role, might end up being called “param” or another undescriptive name.

If you are worried about table names getting too large and therefore queries becoming too verbose, you can use table aliases.

Using Aliases

Aliases allow you to write more compact queries. However, if you are not careful, you may end up with unreadable lines of SQL. Here’s an example:

SELECT 
    a.id, b.name
FROM
   platform_events AS a
        LEFT JOIN  platform_events_calendars AS b ON (b.platform_event_id = a.id)
WHERE
    a.enabled = 1 
	    AND b.platform is not null

Better aliases would make this query more descriptive:

SELECT 
    Event.id, Calendar.name
FROM
    platform_events AS Event
        LEFT JOIN
    platform_event_calendars AS Calendar ON (Calendar.platform_event_id = Event.id)
WHERE
    Event.enabled = 1
        AND Calendar.platform IS NOT NULL

Understanding Syntax and Reserved Words

SQL parsers will generally point out the location of a syntax error. Be sure to check the documentation if you are in doubt.

Sometimes the parser will return an empty message. Or it might give you a message for an error that you have a hard time finding. If this happens, check your parentheses and quotes; it’s easy to forget to properly close them.

Also, pay special attention to how you use reserved words. Try not to use them as part of your table or column names. For instance, if you have a column named desc, maxvalue, or order, this name could get misinterpreted by the parser. Name columns and tables descriptively, without using reserved words. This will help prevent possible collisions. In the former example, you could name the columns sale_description, speed_maxvalue, or purchase_order.

If you absolutely must use reserved words, enclose the field names that use them with double quotes (“), as per the SQL standard. However, your specific engine may use other escape characters. For example, MySQL uses single quotes (‘).

Say that you have a table called table with a column called from. You could write the query by escaping the reserved words, like this:

SELECT 
    “from”
FROM
    “table”;

While you can use escape characters to get round the use of reserved words, consider that every database documentation advises you not to use reserved words as object names. It’s really error prone.

“…It used to work!”

If a query stops working for no apparent reason, look for:

  • A schema change
  • A change in the database user’s permissions
  • A change to the underlying engine (upgrading or downgrading)

The query’s error message will tell you where you should look. If you test the query in your environment and it works, then look at three scenarios shown above. You may have an outdated database schema, or your users have a different engine version or different permissions than you.

Writing Solid Queries

Now that we have covered some easy-to-spot SQL code bugs, let’s turn our attention to tips for writing solid queries.

Understand the Structure

First, I recommend familiarizing yourself with your database’s structure and information. To understand the general structure:

  • Look at all existing tables in your model.
  • Pay extra attention to the tables that you need to use for your query.
  • Get familiar with your tables’ columns and their primary keys.
  • Figure out how these tables relate to others; check for foreign keys.
  • Check for unique columns and indexes.

Next, look at the data you will use in your query:

  • Are there any records?
  • How many rows are available in the tables?
  • Find any columns holding null and empty values.
  • Look at the various data values (numeric, Boolean, date, etc.) that are stored as strings.
  • Beware accented characters! (Hint: Pay attention to the database’s charset.) Sometimes the data is wrongly encoded and stored with weird symbols instead of accented characters. This will make your query fail.

Knowing your data will help you approach a query in a more enlightened way. For instance, working with dates that are stored as strings is different than working with dates stored as DATETIME types. Or what if you are not supposed to have null values in a required column but your table definition allows null values? Or what if your table has many millions of records? Your query will have to make optimal use of indexes to prevent really slow execution times. These are all things that you want to know before you start writing!

sql cheat sheet

There is no standard SQL command that will get all tables from a database. It depends on your database engine. Usually you will have a tool for visually exploring the tables that you can use. If not, check the engine’s documentation. For instance:

  • MySQL: USE database_name; SHOW TABLES
  • Oracle: SELECT * FROM dba_tables
  • MS SQL: SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_TYPE = 'BASE TABLE'

Knowing this, you could start exploring the available columns data with a simple SELECT:

SELECT * FROM your_desired_table

Then you could see how many rows the table has:

SELECT COUNT(*) FROM your_desired_table

Depending on your engine, you can explore the table structure in greater detail using something like:

  • MySQL: DESCRIBE your_table_name
  • Oracle: DESC your_table_name
  • MS SQL: SP_HELP your_table_name

Understand What Your Query Should Retrieve

Sometimes queries are not as simple as SELECTing a name FROM a table. You may need to use averages, summarizations, groups, or values that depend on whether another related table satisfies a given value. My best advice is to be really specific in what you expect and try to consider all possible scenarios.

These techniques can help you to validate your result records and simplify a complex scenario:

  • Populate testing tables with well-known data of your own and hit those in your query.
  • If you are using so many columns that they overwhelm the output of your SELECT, start with only the most meaningful columns and add others as you can.
  • Use hard-coded values strategically in your query. For instance, replace a subselect with a known fixed value (or a list of values) so you can better focus on other aspects of the query.
  • If you are using subqueries, test those independently before you try the entire query. This will ensure the subqueries are working as you intend.
  • Test complex expressions (like ones involving math) separately.
  • When possible, test and add the component parts of a complex query incrementally to ensure each step works.

Suppose you have a table like this:

baner sql creating tables

CREATE TABLE “syscities” (
    “id” INT(11) NOT NULL,
    “cityname” VARCHAR(512) NOT NULL,
    “population” INT(11) NOT NULL
);

If you have a visual tool, you can use that to populate the table. Otherwise, use some INSERTs to create a few test values:

INSERT INTO “syscities” (“id”, “cityname”, “population”)
VALUES (1, 'Manchester', 1000000);

INSERT INTO “syscities” (“id”, “cityname”, “population”)
VALUES (2, 'Los Angeles', 2000000);

Having values that you know beforehand and makes it easier to see if your queries are working properly. For instance, if you wanted to select the ID of the city with the largest population, it’s easy to see that ID 2 (Los Angeles) would be the right value to return:

SELECT 
    syscities.id
FROM
    syscities
ORDER BY syscities.population DESC LIMIT 1;

Validate with Some Real Data

Before jumping into all of your actual data, I suggest grabbing only a few records for validation purposes. (In this case, “few” could be a couple of records – or maybe hundreds or even thousands if you’re working with large datasets.) Put these known records into a testing table and work with that. Once you see that everything looks good, test the query with the full set.

Always keep in mind that data will change over time, so you’ll need to plan ahead and test your query with some common scenarios:

  • No rows: What if your query (and its subqueries) are used on a table with no records?
  • Only one row: What if there is only one record in the table?
  • A few rows: Testing with odd and even numbers of rows may return different results. Make sure you use both even and odd amounts of records when you test.
  • Many, many rows: Have you considered the possibility of having a really huge dataset? If some of the tables referenced in your query could become very large, you must anticipate the effect this can have on the query; maybe it kills the response time or even renders the query useless.For instance, the MySQL function expr IN (value,...) is limited to the max_allowed_packet value. If you put more than that number of elements in the list, the query will fail. Let’s continue with the syscities table to illustrate this. Suppose I create the following (suboptimal) query:
    SELECT 
        syscities.cityname
    FROM
        syscities
    WHERE
        syscities.id IN (SELECT 
                id
            FROM
                syscities
            WHERE
                population > 10000)
    

If you are storing records for all major cities worldwide, the SELECT used in the WHERE condition would retrieve thousands of records, which could cause the IN clause to get too many IDs. The entire statement could fail.

One More Thing to Remember

Hopefully this guide will help you prevent some mistakes common to all SQL learners. Remember, the server you use (Oracle, MS SQL Server, MySQL, etc.) will impact how you learn SQL, since each one has its own interpretation of the SQL standard. And pay close attention when you search for help online, as you may find solutions that don’t apply to the server (or version) you are using.

If you have any comments, questions, or suggestions that could help other SQL learners, please share them in the comment section below.

Engineer @ Axones

GET ACCESS TO EXPERT SQL CONTENT!