Non-Syntactic SQL Errors Spotted

non-syntactic sql errors, common sql errors, JOIN, NULL,

A look at common errors that arise when writing code as an SQL beginner.

Debugging, Headaches, and SQL

If you’re new to SQL, you’ve probably already run into a myriad of errors. Don’t worry — you’re not alone. All developers make mistakes; it’s a normal part of the development process. The key to mastering any programming language, SQL being no exception, is understanding where the most common mistakes lie.

Often, SQL errors are grouped into two categories: syntactic and non-syntactic. Identifying and correcting syntactic errors is quite easy, as databases have built-in features that help you with tracking errors. However, non-syntactic errors are much harder to identify because they don’t involve flaws in syntax that a database’s automated system would otherwise flag. Thus, they can often lead to serious problems if left uncorrected. Rest assured, though. With a bit of experience, you’ll learn how to handle most of these errors on your own.

In this article, we’ll discuss the four most common non-syntactic errors you will encounter as an SQL developer.

1. Missing JOIN Clauses

This is one of the most common errors in SQL. Developers often run into this issue when writing SQL joins. So, let’s say you’re writing a query to join tables. If you forget to include the ON clause of the JOIN statement, you’ll end up writing a Cartesian join instead, where each row of the right table(s) will be joined with each row of the left table(s). This mistake is dangerous, as it can quickly lead to row-count explosion.

Suppose we have a simple dim_customer table that we want to JOIN with a dim_date table, based on the birth dates of customers. Let’s say we want to see which generation is the most representative of our customers.

To do so, we write the following query:

SELECT 
  count(*), 
  year
FROM dim_customer 
JOIN dim_date 
ON (date_of_birth = date)
GROUP BY year

But an erroneous query would exclude the ON clause:

SELECT 
  count(*), 
  year
FROM dim_customer 
JOIN dim_date 
GROUP BY year

If you forget to include the ON clause, your results will of course differ and, depending on the table sizes, may take longer to compute.

How to identify:

You can usually spot this error by examining the cardinalities, or the number of records, of the underlying tables in your join. If the resulting set has a cardinality that is larger than that of the most populous table in your join, it’s a likely indication that something is wrong in your code.

2. Incorrect Calculations

Like any other language, SQL simply does what it’s told. Thus, calculations can yield incorrect results (in the business sense) if you as the developer implement the wrong algorithm for computing certain values. This type of error is often expensive for companies, as it eats away at valuable and limited resources.

Arithmetic calculations

Let’s look at some concrete examples. The most common arithmetic calculations concern proportions.Some relational database management systems will round quotients to the nearest integer. If we want to calculate the percentage of sales over total transactions, we would calculate this using simple division.

SELECT 
  SUM(CASE 
    WHEN TRANSACTION_TYPE =’sale’ THEN 1 ELSE 0 END) / COUNT(*) 
FROM TRANSACTIONS;

However, since the number of transactions is always an integer value, this kind of operation will yield integer division in some databases like PostgreSQL and Microsoft SQL. This is problematic if we need a percentage with floating-point values.

If you’d like a floating-point answer, you will need to CAST at least one of the integers to a real number (implicitly or explicitly) to avoid integer division. The most simple way to do this is by multiplying one of the integers by “1.0”.

SELECT 
  1.0*SUM(CASE 
    WHEN TRANSACTION_TYPE =’sale’ THEN 1 ELSE 0 END) / COUNT(*) 
FROM TRANSACTIONS;

Null values in expressions

Recall that any operation involving null results in null. In other words, if you perform division, multiplication, addition, or subtraction with a value of null, the result will always be null. This is a common mistake when you are working with some kind of outer join statement. If you’re performing an operation on a column from an outer table, be sure to safeguard your expression with a 1 or 0, depending on the specific operation.

Let’s look at a specific business request. Using our previous example, if for some reason TRANSACTION_TYPE contains the value null, your percentage will also be null. To get the correct result, which is 0, you will need to mask the transaction type with COALESCE or insert a new WHEN function to account for situations when a TRANSACTION_TYPE value is null.

So, the correct code would be:

SELECT 
  1.0*SUM(CASE 
    WHEN COALESCE(TRANSACTION_TYPE,’x’) =’sale’ AND TRANSACTION_SUBTYPE != ’direct’ THEN 1 
    WHEN COALESCE(TRANSACTION_TYPE,’x’) =’sale’ AND TRANSACTION_SUBTYPE THEN 2 ELSE 0 END) / COUNT(*) 
FROM TRANSACTIONS;

The above query could use some additional tweaking. Can you identify some areas for improvement?

Incorrect date manipulation

It’s crucial that you pay special attention to date representations, as not all countries use the same date formats. Thus, it’s always a good idea to encapsulate your dates with date masks so there is no ambiguity regarding which number corresponds to the day and which corresponds to the month. Here’s an example mask for the first of February 2017:

SELECT to_date('01.02.2017','dd.mm.yyyy')

How to identify:

To spot calculation errors, you need to have a solid software development life cycle (SDLC) in place. You should allocate some time to test your statements so you can ensure that your solution was coded properly. In addition, there is usually a QA team that tests your SQL code for bugs. After your code passes their screening, it typically goes through a user-acceptance test (UAT), where the company validates your product to make sure that what they’re paying for is what they are in fact receiving from you.

3. Wrong Variable Order

When constructing a DML statement, it is important to know which variable value you are inserting into which table column. Some combinations can lead to execution errors, such as if you try to insert a string variable into a field that expects a numerical data type.

However, certain variable combinations do not trigger any error signaling and are thus some of the most elusive errors. Such errors include inserting a numerical value into a field expecting a string (where implicit casting is performed), inserting the wrong variable value into a column of the corresponding type, and so on.

Let’s return to our example. Can you spot the errors in the following query?

INSERT
INTO dim_customer
  (
    id,
    cust_natural_key,
    first_name,
    last_name,
    address_residence,
    current_address_residence,
    date_of_birth,
    marital_status,
    gender
  )
  VALUES
  (
    100,101,1,
    'Barunchic',
    'London',
    'London',
    to_date('dd.mm.yyyy','01.01.2022'),
    'N',
    'R'
  )

There are three errors in the above query:

  • The number 1 is used for the first name of the customer
  • A future date is listed as the customer’s date of birth
  • R is used to indicate the customer’s gender

How to identify:

The best course of action when dealing with this type of error is mitigation rather than identification. To mitigate such an error, you should model your database to have good constraint coverage for its columns. So, in our example, we could construct different check constraints that would throw errors if we were to try to insert an erroneous row into the table.

To model your database in a clear and organized fashion, the best choice is our Vertabelo data modeler. To learn more about constraints, be sure to check out the resources offered through Vertabelo Academy. Even if you’ve designed your database well, be sure to perform a rudimentary analysis on the data to check if the columns have values that fall within appropriate ranges.

4. Unintentional Deletion of Data

This is the most dangerous type of error that you can run into in a database system. If you experience this problem, don’t panic—it happens to everyone at least once in their career.

Since this is a dangerous and costly mistake, you need to ensure that there are many different abstraction layers that prevent this situation from ever occurring in the first place. First and foremost, whenever you plan to delete data, be sure to double-check your code. If you do plan to delete data, you should do so in a transactional block. You can learn more about transactional blocks on Vertabelo Academy.

A good tactic for securing a DELETE statement is to always combine it with a hanging WHERE clause before migrating your code to production. Reviewing the script before deployment is also generally a good idea. If you miss a DELETE statement in your review, you will get a syntax error. You can always correct a syntax error, and the data will not be deleted!

If you do delete data and commit your transaction, there’s still something you can do on your end to correct the issue, using something called a “temporal request”. As this is a more advanced topic, we won’t delve into the details of this topic too much in this article. Essentially, with a temporal request, you receive a snapshot of the table at a certain point in time. Although this command is part of the SQL2011 standard, it is very RDBMS specific and changes from one vendor to the next. Note that the truncate command is not covered by a temporal request query, so if you truncate your table, you cannot use this command to get it back!

If a temporal request do not work, it’s time to call your DBA. They’ll give you some options for retrieving the data from regular backups or disaster recovery databases that they have set up for situations such as this.

How to identify:

Again, check the cardinalities of the tables you are working with. If the cardinality for any table is zero, you know you’ve got an empty table.

Summary

In this article, we covered the most common types of non-syntactic errors that you may encounter as an SQL developer. Of course, there are many more non-syntactic errors beyond the ones we’ve discussed here, and we encourage you to post any issues you encounter in the comment section below.

As you already know, the best (and perhaps only) way to gain experience is by practice. There’s no better way to master your SQL skills than through our Vertabelo Academy courses. Get started today!

Aldo Zelen

Aldo is a data architect with a passion for the cloud. From leading a team of data professionals to coding a data warehouse in the cloud, Aldo has experience with the whole lifecycle of data-intensive projects. Aldo spends his free time forecasting geopolitical events in forecasting tournaments.