As you start coding in SQL, you will use some statements and techniques over and over again. We call these “SQL patterns”. This series will look at the most common SQL patterns and consider how to use them.
Previously, we looked at the SQL pattern of matching NULLs. This is important when you are comparing columns containing NULL values. Today, we’re going to consider another SQL practice: conditional summarization with CASE operator.
What is Conditional Summarization?
When you are using aggregate functions to create report queries, you’ll frequently find yourself using conditional summarization with the
CASE operator. Remember that
CASE returns a value based on defined criteria. (For more about the
CASE expression, see this post and this one.) When you perform a summarization with
CASE, you’re simply adding up (summarizing) values that meet the
Obviously, you’ll be using the
SUM part of the query to aggregate these values. I know this sounds complicated, but it is not. Let’s use a simple example to explain it.
We’ll start by considering a basic
transactions table that contains transactional data from a small company.
transactions table has these columns:
id– A unique identifier for each transaction
datetime– The timestamp for the transaction
customer– The customer’s ID
creditcard– The ID of the type of credit card used
amount– The transaction amount, in dollars
account– The customer’s account number
type– The transaction type
Here’s the data we’d find in a table like this:
We want to find the sum of transaction amounts and the number of transactions completed before April 1st. Furthermore, we wanted these listed by individual customer. We could achieve this using the following query :
SELECT customer, SUM( CASE WHEN datetime >= TIMESTAMP '2017-04-01' THEN amount ELSE 0 END ) AS sum_amount_after, SUM(CASE WHEN datetime >= TIMESTAMP '2017-04-01' THEN 1 ELSE 0 END) AS transaction_count_after, SUM(CASE WHEN datetime < TIMESTAMP '2017-04-01' THEN amount ELSE 0 END) AS sum_amount_prior, SUM(CASE WHEN datetime < TIMESTAMP '2017-04-01' THEN 1 ELSE 0 END) AS transaction_count_prior FROM transactions WHERE datetime BETWEEN TIMESTAMP '2017-01-01' AND '2017-05-01' GROUP BY customer
The result of this query is:
Wait! How Did That Work?
This long query can be confusing, so let’s break it down a little. We will concentrate first on the part dealing with the April 1 cutoff (2017-04-01).
— Vertabelo (@Vertabelo) June 6, 2017
Below, we are looking at the transaction amount for customer ‘1’. Any transaction amounts that posted before 01.04.2017 will be set to “0”. We will name this column
SELECT customer, datetime, CASE WHEN datetime >= TIMESTAMP '2017-04-01' THEN amount ELSE 0 END AS amount_after, amount FROM transactions WHERE datetime BETWEEN TIMESTAMP '2017-01-01' AND '2017-05-01' AND customer = '1' ORDER BY datetime;
Alternately, we can replace the zeros shown in the results with a NULL in the
SELECT customer, datetime, CASE WHEN datetime >= TIMESTAMP '2017-04-01' THEN amount ELSE null END AS amount_after, amount FROM transactions WHERE datetime BETWEEN TIMESTAMP '2017-01-01' AND '2017-05-01' AND customer = '1' ORDER BY datetime;
Now if we summarize these columns, we will get:
- The total of all transactions for
- The sum of all transactions that posted after 01.04. Any transactions that posted before 01.04 are set to zero (or NULL) for
If we want to count how many transactions were posted after 01.04, we can modify the query and create a
COUNT statement that uses the same
CASE with NULL in the
SELECT customer, count(CASE WHEN datetime >= TIMESTAMP '2017-04-01' THEN amount ELSE NULL END) AS count_after FROM transactions WHERE datetime BETWEEN TIMESTAMP '2017-01-01' AND '2017-05-01' GROUP BY customer ORDER BY customer;
Note: This query is really fast, as the RDBMS only needs to access one table. Building aggregation queries that use just one table is a good way to get results quickly.
CASE with a zero in the
COUNT statement. What is the result and why? Tell us in the comments section below.