Common SQL Window Functions: Ranking Functions

Want to learn how to use SQL window functions?  Ranking functions are a good place to start!

Learning about SQL window functions usually comes after you’ve built a foundation in the language, but these powerful functions take your skills up a level. As you master them, you’ll find better ways to solve query problems.

When used for business intelligence applications, SQL queries combine data retrieval and advanced computations. These operations are more complex than those used in OLTP systems. In fact, the SQL standard was expanded with analytical functions specifically to support this kind of activity.

SQL now has several types of analytical functions, including:

  • Ranking functions
  • Aggregation functions
  • Positional functions
  • Statistical functions

Ranking functions are the easiest windows functions to learn; they serve as a nice little gateway to the other, more complicated functions.

SQL Window Functions Basics

Before we dive into ranking functions, it’s worth noting that all analytical function share some basic rules:

  • The processing of the analytical function is done after the GROUP BY processing of the SQL statement and before the ordering of the returned rows.
  • Analytic function execution organizes results into partitions and then computes functions over these partitions in a specified order.
  • Execution of the analytical function occurs after row filtering (WHERE) and grouping (GROUP BY) so the calculations can be performed on the GROUP BY results.

In SQL, window functions go by several names, including windowing functions, OVER functions, and analytical functions.  We’ll stick with “window function” or “analytical function” in this article.

What Are Ranking Functions?

Ranking analytical functions rank data based on the orders stated in the function specification. Let’s look at four ranking functions that each compute rank in a different way: RANK, DENSE RANK, ROW_NUMBER, and NTILE.

We’ll use a simple “sales” table to demonstrate each of these functions. The columns in this table are:

  • id” – The primary key of the table.
  • salesman_id” – The ID of the person who made the sale.
  • sales_item” – The ID of the item sold.
  • sales_num” – The number of items sold.
  • sales_price” – The price per individual item.
  • datetime” – The date of the sale.

RANK

RANK is a simple analytical function. It does not take any column list arguments, like all other rank analytical functions except NTILE. It returns a rank or a number based on the ordering of the rows by some condition.

Let’s say we want to rank our salespeople based on the number of items they’ve sold. We do this using the RANK function over the sales_num column, which records the number of sales for each rep.  Here’s the code:

SELECT id,
  salesman_id,
  sales_item,
  sales_num,
  sales_price,
  rank() over (order by sales_num)
FROM sales
ORDER BY sales_num;

And the result set is displayed as :

ID SALESMAN
_ID
SALES
_ITEM
SALES
_NUM
SALES
_PRICE
RANK()OVER
(ORDERBYSALES_NUM)
1 10 100 4 50.5 1
2 30 200 7 50 2
5 60 200 8 30.5 3
4 50 200 15 110 4
6 70 200 22 10.5 5
3 40 200 40 10.5 6

ORDER BYis mandatory when using RANK, as it displays  records in an ascending or descending order based on our requirements. In the above example, we ranked “sales_num” in ascending order so the largest sales number has a correspondingly higher rank. (Note:  Ascending is the default order in an ORDER BY clause; you do not need to specify it with the ASC keyword.)

If we were to reverse the order by specifying the DESC keyword – which you do have to specify – the sale reps’ ranks would change:

SELECT id,
  salesman_id,
  sales_item,
  sales_num,
  sales_price,
  rank() over (order by sales_num desc)
FROM sales
ORDER BY sales_num;

The results:

ID SALESMAN
_ID
SALES
_ITEM
SALES
_NUM
SALES
_PRICE
RANK()OVER
(ORDERBYSALES_NUMDESC)
1 10 100 4 50.5 6
2 30 200 7 50 5
5 60 200 8 30.5 4
4 50 200 15 110 3
6 70 200 22 10.5 2
3 40 200 40 10.5 1

The ranks have reversed:   the lowest sales number now has the highest rank number.

An interesting situation arises when we have two records with identical values. Let’s look at that ascending example again.  Notice what happens when two salespeople have made the same number of sales:

ID SALESMAN
_ID
SALES
_ITEM
SALES
_NUM
SALES
_PRICE
RANK()OVER
(ORDERBYSALES_NUM)
1 10 100 4 50.5 1
2 30 200 7 50 2
7 80 200 7 20 2
5 60 200 8 30.5 4
4 50 200 15 110 5
6 70 200 22 10.5 6
3 40 200 40 10.5 7

Two reps have the same number of total sales (7) so RANK gives them the same rank (2). The next rank skips the next number (3) and that rep is ranked fourth.

What happened to that third-place spot?  What if we don’t want to skip rank numbers?

DENSE RANK

Like RANK, DENSE RANK takes no parameters. The difference between the two is in the handling of the rank number after records with identical ranking values, as demonstrated above. DENSE RANK is “dense”; it does not return any gaps between records. This is useful for describing any kind of competition where a tie might occur (e.g. two teams tie for second place in a sports tournament).

As we saw in our previous example, two sales reps shared the second place, but the results skipped third place and moved right on to fourth place. This makes no sense.

Let’s rework our previous example, this time using the DENSE_RANK function:

SELECT id,
  salesman_id,
  sales_item,
  sales_num,
  sales_price,
  rank() over (order by sales_num) as rank,
  dense_rank() over (order by sales_num) as dense_rank
FROM sales
ORDER BY sales_num;

And the result set is:

ID SALESMAN
_ID
SALES
_ITEM
SALES
_NUM
SALES
_PRICE
RANK DENSE
_RANK
1 10 100 4 50.5 1 1
2 30 200 7 50 2 2
7 80 200 7 20 2 2
5 60 200 8 30.5 4 3
4 50 200 15 110 5 4
6 70 200 22 10.5 6 5
3 40 200 40 10.5 7 6

Pay attention to the  row with an “8” in the “sales_num” column. RANK skipped “3”  and gave this row a rank of “4”.  If the tying records would have had more than two rows,  RANK would skip them all.  If there were ten identical values, RANK would skip all ten and give the next unique record the rank of 13.   DENSE_RANK keeps it “dense” and just continues, grouping all identical values into one rank and assigning the next unique value the next rank number – in this case, “3”.

ROW NUMBER

ROW_NUMBER function is different than DENSE_RANK and RANK as it does not group records that share ranking values: it simply lists all records in sequential order.  This is quite a commonly used function.

We can expand the results we got from DENSE_RANK by adding ROW_NUMBER over the sales_num column:

SELECT id,
  salesman_id,
  sales_item,
  sales_num,
  sales_price,
  rank() over (order by sales_num) as rank,
  dense_rank() over (order by sales_num) as dense_rank,
  row_number() over (order by sales_num) as row_number
FROM sales
ORDER BY sales_num;

And the result is:

ID SALESMAN
_ID
SALES
_ITEM
SALES
_NUM
SALES
_PRICE
RANK DENSE
_RANK
ROW
_NUMBER
1 10 100 4 50.5 1 1 1
2 30 200 7 50 2 2 2
7 80 200 7 20 2 2 3
5 60 200 8 30.5 4 3 4
4 50 200 15 110 5 4 5
6 70 200 22 10.5 6 5 6
3 40 200 40 10.5 7 6 7

You might wonder which of the identical-value records would receive which ranking.  This is totally undetermined, which means it would change during different executions.  So beware: many errors happen because of the misuse of ROW_NUMBER on datasets with non-unique arguments.

Note:  One interesting use case for ranking functions is filtering data. If we filtered the top two sales by price using RANK, it would look like:

with top_price as (
SELECT id,
  salesman_id,
  sales_item,
  sales_num,
  sales_price,
  RANK() OVER (ORDER BY sales_price) rank_sales_price
FROM sales)
select * from 
top_price 
where rank_sales_price < 3

The results:

ID SALESMAN
_ID
SALES
_ITEM
SALES
_NUM
SALES
_PRICE
RANK
_SALES_PRICE
3 40 200 40 10.5 1
6 70 200 22 10.5 1

NTILE

Unlike the other ranking functions we’ve discussed in this article, NTILE takes arguments. And it’s not widely used. The NTILE function divides rows into equal groups and returns the rank, or number, of these groups.

Suppose we want to divide our sales reps into three groups.  We do this by specifying a 3 as the argument for NTILE (i.e. NTILE(3)) over the sales_num column:

SELECT id,
  salesman_id,
  sales_item,
  sales_num,
  sales_price,
  NTILE(3) over (order by sales_num ) as NTILE
FROM sales
ORDER BY sales_num;

We get the following ranks, which are based on the values in the sales_num column:

ID SALESMAN
_ID
SALES
_ITEM
SALES
_NUM
SALES
_PRICE
NTILE
1 10 100 4 50.5 1
2 30 200 7 50 1
7 80 200 7 20 1
5 60 200 8 30.5 2
4 50 200 15 110 2
6 70 200 22 10.5 3
3 40 200 40 10.5 3

Notice that the first group has three records and the other two groups only have two each. This is how NTILE works.  If there is a remainder in the division of rows, NTILE expands the groups by one apiece until the remainder is exhausted.  In our case, the remainder was one, so only the first group was expanded.

Want to Learn More About SQL Window Functions?

Now that you understand a bit about ranking functions, you’re on your way to the world of SQL window functions.  But don’t just read about them – practice what you’re learning.  I recommend Vertabelo Academy’s Window Functions course.  It’s a good, hands-on way to really dig into using analytical functions to power up your SQL.  And don’t forget to come back to our blog to learn more about the other types of window functions!

Data Warehouse Architect

GET ACCESS TO EXPERT SQL CONTENT!