Oracle Top-N Queries for Absolute Beginners

Oracle Top-N Queries for Absolute Beginners

It’s common to run a query using only part of a dataset – for example, the top 100 salespeople in a company. In this article, we’ll see how to use Oracle’s Top-N query method to query just these rows.

Top-N queries retrieve a defined number of rows (top or bottom) from a result set. In other words, they find the best or worst of something – the ten best selling cars in a certain region, the five most popular routers, the 20 worst-performing stores, etc. These kinds of queries are mostly used in Business Intelligence, where it’s important to track the performance of certain entities and thus find ways of increasing profits or cutting costs.

As database developer or BI developer, your role is to enable users to get results in a timely and dependable manner. In this article, we will explore some common ways of constructing Top-N queries in an Oracle database.

Get Set, Get Ready

First, we need to lay the groundwork for our queries by familiarizing ourselves with some basic analytical functions. If you aren’t familiar with SQL functions, I recommend using Vertabelo Academy, particularly the Standard SQL Functions Course.

Assuming you do have some background in SQL, let’s get started by looking at the data in a simple sales table:


This table contains the following information:

  • id – A unique customer identifier in the system.
  • account – The customer’s account number.
  • basket – Identifies the product group that was sold.
  • amount – The total number of products purchased.
  • sales date – The date of the sale.

Now let’s query this table and see what we have inside:

select * from sales order by amount desc;

Here is the result:


Now suppose you want to query the top five sales, or the five biggest sales. Usually, people execute this query using the Oracle pseudocolumn ROWNUM. ROWNUM returns a number indicating the order that the row is selected from the table. Here’s the code:

select * from sales where rownum <= 5 order by amount desc;

But this is a mistake, as you can see from the result set:


This is because ROWNUM is executed before the ORDER BY statement. To get around this, you use the inline view (a SELECT statement within the FROM clause of another SELECT, as shown below.) This tells Oracle to sort the data first.

select * from 
(select * from sales order by amount desc) 
where rownum <= 5;

And we get the correct result:


However, there is one problem: how we define the best sale. Are the five best sales the first five distinct best? The five non-distinct best? Or the first five ranked sales? The definition of the best sales will come from the business side of things, but figuring out the solution is up to you.

What Is a Top-N Query?

Now that we understand the business problems and the data, let’s move on to Top-N queries. I’ll look at each of these query types:

  • Top-N with ROW_NUMBER
  • Top-N Distinct
  • Top-N with RANK

Top-N queries don’t have a special command or keyword. They utilize an ordered inline view, select results from that view, and then limit the number of rows using ROWNUM.

Note for Oracle 12c users: Oracle 12c introduced a new clause, FETCH FIRST. This allows us to write Top-N queries by adding the FETCH FIRST clause at the end of a basic query. I’ve included examples of this method as well.

Top-N with ROW_NUMBER

This is the classic Top-N. It is non-distinct, meaning that there may be gaps in the ranking if two or more values are the same. The ROW_NUMBER analytical function returns a unique row for each returned row. We can take this query further than we can one with ROWNUM because of the windowing aspect of ROW_NUMBER. To learn more about analytical functions in Oracle, start here.

The query is :

SELECT *
FROM   (SELECT sales.*,
               ROW_NUMBER() OVER (ORDER BY amount DESC) AS amount_dense_rank 
        FROM   sales)
WHERE  amount_dense_rank  <= 3;

And the result we get is:


Oracle 12c Syntax

To get the same result using Oracle 12c, we simply write:

SELECT amount
FROM   sales
ORDER BY amount DESC
FETCH FIRST 3 ROWS ONLY;

And the result we get is:


Top-N Distinct

To get a Top-N with distinct results (i.e. no gaps in the ranked results), we use the DENSE_RANK analytical function. DENSE_RANK is similar to RANK (which we will discuss below), but it returns without any gaps in the results. We would use the Top-N Distinct query type when we want to see all the values and all the corresponding rows.

Here is an example of a Top-N query with DENSE_RANK:

SELECT *
FROM   (SELECT sales.*,
               DENSE_RANK() OVER (ORDER BY amount DESC) AS amount_dense_rank 
        FROM sales)
WHERE  amount_dense_rank  <= 3;

And the result:


We see that the same amounts are given the same rank – there are multiple 1s, 2s, and 3s – but there are no gaps. We will receive the three highest distinct values.

Top-N Non-Distinct with Ranked Values

A Top-N non-distinct query with ranked values will return an N number of the highest values and all the corresponding rows. To implement this, we use the RANK analytical function, which produces a sequential rank for each distinct value in the specified window.

Let’s look at an example with three amounts:

SELECT *
FROM   (SELECT sales.*,
               RANK() OVER (ORDER BY amount DESC) AS amount_rank 
        FROM   sales)
WHERE  amount_rank  <= 3;

And the results :


The RANK function returns the same number for non-distinct values and skips the number of distinct values to keep the rank consistent. If we had three sales that all had amounts of 100, there would be three “first-place” values with the amount of 100 and the next rank would be four.

Oracle 12c Syntax

You can achieve the same ranking results using Oracle 12c’s new syntax feature:

SELECT amount
FROM sales
ORDER BY amount DESC
FETCH FIRST 3 ROWS WITH TIES;

And we get the same result:


We’ve explored three simple methods of creating Top-N queries with analytical functions. And we’ve explored a new syntax available in the Oracle 12c database. While we’ve concentrated on finding the best performers, an inverse query that finds the worst performers is also possible. All you’d need to do is reverse the order of the analytical functions or inline views. Why not put in some practice with creating queries? I recommend the SQL Window Functions online course – the only interactive course for SQL analytical functions on the Internet. See what you can come up with!

Aldo Zelen

Aldo is a data architect with a passion for the cloud. From leading a team of data professionals to coding a data warehouse in the cloud, Aldo has experience with the whole lifecycle of data-intensive projects. Aldo spends his free time forecasting geopolitical events in forecasting tournaments.

comments powered by Disqus

GET ACCESS TO EXPERT CONTENT!

Over 85.000 happy students
and counting!