*Positional SQL window functions deal with data’s location in the set. In this post, we explain LEAD, LAG, and other positional functions.*

SQL window functions allow us to aggregate data while still using individual row values. We’ve already dealt with ranking functions and the use of partitions. In this post, we’ll examine positional window functions, which are **extremely helpful in reporting** and **summarizing data**. Specifically, we’ll look at `LAG`

, `LEAD`

, `FIRST_VALUE`

and `LAST_VALUE`

.

It is worthwhile mentioning that `LEAD`

mirrors `LAG`

and `FIRST_VALUE`

mirrors `LAST_VALUE`

. In some ways, these pairs can be used interchangeably.

We’re getting into a more complex area of SQL window functions now. As we go through this post, you’ll notice that some problems are solved more easily using window functions (as compared to using nested complex subquery combinations). Don’t worry if you feel a little bit overwhelmed at first. The topic isn’t easy and it takes time to master it. Just keep learning and practicing!

### Introducing SQL’s Positional Windows Functions

Once again, we’ll be working with the ** sales** table. This table tracks basic sales data using the following columns:

`id`

” – The primary key of the table.`salesman_id`

– The ID of the person who made the sale.`sales_item`

– The ID of the item sold.`sales_num`

– The number of items sold.`sales_price`

– The price per individual item.`datetime`

– The date of the sale.

Now, let’s start with the `FIRST VALUE`

function.

### FIRST VALUE

First, we are going to look at the analytical function called `FIRST VALUE`

. As the name suggests, this functions returns the **first** value in an ordered list of values.

Suppose we want to extract the** first six months of data** from our

table, but we want to list the salesman with the **sales****biggest sale volume first** for each month. To accomplish this, we use the `FIRST VALUE`

function with the following arguments:

`first_value(salesman_id) over (partition by extract(MONTH FROM datetime) order by sales_num desc) `

We are feeding the `FIRST VALUE`

function with the `salesman_id`

argument over the window of a month where the data is sorted in descending order based on the `sales_num`

” column. We expect the salesman ID with the largest associated sale volume to be listed first for each month.

SELECT salesman_id, sales_item, sales_num, sales_price, extract(MONTH FROM datetime) month, first_value(salesman_id) over (partition by extract(MONTH FROM datetime) order by sales_num desc) as TOP_SALESMAN FROM sales WHERE extract(MONTH FROM datetime) <= 6 ORDER BY extract(MONTH FROM datetime);

Does the result live up to the expectation?

SALESMAN _ID |
SALES _ITEM |
SALES _NUM |
SALES _PRICE |
MONTH | TOP_ SALESMAN |
---|---|---|---|---|---|

10 | 8 | 22 | 160.5 | 1 | 10 |

70 | 80 | 2 | 60.5 | 1 | 10 |

10 | 6 | 550 | 16.1 | 2 | 10 |

30 | 25 | 50 | 26.1 | 2 | 10 |

30 | 25 | 330 | 12.1 | 3 | 30 |

10 | 10 | 50 | 55.1 | 3 | 30 |

40 | 200 | 40 | 10.5 | 3 | 30 |

70 | 200 | 22 | 10.5 | 3 | 30 |

10 | 200 | 20 | 2.1 | 3 | 30 |

10 | 200 | 20 | 1.1 | 3 | 30 |

50 | 200 | 15 | 110 | 3 | 30 |

60 | 200 | 8 | 30.5 | 3 | 30 |

30 | 200 | 7 | 50 | 3 | 30 |

10 | 100 | 4 | 50.5 | 3 | 30 |

10 | 40 | 40 | 5.1 | 4 | 10 |

30 | 25 | 30 | 2.1 | 5 | 30 |

50 | 600 | 215 | 10 | 6 | 50 |

**Try It Yourself:** You can simulate the results of using `FIRST_VALUE`

with other functions. See if you can find any without using the `RANK`

function.

### LAST VALUE

`LAST VALUE`

is the opposite of `FIRST VALUE`

; it returns the **last** value of an ordered list of values.

Let’s take the previous example and simply replace `FIRST VALUE`

with the `LAST VALUE`

:

SELECT salesman_id, sales_item, sales_num, sales_price, extract(MONTH FROM datetime) month, last_value(salesman_id) over (partition by extract(MONTH FROM datetime) order by sales_num desc) as TOP_SALESMAN FROM sales WHERE extract(MONTH FROM datetime) <= 6 ORDER BY extract(MONTH FROM datetime);

We get this result:

SALESMAN _ID |
SALES _ITEM |
SALES _NUM |
SALES _PRICE |
MONTH | TOP_ SALESMAN |
---|---|---|---|---|---|

10 | 8 | 22 | 160.5 | 1 | 10 |

70 | 80 | 2 | 60.5 | 1 | 70 |

10 | 6 | 550 | 16.1 | 2 | 10 |

30 | 25 | 50 | 26.1 | 2 | 30 |

30 | 25 | 330 | 12.1 | 3 | 30 |

10 | 10 | 50 | 55.1 | 3 | 10 |

40 | 200 | 40 | 10.5 | 3 | 40 |

70 | 200 | 22 | 10.5 | 3 | 70 |

10 | 200 | 20 | 2.1 | 3 | 10 |

10 | 200 | 20 | 1.1 | 3 | 10 |

50 | 200 | 15 | 110 | 3 | 50 |

60 | 200 | 8 | 30.5 | 3 | 60 |

30 | 200 | 7 | 50 | 3 | 30 |

10 | 100 | 4 | 50.5 | 3 | 10 |

10 | 40 | 40 | 5.1 | 4 | 10 |

30 | 25 | 30 | 2.1 | 5 | 30 |

50 | 600 | 215 | 10 | 6 | 50 |

Now the values in `salesman_id`

that have the **smallest** associated `sales_number`

are listed first, just as we’d expect.

**Try It Yourself:** `LAST VALUE`

and `FIRST_VALUE`

are interchangeable – you can receive the same result you expect from `LAST VALUE`

using `FIRST VALUE`

. I will leave it up to you to figure out how.

There, that wasn’t too bad. Let’s move on to something more complex: the `LEAD`

and `LAG`

functions. You should note that, like `FIRST VALUE`

and `LAST VALUE`

, these have an * opposite yet interchangeable* relationship. Also, know that these are usually used in calculations as *helper* functions that show some **difference** between current value and the leading or lagging value.

We’ll start with the `LAG`

function.

### LAG

`LAG`

lets you access the information in another row without using a `SELF JOIN`

command. You simply specify the column name (the value) and how many rows back the data is. Once again, we are using this function with a window of data.

**Important:** `LAG`

returns the value from a row located **before** the current row in the specified sorting order.

The syntax of the function looks like this:

`LAG(value, offset, default_value)`

`value`

is the column we want to access`offset`

is the number of previous rows we skip to get to the value`default_value`

is what the function returns if there is no value in the row

`LEAD`

/ `LAG`

functions is usually used in SQL queries as a “helper” function in calculations that show some difference between current value and leading/lagging value.

Let’s try a simple `LAG`

function in our sales dataset:

SELECT salesman_id, sales_item, sales_num, sales_price, datetime as DATE, lag(sales_num,1,0) over (partition by salesman_id order by datetime desc) as LAG_SALES FROM sales WHERE extract(MONTH FROM datetime) <= 6 ORDER BY salesman_id,datetime desc;

Here the `LAG_SALES`

column contains sales numbers that are lagging based on the datetime column. The column is reset for every salesman.

SALESMAN _ID |
SALES _ITEM |
SALES _NUM |
SALES _PRICE |
DATE | LAG_ SALES |
---|---|---|---|---|---|

10 | 40 | 40 | 5.1 | 14-APR-17 | 0 |

10 | 200 | 20 | 2.1 | 20-MAR-17 | 40 |

10 | 10 | 50 | 55.1 | 14-MAR-17 | 20 |

10 | 200 | 20 | 1.1 | 07-MAR-17 | 50 |

10 | 100 | 4 | 50.5 | 01-MAR-17 | 20 |

10 | 6 | 550 | 16.1 | 07-FEB-17 | 4 |

10 | 8 | 22 | 160.5 | 15-JAN-17 | 550 |

30 | 25 | 30 | 2.1 | 25-MAY-17 | 0 |

30 | 25 | 330 | 12.1 | 25-MAR-17 | 30 |

30 | 200 | 7 | 50 | 01-MAR-17 | 330 |

30 | 25 | 50 | 26.1 | 03-FEB-17 | 7 |

40 | 200 | 40 | 10.5 | 02-MAR-17 | 0 |

50 | 600 | 215 | 10 | 03-JUN-17 | 0 |

50 | 200 | 15 | 110 | 03-MAR-17 | 215 |

60 | 200 | 8 | 30.5 | 04-MAR-17 | 0 |

70 | 200 | 22 | 10.5 | 05-MAR-17 | 0 |

70 | 80 | 2 | 60.5 | 05-JAN-17 | 22 |

We can try a more complex example. We know who are our top salesmen, in terms of volume; let’s explore the difference between the sales volume of the current salesman and that of the top salesman. We’ll use a six-month timeframe again.

We’ll compute this in two steps.

#### Step 1

In this step, we are calculating `LAG`

sales numbers. We are gathering `sales_numbers`

values, offset by 1. If we do not find any row preceding the value, we put in the default value of 0.

SELECT salesman_id, sales_item, sales_num, sales_price, extract(MONTH FROM datetime) month, leg(sales_num,1,0) over (order by sales_num desc) as LAG_SALES FROM sales WHERE extract(MONTH FROM datetime) <= 6 ORDER BY sales_num desc;

Here is the result:

SALESMAN _ID |
SALES _ITEM |
SALES _NUM |
SALES _PRICE |
MONTH | LAG _SALES |
---|---|---|---|---|---|

10 | 6 | 550 | 16.1 | 2 | 0 |

30 | 25 | 330 | 12.1 | 3 | 550 |

50 | 600 | 215 | 10 | 6 | 330 |

30 | 25 | 50 | 26.1 | 2 | 215 |

10 | 10 | 50 | 55.1 | 3 | 50 |

40 | 200 | 40 | 10.5 | 3 | 50 |

10 | 40 | 40 | 5.1 | 4 | 40 |

30 | 25 | 30 | 2.1 | 5 | 40 |

10 | 8 | 22 | 160.5 | 1 | 30 |

70 | 200 | 22 | 10.5 | 3 | 22 |

10 | 200 | 20 | 2.1 | 3 | 22 |

10 | 200 | 20 | 1.1 | 3 | 20 |

50 | 200 | 15 | 110 | 3 | 20 |

60 | 200 | 8 | 30.5 | 3 | 15 |

30 | 200 | 7 | 50 | 3 | 8 |

10 | 100 | 4 | 50.5 | 3 | 7 |

70 | 80 | 2 | 60.5 | 1 | 4 |

#### Step 2

Now we will compute the difference between the volume of the current row and the volume of the previous row:

SELECT salesman_id, sales_item, sales_num, sales_price, extract(MONTH FROM datetime) month, ABS(leg(sales_num,1,0) over (order by sales_num desc)-sales_num) as DIFFERENCE_SALES FROM sales WHERE extract(MONTH FROM datetime) <= 6 ORDER BY sales_num desc;

The result is:

SALESMAN _ID |
SALES _ITEM |
SALES _NUM |
SALES _PRICE |
MONTH | DIFFERENCE _SALES |
---|---|---|---|---|---|

10 | 6 | 550 | 16.1 | 2 | 550 |

30 | 25 | 330 | 12.1 | 3 | 220 |

50 | 600 | 215 | 10 | 6 | 115 |

30 | 25 | 50 | 26.1 | 2 | 165 |

10 | 10 | 50 | 55.1 | 3 | 0 |

40 | 200 | 40 | 10.5 | 3 | 10 |

10 | 40 | 40 | 5.1 | 4 | 0 |

30 | 25 | 30 | 2.1 | 5 | 10 |

10 | 8 | 22 | 160.5 | 1 | 8 |

70 | 200 | 22 | 10.5 | 3 | 0 |

10 | 200 | 20 | 2.1 | 3 | 2 |

10 | 200 | 20 | 1.1 | 3 | 0 |

50 | 200 | 15 | 110 | 3 | 5 |

60 | 200 | 8 | 30.5 | 3 | 7 |

30 | 200 | 7 | 50 | 3 | 1 |

10 | 100 | 4 | 50.5 | 3 | 3 |

70 | 80 | 2 | 60.5 | 1 | 2 |

Now we can continue our analysis on the new column and see some interesting trends. We see that the top performers in sales volume are better then the rest by couple of orders of magnitude. Where we find zero we know that the salesman are selling the same amount of goods. See the resulting data-set and try to come up with some of your own insights.

### LEAD

`LEAD`

is an analytical function that **returns the value of an argument column** that is offset-ed by an inputted number or rows in the partition window. The function is extracting the offset downstream from the supplied execution order.

Like `LAG`

, the `LEAD`

function returns the value from some row **after** the current row (for example: the next row, or the row two rows after current), in the specified sorting order.

So the difference is in the direction in which we take the offset-ed value.

The syntax of the function looks the same as with the `LAG`

function (except the keyword):

`LEAD(value, offset, default_value)`

`value`

is the column we are calculating upon`offset`

is the number of rows lagging we skip to get to the value`default_value`

is the value the function returns if there is no value left in the lead

To demonstrate this function let’s get the same result we got from the `LAG`

example. To achieve this we will change one small thing besides the `LAG`

keyword.

This is the query:

SELECT salesman_id, sales_item, sales_num, sales_price, extract(MONTH FROM datetime) month, ABS(lead(sales_num,1,0) over (order by sales_num desc)-sales_num) as DIFFERENCE_SALES FROM sales WHERE extract(MONTH FROM datetime) <= 6 ORDER BY sales_num desc;

Can you spot the difference?

(*Hint:* This was the challenge in the `FIRST_VALUE`

/ `LAST_VALUE`

demonstration.)

### Conclusion

On your learning path, I encourage you to practice, practice, practice 😉 There is no better place for that than Vertabelo Academy.

The new functions are best mastered and practiced through the excellent hand-on courses of **Vertabelo Academy**. Practice coding as much as you can and use this article as a side companion, this is the best way to learn advanced SQL.