SQL Window Functions: Detailed Examples with Real-Life Use Cases

In SQL, window functions allow you to perform operations across a set of rows related to the current row without collapsing the results. This is powerful for complex analyses like running totals, moving averages, ranking, etc., while keeping all the rows in the result set.

In this blog, we will dive into SQL window functions and demonstrate their real-time applications with detailed examples and expected outputs.

What Are Window Functions?

A window function performs a calculation across a set of rows that are related to the current row, similar to how aggregate functions work. However, window functions do not collapse the result set into a single row like aggregates do. This allows for row-level analysis while performing operations like ranking, running totals, moving averages, etc.

Basic Syntax for Window Functions

The basic syntax for a window function is:

<window_function>(<expression>) OVER (PARTITION BY <column> ORDER BY <column>)

<window_function>: The function to perform on the data (e.g., ROW_NUMBER(), SUM(), RANK(), etc.).
<expression>: The column or expression to apply the function on.
PARTITION BY <column>: Optional. It divides the result set into partitions (groups of rows).
ORDER BY <column>: Specifies how rows are ordered within the partition for the window function to apply.

Let’s now explore different window functions with real-life examples, expected outputs, and explanations.

1. Ranking Functions

Ranking functions assign a rank to each row within a partition, typically based on a specified ordering. These functions are extremely useful when you want to rank values or create a leaderboard.

1.1 ROW_NUMBER()

The ROW_NUMBER() function assigns a unique sequential integer to each row within the partition. The numbering starts from 1 for each partition.

Example Use Case: Ranking Salespeople by Sales Amount

SELECT salesperson_id, sale_date, amount,
       ROW_NUMBER() OVER (PARTITION BY salesperson_id ORDER BY amount DESC) AS rank
FROM sales;

Expected Output:

salesperson_id	sale_date	amount	rank
1	2024-01-02	700	1
1	2024-01-01	500	2
2	2024-01-02	800	1
2	2024-01-01	600	2
3	2024-01-01	450	1

Explanation:

ROW_NUMBER() ranks salespeople based on the amount they sold in descending order.

1.2 RANK()

The RANK() function assigns ranks to rows, but unlike ROW_NUMBER(), it handles ties by assigning the same rank to rows with equal values. However, there is a gap in the subsequent rank.

Example Use Case: Ranking Employees Based on Performance

SELECT employee_id, performance_score,
       RANK() OVER (ORDER BY performance_score DESC) AS rank
FROM employees;

Expected Output:

employee_id	performance_score	rank
2	92	1
3	92	1
1	85	3
5	85	3
4	78	5

Explanation:

Employees with the same performance score (92) get the same rank (1). The next rank is 3, skipping 2.

1.3 DENSE_RANK()

DENSE_RANK() works like RANK(), but it does not leave gaps in the ranking. If two rows have the same value, they get the same rank, but the next rank is immediately after the tied rank.

Example Use Case: Ranking Products Based on Sales Volume

SELECT product_id, sales_volume,
       DENSE_RANK() OVER (ORDER BY sales_volume DESC) AS rank
FROM products;

Expected Output:

product_id	sales_volume	rank
105	800	1
104	700	2
102	600	3
101	500	4
103	500	4

Explanation:

Products with equal sales volume (500) get the same rank, and the next rank is 4 without any gaps.

1.4 NTILE()

NTILE(n) divides the result set into n equal parts and assigns a bucket number to each row. The number of buckets must be specified (e.g., NTILE(4) for quartiles).

Example Use Case: Dividing Salespeople into Quartiles Based on Sales

SELECT salesperson_id, amount,
       NTILE(4) OVER (ORDER BY amount DESC) AS quartile
FROM sales;

Expected Output:

salesperson_id	amount	quartile
2	800	1
1	700	1
2	600	2
1	500	3
3	450	4

Explanation:

The sales data is divided into 4 quartiles, with the highest sales in the 1st quartile and the lowest in the 4th.

2. Value Functions

These functions allow you to examine values from preceding or subsequent rows relative to the current row.

2.1 LAG()

LAG() allows you to access the value from a previous row within the same partition. This is useful for comparisons between consecutive rows.

Example Use Case: Comparing Daily Sales with Previous Day’s Sales

SELECT salesperson_id, sale_date, amount,
       LAG(amount) OVER (PARTITION BY salesperson_id ORDER BY sale_date) AS prev_sale
FROM sales;

Expected Output:

salesperson_id	sale_date	amount	prev_sale
1	2024-01-01	500	NULL
1	2024-01-02	700	500
2	2024-01-01	600	NULL
2	2024-01-02	800	600
3	2024-01-01	450	NULL

Explanation:

The LAG() function returns the previous day’s sales. For the first row, there is no previous sale, so the result is NULL.

2.2 LEAD()

LEAD() works like LAG(), but it provides access to the value from a subsequent row.

Example Use Case: Comparing Current Sales with Next Day’s Sales

SELECT salesperson_id, sale_date, amount,
       LEAD(amount) OVER (PARTITION BY salesperson_id ORDER BY sale_date) AS next_sale
FROM sales;

Expected Output:

salesperson_id	sale_date	amount	next_sale
1	2024-01-01	500	700
1	2024-01-02	700	NULL
2	2024-01-01	600	800
2	2024-01-02	800	NULL
3	2024-01-01	450	NULL

Explanation:

The LEAD() function shows the sales for the next day. If there’s no next sale (like on the last day), it returns NULL.

2.3 FIRST_VALUE() and LAST_VALUE()

FIRST_VALUE() and LAST_VALUE() allow you to access the first and last values within the window frame.

Example Use Case: Finding First and Last Sale for Each Salesperson

SELECT salesperson_id, sale_date, amount,
       FIRST_VALUE(amount) OVER (PARTITION BY salesperson_id ORDER BY sale_date) AS first_sale,
       LAST_VALUE(amount) OVER (PARTITION BY salesperson_id ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_sale
FROM sales;

Expected Output:

salesperson_id	sale_date	amount	first_sale	last_sale
1	2024-01-01	500	500	700
1	2024-01-02	700	500	700
2	2024-01-01	600	600	800
2	2024-01-02	800	600	800
3	2024-01-01	450	450	450

Explanation:

FIRST_VALUE() returns the first sale for each salesperson, while LAST_VALUE() returns the last sale.

3. Aggregate Functions

These functions compute an aggregate value over a window frame, similar to traditional aggregate functions like SUM() and AVG(), but with the ability to retain row-level detail.

3.1 SUM()

SUM() calculates the cumulative sum of a column over the window frame.

Example Use Case: Running Total of Sales

SELECT salesperson_id, sale_date, amount,
       SUM(amount) OVER (PARTITION BY salesperson_id ORDER BY sale_date) AS running_total
FROM sales;

Expected Output:

salesperson_id	sale_date	amount	running_total
1	2024-01-01	500	500
1	2024-01-02	700	1200
2	2024-01-01	600	600
2	2024-01-02	800	1400
3	2024-01-01	450	450

Explanation:

SUM() calculates the cumulative sales total for each salesperson, with the totals updating for each row.

3.2 AVG()

AVG() calculates the moving average over a window frame.

Example Use Case: 7-Day Moving Average of Sales

SELECT salesperson_id, sale_date, amount,
       AVG(amount) OVER (PARTITION BY salesperson_id ORDER BY sale_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS moving_avg
FROM sales;

Expected Output:

salesperson_id	sale_date	amount	moving_avg
1	2024-01-01	500	500
1	2024-01-02	700	600
2	2024-01-01	600	600
2	2024-01-02	800	700
3	2024-01-01	450	450

Explanation:

AVG() calculates the moving average for each salesperson over the past 7 days, including the current sale.