SQL Window Functions: Detailed Examples with Real-Life Use Cases
In SQL, window functions allow you to perform operations across a set of rows related to the current row without collapsing the results. This is powerful for complex analyses like running totals, moving averages, ranking, etc., while keeping all the rows in the result set.
In this blog, we will dive into SQL window functions and demonstrate their real-time applications with detailed examples and expected outputs.
What Are Window Functions?
A window function performs a calculation across a set of rows that are related to the current row, similar to how aggregate functions work. However, window functions do not collapse the result set into a single row like aggregates do. This allows for row-level analysis while performing operations like ranking, running totals, moving averages, etc.
Basic Syntax for Window Functions
The basic syntax for a window function is:
<window_function>(<expression>) OVER (PARTITION BY <column> ORDER BY <column>)
<window_function>
: The function to perform on the data (e.g.,ROW_NUMBER()
,SUM()
,RANK()
, etc.).<expression>
: The column or expression to apply the function on.PARTITION BY <column>
: Optional. It divides the result set into partitions (groups of rows).ORDER BY <column>
: Specifies how rows are ordered within the partition for the window function to apply.
Let’s now explore different window functions with real-life examples, expected outputs, and explanations.
1. Ranking Functions
Ranking functions assign a rank to each row within a partition, typically based on a specified ordering. These functions are extremely useful when you want to rank values or create a leaderboard.
1.1 ROW_NUMBER()
The ROW_NUMBER()
function assigns a unique sequential integer to each row within the partition. The numbering starts from 1 for each partition.
Example Use Case: Ranking Salespeople by Sales Amount
SELECT salesperson_id, sale_date, amount,
ROW_NUMBER() OVER (PARTITION BY salesperson_id ORDER BY amount DESC) AS rank
FROM sales;
Expected Output:
salesperson_id | sale_date | amount | rank |
1 | 2024-01-02 | 700 | 1 |
1 | 2024-01-01 | 500 | 2 |
2 | 2024-01-02 | 800 | 1 |
2 | 2024-01-01 | 600 | 2 |
3 | 2024-01-01 | 450 | 1 |
Explanation:
ROW_NUMBER()
ranks salespeople based on the amount they sold in descending order.
1.2 RANK()
The RANK()
function assigns ranks to rows, but unlike ROW_NUMBER()
, it handles ties by assigning the same rank to rows with equal values. However, there is a gap in the subsequent rank.
Example Use Case: Ranking Employees Based on Performance
SELECT employee_id, performance_score,
RANK() OVER (ORDER BY performance_score DESC) AS rank
FROM employees;
Expected Output:
employee_id | performance_score | rank |
2 | 92 | 1 |
3 | 92 | 1 |
1 | 85 | 3 |
5 | 85 | 3 |
4 | 78 | 5 |
Explanation:
- Employees with the same performance score (92) get the same rank (1). The next rank is 3, skipping 2.
1.3 DENSE_RANK()
DENSE_RANK()
works like RANK()
, but it does not leave gaps in the ranking. If two rows have the same value, they get the same rank, but the next rank is immediately after the tied rank.
Example Use Case: Ranking Products Based on Sales Volume
SELECT product_id, sales_volume,
DENSE_RANK() OVER (ORDER BY sales_volume DESC) AS rank
FROM products;
Expected Output:
product_id | sales_volume | rank |
105 | 800 | 1 |
104 | 700 | 2 |
102 | 600 | 3 |
101 | 500 | 4 |
103 | 500 | 4 |
Explanation:
- Products with equal sales volume (500) get the same rank, and the next rank is 4 without any gaps.
1.4 NTILE()
NTILE(n)
divides the result set into n
equal parts and assigns a bucket number to each row. The number of buckets must be specified (e.g., NTILE(4)
for quartiles).
Example Use Case: Dividing Salespeople into Quartiles Based on Sales
SELECT salesperson_id, amount,
NTILE(4) OVER (ORDER BY amount DESC) AS quartile
FROM sales;
Expected Output:
salesperson_id | amount | quartile |
2 | 800 | 1 |
1 | 700 | 1 |
2 | 600 | 2 |
1 | 500 | 3 |
3 | 450 | 4 |
Explanation:
- The sales data is divided into 4 quartiles, with the highest sales in the 1st quartile and the lowest in the 4th.
2. Value Functions
These functions allow you to examine values from preceding or subsequent rows relative to the current row.
2.1 LAG()
LAG()
allows you to access the value from a previous row within the same partition. This is useful for comparisons between consecutive rows.
Example Use Case: Comparing Daily Sales with Previous Day’s Sales
SELECT salesperson_id, sale_date, amount,
LAG(amount) OVER (PARTITION BY salesperson_id ORDER BY sale_date) AS prev_sale
FROM sales;
Expected Output:
salesperson_id | sale_date | amount | prev_sale |
1 | 2024-01-01 | 500 | NULL |
1 | 2024-01-02 | 700 | 500 |
2 | 2024-01-01 | 600 | NULL |
2 | 2024-01-02 | 800 | 600 |
3 | 2024-01-01 | 450 | NULL |
Explanation:
- The
LAG()
function returns the previous day’s sales. For the first row, there is no previous sale, so the result isNULL
.
2.2 LEAD()
LEAD()
works like LAG()
, but it provides access to the value from a subsequent row.
Example Use Case: Comparing Current Sales with Next Day’s Sales
SELECT salesperson_id, sale_date, amount,
LEAD(amount) OVER (PARTITION BY salesperson_id ORDER BY sale_date) AS next_sale
FROM sales;
Expected Output:
salesperson_id | sale_date | amount | next_sale |
1 | 2024-01-01 | 500 | 700 |
1 | 2024-01-02 | 700 | NULL |
2 | 2024-01-01 | 600 | 800 |
2 | 2024-01-02 | 800 | NULL |
3 | 2024-01-01 | 450 | NULL |
Explanation:
- The
LEAD()
function shows the sales for the next day. If there’s no next sale (like on the last day), it returnsNULL
.
2.3 FIRST_VALUE() and LAST_VALUE()
FIRST_VALUE()
and LAST_VALUE()
allow you to access the first and last values within the window frame.
Example Use Case: Finding First and Last Sale for Each Salesperson
SELECT salesperson_id, sale_date, amount,
FIRST_VALUE(amount) OVER (PARTITION BY salesperson_id ORDER BY sale_date) AS first_sale,
LAST_VALUE(amount) OVER (PARTITION BY salesperson_id ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_sale
FROM sales;
Expected Output:
salesperson_id | sale_date | amount | first_sale | last_sale |
1 | 2024-01-01 | 500 | 500 | 700 |
1 | 2024-01-02 | 700 | 500 | 700 |
2 | 2024-01-01 | 600 | 600 | 800 |
2 | 2024-01-02 | 800 | 600 | 800 |
3 | 2024-01-01 | 450 | 450 | 450 |
Explanation:
FIRST_VALUE()
returns the first sale for each salesperson, whileLAST_VALUE()
returns the last sale.
3. Aggregate Functions
These functions compute an aggregate value over a window frame, similar to traditional aggregate functions like SUM()
and AVG()
, but with the ability to retain row-level detail.
3.1 SUM()
SUM()
calculates the cumulative sum of a column over the window frame.
Example Use Case: Running Total of Sales
SELECT salesperson_id, sale_date, amount,
SUM(amount) OVER (PARTITION BY salesperson_id ORDER BY sale_date) AS running_total
FROM sales;
Expected Output:
salesperson_id | sale_date | amount | running_total |
1 | 2024-01-01 | 500 | 500 |
1 | 2024-01-02 | 700 | 1200 |
2 | 2024-01-01 | 600 | 600 |
2 | 2024-01-02 | 800 | 1400 |
3 | 2024-01-01 | 450 | 450 |
Explanation:
SUM()
calculates the cumulative sales total for each salesperson, with the totals updating for each row.
3.2 AVG()
AVG()
calculates the moving average over a window frame.
Example Use Case: 7-Day Moving Average of Sales
SELECT salesperson_id, sale_date, amount,
AVG(amount) OVER (PARTITION BY salesperson_id ORDER BY sale_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS moving_avg
FROM sales;
Expected Output:
salesperson_id | sale_date | amount | moving_avg |
1 | 2024-01-01 | 500 | 500 |
1 | 2024-01-02 | 700 | 600 |
2 | 2024-01-01 | 600 | 600 |
2 | 2024-01-02 | 800 | 700 |
3 | 2024-01-01 | 450 | 450 |
Explanation:
AVG()
calculates the moving average for each salesperson over the past 7 days, including the current sale.