SQL Window Functions: Detailed Examples with Real-Life Use Cases

In SQL, window functions allow you to perform operations across a set of rows related to the current row without collapsing the results. This is powerful for complex analyses like running totals, moving averages, ranking, etc., while keeping all the rows in the result set.

In this blog, we will dive into SQL window functions and demonstrate their real-time applications with detailed examples and expected outputs.


What Are Window Functions?

A window function performs a calculation across a set of rows that are related to the current row, similar to how aggregate functions work. However, window functions do not collapse the result set into a single row like aggregates do. This allows for row-level analysis while performing operations like ranking, running totals, moving averages, etc.

Basic Syntax for Window Functions

The basic syntax for a window function is:

<window_function>(<expression>) OVER (PARTITION BY <column> ORDER BY <column>)
  • <window_function>: The function to perform on the data (e.g., ROW_NUMBER(), SUM(), RANK(), etc.).

  • <expression>: The column or expression to apply the function on.

  • PARTITION BY <column>: Optional. It divides the result set into partitions (groups of rows).

  • ORDER BY <column>: Specifies how rows are ordered within the partition for the window function to apply.

Let’s now explore different window functions with real-life examples, expected outputs, and explanations.

1. Ranking Functions

Ranking functions assign a rank to each row within a partition, typically based on a specified ordering. These functions are extremely useful when you want to rank values or create a leaderboard.

1.1 ROW_NUMBER()

The ROW_NUMBER() function assigns a unique sequential integer to each row within the partition. The numbering starts from 1 for each partition.

Example Use Case: Ranking Salespeople by Sales Amount

SELECT salesperson_id, sale_date, amount,
       ROW_NUMBER() OVER (PARTITION BY salesperson_id ORDER BY amount DESC) AS rank
FROM sales;

Expected Output:

salesperson_idsale_dateamountrank
12024-01-027001
12024-01-015002
22024-01-028001
22024-01-016002
32024-01-014501

Explanation:

  • ROW_NUMBER() ranks salespeople based on the amount they sold in descending order.

1.2 RANK()

The RANK() function assigns ranks to rows, but unlike ROW_NUMBER(), it handles ties by assigning the same rank to rows with equal values. However, there is a gap in the subsequent rank.

Example Use Case: Ranking Employees Based on Performance

SELECT employee_id, performance_score,
       RANK() OVER (ORDER BY performance_score DESC) AS rank
FROM employees;

Expected Output:

employee_idperformance_scorerank
2921
3921
1853
5853
4785

Explanation:

  • Employees with the same performance score (92) get the same rank (1). The next rank is 3, skipping 2.

1.3 DENSE_RANK()

DENSE_RANK() works like RANK(), but it does not leave gaps in the ranking. If two rows have the same value, they get the same rank, but the next rank is immediately after the tied rank.

Example Use Case: Ranking Products Based on Sales Volume

SELECT product_id, sales_volume,
       DENSE_RANK() OVER (ORDER BY sales_volume DESC) AS rank
FROM products;

Expected Output:

product_idsales_volumerank
1058001
1047002
1026003
1015004
1035004

Explanation:

  • Products with equal sales volume (500) get the same rank, and the next rank is 4 without any gaps.

1.4 NTILE()

NTILE(n) divides the result set into n equal parts and assigns a bucket number to each row. The number of buckets must be specified (e.g., NTILE(4) for quartiles).

Example Use Case: Dividing Salespeople into Quartiles Based on Sales

SELECT salesperson_id, amount,
       NTILE(4) OVER (ORDER BY amount DESC) AS quartile
FROM sales;

Expected Output:

salesperson_idamountquartile
28001
17001
26002
15003
34504

Explanation:

  • The sales data is divided into 4 quartiles, with the highest sales in the 1st quartile and the lowest in the 4th.

2. Value Functions

These functions allow you to examine values from preceding or subsequent rows relative to the current row.

2.1 LAG()

LAG() allows you to access the value from a previous row within the same partition. This is useful for comparisons between consecutive rows.

Example Use Case: Comparing Daily Sales with Previous Day’s Sales

SELECT salesperson_id, sale_date, amount,
       LAG(amount) OVER (PARTITION BY salesperson_id ORDER BY sale_date) AS prev_sale
FROM sales;

Expected Output:

salesperson_idsale_dateamountprev_sale
12024-01-01500NULL
12024-01-02700500
22024-01-01600NULL
22024-01-02800600
32024-01-01450NULL

Explanation:

  • The LAG() function returns the previous day’s sales. For the first row, there is no previous sale, so the result is NULL.

2.2 LEAD()

LEAD() works like LAG(), but it provides access to the value from a subsequent row.

Example Use Case: Comparing Current Sales with Next Day’s Sales

SELECT salesperson_id, sale_date, amount,
       LEAD(amount) OVER (PARTITION BY salesperson_id ORDER BY sale_date) AS next_sale
FROM sales;

Expected Output:

salesperson_idsale_dateamountnext_sale
12024-01-01500700
12024-01-02700NULL
22024-01-01600800
22024-01-02800NULL
32024-01-01450NULL

Explanation:

  • The LEAD() function shows the sales for the next day. If there’s no next sale (like on the last day), it returns NULL.

2.3 FIRST_VALUE() and LAST_VALUE()

FIRST_VALUE() and LAST_VALUE() allow you to access the first and last values within the window frame.

Example Use Case: Finding First and Last Sale for Each Salesperson

SELECT salesperson_id, sale_date, amount,
       FIRST_VALUE(amount) OVER (PARTITION BY salesperson_id ORDER BY sale_date) AS first_sale,
       LAST_VALUE(amount) OVER (PARTITION BY salesperson_id ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_sale
FROM sales;

Expected Output:

salesperson_idsale_dateamountfirst_salelast_sale
12024-01-01500500700
12024-01-02700500700
22024-01-01600600800
22024-01-02800600800
32024-01-01450450450

Explanation:

  • FIRST_VALUE() returns the first sale for each salesperson, while LAST_VALUE() returns the last sale.

3. Aggregate Functions

These functions compute an aggregate value over a window frame, similar to traditional aggregate functions like SUM() and AVG(), but with the ability to retain row-level detail.

3.1 SUM()

SUM() calculates the cumulative sum of a column over the window frame.

Example Use Case: Running Total of Sales

SELECT salesperson_id, sale_date, amount,
       SUM(amount) OVER (PARTITION BY salesperson_id ORDER BY sale_date) AS running_total
FROM sales;

Expected Output:

salesperson_idsale_dateamountrunning_total
12024-01-01500500
12024-01-027001200
22024-01-01600600
22024-01-028001400
32024-01-01450450

Explanation:

  • SUM() calculates the cumulative sales total for each salesperson, with the totals updating for each row.

3.2 AVG()

AVG() calculates the moving average over a window frame.

Example Use Case: 7-Day Moving Average of Sales

SELECT salesperson_id, sale_date, amount,
       AVG(amount) OVER (PARTITION BY salesperson_id ORDER BY sale_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS moving_avg
FROM sales;

Expected Output:

salesperson_idsale_dateamountmoving_avg
12024-01-01500500
12024-01-02700600
22024-01-01600600
22024-01-02800700
32024-01-01450450

Explanation:

  • AVG() calculates the moving average for each salesperson over the past 7 days, including the current sale.