What you'll practice: the
OVERclause,PARTITION BY,ORDER BYinside a window,ROW_NUMBER,RANK,DENSE_RANK,LAG,LEAD,FIRST_VALUE,LAST_VALUE, and aggregates as window functions (SUM() OVER,AVG() OVER).
Window functions are the single most powerful feature in standard SQL that most intermediate users never touch. They let you compute "for each row, something that depends on a set of other rows" — running totals, ranks, moving averages, row-over-row differences — without collapsing the result set with a GROUP BY.
A window function is like a GROUP BY aggregate that does not reduce the row count. Instead of computing one value per group, it computes one value per row, where that value is calculated over a "window" of related rows.
The magic word is OVER. Anywhere you'd normally write an aggregate like SUM(salary), you can write SUM(salary) OVER (PARTITION BY department ORDER BY hired_on) and get a "running total by department, in hire order" attached to every employee row.
- Attach a window to an aggregate with
OVER () - Partition a window with
PARTITION BY - Order a window with
ORDER BYinside theOVER - Use the ranking functions:
ROW_NUMBER,RANK,DENSE_RANK,NTILE - Use the value functions:
LAG,LEAD,FIRST_VALUE,LAST_VALUE - Use aggregates as window functions:
SUM,AVG,COUNT,MIN,MAX - Understand frame clauses (
ROWS BETWEEN ...) at a high level
psql -U postgres -d sql_exercise -f 04-advanced-databases/14-window-functions/schema.sql
psql -U postgres -d sql_exercise -f 04-advanced-databases/14-window-functions/seed.sql
psql -U postgres -d sql_exerciseInside psql:
SET search_path TO adv_windows;Run both of these and look at the output:
SELECT department, AVG(salary)::INT FROM employee GROUP BY department;vs.
SELECT department, full_name, salary,
AVG(salary) OVER (PARTITION BY department)::INT AS dept_avg
FROM employee;Question: how many rows does each query return? What did OVER (PARTITION BY department) buy you over GROUP BY department?
Return full_name, department, salary, dept_avg, and salary - dept_avg. The goal: show each employee's salary next to how it compares to their department's average, in one pass.
Assign a per-department row number so the highest-paid employee in each department is 1, the next is 2, and so on.
Hint:
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rnBuilding on Exercise 3, return only the top earner per department. (Hint: wrap the query in a sub-query or CTE and filter WHERE rn = 1.)
Expected: 4 rows — one per department.
Same idea, but return the top two per department.
Return full_name, department, salary, and three columns:
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC)RANK() OVER (PARTITION BY department ORDER BY salary DESC)DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC)
Question: notice the difference between RANK and DENSE_RANK on data (Grace and Donald? Or wherever there's a tie in your data). When is DENSE_RANK "the right answer"?
Split all employees into four pay quartiles and show which quartile each falls into.
Hint: NTILE(4) OVER (ORDER BY salary).
For every sale row, return rep_name, sold_at, amount, and a running total of that rep's sales up to and including that date.
Hint:
SUM(amount) OVER (PARTITION BY rep_name ORDER BY sold_at) AS running_totalShow each sale next to the previous sale (in time order, per rep) and the difference.
Hint:
LAG(amount) OVER (PARTITION BY rep_name ORDER BY sold_at) AS prev_amount,
amount - LAG(amount) OVER (PARTITION BY rep_name ORDER BY sold_at) AS diffSame as Exercise 9 but showing the next sale's amount per rep.
For every sale row, show the rep_name, sold_at, amount, and two extra columns:
first_sale_amount— the rep's very first sale amountlast_sale_amount— the rep's very last sale amount
Hint:
FIRST_VALUE(amount) OVER (PARTITION BY rep_name ORDER BY sold_at)
LAST_VALUE(amount) OVER (PARTITION BY rep_name ORDER BY sold_at
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)Question: why does LAST_VALUE need the explicit frame clause? (This is the window function "gotcha" everyone hits once.)
First, aggregate sales to (day, total). Then attach a 3-day (current + 2 previous) moving average to every day.
Hint:
WITH daily AS (
SELECT sold_at AS day, SUM(amount) AS total
FROM sale
GROUP BY sold_at
)
SELECT day, total,
AVG(total) OVER (
ORDER BY day
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
)::INT AS moving_avg
FROM daily
ORDER BY day;For every employee, return full_name, department, salary, and pct_of_dept_total — their salary as a percentage of their department's total salary spend.
Hint: 100.0 * salary / SUM(salary) OVER (PARTITION BY department).
For every employee, return full_name, salary, and a cume_dist column using CUME_DIST() OVER (ORDER BY salary). Also add PERCENT_RANK() OVER (ORDER BY salary).
Question: what does CUME_DIST mean, and how is it different from PERCENT_RANK?
For every employee, return full_name, hired_on, and the number of days between their hire date and the next hire date (chronologically). Hint: LEAD(hired_on) OVER (ORDER BY hired_on) - hired_on.
Then wrap the query in a CTE and return the single largest gap.
For each month, compute the total sales. Then use a window function to attach the overall average monthly sales as a column, and return only months whose total exceeds that overall average.
Hint:
WITH monthly AS (
SELECT DATE_TRUNC('month', sold_at)::DATE AS month, SUM(amount) AS total
FROM sale GROUP BY 1
),
with_avg AS (
SELECT month, total, AVG(total) OVER () AS overall_avg FROM monthly
)
SELECT month, total, overall_avg::INT
FROM with_avg
WHERE total > overall_avg;Several window functions in one query can share a named window. Rewrite Exercise 6 using WINDOW w AS (PARTITION BY department ORDER BY salary DESC) and reference it with ROW_NUMBER() OVER w.
Classic interview question: given the sale dates, find the start and end of each contiguous "island" of dates where the same rep made sales on consecutive days (if any). Hint: the "gaps and islands" trick uses ROW_NUMBER() minus a date sequence number.
Add a pct_change column to your Exercise 9 result that shows the percent change from the previous sale. Handle the first row (no previous) by returning NULL.
DROP SCHEMA IF EXISTS adv_windows CASCADE;