Skip to content

Latest commit

 

History

History
239 lines (154 loc) · 7.15 KB

File metadata and controls

239 lines (154 loc) · 7.15 KB

Section 8 — Order of SQL

What you'll practice: the logical order in which SQL clauses actually execute, and why it matters. You write a query in the order SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT, but the database evaluates it in a completely different order:

FROM        -> pick your tables / join them
WHERE       -> filter raw rows
GROUP BY    -> collapse rows into groups
HAVING      -> filter groups
SELECT      -> pick (and compute) the columns to return
ORDER BY    -> sort the result
LIMIT       -> take the top N rows

Understanding this order explains every "weird" SQL error you'll hit for the rest of your career.

Learning objectives

  • Recite the logical order of SQL clause execution
  • Understand why you can't reference a SELECT alias in WHERE
  • Understand why HAVING can reference aggregates but WHERE can't
  • Understand why ORDER BY can reference a SELECT alias
  • Reason about what a query returns by mentally executing the clauses in order

How to run these exercises

psql -U postgres -d sql_exercise \
     -f 02-sql-fundamentals/08-order-of-sql/schema.sql

psql -U postgres -d sql_exercise \
     -f 02-sql-fundamentals/08-order-of-sql/seed.sql
SET search_path TO sqlf_order;

SELECT
  (SELECT COUNT(*) FROM customer) AS customers,
  (SELECT COUNT(*) FROM product)  AS products,
  (SELECT COUNT(*) FROM orders)   AS orders;
-- Expected: 15, 15, 50

Exercises

Exercise 1 — The warmup: total revenue

Return the sum of orders.total across every row. Alias it total_revenue.

Hint: SELECT SUM(total) AS total_revenue FROM orders;


Exercise 2 — Filter before aggregating

Return the total revenue from orders placed in 2024.

Hint: WHERE order_date >= '2024-01-01' then SUM(total).

Question: think about the order of execution. WHERE runs before the SUM, so it's filtering rows, not the already-computed total.


Exercise 3 — GROUP BY after filter

For each customer, return the customer_id and the total they've spent across all orders. Sort by total descending.

Hint: GROUP BY customer_id.


Exercise 4 — The alias-in-WHERE trap

Try this query:

SELECT
    customer_id,
    SUM(total) AS spent
FROM orders
WHERE spent > 500       -- <-- this will fail
GROUP BY customer_id;

Question: what error do you get? Why? Explain it in terms of the logical order of execution.

Now fix it two ways:

  1. Move the filter to HAVING.
  2. Keep the logic in WHERE by avoiding the alias entirely (which isn't possible here — why?).

Exercise 5 — HAVING with aggregates

Return every customer who has spent more than $1,000 total. Show customer_id, total_spent.

Hint: SUM(total) in SELECT, HAVING SUM(total) > 1000.


Exercise 6 — WHERE + HAVING together

Return every customer whose 2024 orders total more than $500. Show customer_id and spent_2024.

Hint: WHERE order_date >= '2024-01-01' ... GROUP BY customer_id HAVING SUM(total) > 500.

Question: what would happen if you put the date filter in HAVING instead of WHERE? Would the result be the same? Would the query be slower?


Exercise 7 — ORDER BY can use a SELECT alias

Return each customer's customer_id and their total spending aliased spent, then sort by spent descending.

Hint: ORDER BY spent DESC (works because ORDER BY runs after SELECT).


Exercise 8 — Join + group + filter + sort + limit

Return the top 5 customers (by total spend) who live in either 'USA' or 'UK'. Show the customer's full name (concatenated), their country, and their total spend.

Hint: you'll need to use every clause:

  • FROM customer c JOIN orders o ON o.customer_id = c.id
  • WHERE c.country IN ('USA', 'UK')
  • GROUP BY c.id, c.first_name, c.last_name, c.country
  • ORDER BY total_spent DESC
  • LIMIT 5

Exercise 9 — Which product category drives the most revenue?

Return each product.category with the total revenue from orders of products in that category. Sort descending.

Hint: orders → join productGROUP BY product.categorySUM(orders.total).


Exercise 10 — The WHERE ... aggregate trap

Try this query:

SELECT customer_id, SUM(total)
FROM orders
WHERE SUM(total) > 500     -- <-- this will fail
GROUP BY customer_id;

Question: what error do you get? Explain it using the clause execution order.


Exercise 11 — Top category per year

Return the top 3 (category, year) combinations by total revenue. Show the category, the year (extracted from order_date), and the total revenue. Sort by total revenue descending.

Hint: EXTRACT(YEAR FROM order_date) gives you the year as a number. Then GROUP BY both category and year, and LIMIT 3.


Exercise 12 — Customers who have never ordered a book

Return the first_name and last_name of every customer who has never ordered anything from the 'Books' category.

Hint: one way:

SELECT first_name, last_name
FROM customer
WHERE id NOT IN (
    SELECT o.customer_id
    FROM orders o
    JOIN product p ON p.id = o.product_id
    WHERE p.category = 'Books'
);

Think about what order this query runs in: the subquery is evaluated first (just like a FROM/WHERE block), then its result is used by the outer WHERE.


Exercise 13 — Month-by-month revenue for 2023

Return each month of 2023 with the total revenue for that month. Columns: month (as a date truncated to month) and revenue. Sort by month.

Hint: DATE_TRUNC('month', order_date) in PostgreSQL.


Exercise 14 — Customers with more than 3 orders

Return the first_name, last_name, and order count of every customer who has placed more than 3 orders.

Hint: JOIN, GROUP BY, HAVING COUNT(*) > 3. Then reason: which part of the query runs where?


Exercise 15 — Mental execution trace

Without running it, describe — in order — what each clause of the following query does:

SELECT c.country, SUM(o.total) AS revenue
FROM customer c
JOIN orders o ON o.customer_id = c.id
WHERE o.order_date >= '2023-01-01'
GROUP BY c.country
HAVING SUM(o.total) > 1000
ORDER BY revenue DESC
LIMIT 3;

Write your answer as a 7-step trace: FROM, WHERE, GROUP BY, HAVING, SELECT, ORDER BY, LIMIT. Then run the query to check.


Bonus challenges

Bonus 1 — Why does SELECT DISTINCT appear between SELECT and ORDER BY?

Read the official Postgres docs section on logical query processing, then explain in one paragraph where DISTINCT fits into the order.

Bonus 2 — Window function bonus

Return each order with a running total of revenue per customer, ordered by order_date. Use SUM(total) OVER (PARTITION BY customer_id ORDER BY order_date). Window functions run at roughly the same stage as SELECT — after GROUP BY and HAVING, before ORDER BY.

Bonus 3 — Rewrite with a CTE

Rewrite exercise 8 (top 5 US/UK customers) using a WITH (common table expression) to name the intermediate "customer totals" step. Does it make the query easier to read?


Cleanup

DROP SCHEMA sqlf_order CASCADE;