|
| 1 | +# Mastering PostgreSQL with Common Table Expressions (CTEs) |
| 2 | + |
| 3 | +If you find yourself writing long, nested subqueries that are difficult to read and debug, then **Common Table Expressions (CTEs)** are the feature you need. A CTE allows you to create a temporary, named result set that you can reference within a `SELECT`, `INSERT`, `UPDATE`, or `DELETE` statement. |
| 4 | + |
| 5 | +Think of a CTE as a temporary "view" or a variable that holds the result of a query, available only for the duration of the single statement it belongs to. |
| 6 | + |
| 7 | +## Why Use CTEs? |
| 8 | + |
| 9 | +1. **Readability**: They break down complex queries into logical, sequential steps, making your SQL much easier to read and understand. |
| 10 | +2. **Reusability**: You can reference the same CTE multiple times within a single query, avoiding the need to write the same subquery over and over. |
| 11 | +3. **Recursion**: CTEs are the only way to write recursive queries in standard SQL, which is essential for working with hierarchical data like organizational charts or bill of materials. |
| 12 | + |
| 13 | +## The Core Syntax: `WITH` Clause |
| 14 | + |
| 15 | +A CTE is defined using the `WITH` keyword at the beginning of your query. |
| 16 | + |
| 17 | +```sql |
| 18 | +WITH cte_name AS ( |
| 19 | + -- This is the CTE query definition |
| 20 | + SELECT column1, column2 FROM some_table WHERE condition |
| 21 | +) |
| 22 | +-- This is the main query that uses the CTE |
| 23 | +SELECT * |
| 24 | +FROM cte_name; |
| 25 | +``` |
| 26 | + |
| 27 | +You can also chain multiple CTEs together. |
| 28 | + |
| 29 | +```sql |
| 30 | +WITH |
| 31 | + cte_1 AS ( |
| 32 | + SELECT ... |
| 33 | + ), |
| 34 | + cte_2 AS ( |
| 35 | + -- This CTE can even reference the one before it! |
| 36 | + SELECT ... FROM cte_1 |
| 37 | + ) |
| 38 | +SELECT * |
| 39 | +FROM cte_2; |
| 40 | +``` |
| 41 | + |
| 42 | +--- |
| 43 | + |
| 44 | +## Let's See Some Examples |
| 45 | + |
| 46 | +First, let's use our `employees` table from the previous guide and add a `departments` table for more interesting scenarios. |
| 47 | + |
| 48 | +```sql |
| 49 | +-- (You can skip this if you already have the employees table) |
| 50 | +CREATE TABLE employees ( |
| 51 | + id SERIAL PRIMARY KEY, |
| 52 | + name VARCHAR(100), |
| 53 | + department VARCHAR(50), |
| 54 | + salary NUMERIC(10, 2), |
| 55 | + manager_id INT |
| 56 | +); |
| 57 | + |
| 58 | +INSERT INTO employees (name, department, salary, manager_id) VALUES |
| 59 | +('Alice', 'Engineering', 90000.00, 3), |
| 60 | +('Bob', 'Engineering', 80000.00, 3), |
| 61 | +('Charlie', 'Engineering', 105000.00, 7), |
| 62 | +('Diana', 'HR', 65000.00, 7), |
| 63 | +('Eve', 'HR', 60000.00, 4), |
| 64 | +('Frank', 'Sales', 75000.00, 8), |
| 65 | +('Grace', 'Sales', 80000.00, 8), |
| 66 | +('Heidi', 'Management', 120000.00, NULL); -- The CEO |
| 67 | + |
| 68 | + |
| 69 | +-- Let's add manager info to our employees table |
| 70 | +UPDATE employees SET manager_id = 8 WHERE name IN ('Alice', 'Bob'); |
| 71 | +UPDATE employees SET manager_id = 8 WHERE name = 'Charlie'; |
| 72 | +UPDATE employees SET manager_id = 8 WHERE name IN ('Diana', 'Eve'); |
| 73 | +-- Let's define a proper management structure |
| 74 | +-- Heidi (8) is CEO |
| 75 | +-- Charlie (3) reports to Heidi |
| 76 | +-- Diana (4) reports to Heidi |
| 77 | +-- Alice (1) and Bob (2) report to Charlie |
| 78 | +-- Eve (5) reports to Diana |
| 79 | +-- Frank (6) and Grace (7) report to Charlie |
| 80 | +TRUNCATE TABLE employees; -- Start fresh for clarity |
| 81 | +INSERT INTO employees (id, name, department, salary, manager_id) VALUES |
| 82 | +(1, 'Alice', 'Engineering', 90000.00, 3), |
| 83 | +(2, 'Bob', 'Engineering', 80000.00, 3), |
| 84 | +(3, 'Charlie', 'Engineering', 105000.00, 8), |
| 85 | +(4, 'Diana', 'HR', 75000.00, 8), |
| 86 | +(5, 'Eve', 'HR', 60000.00, 4), |
| 87 | +(6, 'Frank', 'Sales', 75000.00, 3), |
| 88 | +(7, 'Grace', 'Sales', 80000.00, 3), |
| 89 | +(8, 'Heidi', 'Management', 120000.00, NULL); |
| 90 | + |
| 91 | +``` |
| 92 | + |
| 93 | +### Example 1: Improving Readability (Replacing a Subquery) |
| 94 | + |
| 95 | +**Goal**: Find all employees in the 'Engineering' department who earn more than the department's average salary. |
| 96 | + |
| 97 | +**The old way (with a subquery):** |
| 98 | + |
| 99 | +```sql |
| 100 | +SELECT |
| 101 | + name, |
| 102 | + salary |
| 103 | +FROM employees |
| 104 | +WHERE |
| 105 | + department = 'Engineering' AND |
| 106 | + salary > (SELECT AVG(salary) FROM employees WHERE department = 'Engineering'); |
| 107 | +``` |
| 108 | +This works, but the subquery is nested and evaluated separately. |
| 109 | + |
| 110 | +**The cleaner way (with a CTE):** |
| 111 | + |
| 112 | +```sql |
| 113 | +WITH engineering_stats AS ( |
| 114 | + SELECT AVG(salary) as avg_salary |
| 115 | + FROM employees |
| 116 | + WHERE department = 'Engineering' |
| 117 | +) |
| 118 | +SELECT |
| 119 | + e.name, |
| 120 | + e.salary |
| 121 | +FROM |
| 122 | + employees e, |
| 123 | + engineering_stats es |
| 124 | +WHERE |
| 125 | + e.department = 'Engineering' AND |
| 126 | + e.salary > es.avg_salary; |
| 127 | +``` |
| 128 | + |
| 129 | +**Result:** |
| 130 | + |
| 131 | +| name | salary | |
| 132 | +|---------|-----------| |
| 133 | +| Charlie | 105000.00 | |
| 134 | + |
| 135 | +The CTE version separates the logic into two clear steps: first, calculate the average salary for the department; second, use that result to filter the employees. |
| 136 | + |
| 137 | +### Example 2: Chaining Multiple CTEs |
| 138 | + |
| 139 | +**Goal**: Find all 'Sales' employees and show their salary alongside the department's max salary and the total number of employees in that department. |
| 140 | + |
| 141 | +```sql |
| 142 | +WITH sales_employees AS ( |
| 143 | + -- Step 1: Get all employees in the Sales department |
| 144 | + SELECT |
| 145 | + name, |
| 146 | + salary |
| 147 | + FROM employees |
| 148 | + WHERE department = 'Sales' |
| 149 | +), |
| 150 | +sales_stats AS ( |
| 151 | + -- Step 2: Calculate stats using the first CTE |
| 152 | + SELECT |
| 153 | + MAX(salary) AS max_salary, |
| 154 | + COUNT(*) AS num_employees |
| 155 | + FROM sales_employees |
| 156 | +) |
| 157 | +-- Step 3: Combine the results |
| 158 | +SELECT |
| 159 | + se.name, |
| 160 | + se.salary, |
| 161 | + ss.max_salary, |
| 162 | + ss.num_employees |
| 163 | +FROM |
| 164 | + sales_employees se, |
| 165 | + sales_stats ss; |
| 166 | +``` |
| 167 | + |
| 168 | +**Result:** |
| 169 | + |
| 170 | +| name | salary | max_salary | num_employees | |
| 171 | +|-------|----------|------------|---------------| |
| 172 | +| Frank | 75000.00 | 80000.00 | 2 | |
| 173 | +| Grace | 80000.00 | 80000.00 | 2 | |
| 174 | + |
| 175 | +This demonstrates how you can build a logical pipeline where each step is a simple, easy-to-understand CTE. |
| 176 | + |
| 177 | +### Example 3: The Power of Recursion |
| 178 | + |
| 179 | +This is the most powerful feature of CTEs. We can use them to query hierarchical data, like an organizational chart. |
| 180 | + |
| 181 | +**Goal**: Find all employees who report to Charlie (ID 3), directly or indirectly. |
| 182 | + |
| 183 | +We need a `RECURSIVE` CTE. It has two parts: |
| 184 | +1. **Base Case**: The starting point of the recursion (Charlie's direct reports). |
| 185 | +2. **Recursive Step**: The part that joins back to the CTE to find the next level of the hierarchy (the reports of the reports). |
| 186 | + |
| 187 | +```sql |
| 188 | +WITH RECURSIVE subordinates AS ( |
| 189 | + -- 1. Base Case: Find the direct reports of Charlie (manager_id = 3) |
| 190 | + SELECT |
| 191 | + id, |
| 192 | + name, |
| 193 | + manager_id, |
| 194 | + 1 as level -- Start at level 1 |
| 195 | + FROM employees |
| 196 | + WHERE manager_id = 3 |
| 197 | + |
| 198 | + UNION ALL |
| 199 | + |
| 200 | + -- 2. Recursive Step: Join employees with the CTE to find the next level |
| 201 | + SELECT |
| 202 | + e.id, |
| 203 | + e.name, |
| 204 | + e.manager_id, |
| 205 | + s.level + 1 |
| 206 | + FROM employees e |
| 207 | + INNER JOIN subordinates s ON e.manager_id = s.id -- Key recursive join |
| 208 | +) |
| 209 | +SELECT |
| 210 | + name, |
| 211 | + level |
| 212 | +FROM subordinates; |
| 213 | +``` |
| 214 | + |
| 215 | +**Result:** |
| 216 | + |
| 217 | +| name | level | |
| 218 | +|-------|-------| |
| 219 | +| Alice | 1 | |
| 220 | +| Bob | 1 | |
| 221 | +| Frank | 1 | |
| 222 | +| Grace | 1 | |
| 223 | + |
| 224 | +In this structure, Charlie has no one reporting to his direct reports, so the recursion stops after one level. If Alice (ID 1) managed someone, they would appear at `level` 2. |
| 225 | + |
| 226 | +## Conclusion |
| 227 | + |
| 228 | +Common Table Expressions are a fundamental tool for writing modern, maintainable SQL. They turn messy, nested subqueries into clean, logical workflows. |
| 229 | + |
| 230 | +* Use them to **simplify complex queries**. |
| 231 | +* Use them to **avoid repeating yourself**. |
| 232 | +* Use `WITH RECURSIVE` to **tackle hierarchical data** with ease. |
| 233 | + |
| 234 | +Once you get comfortable with CTEs, you'll find it hard to go back to writing complex queries any other way. |
| 235 | + |
| 236 | +For more details, see the official [PostgreSQL documentation on `WITH` Queries (CTEs)](https://www.postgresql.org/docs/current/queries-with.html). |
0 commit comments