You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(sql): HAVING — post-aggregation filter (SQLR-52) (#161)
WHERE filters rows before grouping; HAVING filters groups after
aggregation. Closes the Phase 9e aggregates story.
Parser (src/sql/parser/select.rs):
- SelectQuery grows `having: Option<Expr>`, passed through raw from
sqlparser like WHERE. parse_aggregate_call / AggregateFn::from_name
exposed pub(crate) for the executor's HAVING lowering.
- HAVING without GROUP BY rejected with a typed NotImplemented (the
degenerate single-group form SQLite allows isn't worth the executor
branch in v0). HAVING + JOIN stays covered by the existing
GROUP-BY-over-JOIN rejection (SQLR-6 is the follow-up).
Executor (src/sql/executor.rs):
- lower_having_expr rewrites aggregate calls in the HAVING tree to
identifiers naming their output slot (SUM(salary) → "SUM(salary)"),
registering hidden trailing projection slots for aggregates and
GROUP BY keys referenced only in HAVING so aggregate_rows computes
them alongside the visible ones.
- New GroupRowScope resolves those identifiers against the group's
output row through the shared expression evaluator — comparisons,
AND/OR/NOT, arithmetic, IS NULL, LIKE, IN all work, with the same
NULL-as-false collapse WHERE applies (design-decisions §13).
- filter_groups_by_having runs after aggregation, before DISTINCT /
ORDER BY / LIMIT; hidden slots are stripped after filtering.
Tests: +13 executor tests (612 → 625 in the engine suite): COUNT/SUM
thresholds, aggregate alias, aggregate-only-in-HAVING, group-key-only-
in-HAVING, compound AND, ORDER BY/LIMIT composition, all-groups-
excluded, NULL-aggregate collapse, lowercase call form, no-GROUP-BY
rejection, out-of-scope column rejection, all four JOIN flavors
rejected cleanly.
Docs: supported-sql.md (HAVING semantics section + syntax block),
sql-engine.md aggregation pipeline, roadmap.md shipped entry, README,
design-decisions §13 wording, web docs page / sql-ref / roadmap.
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: docs/design-decisions.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -398,7 +398,7 @@ A hidden primary that the registry itself owns sidesteps both problems: every `*
398
398
399
399
**Decision.** In [`eval_predicate`](../src/sql/executor.rs), a `WHERE` expression evaluating to `NULL` is treated as `false` — the row does *not* match.
400
400
401
-
**Why.** Matches SQL's three-valued logic in spirit: `NULL` propagates through comparisons, and a `WHERE` requires a definitely-true predicate. Doing strict 3VL would mean threading an explicit `Option<bool>` / "unknown" state through the evaluator. For a query surface that doesn't have `HAVING` or aggregate post-filters, implicit coercion to `false` at the `WHERE` boundary is equivalent for every statement we execute.
401
+
**Why.** Matches SQL's three-valued logic in spirit: `NULL` propagates through comparisons, and a `WHERE` requires a definitely-true predicate. Doing strict 3VL would mean threading an explicit `Option<bool>` / "unknown" state through the evaluator. Implicit coercion to `false` at the filter boundary is equivalent for every statement we execute; `HAVING` (SQLR-52) reuses the same collapse — a group whose predicate evaluates to `NULL` is dropped.
402
402
403
403
**Cost.** Diverges subtly from strict SQL on edge cases involving `NULL` through `NOT` / `AND` / `OR`. If this matters later, the evaluator can be upgraded to 3VL without touching callers.
-~~`HAVING` (post-aggregation filter)~~ ✅ Shipped (SQLR-52) — group-row filter after aggregation; references GROUP BY keys, aggregate aliases, and direct aggregate calls (hidden-slot computation for HAVING-only aggregates). `HAVING` without `GROUP BY` stays rejected in v0.
738
738
-`CASE WHEN … THEN … END`, `BETWEEN`, `GLOB`, `REGEXP`, `LIKE … ESCAPE '<char>'`
739
739
- Aggregates / `GROUP BY` / `DISTINCT`*over* joins (needs a single executor pass that knows about multiple input streams)
-**`WHERE`**: any [expression](#expressions). Evaluated per row; NULL-as-false in WHERE context (three-valued logic collapsed to two-valued for filtering). Includes **`IS NULL`** / **`IS NOT NULL`** for explicit null tests, **`LIKE` / `NOT LIKE` / `ILIKE`** for pattern matching, and **`IN (list) / NOT IN (list)`** for set-membership against literal lists.
205
206
-**`DISTINCT`**: `SELECT DISTINCT` deduplicates result rows after projection (and after aggregation, when both apply). `NULL` values compare equal to other `NULL`s for dedupe, matching SQL's DISTINCT semantic.
206
207
-**`GROUP BY`**: one or more bare column names. Every non-aggregate item in the projection must appear in the `GROUP BY` list (the parser rejects the violation with a clear message). `GROUP BY <col>` without any aggregate behaves like an implicit `DISTINCT <col>`.
208
+
-**`HAVING`** (SQLR-52): post-aggregation filter over the grouped output. `WHERE` filters rows before grouping; `HAVING` filters groups after aggregation. Requires `GROUP BY` (see [HAVING semantics](#having-semantics-sqlr-52)).
207
209
-**Aggregates** (SQLR-3): `COUNT(*)`, `COUNT(col)`, `COUNT(DISTINCT col)`, `SUM(col)`, `AVG(col)`, `MIN(col)`, `MAX(col)`. `SUM` over an integer column stays `INTEGER` until a `REAL` input arrives or the running sum overflows `i64` (one-time promotion to `REAL`). `AVG` always returns `REAL` (or `NULL` on empty / all-NULL groups). `MIN` / `MAX` skip NULLs and use the same total order as `ORDER BY`. Aggregates over an empty table or empty group return `0` for `COUNT(*)` / `COUNT(col)` and `NULL` for the rest.
208
210
-**`ORDER BY`**: single sort key, `ASC` (default) or `DESC`. For non-aggregating queries the key is any expression — including function calls — so KNN queries like `ORDER BY vec_distance_l2(embedding, [...]) LIMIT k` work end-to-end *(Phase 7b)*. For aggregating queries the key resolves against the *output* row by name: a bare identifier matches an alias or a `GROUP BY` column, and a function call like `COUNT(*)` matches an aggregate projection by its canonical display form. Sort key types must match across rows.
209
211
-**`LIMIT`**: non-negative integer literal. `LIMIT 0` is valid (returns zero rows). When `DISTINCT` is in play, `LIMIT` is applied after deduplication so it counts unique rows.
@@ -260,12 +262,28 @@ The executor includes a tiny optimizer: if the `WHERE` is exactly `<indexed_col>
260
262
- Three-valued logic: if the LHS is `NULL`, the result is `NULL`; if the RHS list contains a `NULL` and no other entry matches, the result is `NULL`. In a `WHERE` both cases collapse to "row excluded", matching SQLite.
261
263
-`IN (subquery)`, `IN UNNEST(...)`, and `BETWEEN` are not supported yet.
262
264
265
+
### `HAVING` semantics (SQLR-52)
266
+
267
+
- Post-aggregation filter: groups whose `HAVING` expression evaluates to false or `NULL` are dropped (NULL-as-false, the same three-valued-logic collapse `WHERE` applies).
268
+
-**Requires `GROUP BY`.** The degenerate no-`GROUP-BY` single-group form SQLite allows is rejected with a clear `NotImplemented` — use `WHERE` for row-level filters.
269
+
-**What's in scope:** the `GROUP BY` key columns (their per-group values), aggregate output columns by alias (`SUM(salary) AS total … HAVING total > 100`), and aggregate calls written out directly (`HAVING COUNT(*) > 1`, matched case-insensitively by canonical display form).
270
+
- Aggregates and `GROUP BY` keys referenced **only** in `HAVING` work too — `SELECT dept FROM emp GROUP BY dept HAVING COUNT(*) > 1` computes the count without projecting it.
271
+
- Any other column reference is an error (matches SQLite: `HAVING` sees the grouped output, not the raw rows).
272
+
- The expression surface is the same as `WHERE`: comparisons, `AND` / `OR` / `NOT`, arithmetic, `IS [NOT] NULL`, `LIKE`, `IN (list)`.
273
+
- Runs after `WHERE` + aggregation, before `DISTINCT`, `ORDER BY`, and `LIMIT`.
274
+
275
+
```sql
276
+
SELECT dept, COUNT(*) FROM emp GROUP BY dept HAVINGCOUNT(*) >1;
277
+
SELECT dept, SUM(salary) AS total FROM emp GROUP BY dept HAVING total >100000;
278
+
SELECT dept FROM emp GROUP BY dept HAVINGCOUNT(*) >1ANDSUM(salary) >100;
279
+
```
280
+
263
281
### What doesn't work
264
282
265
283
-**Comma-separated FROM lists** (`FROM a, b`) — use an explicit `JOIN` / `CROSS JOIN`. `INNER` / `LEFT` / `RIGHT` / `FULL OUTER` / `CROSS` with `ON` / `USING` / `NATURAL` are all supported (see [JOIN semantics](#join-semantics-sqlr-5))
266
284
-**Aggregates** / **`GROUP BY`** / **`DISTINCT`** over a JOIN — pipe through a subquery once subqueries land
267
285
-**Subqueries**, CTEs (`WITH`), views
268
-
-**`HAVING`** — pre-aggregation `WHERE` works; post-aggregation filtering does not yet
286
+
-**`HAVING` without `GROUP BY`** — the degenerate single-group form is rejected; `HAVING` with `GROUP BY` works (see [HAVING semantics](#having-semantics-sqlr-52))
269
287
-**`DISTINCT`** on `SUM` / `AVG` / `MIN` / `MAX` (only `COUNT(DISTINCT col)` is supported)
270
288
-**`GROUP BY` on expressions** — bare column names only in v1
0 commit comments