Skip to content

Commit 0744c0a

Browse files
MagicalTuxclaude
andcommitted
docs: trim completed B9 items from the roadmap
The B9 cluster's byte-exact slices (B9c–B9g, the B9d subset, the B9a EQP nodes) shipped this session; condense them into a one-paragraph "done" summary and keep only the genuinely-open items in "Remaining": B9a-seek (positive IN-on-indexed executor seek) and B9b (window-function EQP). Move the confirmed cost-model deferrals (B9h, B9j) into "Blocked / deferred by design" alongside B1b/B4, and update the §7 suggested order. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent bc4a281 commit 0744c0a

1 file changed

Lines changed: 52 additions & 132 deletions

File tree

ROADMAP.md

Lines changed: 52 additions & 132 deletions
Original file line numberDiff line numberDiff line change
@@ -1164,134 +1164,54 @@ gated on VDBE-vs-tree-walker parity, so it can't regress correctness):
11641164
- **B1c — RIGHT/FULL join inner seeks.** INNER/LEFT already seek; RIGHT/FULL still
11651165
materialize the inner table.
11661166

1167-
**Remaining — `EXPLAIN QUERY PLAN` fidelity & single-table access paths.** The
1168-
2026-06/07 differential sweep drove the derived/CTE/view flatten, CO-ROUTINE body,
1169-
scalar-subquery-node, `NOT INDEXED`, `IS`/parenthesized-column, and
1170-
`COLLATE`-mismatch work to completion (see Track A's EQP paragraph). What it left
1171-
open, ordered roughly cosmetic-EQP → executor-touching → cost-model. Each is gated on
1172-
plan **and** row differential parity vs the pinned `sqlite3 3.50.4`; the executor
1173-
already returns correct rows for every one (B9i differs only in an SQL-unspecified
1174-
tie/representative order), so they are perf/EQP-fidelity work, not correctness:
1175-
1176-
- **B9a — `IN (SELECT …)``LIST SUBQUERY` + bloom filter.****Partly done (EQP).**
1177-
A single non-correlated `[NOT] IN (SELECT …)` now renders `LIST SUBQUERY 1` (child =
1178-
the body's plan, then a `CREATE BLOOM FILTER` child) after the access, for the
1179-
*provably byte-exact* subset: graphite's access is a bare `SCAN` (so no seek to
1180-
diverge from SQLite's cost-model outer choice — an `… AND c=?` that graphite seeks
1181-
makes the line a SEARCH and declines, dodging the cases where SQLite scans-plus-bloom
1182-
where graphite would seek), and either the form is `NOT IN` or the IN column is not
1183-
seekable. This covers the common `NOT IN (SELECT …)`. *Still open (**B9a-seek**):* a
1184-
positive `IN` on an **indexed / rowid** column, which SQLite serves with a
1185-
per-candidate `SEARCH t (col=?)` — graphite folds the `IN` in the tree-walker and
1186-
scans, so it declines (no wrong node emitted). That needs the executor to evaluate
1187-
the non-correlated subquery to a value list and seek per value (the `find_in_constraint`
1188-
+ `try_index_in` path already seeks a literal `IN` list), plus the outer-access EQP;
1189-
and a compound / correlated body (`CORRELATED LIST SUBQUERY`, shared-id bump) stays
1190-
deferred.
1191-
- **B9b — window-function EQP.** `… OVER (…)` renders `CO-ROUTINE (subquery-N)` over
1192-
the windowed input; the `(subquery-N)` label is codegen-order-fragile (see the
1193-
`schema-sql-canonicalization` note), so this needs a deterministic-numbering model
1194-
before it can be byte-exact. Rows already correct.
1195-
- **B9c — remaining derived/CTE `LIMIT`-body EQP paths. ✅ Done.** The bare-`LIMIT`
1196-
pure-wildcard flatten, the *non-flattenable* co-routine cases (an `OFFSET` body, an
1197-
outer `WHERE`, or an outer aggregate over a `LIMIT` body → `CO-ROUTINE <name>` + the
1198-
outer `{SCAN|SEARCH} <name>`), AND the *flatten* variants are all done now: a
1199-
**narrower projection** over a bare-`LIMIT` body with no outer `WHERE` substitutes the
1200-
outer projection into the body (`SELECT a FROM (SELECT * FROM t LIMIT 5)` → `SCAN t
1201-
USING COVERING INDEX`), and a **single-term outer `ORDER BY`** pushes the projection +
1202-
ORDER BY into the body and recurses (`(… LIMIT 5) ORDER BY b` → `SCAN t USING INDEX
1203-
tb` / a full temp-b-tree if unindexed). *Residual:* a **multi-term** outer `ORDER BY`
1204-
over a `LIMIT` body still declines — SQLite full-sorts the materialized `LIMIT` rows,
1205-
whereas pushing the ORDER BY down would render a partial `LAST TERM` index walk when
1206-
the leading prefix is indexed. Boundary rules in the `eqp-derived-coroutine` memory.
1207-
- **B9d — `SEARCH` + `GROUP BY`/`DISTINCT` temp-b-tree node. ◐ Unambiguous subset
1208-
done.** The grouping b-tree now also materializes over a **rowid *range* seek**
1209-
(`WHERE a>? GROUP BY c` → `SEARCH t USING INTEGER PRIMARY KEY (rowid>?)#USE TEMP
1210-
B-TREE FOR GROUP BY`, and the `DISTINCT` analogue) — the rowid is the table's own
1211-
clustered key, so there is no secondary-index *choice* and thus no cost-model
1212-
divergence; the seek returns rows in rowid order, never the group/distinct-key
1213-
order, so the b-tree is always needed. Gated in `eqp_select` on a
1214-
`SEARCH … INTEGER PRIMARY KEY (rowid>?/<?)` access line (a rowid *equality* seek is a
1215-
single row → excluded), reusing `group_distinct_btree`'s existing "a secondary index
1216-
leads the first key column → decline" guard. *Still open (folded into B9h):* the
1217-
same node under a **secondary-index** seek (`WHERE b=? GROUP BY c`), where SQLite may
1218-
pick a *different* composite index `(b,c)` whose walk serves the grouping — a
1219-
cost-model index-choice decision.
1220-
- **B9e — `col = (scalar subquery)` seek. ✅ Done (SELECT).** `WHERE b = (SELECT …)`
1221-
(and `>`/`<`/etc.) against a non-correlated scalar subquery now seeks — the executor
1222-
folds the subquery to its value before the seek (`scan_source` single-table fast path),
1223-
and `eqp_access` recognizes the shape *structurally* (a placeholder-literal rewrite,
1224-
`placeholder_fold_seek_where`) so `EXPLAIN` renders the `SEARCH` without running the
1225-
subquery — matching SQLite, which plans the seek without evaluating it (so even
1226-
`b=(SELECT 1/0)` plans a `SEARCH`; the query still errors at execution as in SQLite).
1227-
Secondary index + INTEGER PRIMARY KEY, equality + range. Superset-safe. *Residuals:*
1228-
a **bare-column** subquery (`(SELECT x FROM u)`) does not fold (dropping its affinity
1229-
would be unsound), so it stays a SCAN (rows correct); and a **DELETE/UPDATE** with a
1230-
subquery `WHERE` stays a SCAN (SQLite renders a two-pass `USING COVERING INDEX` the
1231-
`sel`-less `eqp_access` can't reproduce). A **correlated** body / `EXISTS` /
1232-
`IN (SELECT)` correctly do not seek.
1233-
- **B9f — `GLOB 'prefix*'` prefix-range seek. ✅ Done.** A fixed-prefix `GLOB`
1234-
(always case-sensitive / byte-based) now seeks `col >= 'prefix' AND col < 'prefix⁺'`
1235-
on a BINARY index and reads `SEARCH … (b>? AND b<?)`. Implemented as a `BinaryOp::Glob`
1236-
arm in the shared `collect_range_constraints` (so the executor range seek and
1237-
`eqp_access` move in lockstep), gated on the column's collation being BINARY; the
1238-
`glob_prefix_range` helper extracts the literal prefix (up to the first `*`/`?`/`[`)
1239-
and increments the last byte `< 0xFF` (dropping trailing `0xFF`; a non-UTF-8
1240-
increment drops the upper bound → still a valid superset). A leading wildcard scans;
1241-
the full GLOB is re-applied so results are exact.
1242-
- **B9g — eq-prefix + trailing rowid range on a secondary index. ✅ Done.**
1243-
`WHERE b=? AND a>?` (a the IPK) *and* the bare `rowid`/`_rowid_`/`oid` alias spelling
1244-
now seek and render `SEARCH … USING INDEX ib (b=? AND rowid>?)`, bounding the
1245-
`(b, rowid)` index range directly — the rowid is the index's implicit trailing key.
1246-
Extended the existing eq-prefix + next-column range seek (executor `try_index_lookup`
1247-
+ `eqp_access`, in lockstep) with a `next_pos == idx_cols.len()` rowid-tail block,
1248-
and added a `rowid_alias_range` collector for the alias spelling.
1249-
- **B9h — cost-model single-table index *choice*.** SQLite prefers, among indexes
1250-
that share an equality prefix, the one whose walk does the most work: a composite
1251-
`(b,c)` over `(b)` when a trailing range (`b=? AND c>?`) or a `GROUP BY`/`ORDER BY c`
1252-
can ride the same walk; a *covering* index over a narrower one; the smallest
1253-
covering index for `count(*)`; a covering index for an `IN` list. graphite picks by
1254-
longest-equality-prefix only. It also decides *whether* a no-WHERE query uses a
1255-
covering scan at all (SQLite: narrow index beats a wide-row table scan, plain scan
1256-
beats it on a 2-column table) — so the covering-scan row-order parity (formerly B9i)
1257-
rides here too, as does the **secondary-index** `SEARCH` + `GROUP BY`/`DISTINCT`
1258-
b-tree left open by B9d. This changes the chosen access path, so it risks regressing
1259-
the EQP corpus and must be rolled out shape-by-shape with the full differential suite
1260-
— the single-table analogue of B1b. **Confirmed deferred by design (2026-07-04):** the
1261-
pinned `sqlite3 3.50.4` oracle has no stat4, so its choices depend on row-width /
1262-
index-width / index-count heuristics graphite can't reproduce without diverging the
1263-
EQP corpus — same class as B1b/B4. Needs a stat4-enabled oracle to become tractable.
1264-
- **B9i — covering-scan no-`ORDER BY` row order → subsumed by B9h (investigated
1265-
2026-07-04, nothing to fix in isolation).** The original premise was wrong: graphite's
1266-
covering read is *already* in index-walk order, and whenever graphite and SQLite pick
1267-
the **same** covering index the rows already match (verified: `SELECT DISTINCT b`,
1268-
`GROUP BY b`, `DISTINCT b COLLATE NOCASE`, `DISTINCT b,c`). Every remaining
1269-
no-`ORDER BY` divergence is a *different access-path choice* — SQLite uses a covering
1270-
index (and its walk order) exactly where its cost model says the narrow index beats a
1271-
full-row table scan (`SELECT b`/`count(*)`/`DISTINCT rowid` over a wide table → covering;
1272-
over a narrow 2-column table → plain `SCAN t`), and picks the smallest of several
1273-
covering indexes. graphite either over-selects a single covering index or stands down
1274-
with two, so the plan **and** the emitted order differ. Reproducing it is pure cost
1275-
modelling (row width, index width, number of indexes) — the same B9h/B4 problem, so
1276-
the row-order parity rides on B9h, not a separate execution-order change.
1277-
- **B9j — collation-aware index *selection* for a non-default-collation index.
1278-
Deferred (entangled, rows already correct).** `collect_eq_constraints` /
1279-
`collect_range_constraints` compare an explicit `COLLATE` to the *column's*
1280-
collation. When an index carries a *non-default* collation (`CREATE INDEX ib ON
1281-
t(b COLLATE NOCASE)` on a BINARY column), graphite is wrong in **both** directions vs
1282-
sqlite: `b='x' COLLATE NOCASE` should seek `ib` (graphite scans), and `b='x'` (BINARY
1283-
comparison) should *not* seek the NOCASE `ib` (graphite over-seeks it — rows still
1284-
correct via the WHERE re-apply, only EQP/perf). The correct model — an index serves a
1285-
comparison iff its per-column collation equals the comparison's *effective* collation
1286-
(explicit `COLLATE`, else the column's) — must be threaded into the index *selection*
1287-
at every one of the ~9 `collect_eq_constraints` call sites (the seek fast paths,
1288-
`eqp_access`, and `seek_order_prefix`'s ORDER-BY credit) in lockstep. The current
1289-
column-collation gate in the collector is itself an earlier ORDER-BY-ordering
1290-
correctness fix, so moving the check risks the extensive collation/seek/order suite
1291-
for a niche pattern with rows already correct. Deferred; a careful cross-cutting
1292-
refactor, not a quick slice.
1167+
**`EXPLAIN QUERY PLAN` fidelity & single-table access paths.** The 2026-06/07
1168+
differential sweep and the **B9** cluster (2026-07) closed the byte-exact slices:
1169+
derived/CTE/view flatten and every CO-ROUTINE-body shape, the scalar-subquery and
1170+
`IN`-subquery plan nodes, `NOT INDEXED`, `IS`/parenthesized-column and
1171+
`COLLATE`-mismatch seeks, `col = (scalar subquery)` seek (**B9e**), `GLOB 'prefix*'`
1172+
range seek (**B9f**), eq-prefix + trailing-rowid range (**B9g**), the whole `LIMIT`-body
1173+
flatten / co-routine taxonomy (**B9c**), and the rowid-range `GROUP BY`/`DISTINCT`
1174+
temp-b-tree node (**B9d** subset). Details live in Track A's EQP paragraph and the
1175+
`eqp-derived-coroutine` / `planner-index-seeks` memories. Still open (rows already
1176+
correct for all of these — perf/EQP-fidelity, not correctness):
1177+
1178+
- **B9a-seek — positive `IN (SELECT …)` on an indexed/rowid column.** The `LIST
1179+
SUBQUERY` + `CREATE BLOOM FILTER` render for a non-correlated `NOT IN` / unindexed `IN`
1180+
is done (**B9a**); a positive `IN` on a *seekable* column, which SQLite serves with a
1181+
per-candidate `SEARCH t (col=?)`, still declines (graphite folds the `IN` in the
1182+
tree-walker and scans — no wrong node). Needs the executor to evaluate the
1183+
non-correlated subquery to a value list and seek per value (`find_in_constraint` /
1184+
`try_index_in` already seek a literal `IN` list), plus the outer-access EQP. A
1185+
compound / correlated body (`CORRELATED LIST SUBQUERY`) stays deferred.
1186+
- **B9b — window-function EQP.** `… OVER (…)` renders `CO-ROUTINE (subquery-N)` over the
1187+
windowed input; the `(subquery-N)` label is codegen-order-fragile, so this needs a
1188+
deterministic-numbering model before it can be byte-exact.
12931189

12941190
**Blocked / deferred by design:**
1191+
- **B9h — cost-model single-table index *choice*.** SQLite prefers, among indexes
1192+
sharing an equality prefix, the one whose walk does the most work (composite `(b,c)`
1193+
over `(b)` for a trailing range / `GROUP BY`/`ORDER BY c`; a *covering* index over a
1194+
narrower one; the smallest covering index for `count(*)`/`IN`), and decides *whether*
1195+
a no-WHERE query covers at all (narrow index vs wide-row table scan). graphite picks
1196+
by longest-equality-prefix only. The covering-scan no-`ORDER BY` row-order parity
1197+
(investigated 2026-07-04 as B9i — graphite already walks index order, so it is *not* an
1198+
execution-order bug) and the secondary-index `SEARCH` + `GROUP BY`/`DISTINCT` b-tree
1199+
(left open by B9d) both ride here. **Deferred by design:** the pinned oracle has no
1200+
stat4, so its choices depend on row-width / index-width / index-count heuristics
1201+
graphite can't reproduce without diverging the EQP corpus — same class as B1b/B4;
1202+
needs a stat4-enabled oracle.
1203+
- **B9j — collation-aware index *selection* for a non-default-collation index.**
1204+
`collect_eq_constraints` / `collect_range_constraints` compare an explicit `COLLATE`
1205+
to the *column's* collation. When an index carries a *non-default* collation
1206+
(`CREATE INDEX ib ON t(b COLLATE NOCASE)` on a BINARY column) graphite is wrong both
1207+
ways vs sqlite (`b='x' COLLATE NOCASE` should seek `ib` but scans; `b='x'` should not
1208+
seek the NOCASE `ib` but over-seeks it — rows still correct via the WHERE re-apply,
1209+
EQP/perf only). The correct model — an index serves a comparison iff its per-column
1210+
collation equals the comparison's *effective* collation — must be threaded into index
1211+
*selection* at all ~9 `collect_eq_constraints` sites in lockstep; the collector's
1212+
current column-collation gate is itself an earlier ORDER-BY-ordering correctness fix,
1213+
so relocating it risks the whole collation/seek/order suite for a niche pattern.
1214+
Deferred; a careful cross-cutting refactor, not a quick slice.
12951215
- **B1b — cost-based join reordering.** graphite's per-cursor seek/bloom-filter
12961216
choices diverge from sqlite's cost-reordered plain scans *by design*; matching
12971217
the EQP would mean abandoning often-cheaper access paths. Results already correct.
@@ -1504,12 +1424,12 @@ reasonable order:
15041424
5. **Track A leftovers** — the `Expr::Column` enrichment (source span + schema
15051425
field) that unblocks both **A-rn3-edge** and the 3-part-qualifier check, plus
15061426
the statement-level prepare pass for the lazy-validation gaps.
1507-
6. **B9a–B9j — `EXPLAIN QUERY PLAN` fidelity & single-table access paths** (Track B).
1508-
Independent, mostly small, differentially-gated slices; the executor already
1509-
returns correct rows, so they are perf/EQP-fidelity, not correctness. Do the
1510-
cosmetic-EQP and executor-touching ones (B9a–B9g) opportunistically; hold the
1511-
cost-model index-choice ones (B9d/B9h) until they can be rolled out shape-by-shape
1512-
without regressing the EQP corpus — same caution as B1b.
1427+
6. **B9a-seek / B9b — the last `EXPLAIN QUERY PLAN` fidelity slices** (Track B). The
1428+
rest of the B9 cluster (B9c–B9g, the B9d subset, the B9a EQP nodes) shipped in
1429+
2026-07; what's left is the positive-`IN`-on-indexed-column executor seek
1430+
(**B9a-seek**) and the fragile-numbering window EQP (**B9b**). The cost-model
1431+
index-choice items (**B9h**, **B9j**) are deferred by design — they need a
1432+
stat4-enabled oracle / a cross-cutting collation refactor (see §4).
15131433

15141434
**Deferred / blocked** (documented in §4): **B1b** join reordering and **B4**
15151435
`sqlite_stat4` (diverge from / unverifiable against the stat1-only oracle);

0 commit comments

Comments
 (0)