@@ -1164,134 +1164,54 @@ gated on VDBE-vs-tree-walker parity, so it can't regress correctness):
11641164- ** B1c — RIGHT/FULL join inner seeks.** INNER/LEFT already seek; RIGHT/FULL still
11651165 materialize the inner table.
11661166
1167- ** Remaining — ` EXPLAIN QUERY PLAN ` fidelity & single-table access paths.** The
1168- 2026-06/07 differential sweep drove the derived/CTE/view flatten, CO-ROUTINE body,
1169- scalar-subquery-node, ` NOT INDEXED ` , ` IS ` /parenthesized-column, and
1170- ` COLLATE ` -mismatch work to completion (see Track A's EQP paragraph). What it left
1171- open, ordered roughly cosmetic-EQP → executor-touching → cost-model. Each is gated on
1172- plan ** and** row differential parity vs the pinned ` sqlite3 3.50.4 ` ; the executor
1173- already returns correct rows for every one (B9i differs only in an SQL-unspecified
1174- tie/representative order), so they are perf/EQP-fidelity work, not correctness:
1175-
1176- - ** B9a — ` IN (SELECT …) ` → ` LIST SUBQUERY ` + bloom filter.** ◐ ** Partly done (EQP).**
1177- A single non-correlated ` [NOT] IN (SELECT …) ` now renders ` LIST SUBQUERY 1 ` (child =
1178- the body's plan, then a ` CREATE BLOOM FILTER ` child) after the access, for the
1179- * provably byte-exact* subset: graphite's access is a bare ` SCAN ` (so no seek to
1180- diverge from SQLite's cost-model outer choice — an ` … AND c=? ` that graphite seeks
1181- makes the line a SEARCH and declines, dodging the cases where SQLite scans-plus-bloom
1182- where graphite would seek), and either the form is ` NOT IN ` or the IN column is not
1183- seekable. This covers the common ` NOT IN (SELECT …) ` . * Still open (** B9a-seek** ):* a
1184- positive ` IN ` on an ** indexed / rowid** column, which SQLite serves with a
1185- per-candidate ` SEARCH t (col=?) ` — graphite folds the ` IN ` in the tree-walker and
1186- scans, so it declines (no wrong node emitted). That needs the executor to evaluate
1187- the non-correlated subquery to a value list and seek per value (the ` find_in_constraint `
1188- + ` try_index_in ` path already seeks a literal ` IN ` list), plus the outer-access EQP;
1189- and a compound / correlated body (` CORRELATED LIST SUBQUERY ` , shared-id bump) stays
1190- deferred.
1191- - ** B9b — window-function EQP.** ` … OVER (…) ` renders ` CO-ROUTINE (subquery-N) ` over
1192- the windowed input; the ` (subquery-N) ` label is codegen-order-fragile (see the
1193- ` schema-sql-canonicalization ` note), so this needs a deterministic-numbering model
1194- before it can be byte-exact. Rows already correct.
1195- - ** B9c — remaining derived/CTE ` LIMIT ` -body EQP paths. ✅ Done.** The bare-` LIMIT `
1196- pure-wildcard flatten, the * non-flattenable* co-routine cases (an ` OFFSET ` body, an
1197- outer ` WHERE ` , or an outer aggregate over a ` LIMIT ` body → ` CO-ROUTINE <name> ` + the
1198- outer ` {SCAN|SEARCH} <name> ` ), AND the * flatten* variants are all done now: a
1199- ** narrower projection** over a bare-` LIMIT ` body with no outer ` WHERE ` substitutes the
1200- outer projection into the body (` SELECT a FROM (SELECT * FROM t LIMIT 5) ` → `SCAN t
1201- USING COVERING INDEX` ), and a **single-term outer ` ORDER BY`** pushes the projection +
1202- ORDER BY into the body and recurses (` (… LIMIT 5) ORDER BY b ` → `SCAN t USING INDEX
1203- tb` / a full temp-b-tree if unindexed). *Residual:* a **multi-term** outer ` ORDER BY`
1204- over a ` LIMIT ` body still declines — SQLite full-sorts the materialized ` LIMIT ` rows,
1205- whereas pushing the ORDER BY down would render a partial ` LAST TERM ` index walk when
1206- the leading prefix is indexed. Boundary rules in the ` eqp-derived-coroutine ` memory.
1207- - ** B9d — ` SEARCH ` + ` GROUP BY ` /` DISTINCT ` temp-b-tree node. ◐ Unambiguous subset
1208- done.** The grouping b-tree now also materializes over a ** rowid * range* seek**
1209- (` WHERE a>? GROUP BY c ` → `SEARCH t USING INTEGER PRIMARY KEY (rowid>?)#USE TEMP
1210- B-TREE FOR GROUP BY` , and the ` DISTINCT` analogue) — the rowid is the table's own
1211- clustered key, so there is no secondary-index * choice* and thus no cost-model
1212- divergence; the seek returns rows in rowid order, never the group/distinct-key
1213- order, so the b-tree is always needed. Gated in ` eqp_select ` on a
1214- ` SEARCH … INTEGER PRIMARY KEY (rowid>?/<?) ` access line (a rowid * equality* seek is a
1215- single row → excluded), reusing ` group_distinct_btree ` 's existing "a secondary index
1216- leads the first key column → decline" guard. * Still open (folded into B9h):* the
1217- same node under a ** secondary-index** seek (` WHERE b=? GROUP BY c ` ), where SQLite may
1218- pick a * different* composite index ` (b,c) ` whose walk serves the grouping — a
1219- cost-model index-choice decision.
1220- - ** B9e — ` col = (scalar subquery) ` seek. ✅ Done (SELECT).** ` WHERE b = (SELECT …) `
1221- (and ` > ` /` < ` /etc.) against a non-correlated scalar subquery now seeks — the executor
1222- folds the subquery to its value before the seek (` scan_source ` single-table fast path),
1223- and ` eqp_access ` recognizes the shape * structurally* (a placeholder-literal rewrite,
1224- ` placeholder_fold_seek_where ` ) so ` EXPLAIN ` renders the ` SEARCH ` without running the
1225- subquery — matching SQLite, which plans the seek without evaluating it (so even
1226- ` b=(SELECT 1/0) ` plans a ` SEARCH ` ; the query still errors at execution as in SQLite).
1227- Secondary index + INTEGER PRIMARY KEY, equality + range. Superset-safe. * Residuals:*
1228- a ** bare-column** subquery (` (SELECT x FROM u) ` ) does not fold (dropping its affinity
1229- would be unsound), so it stays a SCAN (rows correct); and a ** DELETE/UPDATE** with a
1230- subquery ` WHERE ` stays a SCAN (SQLite renders a two-pass ` USING COVERING INDEX ` the
1231- ` sel ` -less ` eqp_access ` can't reproduce). A ** correlated** body / ` EXISTS ` /
1232- ` IN (SELECT) ` correctly do not seek.
1233- - ** B9f — ` GLOB 'prefix*' ` prefix-range seek. ✅ Done.** A fixed-prefix ` GLOB `
1234- (always case-sensitive / byte-based) now seeks ` col >= 'prefix' AND col < 'prefix⁺' `
1235- on a BINARY index and reads ` SEARCH … (b>? AND b<?) ` . Implemented as a ` BinaryOp::Glob `
1236- arm in the shared ` collect_range_constraints ` (so the executor range seek and
1237- ` eqp_access ` move in lockstep), gated on the column's collation being BINARY; the
1238- ` glob_prefix_range ` helper extracts the literal prefix (up to the first ` * ` /` ? ` /` [ ` )
1239- and increments the last byte ` < 0xFF ` (dropping trailing ` 0xFF ` ; a non-UTF-8
1240- increment drops the upper bound → still a valid superset). A leading wildcard scans;
1241- the full GLOB is re-applied so results are exact.
1242- - ** B9g — eq-prefix + trailing rowid range on a secondary index. ✅ Done.**
1243- ` WHERE b=? AND a>? ` (a the IPK) * and* the bare ` rowid ` /` _rowid_ ` /` oid ` alias spelling
1244- now seek and render ` SEARCH … USING INDEX ib (b=? AND rowid>?) ` , bounding the
1245- ` (b, rowid) ` index range directly — the rowid is the index's implicit trailing key.
1246- Extended the existing eq-prefix + next-column range seek (executor ` try_index_lookup `
1247- + ` eqp_access ` , in lockstep) with a ` next_pos == idx_cols.len() ` rowid-tail block,
1248- and added a ` rowid_alias_range ` collector for the alias spelling.
1249- - ** B9h — cost-model single-table index * choice* .** SQLite prefers, among indexes
1250- that share an equality prefix, the one whose walk does the most work: a composite
1251- ` (b,c) ` over ` (b) ` when a trailing range (` b=? AND c>? ` ) or a ` GROUP BY ` /` ORDER BY c `
1252- can ride the same walk; a * covering* index over a narrower one; the smallest
1253- covering index for ` count(*) ` ; a covering index for an ` IN ` list. graphite picks by
1254- longest-equality-prefix only. It also decides * whether* a no-WHERE query uses a
1255- covering scan at all (SQLite: narrow index beats a wide-row table scan, plain scan
1256- beats it on a 2-column table) — so the covering-scan row-order parity (formerly B9i)
1257- rides here too, as does the ** secondary-index** ` SEARCH ` + ` GROUP BY ` /` DISTINCT `
1258- b-tree left open by B9d. This changes the chosen access path, so it risks regressing
1259- the EQP corpus and must be rolled out shape-by-shape with the full differential suite
1260- — the single-table analogue of B1b. ** Confirmed deferred by design (2026-07-04):** the
1261- pinned ` sqlite3 3.50.4 ` oracle has no stat4, so its choices depend on row-width /
1262- index-width / index-count heuristics graphite can't reproduce without diverging the
1263- EQP corpus — same class as B1b/B4. Needs a stat4-enabled oracle to become tractable.
1264- - ** B9i — covering-scan no-` ORDER BY ` row order → subsumed by B9h (investigated
1265- 2026-07-04, nothing to fix in isolation).** The original premise was wrong: graphite's
1266- covering read is * already* in index-walk order, and whenever graphite and SQLite pick
1267- the ** same** covering index the rows already match (verified: ` SELECT DISTINCT b ` ,
1268- ` GROUP BY b ` , ` DISTINCT b COLLATE NOCASE ` , ` DISTINCT b,c ` ). Every remaining
1269- no-` ORDER BY ` divergence is a * different access-path choice* — SQLite uses a covering
1270- index (and its walk order) exactly where its cost model says the narrow index beats a
1271- full-row table scan (` SELECT b ` /` count(*) ` /` DISTINCT rowid ` over a wide table → covering;
1272- over a narrow 2-column table → plain ` SCAN t ` ), and picks the smallest of several
1273- covering indexes. graphite either over-selects a single covering index or stands down
1274- with two, so the plan ** and** the emitted order differ. Reproducing it is pure cost
1275- modelling (row width, index width, number of indexes) — the same B9h/B4 problem, so
1276- the row-order parity rides on B9h, not a separate execution-order change.
1277- - ** B9j — collation-aware index * selection* for a non-default-collation index.
1278- Deferred (entangled, rows already correct).** ` collect_eq_constraints ` /
1279- ` collect_range_constraints ` compare an explicit ` COLLATE ` to the * column's*
1280- collation. When an index carries a * non-default* collation (`CREATE INDEX ib ON
1281- t(b COLLATE NOCASE)` on a BINARY column), graphite is wrong in ** both** directions vs
1282- sqlite: ` b='x' COLLATE NOCASE ` should seek ` ib ` (graphite scans), and ` b='x' ` (BINARY
1283- comparison) should * not* seek the NOCASE ` ib ` (graphite over-seeks it — rows still
1284- correct via the WHERE re-apply, only EQP/perf). The correct model — an index serves a
1285- comparison iff its per-column collation equals the comparison's * effective* collation
1286- (explicit ` COLLATE ` , else the column's) — must be threaded into the index * selection*
1287- at every one of the ~ 9 ` collect_eq_constraints ` call sites (the seek fast paths,
1288- ` eqp_access ` , and ` seek_order_prefix ` 's ORDER-BY credit) in lockstep. The current
1289- column-collation gate in the collector is itself an earlier ORDER-BY-ordering
1290- correctness fix, so moving the check risks the extensive collation/seek/order suite
1291- for a niche pattern with rows already correct. Deferred; a careful cross-cutting
1292- refactor, not a quick slice.
1167+ ** ` EXPLAIN QUERY PLAN ` fidelity & single-table access paths.** The 2026-06/07
1168+ differential sweep and the ** B9** cluster (2026-07) closed the byte-exact slices:
1169+ derived/CTE/view flatten and every CO-ROUTINE-body shape, the scalar-subquery and
1170+ ` IN ` -subquery plan nodes, ` NOT INDEXED ` , ` IS ` /parenthesized-column and
1171+ ` COLLATE ` -mismatch seeks, ` col = (scalar subquery) ` seek (** B9e** ), ` GLOB 'prefix*' `
1172+ range seek (** B9f** ), eq-prefix + trailing-rowid range (** B9g** ), the whole ` LIMIT ` -body
1173+ flatten / co-routine taxonomy (** B9c** ), and the rowid-range ` GROUP BY ` /` DISTINCT `
1174+ temp-b-tree node (** B9d** subset). Details live in Track A's EQP paragraph and the
1175+ ` eqp-derived-coroutine ` / ` planner-index-seeks ` memories. Still open (rows already
1176+ correct for all of these — perf/EQP-fidelity, not correctness):
1177+
1178+ - ** B9a-seek — positive ` IN (SELECT …) ` on an indexed/rowid column.** The `LIST
1179+ SUBQUERY` + ` CREATE BLOOM FILTER` render for a non-correlated ` NOT IN` / unindexed ` IN`
1180+ is done (** B9a** ); a positive ` IN ` on a * seekable* column, which SQLite serves with a
1181+ per-candidate ` SEARCH t (col=?) ` , still declines (graphite folds the ` IN ` in the
1182+ tree-walker and scans — no wrong node). Needs the executor to evaluate the
1183+ non-correlated subquery to a value list and seek per value (` find_in_constraint ` /
1184+ ` try_index_in ` already seek a literal ` IN ` list), plus the outer-access EQP. A
1185+ compound / correlated body (` CORRELATED LIST SUBQUERY ` ) stays deferred.
1186+ - ** B9b — window-function EQP.** ` … OVER (…) ` renders ` CO-ROUTINE (subquery-N) ` over the
1187+ windowed input; the ` (subquery-N) ` label is codegen-order-fragile, so this needs a
1188+ deterministic-numbering model before it can be byte-exact.
12931189
12941190** Blocked / deferred by design:**
1191+ - ** B9h — cost-model single-table index * choice* .** SQLite prefers, among indexes
1192+ sharing an equality prefix, the one whose walk does the most work (composite ` (b,c) `
1193+ over ` (b) ` for a trailing range / ` GROUP BY ` /` ORDER BY c ` ; a * covering* index over a
1194+ narrower one; the smallest covering index for ` count(*) ` /` IN ` ), and decides * whether*
1195+ a no-WHERE query covers at all (narrow index vs wide-row table scan). graphite picks
1196+ by longest-equality-prefix only. The covering-scan no-` ORDER BY ` row-order parity
1197+ (investigated 2026-07-04 as B9i — graphite already walks index order, so it is * not* an
1198+ execution-order bug) and the secondary-index ` SEARCH ` + ` GROUP BY ` /` DISTINCT ` b-tree
1199+ (left open by B9d) both ride here. ** Deferred by design:** the pinned oracle has no
1200+ stat4, so its choices depend on row-width / index-width / index-count heuristics
1201+ graphite can't reproduce without diverging the EQP corpus — same class as B1b/B4;
1202+ needs a stat4-enabled oracle.
1203+ - ** B9j — collation-aware index * selection* for a non-default-collation index.**
1204+ ` collect_eq_constraints ` / ` collect_range_constraints ` compare an explicit ` COLLATE `
1205+ to the * column's* collation. When an index carries a * non-default* collation
1206+ (` CREATE INDEX ib ON t(b COLLATE NOCASE) ` on a BINARY column) graphite is wrong both
1207+ ways vs sqlite (` b='x' COLLATE NOCASE ` should seek ` ib ` but scans; ` b='x' ` should not
1208+ seek the NOCASE ` ib ` but over-seeks it — rows still correct via the WHERE re-apply,
1209+ EQP/perf only). The correct model — an index serves a comparison iff its per-column
1210+ collation equals the comparison's * effective* collation — must be threaded into index
1211+ * selection* at all ~ 9 ` collect_eq_constraints ` sites in lockstep; the collector's
1212+ current column-collation gate is itself an earlier ORDER-BY-ordering correctness fix,
1213+ so relocating it risks the whole collation/seek/order suite for a niche pattern.
1214+ Deferred; a careful cross-cutting refactor, not a quick slice.
12951215- ** B1b — cost-based join reordering.** graphite's per-cursor seek/bloom-filter
12961216 choices diverge from sqlite's cost-reordered plain scans * by design* ; matching
12971217 the EQP would mean abandoning often-cheaper access paths. Results already correct.
@@ -1504,12 +1424,12 @@ reasonable order:
150414245 . ** Track A leftovers** — the ` Expr::Column ` enrichment (source span + schema
15051425 field) that unblocks both ** A-rn3-edge** and the 3-part-qualifier check, plus
15061426 the statement-level prepare pass for the lazy-validation gaps.
1507- 6 . ** B9a–B9j — ` EXPLAIN QUERY PLAN ` fidelity & single-table access paths ** (Track B).
1508- Independent, mostly small, differentially-gated slices; the executor already
1509- returns correct rows, so they are perf/EQP-fidelity, not correctness. Do the
1510- cosmetic-EQP and executor-touching ones (B9a–B9g) opportunistically; hold the
1511- cost-model index-choice ones (B9d/ B9h) until they can be rolled out shape-by-shape
1512- without regressing the EQP corpus — same caution as B1b .
1427+ 6 . ** B9a-seek / B9b — the last ` EXPLAIN QUERY PLAN ` fidelity slices ** (Track B). The
1428+ rest of the B9 cluster (B9c–B9g, the B9d subset, the B9a EQP nodes) shipped in
1429+ 2026-07; what's left is the positive- ` IN ` -on-indexed-column executor seek
1430+ ( ** B9a-seek ** ) and the fragile-numbering window EQP ( ** B9b ** ). The cost-model
1431+ index-choice items ( ** B9h** , ** B9j ** ) are deferred by design — they need a
1432+ stat4-enabled oracle / a cross-cutting collation refactor (see §4) .
15131433
15141434** Deferred / blocked** (documented in §4): ** B1b** join reordering and ** B4**
15151435` sqlite_stat4 ` (diverge from / unverifiable against the stat1-only oracle);
0 commit comments