Skip to content

Commit 419767e

Browse files
MagicalTuxclaude
andcommitted
feat(eqp): render the FOR IN-OPERATOR node for an indexed IN-subquery
When a scanned outer's WHERE has a single `col [NOT] IN (SELECT <c> FROM <table> [ORDER BY …])` whose subquery projects one plain *indexed* column, SQLite evaluates the IN by iterating that column's index rather than materializing the result, and renders a single `… FOR IN-OPERATOR` plan node in place of the `LIST SUBQUERY` / `CREATE BLOOM FILTER` subtree. graphite now matches: a secondary index leading with the column renders `USING INDEX <name> FOR IN-OPERATOR`; the rowid / INTEGER PRIMARY KEY renders `USING ROWID SEARCH ON TABLE <table> FOR IN-OPERATOR`. The new `in_operator_index_node` gates on a bare-`SCAN` outer plus a simple body (no WHERE/GROUP/HAVING/DISTINCT/LIMIT/OFFSET/join/compound/CTE/window and a single plain-column projection). When two or more plain indexes lead with the column, which one SQLite iterates is a cost-model tiebreak unverifiable against the stat1-only oracle, so those keep the `LIST SUBQUERY` form. This is an EXPLAIN-QUERY-PLAN-only change; executed results are unaffected. Verified byte-for-byte vs sqlite3 3.50.4 (`tests/eqp_in_operator.rs`). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent f9dfc3c commit 419767e

3 files changed

Lines changed: 242 additions & 17 deletions

File tree

ROADMAP.md

Lines changed: 25 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1244,14 +1244,18 @@ correct for all of these — perf/EQP-fidelity, not correctness):
12441244
when the rendered access line *is* the IN column's seek (`(in_col=?)`), so a competing
12451245
equality/range seek on a different column declines rather than mis-render. Verified vs
12461246
sqlite3 3.50.4 (`eqp_in_subquery.rs`: EQP + row parity, rowid + secondary + empty list).
1247-
*Residual (separate, not B9a-seek):* a non-seekable / `NOT IN` whose subquery projects
1248-
an *indexed* column — SQLite renders `USING INDEX <ix> FOR IN-OPERATOR` rather than
1249-
`LIST SUBQUERY`+bloom; graphite still emits the bloom node there. Correlated / compound
1250-
bodies stay deferred (`CORRELATED LIST SUBQUERY`).
1251-
1252-
With B9a-seek shipped, the only open EQP-fidelity threads are the `FOR IN-OPERATOR`
1253-
render residual above and **B9b** (window EQP), now confirmed deferred by design —
1254-
see below.
1247+
*`FOR IN-OPERATOR` (DONE 2026-07-05):* a scanned outer whose `[NOT] IN
1248+
(SELECT <col> FROM <table> [ORDER BY …])` projects a single *indexed* plain column now
1249+
renders SQLite's `USING INDEX <ix> FOR IN-OPERATOR` (secondary index) / `USING ROWID
1250+
SEARCH ON TABLE <t> FOR IN-OPERATOR` (rowid / INTEGER PRIMARY KEY) in place of the
1251+
`LIST SUBQUERY`+bloom subtree (`in_operator_index_node`, gated on a bare-`SCAN` outer +
1252+
a simple, WHERE/LIMIT/DISTINCT/GROUP/join/expression-free body). When two+ plain indexes
1253+
lead with the column the choice is a cost-model tiebreak we don't reproduce → it stays on
1254+
`LIST SUBQUERY` (`eqp_in_operator.rs`). Correlated / compound bodies stay deferred
1255+
(`CORRELATED LIST SUBQUERY`).
1256+
1257+
With B9a-seek and `FOR IN-OPERATOR` shipped, the only open EQP-fidelity thread is **B9b**
1258+
(window EQP), now confirmed deferred by design — see below.
12551259

12561260
**Blocked / deferred by design:**
12571261
- **B9h — cost-model single-table index *choice*.** SQLite prefers, among indexes
@@ -1496,10 +1500,12 @@ reasonable order:
14961500
1. **B5b-2 — live storage cursors on the VDBE.** The largest remaining VDBE piece;
14971501
it turns the materialized inner join into a seek-driven one and is the
14981502
prerequisite for streaming correlated subqueries and windows. Perf/coverage,
1499-
parity-gated, low risk. **B5b-2a landed** (INNER-join inner rowid seek on an
1500-
INTEGER PRIMARY KEY — live `TableCursor::seek`, no inner materialization);
1501-
remaining sub-steps (in-interpreter seek opcodes, secondary-index / WITHOUT
1502-
ROWID seeks, N-table chains) tracked under §4's B5b-2 entry.
1503+
parity-gated, low risk. **B5b-2a–2d landed** (INNER *and* LEFT inner rowid seek on an
1504+
INTEGER PRIMARY KEY, compound-`ON` conjunct seek, and an N-table left-deep chain of
1505+
ipk seeks — live `TableCursor::seek`, no inner materialization); the remaining
1506+
sub-step is the in-interpreter `OpenRead`/`SeekRowid` opcodes (moving the seek into
1507+
bytecode — an architectural refactor with no user-visible behavior change), plus the
1508+
affinity-blocked secondary-index / `WITHOUT ROWID` seeks, tracked under §4's B5b-2 entry.
15031509
2. **B5c-2 — correlated subqueries on the VDBE**, once B5b-2 lands the live-cursor
15041510
machinery.
15051511
3. **C9a → C9d — the concurrency story** (persistent read locks in `src/pager/`,
@@ -1512,10 +1518,13 @@ reasonable order:
15121518
prepare pass for the lazy-validation gaps.
15131519
6. **`EXPLAIN QUERY PLAN` fidelity (Track B) — essentially closed.** The whole B9
15141520
cluster shipped in 2026-07 (B9a incl. the seekable-`IN` render, B9c–B9g, the B9d
1515-
subset). What remains is deferred by design: the cost-model index-choice items
1516-
(**B9h**, **B9j**), and **B9b** window EQP — whose co-routine body is itself the B9h
1517-
index choice (see §4). The lone open non-blocked residual is the `FOR IN-OPERATOR`
1518-
render node for a `NOT IN`/unindexed subquery over an indexed column.
1521+
subset), and the **`FOR IN-OPERATOR`** render node landed 2026-07-05 (a scanned
1522+
outer whose `[NOT] IN (SELECT <indexed-col> FROM <table>)` iterates the subquery
1523+
column's index — `USING INDEX <name>` / `USING ROWID SEARCH ON TABLE <t>` — instead
1524+
of a `LIST SUBQUERY`; an ambiguous multi-index choice stays on `LIST SUBQUERY`, a
1525+
cost-model tiebreak deliberately not chased). What remains is deferred by design:
1526+
the cost-model index-choice items (**B9h**, **B9j**), and **B9b** window EQP — whose
1527+
co-routine body is itself the B9h index choice (see §4).
15191528

15201529
**Deferred / blocked** (documented in §4): **B1b** join reordering and **B4**
15211530
`sqlite_stat4` (diverge from / unverifiable against the stat1-only oracle);

src/exec/mod.rs

Lines changed: 112 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14803,7 +14803,21 @@ impl Connection {
1480314803
})
1480414804
});
1480514805
let nonseek_case = (negated || !in_col_seekable) && bare_scan;
14806-
if (nonseek_case || seek_is_in_col) && self.eqp_scalar_bodies_renderable(&[body]) {
14806+
// With a bare `SCAN` outer, a simple indexed-column subquery is
14807+
// evaluated by iterating that index (a single `… FOR IN-OPERATOR`
14808+
// node) rather than materializing a `LIST SUBQUERY` + bloom filter.
14809+
let in_op_node = if nonseek_case {
14810+
self.in_operator_index_node(body)
14811+
} else {
14812+
None
14813+
};
14814+
if let Some(node) = in_op_node {
14815+
let n_id = *next_id;
14816+
*next_id += 1;
14817+
out.push((n_id, parent, node));
14818+
} else if (nonseek_case || seek_is_in_col)
14819+
&& self.eqp_scalar_bodies_renderable(&[body])
14820+
{
1480714821
let list_id = *next_id;
1480814822
*next_id += 1;
1480914823
out.push((list_id, parent, String::from("LIST SUBQUERY 1")));
@@ -15116,6 +15130,103 @@ impl Connection {
1511615130
Ok(true)
1511715131
}
1511815132

15133+
/// When an `[NOT] IN (SELECT …)` is evaluated by iterating an index on the
15134+
/// subquery's column instead of materializing its result, SQLite renders a
15135+
/// single `… FOR IN-OPERATOR` node (a child of the outer `SCAN`) in place of
15136+
/// the `LIST SUBQUERY` / `CREATE BLOOM FILTER` subtree. This happens for a
15137+
/// *simple* `SELECT <col> FROM <table> [ORDER BY …]` whose single projected
15138+
/// column is a plain column that is indexed: a secondary index leading with
15139+
/// the column renders `USING INDEX <name> FOR IN-OPERATOR`; the rowid /
15140+
/// INTEGER PRIMARY KEY renders `USING ROWID SEARCH ON TABLE <table> FOR
15141+
/// IN-OPERATOR`. Any `WHERE`/`GROUP BY`/`HAVING`/`DISTINCT`/`LIMIT`/`OFFSET`,
15142+
/// a join, a compound/CTE, an expression projection, or an unindexed column
15143+
/// disqualifies it (→ `None`, so the caller keeps the `LIST SUBQUERY` form).
15144+
fn in_operator_index_node(&self, body: &Select) -> Option<String> {
15145+
if body.distinct
15146+
|| body.where_clause.is_some()
15147+
|| !body.group_by.is_empty()
15148+
|| body.having.is_some()
15149+
|| body.limit.is_some()
15150+
|| body.offset.is_some()
15151+
|| !body.compound.is_empty()
15152+
|| !body.ctes.is_empty()
15153+
|| !body.window_defs.is_empty()
15154+
{
15155+
return None;
15156+
}
15157+
let from = body.from.as_ref()?;
15158+
if !from.joins.is_empty() {
15159+
return None;
15160+
}
15161+
let tref = &from.first;
15162+
if tref.subquery.is_some()
15163+
|| tref.tvf_args.is_some()
15164+
|| tref.schema.is_some()
15165+
|| self.is_bare_tvf(tref)
15166+
|| self.is_view(&tref.name)
15167+
{
15168+
return None;
15169+
}
15170+
// A single plain-column projection (no expression, no `*`).
15171+
if body.columns.len() != 1 {
15172+
return None;
15173+
}
15174+
let ResultColumn::Expr { expr, .. } = &body.columns[0] else {
15175+
return None;
15176+
};
15177+
let mut proj = expr;
15178+
while let Expr::Paren(inner) = proj {
15179+
proj = inner;
15180+
}
15181+
let Expr::Column {
15182+
column,
15183+
table,
15184+
schema: None,
15185+
..
15186+
} = proj
15187+
else {
15188+
return None;
15189+
};
15190+
let meta = self.table_meta(&tref.name, tref.alias.as_deref()).ok()?;
15191+
// A qualifier on the projected column must name the subquery's table.
15192+
if let Some(t) = table {
15193+
let qual = tref.alias.as_deref().unwrap_or(&tref.name);
15194+
if !t.eq_ignore_ascii_case(qual) {
15195+
return None;
15196+
}
15197+
}
15198+
let col_idx = meta
15199+
.columns
15200+
.iter()
15201+
.position(|c| c.name.eq_ignore_ascii_case(column));
15202+
// The rowid / INTEGER PRIMARY KEY of a rowid table → ROWID search form.
15203+
let is_rowid = !meta.without_rowid
15204+
&& (col_idx == meta.ipk && col_idx.is_some()
15205+
|| (is_rowid_alias(column) && col_idx.is_none()));
15206+
if is_rowid {
15207+
return Some(alloc::format!(
15208+
"USING ROWID SEARCH ON TABLE {} FOR IN-OPERATOR",
15209+
tref.name
15210+
));
15211+
}
15212+
// A plain (non-partial, non-expression) secondary index leading with the
15213+
// column → index-iteration form. Only when the choice is *unambiguous*:
15214+
// if two or more plain indexes lead with the column, which one SQLite
15215+
// iterates is a cost-model tiebreak (index width / uniqueness) we can't
15216+
// reproduce against the stat1-only oracle, so defer to the `LIST SUBQUERY`
15217+
// form (its pre-existing render) rather than guess the wrong index name.
15218+
let ci = col_idx?;
15219+
let ixs = self.indexes_of(&tref.name).ok()?;
15220+
let mut leading = ixs.iter().filter(|i| {
15221+
i.partial.is_none() && i.key_exprs.is_none() && i.cols.first() == Some(&ci)
15222+
});
15223+
let ix = leading.next()?;
15224+
if leading.next().is_some() {
15225+
return None;
15226+
}
15227+
Some(alloc::format!("USING INDEX {} FOR IN-OPERATOR", ix.name))
15228+
}
15229+
1511915230
/// [`eqp_access`](Self::eqp_access), then collapse a *secondary*-index seek to a
1512015231
/// plain `SCAN` when the table carries a `NOT INDEXED` hint — the hint forbids
1512115232
/// every secondary index (including an implicit `sqlite_autoindex_…` for a

tests/eqp_in_operator.rs

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
//! `[NOT] IN (SELECT …)` over an *indexed* subquery column, evaluated by
2+
//! iterating that index rather than materializing the result, renders SQLite's
3+
//! `… FOR IN-OPERATOR` plan node in place of the `LIST SUBQUERY` / `CREATE BLOOM
4+
//! FILTER` subtree. This fires only when the outer table is scanned (its IN
5+
//! column is not itself index-seekable) and the subquery is a *simple*
6+
//! `SELECT <col> FROM <table> [ORDER BY …]` whose single plain column is
7+
//! indexed: a secondary index leading with the column → `USING INDEX <name> FOR
8+
//! IN-OPERATOR`; the rowid / INTEGER PRIMARY KEY → `USING ROWID SEARCH ON TABLE
9+
//! <table> FOR IN-OPERATOR`. Any WHERE/LIMIT/DISTINCT/GROUP/join/expression, an
10+
//! unindexed column, or an ambiguous multi-index choice keeps the `LIST
11+
//! SUBQUERY` form. EQP-only: the executed results are unaffected. Verified
12+
//! byte-for-byte against the sqlite3 3.50.4 CLI.
13+
14+
#![cfg(feature = "std")]
15+
16+
use std::process::Command;
17+
18+
fn sqlite3_available() -> bool {
19+
Command::new("sqlite3").arg("--version").output().is_ok()
20+
}
21+
22+
fn plan(bin: &str, base: &str, sql: &str) -> String {
23+
let full = format!("{base} EXPLAIN QUERY PLAN {sql}");
24+
let out = Command::new(bin)
25+
.arg(":memory:")
26+
.arg(&full)
27+
.output()
28+
.unwrap();
29+
String::from_utf8_lossy(&out.stdout)
30+
.lines()
31+
.filter(|l| !l.trim().is_empty() && !l.starts_with("QUERY PLAN"))
32+
.map(|l| l.trim_start_matches(|c: char| " |`*+_-".contains(c)))
33+
.collect::<Vec<_>>()
34+
.join("#")
35+
}
36+
37+
// `w` is the scanned outer (no index on `p`); `t` supplies the subquery values
38+
// with exactly one index per candidate column (unambiguous choice).
39+
const BASE: &str = "CREATE TABLE t(a INTEGER PRIMARY KEY, b, c);\
40+
CREATE INDEX tb ON t(b);\
41+
INSERT INTO t VALUES(1,10,100),(2,20,200),(3,30,300);\
42+
CREATE TABLE u(x, y); CREATE INDEX ux ON u(x); INSERT INTO u VALUES(10,1),(30,3);\
43+
CREATE TABLE w(p, q); INSERT INTO w VALUES(10,1),(20,2),(99,3);";
44+
45+
#[test]
46+
fn for_in_operator_plan_matches_sqlite() {
47+
if !sqlite3_available() {
48+
eprintln!("sqlite3 CLI not found; skipping");
49+
return;
50+
}
51+
let g = env!("CARGO_BIN_EXE_graphitesql");
52+
for q in [
53+
// Secondary-index column → USING INDEX tb FOR IN-OPERATOR.
54+
"SELECT * FROM w WHERE p IN (SELECT b FROM t)",
55+
"SELECT * FROM w WHERE p NOT IN (SELECT b FROM t)",
56+
"SELECT * FROM w WHERE p IN (SELECT b FROM t ORDER BY b)",
57+
"SELECT * FROM w WHERE p IN (SELECT (b) FROM t)",
58+
"SELECT * FROM w WHERE p IN (SELECT b AS bb FROM t)",
59+
"SELECT * FROM w WHERE p IN (SELECT t.b FROM t)",
60+
// rowid / INTEGER PRIMARY KEY → USING ROWID SEARCH ON TABLE t.
61+
"SELECT * FROM w WHERE p IN (SELECT a FROM t)",
62+
"SELECT * FROM w WHERE p IN (SELECT rowid FROM t)",
63+
"SELECT * FROM w WHERE p NOT IN (SELECT a FROM t)",
64+
// Controls that keep the LIST SUBQUERY / bloom form:
65+
"SELECT * FROM w WHERE p IN (SELECT c FROM t)", // unindexed column
66+
"SELECT * FROM w WHERE p IN (SELECT b FROM t WHERE c > 0)", // WHERE
67+
"SELECT * FROM w WHERE p IN (SELECT b FROM t LIMIT 2)", // LIMIT
68+
"SELECT * FROM w WHERE p IN (SELECT DISTINCT b FROM t)", // DISTINCT
69+
"SELECT * FROM w WHERE p IN (SELECT b + 1 FROM t)", // expression
70+
// Control: the IN column is itself index-seekable → outer SEARCH, RHS
71+
// stays a materialized LIST SUBQUERY (no FOR IN-OPERATOR).
72+
"SELECT * FROM u WHERE x IN (SELECT b FROM t)",
73+
"SELECT * FROM t WHERE b IN (SELECT x FROM u)",
74+
] {
75+
assert_eq!(plan("sqlite3", BASE, q), plan(g, BASE, q), "plan for {q}");
76+
}
77+
}
78+
79+
/// The plan node is cosmetic — the executed rows must be identical regardless.
80+
#[test]
81+
fn for_in_operator_rows_match() {
82+
if !sqlite3_available() {
83+
eprintln!("sqlite3 CLI not found; skipping");
84+
return;
85+
}
86+
let g = env!("CARGO_BIN_EXE_graphitesql");
87+
for q in [
88+
"SELECT p FROM w WHERE p IN (SELECT b FROM t) ORDER BY p",
89+
"SELECT p FROM w WHERE p NOT IN (SELECT b FROM t) ORDER BY p",
90+
"SELECT p FROM w WHERE p IN (SELECT a FROM t) ORDER BY p",
91+
] {
92+
let full = format!("{BASE} {q}");
93+
let sq = Command::new("sqlite3")
94+
.arg(":memory:")
95+
.arg(&full)
96+
.output()
97+
.unwrap();
98+
let gr = Command::new(g).arg(":memory:").arg(&full).output().unwrap();
99+
assert_eq!(
100+
String::from_utf8_lossy(&sq.stdout),
101+
String::from_utf8_lossy(&gr.stdout),
102+
"rows for {q}"
103+
);
104+
}
105+
}

0 commit comments

Comments
 (0)