Skip to content

Commit 8d23155

Browse files
joaoh82claude
andauthored
fix(sql): validate table qualifiers in FTS function column args (SQLR-15) (#168)
fts_match() / bm25_score() extract their column argument syntactically (they need the column name to find the index, not its value), so the qualifier never passed through RowScope::lookup and SQLR-14's check. A bogus qualifier was silently dropped: fts_match(bogus.body, 'q') ran as fts_match(body, 'q'). Add RowScope::scope_name() (Some(name) for single-table scope, None for joined / group-row scopes) and run check_single_scope_qualifier in resolve_fts_args when the column arg is a CompoundIdentifier. Error wording matches SQLR-14. 3+-part identifiers now error like the main evaluator instead of being silently truncated. Audited the other syntactic extraction sites: JSON and vec_distance helpers evaluate args through eval_expr_scope (already covered); pager catalog re-parses are trusted engine-written SQL. Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
1 parent 29b5d9a commit 8d23155

5 files changed

Lines changed: 169 additions & 7 deletions

File tree

docs/fts.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,8 @@ The function requires a TEXT column with an FTS index attached. Without one, it
105105
fts_match(body, ...): no FTS index on column 'body' (run CREATE INDEX <name> ON <table> USING fts(body) first)
106106
```
107107

108+
The column argument may be qualified (`fts_match(docs.body, 'q')` or the FROM alias). The qualifier is validated the same way as any column reference (SQLR-15): it must name the table in scope, else the query errors with `unknown table qualifier`.
109+
108110
This contrasts with SQLite's `MATCH` operator, which is parser-level; SQLRite uses a function-call shape because `sqlparser`'s SQLite dialect doesn't expose a `BinaryOperator::Match` variant we can dispatch on ([Q1](phase-8-plan.md#q1-match-operator-syntax)).
109111

110112
### `bm25_score(col, 'q')`
@@ -119,7 +121,7 @@ Notes:
119121

120122
- **`DESC`** is the conventional direction. `ASC` works (returns least-relevant first) but disables the optimizer probe — see [Optimizer hook](#optimizer-hook).
121123
- **Same query string** must appear in `WHERE fts_match(...)` and `ORDER BY bm25_score(...)`. The optimizer recognizes the joint pattern; mismatched strings fall through to a slow per-row evaluation.
122-
- **Same caveats as `fts_match`:** requires an FTS index on the column; errors clearly otherwise.
124+
- **Same caveats as `fts_match`:** requires an FTS index on the column; errors clearly otherwise. Qualified column args (`bm25_score(docs.body, …)`) are validated against the table in scope (SQLR-15) — note the optimizer probe only matches a *bare* column identifier, so a qualified arg always takes the per-row evaluation path.
123125

124126
The math is the standard BM25+ variant (`k1 = 1.5`, `b = 0.75`, fixed at SQLite FTS5 defaults). Full formula in [`src/sql/fts/bm25.rs`](../src/sql/fts/bm25.rs).
125127

docs/sql-engine.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,7 @@ match query {
8282

8383
- `Expr::Nested(inner)` → recurse
8484
- `Expr::Identifier(ident)` → look up `ident.value` on the table at the given rowid
85-
- `Expr::CompoundIdentifier(parts)` → same, after validating the qualifier against the one table in scope — `scope_name` is the FROM alias when declared, else the table name, and anything else errors with `unknown table qualifier` (SQLR-14, matching the joined scope's behavior)
85+
- `Expr::CompoundIdentifier(parts)` → same, after validating the qualifier against the one table in scope — `scope_name` is the FROM alias when declared, else the table name, and anything else errors with `unknown table qualifier` (SQLR-14, matching the joined scope's behavior). Functions that extract a column *name* syntactically instead of resolving a value — `fts_match` / `bm25_score`'s first argument — run the same check via `RowScope::scope_name()` (SQLR-15)
8686
- `Expr::Value(v)` → convert a sqlparser literal to a runtime `Value`
8787
- `Expr::UnaryOp { op, expr }` → recurse on inner, apply `+` / `-` / `NOT`
8888
- `Expr::BinaryOp { left, op, right }` → recurse on both sides, apply the operator

docs/supported-sql.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -202,7 +202,7 @@ COUNT([DISTINCT] <column>) -- counts non-NULL values, option
202202
### What works
203203

204204
- **Projection**: `*` (all columns in declaration order), a bare column list, or an explicit list mixing bare columns and aggregate calls. Each item can carry an optional `AS alias` (the alias becomes the output column header and is recognized by `ORDER BY`).
205-
- **Qualified column references** (SQLR-14): `t.col` works in the projection, `WHERE`, `GROUP BY`, `ORDER BY`, and aggregate arguments, but the qualifier must name the table in scope — the `FROM` alias when one is declared, else the table name (ASCII case-insensitive). Anything else errors with `unknown table qualifier`, matching the joined-scope behavior and SQLite. Declaring an alias removes the table name from scope (`SELECT t.id FROM t AS a` errors), and the same validation applies to `UPDATE` / `DELETE` (including alias forms `UPDATE t AS a …` / `DELETE FROM t AS a …`).
205+
- **Qualified column references** (SQLR-14): `t.col` works in the projection, `WHERE`, `GROUP BY`, `ORDER BY`, and aggregate arguments, but the qualifier must name the table in scope — the `FROM` alias when one is declared, else the table name (ASCII case-insensitive). Anything else errors with `unknown table qualifier`, matching the joined-scope behavior and SQLite. Declaring an alias removes the table name from scope (`SELECT t.id FROM t AS a` errors), and the same validation applies to `UPDATE` / `DELETE` (including alias forms `UPDATE t AS a …` / `DELETE FROM t AS a …`). The column argument of `fts_match` / `bm25_score` gets the same check (SQLR-15) — `fts_match(bogus.body, 'q')` errors instead of silently dropping the qualifier.
206206
- **`WHERE`**: any [expression](#expressions). Evaluated per row; NULL-as-false in WHERE context (three-valued logic collapsed to two-valued for filtering). Includes **`IS NULL`** / **`IS NOT NULL`** for explicit null tests, **`LIKE` / `NOT LIKE` / `ILIKE`** for pattern matching, and **`IN (list) / NOT IN (list)`** for set-membership against literal lists.
207207
- **`DISTINCT`**: `SELECT DISTINCT` deduplicates result rows after projection (and after aggregation, when both apply). `NULL` values compare equal to other `NULL`s for dedupe, matching SQL's DISTINCT semantic.
208208
- **`GROUP BY`**: one or more column names, optionally qualified (`GROUP BY customers.name`) — the qualifier disambiguates same-named columns across joined tables (SQLR-6). Every non-aggregate item in the projection must appear in the `GROUP BY` list (rejected with a clear message — by the parser for single-table queries, by the executor for joined ones, where resolving qualifiers needs the schemas). `GROUP BY <col>` without any aggregate behaves like an implicit `DISTINCT <col>`.

src/sql/executor.rs

Lines changed: 43 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,14 @@ pub(crate) trait RowScope {
6363
/// single-table path; calling them from a joined query produces
6464
/// a `NotImplemented` error rather than wrong results.
6565
fn single_table_view(&self) -> Option<(&Table, i64)>;
66+
67+
/// The user-visible name a `t.col` qualifier must match in this
68+
/// scope (FROM alias if declared, else table name), or `None` when
69+
/// no single name applies (joined / group-row scopes). SQLR-15 —
70+
/// lets helpers that extract a column *name* syntactically (rather
71+
/// than resolving a value through `lookup`) run the same SQLR-14
72+
/// qualifier check as plain column references.
73+
fn scope_name(&self) -> Option<&str>;
6674
}
6775

6876
/// The default scope for non-join queries: one table, one rowid.
@@ -107,6 +115,10 @@ impl RowScope for SingleTableScope<'_> {
107115
fn single_table_view(&self) -> Option<(&Table, i64)> {
108116
Some((self.table, self.rowid))
109117
}
118+
119+
fn scope_name(&self) -> Option<&str> {
120+
Some(self.scope_name)
121+
}
110122
}
111123

112124
/// One table participating in a joined query, plus the user-visible
@@ -184,6 +196,12 @@ impl RowScope for JoinedScope<'_> {
184196
fn single_table_view(&self) -> Option<(&Table, i64)> {
185197
None
186198
}
199+
200+
fn scope_name(&self) -> Option<&str> {
201+
// Qualified references in joined scope resolve per-table in
202+
// `lookup`; there is no single name to check against.
203+
None
204+
}
187205
}
188206

189207
/// SQLR-14 — reject a `q.col` qualifier that doesn't name the single
@@ -2839,10 +2857,25 @@ fn resolve_fts_args<'t>(
28392857
};
28402858
let col_name = match col_expr {
28412859
Expr::Identifier(ident) => ident.value.clone(),
2842-
Expr::CompoundIdentifier(parts) => parts
2843-
.last()
2844-
.map(|p| p.value.clone())
2845-
.ok_or_else(|| SQLRiteError::Internal("empty compound identifier".to_string()))?,
2860+
// SQLR-15 — a `t.col` argument extracts the column *name*
2861+
// without going through `RowScope::lookup`, so it must run the
2862+
// SQLR-14 qualifier check itself: `fts_match(bogus.body, …)`
2863+
// errors instead of silently dropping the `bogus.`.
2864+
Expr::CompoundIdentifier(parts) => match parts.as_slice() {
2865+
[only] => only.value.clone(),
2866+
[q, c] => {
2867+
if let Some(scope_name) = scope.scope_name() {
2868+
check_single_scope_qualifier(Some(&q.value), scope_name, &c.value)?;
2869+
}
2870+
c.value.clone()
2871+
}
2872+
_ => {
2873+
return Err(SQLRiteError::NotImplemented(format!(
2874+
"compound identifier with {} parts is not supported",
2875+
parts.len()
2876+
)));
2877+
}
2878+
},
28462879
other => {
28472880
return Err(SQLRiteError::General(format!(
28482881
"{fn_name}() argument 0 must be a column reference, got {other:?}"
@@ -3649,6 +3682,12 @@ impl RowScope for GroupRowScope<'_> {
36493682
fn single_table_view(&self) -> Option<(&Table, i64)> {
36503683
None
36513684
}
3685+
3686+
fn scope_name(&self) -> Option<&str> {
3687+
// Output rows carry no table qualifier — `lookup` ignores
3688+
// qualifiers entirely, so there is nothing to validate.
3689+
None
3690+
}
36523691
}
36533692

36543693
/// Rewrite a HAVING expression for group-row evaluation: every

src/sql/mod.rs

Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2314,6 +2314,127 @@ mod tests {
23142314
assert!(resp.contains("3 rows returned"), "got: {resp}");
23152315
}
23162316

2317+
// -----------------------------------------------------------------
2318+
// SQLR-15 — table qualifiers in FTS function column args are
2319+
// validated like plain column references (SQLR-14). Before the
2320+
// fix, `fts_match(bogus.body, …)` silently dropped the `bogus.`.
2321+
// -----------------------------------------------------------------
2322+
2323+
/// Assert `sql` fails with the SQLR-14/15 unknown-qualifier message.
2324+
fn assert_fts_unknown_qualifier(db: &mut Database, sql: &str, qualifier: &str) {
2325+
let err = process_command(sql, db)
2326+
.expect_err("a bogus qualifier in an FTS function arg must error, not be ignored");
2327+
let msg = format!("{err}");
2328+
assert!(
2329+
msg.contains(&format!("unknown table qualifier '{qualifier}'")),
2330+
"expected unknown-qualifier error for `{sql}`, got: {msg}"
2331+
);
2332+
}
2333+
2334+
#[test]
2335+
fn fts_match_with_matching_table_qualifier_works() {
2336+
let mut db = seed_fts_table();
2337+
process_command("CREATE INDEX ix_body ON docs USING fts (body);", &mut db).unwrap();
2338+
let resp = process_command(
2339+
"SELECT id FROM docs WHERE fts_match(docs.body, 'rust');",
2340+
&mut db,
2341+
)
2342+
.unwrap();
2343+
assert!(resp.contains("3 rows returned"), "got: {resp}");
2344+
// Case-insensitive match, same as plain column references.
2345+
let resp = process_command(
2346+
"SELECT id FROM docs WHERE fts_match(DOCS.body, 'rust');",
2347+
&mut db,
2348+
)
2349+
.unwrap();
2350+
assert!(resp.contains("3 rows returned"), "got: {resp}");
2351+
}
2352+
2353+
#[test]
2354+
fn fts_match_with_matching_alias_qualifier_works() {
2355+
let mut db = seed_fts_table();
2356+
process_command("CREATE INDEX ix_body ON docs USING fts (body);", &mut db).unwrap();
2357+
let resp = process_command(
2358+
"SELECT id FROM docs AS d WHERE fts_match(d.body, 'rust');",
2359+
&mut db,
2360+
)
2361+
.unwrap();
2362+
assert!(resp.contains("3 rows returned"), "got: {resp}");
2363+
}
2364+
2365+
#[test]
2366+
fn fts_match_alias_shadows_table_name_as_qualifier() {
2367+
// Once `FROM docs AS d` declares an alias, the alias is the
2368+
// only valid qualifier — `docs.body` errors (SQLite semantics,
2369+
// mirrors the SQLR-14 plain-column tests).
2370+
let mut db = seed_fts_table();
2371+
process_command("CREATE INDEX ix_body ON docs USING fts (body);", &mut db).unwrap();
2372+
assert_fts_unknown_qualifier(
2373+
&mut db,
2374+
"SELECT id FROM docs AS d WHERE fts_match(docs.body, 'rust');",
2375+
"docs",
2376+
);
2377+
}
2378+
2379+
#[test]
2380+
fn fts_match_unknown_qualifier_in_where_errors() {
2381+
let mut db = seed_fts_table();
2382+
process_command("CREATE INDEX ix_body ON docs USING fts (body);", &mut db).unwrap();
2383+
assert_fts_unknown_qualifier(
2384+
&mut db,
2385+
"SELECT id FROM docs WHERE fts_match(bogus.body, 'rust');",
2386+
"bogus",
2387+
);
2388+
}
2389+
2390+
#[test]
2391+
fn bm25_score_unknown_qualifier_in_order_by_errors() {
2392+
// ORDER BY position exercises the scalar-eval path: the FTS
2393+
// probe only matches bare identifiers, so the qualified arg
2394+
// falls through to resolve_fts_args.
2395+
let mut db = seed_fts_table();
2396+
process_command("CREATE INDEX ix_body ON docs USING fts (body);", &mut db).unwrap();
2397+
assert_fts_unknown_qualifier(
2398+
&mut db,
2399+
"SELECT id FROM docs ORDER BY bm25_score(bogus.body, 'rust') DESC LIMIT 1;",
2400+
"bogus",
2401+
);
2402+
}
2403+
2404+
#[test]
2405+
fn bm25_score_unknown_qualifier_in_where_errors() {
2406+
// (Projection position can't reach the FTS helper — scalar
2407+
// functions are rejected in the projection list outright, so
2408+
// WHERE-comparison is the other scalar-eval position.)
2409+
let mut db = seed_fts_table();
2410+
process_command("CREATE INDEX ix_body ON docs USING fts (body);", &mut db).unwrap();
2411+
assert_fts_unknown_qualifier(
2412+
&mut db,
2413+
"SELECT id FROM docs WHERE bm25_score(bogus.body, 'rust') > 0.0;",
2414+
"bogus",
2415+
);
2416+
}
2417+
2418+
#[test]
2419+
fn bm25_score_with_matching_qualifier_still_ranks_correctly() {
2420+
// Qualified arg bypasses the FTS probe (bare-identifier match
2421+
// only) — the scalar path must produce the same top result.
2422+
let mut db = seed_fts_table();
2423+
process_command("CREATE INDEX ix_body ON docs USING fts (body);", &mut db).unwrap();
2424+
let out = process_command_with_render(
2425+
"SELECT id FROM docs WHERE fts_match(docs.body, 'rust') \
2426+
ORDER BY bm25_score(docs.body, 'rust') DESC LIMIT 1;",
2427+
&mut db,
2428+
)
2429+
.unwrap();
2430+
assert!(out.status.contains("1 row returned"), "got: {}", out.status);
2431+
let rendered = out.rendered.expect("SELECT should produce rendered output");
2432+
assert!(
2433+
rendered.contains(" 5 "),
2434+
"expected id=5 to be top-ranked; rendered:\n{rendered}"
2435+
);
2436+
}
2437+
23172438
// -----------------------------------------------------------------
23182439
// Phase 7b — vector distance functions through process_command
23192440
// -----------------------------------------------------------------

0 commit comments

Comments
 (0)