Skip to content

Commit 04bd48a

Browse files
dougborgclaude
andcommitted
docs(mcp): correct cookbook recipe accuracy issues (#665 round 2)
Four Copilot review comments — all real accuracy issues caught by reading the recipe against the actual implementation: - Line 76: fall-through snippet was synchronous but smart_search is async (async with session, await connection, await exec_driver_sql, cursor.all). Updated to match the real flow so copy/paste is safe. - Line 85: claimed empty-tokenize triggers fuzzy fall-through. Actual: empty tokens return [] directly (queries.py:396-397). Reframed and added the third path I'd missed (no-FTS-sidecar entities go straight to fuzzy). - Line 238: depends_on rationale was wrong — _variant_postprocess reads attrs_obj.product_or_material from the extended API payload, not from cached parents. Rewrote to describe depends_on accurately as declarative FK-join + cycle-detection metadata. - Line 327: "AND OR NOT" example claimed to trigger FTS5 syntax error but the tokenizer+quoting protect every user-typed query from producing invalid MATCH. Rewrote the section as "test-only path" documenting that the regression test patches exec_driver_sql. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent caace27 commit 04bd48a

1 file changed

Lines changed: 36 additions & 22 deletions

File tree

katana_mcp_server/docs/cookbook/catalog-search.md

Lines changed: 36 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -66,24 +66,33 @@ BM25 (`ORDER BY bm25(<entity>_fts)`), highest signal first.
6666

6767
```python
6868
try:
69-
rows = conn.exec_driver_sql(match_sql, params).fetchall()
69+
async with self._engine.session() as session:
70+
conn = await session.connection()
71+
cursor = await conn.exec_driver_sql(sql, (fts_match, limit))
72+
ids = [int(row[0]) for row in cursor.all()]
7073
except OperationalError as exc:
7174
if not _is_fts_syntax_error(exc):
7275
raise # genuine OperationalError (locked DB, missing table) — propagate
73-
rows = []
74-
if not rows:
75-
return await self.search_fuzzy(cls, query, limit=limit, ...)
76+
return await _fuzzy() # FTS5 syntax error → fuzzy
77+
if not ids:
78+
return await _fuzzy() # FTS5 returned no hits → fuzzy
7679
```
7780

7881
Two conditions trigger the fuzzy fall-through:
7982

80-
1. **FTS5 returned zero hits.** Common for partial matches, typos, or queries that
81-
tokenize to an empty list (e.g., a query of only punctuation).
83+
1. **FTS5 returned zero hits.** Common for partial matches and typos. Note that queries
84+
that *tokenize* to an empty list (whitespace-only or punctuation-only input) return
85+
`[]` directly without fuzzy — there's nothing to match against.
8286
1. **FTS5 raised an `OperationalError` whose underlying `orig` message starts with
8387
`fts5: syntax error` or `fts5: unknown`.** This is narrow on purpose — other
8488
`OperationalError`s (locked DB, missing FTS table, disk I/O) propagate so operators
8589
see real failures instead of degraded-but-silent search.
8690

91+
A third path exists: entities **without** an FTS sidecar (the lookup-only types listed
92+
in the coverage table below) skip FTS entirely and go straight to fuzzy. `smart_search`
93+
on `CachedTaxRate`, for example, has no `tax_rate_fts` table to query — fuzzy is the
94+
only available path.
95+
8796
The narrowing is done via `str(exc.orig).startswith(...)` (with a defensive fallback to
8897
`str(exc)` if `exc.orig` happens to be `None`, which SQLAlchemy shouldn't produce in
8998
practice but is guarded for safety). `str(exc)` on its own would include SQLAlchemy's
@@ -230,12 +239,14 @@ The four cache-only fields on `CachedVariant`:
230239
| `display_name` | `parent.name` + variant `config_attributes` | Human-readable result rendering |
231240
| `supplier_item_codes_text` | `" ".join(supplier_item_codes)` | FTS5 tokenizes whitespace-separated text; a list field wouldn't be indexable |
232241

233-
The variant `EntitySpec` declares `depends_on=("product", "material")` so the parent
234-
records exist by the time the postprocess hook reads them. Cold-start ordering — parents
235-
before children — is the responsibility of the caller (today, `ensure_variants_synced`
236-
explicitly `asyncio.gather`s product+material syncs before running the variant sync);
237-
`depends_on` is documentation + a future scheduler hook (see
238-
`_validate_dependency_graph`).
242+
**An aside on `depends_on`**: the postprocess hook gets parent data from the *extended
243+
API payload* (the variant fetch uses `extend=[PRODUCT_OR_MATERIAL]` so each variant
244+
arrives with its parent inlined), **not** from cached parent rows. The variant
245+
`EntitySpec`'s `depends_on=("product", "material")` is therefore declarative metadata —
246+
useful for FK-join queries against the cache and for `_validate_dependency_graph`
247+
cycle-detection at engine open, but not load-bearing for variant denormalization itself.
248+
Cold-start ordering today is handled explicitly: `ensure_variants_synced`
249+
`asyncio.gather`s the product + material syncs before running the variant sync.
239250

240251
## Soft-state filtering
241252

@@ -320,20 +331,23 @@ hits = await catalog.smart_search(CachedProduct, "kitchen knife")
320331
Tokenize: `["kitchen", "knife"]`. Match: `"kitchen"* AND "knife"*`. Behavior unchanged
321332
from the legacy cache — multi-token AND with prefix expansion.
322333

323-
### Syntax-error fall-through
334+
### Syntax-error fall-through (the test-only path)
324335

325-
```python
326-
hits = await catalog.smart_search(CachedVariant, "AND OR NOT")
327-
# → falls through to search_fuzzy; returns whatever difflib scores best
328-
```
336+
The `\W+` tokenizer plus `_build_fts_match`'s double-quote escaping protect every
337+
user-typed query from producing an invalid FTS5 MATCH expression. Reserved-word queries
338+
like `"AND OR NOT"` tokenize to `["AND", "OR", "NOT"]` and emit the valid (if odd)
339+
expression `"AND"* AND "OR"* AND "NOT"*` — the words land inside the quoted string
340+
literals and don't trigger FTS5's keyword parser.
329341

330-
Tokenize produces tokens that, after FTS5 sees them, form an invalid match expression.
331-
FTS5 raises `OperationalError("fts5: syntax error near ...")`. `_is_fts_syntax_error`
332-
returns `True`; `search_fuzzy` runs.
342+
This is by design — `_is_fts_syntax_error` exists to handle the rare case where a future
343+
generator regression or a SQLite version bump produces a query the FTS5 parser rejects.
344+
The regression test for this path patches `exec_driver_sql` to raise
345+
`OperationalError("fts5: syntax error near ...")` directly and asserts the fall-through
346+
routes to fuzzy.
333347

334348
A `LockedError` or `OperationalError("no such table: ...")` would **not** match the
335-
prefix check and would propagate. That's the design — silent fall-through is reserved
336-
for the recoverable case.
349+
`fts5:` prefix check and would propagate. That's the design — silent fall-through is
350+
reserved for the recoverable case.
337351

338352
## Where to look
339353

0 commit comments

Comments
 (0)