Skip to content

fix(store): match ANY term in multi-word FTS queries (OR, not implicit AND)#448

Open
joniiiAi wants to merge 1 commit into
Gentleman-Programming:mainfrom
joniiiAi:fix/fts5-multiword-or-search
Open

fix(store): match ANY term in multi-word FTS queries (OR, not implicit AND)#448
joniiiAi wants to merge 1 commit into
Gentleman-Programming:mainfrom
joniiiAi:fix/fts5-multiword-or-search

Conversation

@joniiiAi
Copy link
Copy Markdown

Problem

sanitizeFTS wraps each query word in quotes and joins them with a space. FTS5 treats a space between terms as an implicit AND, so a multi-word query only matches documents that contain every term.

For natural-language / multi-word searches this means near-zero recall — the query returns 0 hits unless one single document happens to contain all the words.

Reproducible at the SQL level on any populated DB:

-- implicit AND (current behaviour): every term required
SELECT count(*) FROM observations_fts
WHERE observations_fts MATCH '"email" "pipeline" "stalled" "leads"';   -- 0

-- OR: matches any term
SELECT count(*) FROM observations_fts
WHERE observations_fts MATCH '"email" OR "pipeline" OR "stalled" OR "leads"'; -- many

In practice mem_search (and the HTTP /search + context-injection hook, which share this helper) return nothing for typical multi-word queries while the data is clearly present and findable by single-term search.

Fix

Join the quoted terms with OR instead of a space.

Relevance is preserved by the callers' existing ORDER BY fts.rank (bm25): documents matching more — and rarer — terms still rank highest. So the previous "best" matches stay on top, and partial matches simply become reachable instead of being dropped entirely. Each term stays quoted, so the original special-char protection is unchanged.

Diff is 3 lines of logic (plus an updated doc comment).

Alternative considered

A hybrid "AND first, fall back to OR if 0 results" would also work and keeps strict-AND precision when it happens to match. I went with plain OR + bm25 because it's a one-line change, matches how FTS engines are normally queried, and the rank ordering already delivers the precision benefit (all-term matches float to the top) without a second query.

Tests

  • TestSanitizeFTS — table-driven unit test for the OR join, quote-stripping, empty/blank, single term.
  • TestSearchMatchesAnyTermInMultiWordQuery — regression test: a query whose terms are only partially present now returns the document (0 results before this change).
ok  github.com/Gentleman-Programming/engram/internal/store

Verified locally end-to-end: rebuilt the binary, and mem_search / engram search / the /search HTTP endpoint now return ranked results for previously-empty multi-word queries.

…t AND)

sanitizeFTS joined quoted terms with a space, which FTS5 treats as an
implicit AND. Multi-word / natural-language queries therefore required
EVERY term to appear in a single document and returned 0 hits for almost
all real queries (e.g. searching "email pipeline stalled leads" when no
single observation contains all four words).

Join the quoted terms with OR instead. Relevance is preserved by the
callers' existing `ORDER BY fts.rank` (bm25): documents matching more —
and rarer — terms still rank highest, so the previous best matches stay
on top while partial matches become reachable.

Adds a unit test for sanitizeFTS and a regression test proving a
partial multi-word query now matches (0 hits before).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant