fix(store): match ANY term in multi-word FTS queries (OR, not implicit AND)#448
Open
joniiiAi wants to merge 1 commit into
Open
fix(store): match ANY term in multi-word FTS queries (OR, not implicit AND)#448joniiiAi wants to merge 1 commit into
joniiiAi wants to merge 1 commit into
Conversation
…t AND) sanitizeFTS joined quoted terms with a space, which FTS5 treats as an implicit AND. Multi-word / natural-language queries therefore required EVERY term to appear in a single document and returned 0 hits for almost all real queries (e.g. searching "email pipeline stalled leads" when no single observation contains all four words). Join the quoted terms with OR instead. Relevance is preserved by the callers' existing `ORDER BY fts.rank` (bm25): documents matching more — and rarer — terms still rank highest, so the previous best matches stay on top while partial matches become reachable. Adds a unit test for sanitizeFTS and a regression test proving a partial multi-word query now matches (0 hits before). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
sanitizeFTSwraps each query word in quotes and joins them with a space. FTS5 treats a space between terms as an implicit AND, so a multi-word query only matches documents that contain every term.For natural-language / multi-word searches this means near-zero recall — the query returns 0 hits unless one single document happens to contain all the words.
Reproducible at the SQL level on any populated DB:
In practice
mem_search(and the HTTP/search+ context-injection hook, which share this helper) return nothing for typical multi-word queries while the data is clearly present and findable by single-term search.Fix
Join the quoted terms with
ORinstead of a space.Relevance is preserved by the callers' existing
ORDER BY fts.rank(bm25): documents matching more — and rarer — terms still rank highest. So the previous "best" matches stay on top, and partial matches simply become reachable instead of being dropped entirely. Each term stays quoted, so the original special-char protection is unchanged.Diff is 3 lines of logic (plus an updated doc comment).
Alternative considered
A hybrid "AND first, fall back to OR if 0 results" would also work and keeps strict-AND precision when it happens to match. I went with plain
OR+ bm25 because it's a one-line change, matches how FTS engines are normally queried, and the rank ordering already delivers the precision benefit (all-term matches float to the top) without a second query.Tests
TestSanitizeFTS— table-driven unit test for the OR join, quote-stripping, empty/blank, single term.TestSearchMatchesAnyTermInMultiWordQuery— regression test: a query whose terms are only partially present now returns the document (0 results before this change).Verified locally end-to-end: rebuilt the binary, and
mem_search/engram search/ the/searchHTTP endpoint now return ranked results for previously-empty multi-word queries.