fix(store): embed searchVec query with the store's pinned model#690
fix(store): embed searchVec query with the store's pinned model#690Ciel2142 wants to merge 1 commit into
Conversation
store.searchVector() and the internal multi-query vector path embedded the query with the global QMD_EMBED_MODEL (getDefaultLlamaCpp) instead of the store's pinned embed model. A store whose config.models.embed differs from QMD_EMBED_MODEL hit a sqlite-vec dimension mismatch (e.g. stored 768d vs query 4096d) and loaded the wrong, larger model at query time. Thread getLlm(store) through the bound store.searchVec into getEmbedding, which already accepts an llmOverride. Backward compatible: the new param is optional; session and precomputedEmbedding callers are unaffected, so hybrid/precomputed search paths are unchanged. Adds a regression test that pins a non-default store embed model and asserts searchVec embeds the query with it (hits returned, no dim mismatch, global default llm untouched). Refs: qmd-prm, qmd-se8, qmd-mg2
|
Prior-art note for reviewers: this is a follow-up to #497 (closed), not a duplicate. #497 fixed the query-time embed model name — This PR fixes a different mechanism in the same So #497's The fix threads |
Problem
store.searchVector()(and the internal multi-query vector path) embed thesearch-time query with the global
QMD_EMBED_MODEL(getDefaultLlamaCpp())instead of the store's pinned embed model. When a store is created with
config.models.embeddifferent fromQMD_EMBED_MODEL, the query vector widthdiffers from the stored vectors and sqlite-vec rejects it before any comparison:
(store embedded with embeddinggemma-300M → 768-dim stored vectors; env
QMD_EMBED_MODEL= Qwen3-Embedding-8B → query embedded at 4096-dim.) It alsoloads the wrong, often much larger model at query time.
Root cause
searchVec()callsgetEmbedding(query, model, true, session)without anllmOverride, sogetEmbeddingfalls back togetDefaultLlamaCpp()(the envmodel). The store's own llm (
store.llm) is never threaded into the query-embedstep.
getEmbeddingalready accepts anllmOverrideparameter — it simply isn'tpassed.
Hybrid search (
store.search) is unaffected: it pre-computes query embeddingsvia the store session/llm and passes them as
precomputedEmbedding, whichshort-circuits before
getEmbedding. Only the inline-embed paths hit the bug.Fix
Thread
getLlm(store)(which returnsstore.llm ?? getDefaultLlamaCpp())through the bound
store.searchVecintogetEmbedding, mirroring the existingexpandQuery/rerankllm threading. Three small edits insrc/store.ts:store.searchVecpassesgetLlm(store)as the new last argsearchVec()gains an optionalllm?: LlamaCppparametergetEmbedding(query, model, true, session, llm)Backward compatible — the new param is optional;
session- andprecomputedEmbedding-based callers are unaffected (session is preferred overthe override, and precomputed skips
getEmbeddingentirely).Test
Adds a regression test in
test/store.test.ts: it pins a store to anon-default (3-dim) embed model, sets the global default to a different (4-dim)
model, embeds a doc, then calls
store.searchVec(). It asserts hits are returnedwith no dimension mismatch and that the global default llm is never used to embed
the query. The test fails on the current code with
Dimension mismatch ... Expected 3 ... received 4and passes with the fix.