Skip to content

fix(store): embed searchVec query with the store's pinned model#690

Open
Ciel2142 wants to merge 1 commit into
tobi:mainfrom
Ciel2142:fix/searchvec-pinned-embed-model
Open

fix(store): embed searchVec query with the store's pinned model#690
Ciel2142 wants to merge 1 commit into
tobi:mainfrom
Ciel2142:fix/searchvec-pinned-embed-model

Conversation

@Ciel2142
Copy link
Copy Markdown

Problem

store.searchVector() (and the internal multi-query vector path) embed the
search-time query with the global QMD_EMBED_MODEL (getDefaultLlamaCpp())
instead of the store's pinned embed model. When a store is created with
config.models.embed different from QMD_EMBED_MODEL, the query vector width
differs from the stored vectors and sqlite-vec rejects it before any comparison:

SqliteError: Dimension mismatch for query vector for the "embedding" column. Expected 768 dimensions but received 4096.

(store embedded with embeddinggemma-300M → 768-dim stored vectors; env
QMD_EMBED_MODEL = Qwen3-Embedding-8B → query embedded at 4096-dim.) It also
loads the wrong, often much larger model at query time.

Root cause

searchVec() calls getEmbedding(query, model, true, session) without an
llmOverride, so getEmbedding falls back to getDefaultLlamaCpp() (the env
model). The store's own llm (store.llm) is never threaded into the query-embed
step. getEmbedding already accepts an llmOverride parameter — it simply isn't
passed.

Hybrid search (store.search) is unaffected: it pre-computes query embeddings
via the store session/llm and passes them as precomputedEmbedding, which
short-circuits before getEmbedding. Only the inline-embed paths hit the bug.

Fix

Thread getLlm(store) (which returns store.llm ?? getDefaultLlamaCpp())
through the bound store.searchVec into getEmbedding, mirroring the existing
expandQuery/rerank llm threading. Three small edits in src/store.ts:

  1. the bound store.searchVec passes getLlm(store) as the new last arg
  2. searchVec() gains an optional llm?: LlamaCpp parameter
  3. it forwards that into getEmbedding(query, model, true, session, llm)

Backward compatible — the new param is optional; session- and
precomputedEmbedding-based callers are unaffected (session is preferred over
the override, and precomputed skips getEmbedding entirely).

Test

Adds a regression test in test/store.test.ts: it pins a store to a
non-default (3-dim) embed model, sets the global default to a different (4-dim)
model, embeds a doc, then calls store.searchVec(). It asserts hits are returned
with no dimension mismatch and that the global default llm is never used to embed
the query. The test fails on the current code with Dimension mismatch ... Expected 3 ... received 4 and passes with the fix.

store.searchVector() and the internal multi-query vector path embedded the
query with the global QMD_EMBED_MODEL (getDefaultLlamaCpp) instead of the
store's pinned embed model. A store whose config.models.embed differs from
QMD_EMBED_MODEL hit a sqlite-vec dimension mismatch (e.g. stored 768d vs
query 4096d) and loaded the wrong, larger model at query time.

Thread getLlm(store) through the bound store.searchVec into getEmbedding,
which already accepts an llmOverride. Backward compatible: the new param is
optional; session and precomputedEmbedding callers are unaffected, so
hybrid/precomputed search paths are unchanged.

Adds a regression test that pins a non-default store embed model and asserts
searchVec embeds the query with it (hits returned, no dim mismatch, global
default llm untouched).

Refs: qmd-prm, qmd-se8, qmd-mg2
@Ciel2142
Copy link
Copy Markdown
Author

Ciel2142 commented Jun 1, 2026

Prior-art note for reviewers: this is a follow-up to #497 (closed), not a duplicate.

#497 fixed the query-time embed model namestore.js hardcoded DEFAULT_EMBED_MODEL = "embeddinggemma". On current main that's export const DEFAULT_EMBED_MODEL = DEFAULT_EMBED_MODEL_URI (src/store.ts:45), and #501 made a dimension mismatch throw instead of silently rebuilding. Those address the wrong-model-string path.

This PR fixes a different mechanism in the same searchVec path: the llm instance that runs the query embed. On current main, the inline-embed path still embeds with getDefaultLlamaCpp() (env QMD_EMBED_MODEL) and never threads store.llm:

src/store.ts:1890  searchVec(..., session, precomputedEmbedding)                              // store.llm never passed
src/store.ts:3471  const embedding = precomputedEmbedding ?? await getEmbedding(query, model, true, session);   // no llmOverride
src/store.ts:3562  await (llmOverride ?? getDefaultLlamaCpp()).embed(formattedText, { model, isQuery });        // env fallback

So #497's DEFAULT_EMBED_MODEL_URI change does not cover a store pinned via config.models.embed that differs from env QMD_EMBED_MODEL: the inline-embed path (no session, no precomputedEmbedding) still loads the env model and fails with the same Dimension mismatch ... Expected N ... received M symptom. Hybrid / precomputed / session callers are unaffected — they short-circuit before getEmbedding, which is why this slipped past the #497 fix.

The fix threads getLlm(store) (store.llm ?? getDefaultLlamaCpp()) into that one path, mirroring the existing expandQuery/rerank llm threading. The added regression test fails on current main (dim mismatch) and passes with the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant