feat(embeddings): provider-common query/document prefix support by valda · Pull Request #1585 · vectorize-io/hindsight

valda · 2026-05-11T22:47:06Z

Summary

Adds two new environment variables and a thin template-method layer so every embedding backend can prepend distinct prefixes to query-side vs document-side texts. Required by asymmetric retrieval models (Ruri v3, intfloat/multilingual-e5-, BAAI/bge-large-zh-v1.5, Snowflake/snowflake-arctic-embed-, etc.) to reach published retrieval scores. Empty prefixes (default) preserve byte-identical pre-existing behavior — this is a zero-risk addition for users who don't opt in.

New env vars:

HINDSIGHT_API_EMBEDDINGS_QUERY_PREFIX
HINDSIGHT_API_EMBEDDINGS_DOC_PREFIX

Set both for asymmetric models, leave empty for symmetric models. Example for Ruri v3 (per its model card):

HINDSIGHT_API_EMBEDDINGS_QUERY_PREFIX=検索クエリ: 
HINDSIGHT_API_EMBEDDINGS_DOC_PREFIX=検索文書:

Design

The Embeddings base class becomes a template method:

Concrete __init__(query_prefix="", doc_prefix="") stored as instance attributes.
Concrete encode(self, texts, purpose: Literal["query", "document"] = "document") applies the configured prefix via _apply_prefix(texts, purpose), then delegates to an abstract _encode_impl(texts).
Each backend renames encode → _encode_impl and accepts query_prefix / doc_prefix kwargs that it forwards via super().__init__(...).

This puts prefix application in exactly one place — structurally impossible to forget for new backends, and zero per-backend code duplication. Default empty prefix is a no-op (the _apply_prefix helper short-circuits when both prefixes are empty), so backends that previously passed encode(texts) directly are byte-identical.

create_embeddings_from_env() reads config.embeddings_query_prefix / config.embeddings_doc_prefix once and plumbs them to every provider branch (local / TEI / OpenAI / openrouter / Cohere / LiteLLM / LiteLLMSDK / Google) via a single prefix_kwargs dict — uniform plumbing so missing one branch can't create a silent "prefix not applied for provider X" bug.

Internal callsites that are actually queries (memory_engine.py:3005 primary recall, :6078 reflect mental-model search) now pass purpose=\"query\" through generate_embeddings_batch. All other call sites remain on the default purpose=\"document\". The async path uses functools.partial to bind purpose before run_in_executor, since the executor API doesn't accept kwargs directly.

Files touched

hindsight_api/config.py: 2 env var bindings + dataclass fields + from_env() wiring (+10 lines)
hindsight_api/engine/embeddings.py: base class template-method + 7 backend renames + factory prefix_kwargs plumbing (~177 lines net of churn)
hindsight_api/engine/retain/embedding_utils.py: thread purpose through generate_embedding / generate_embeddings_batch (+functools.partial for executor)
hindsight_api/engine/retain/embedding_processing.py: thread purpose through the retain facade
hindsight_api/engine/memory_engine.py: tag 2 query callsites with purpose=\"query\"
tests/test_embeddings_prefix.py: 10 new tests

Total: +298 / -84.

Compatibility note

This is the authoritative prefix-injection mechanism for Hindsight-mediated embeddings. If a downstream provider (e.g., a TEI deployment) is independently configured to prepend its own prefix, this would be double-applied. Set Hindsight's env vars to empty in that case. Documented as a comment in config.py.

Validation

10 new tests cover: per-purpose prefix application, default purpose=\"document\", empty-prefix byte-identical pass-through, async executor path threads purpose, config reads env vars + empty defaults.
76 existing embedding-related tests still pass (no regression).
I dogfooded the patch by running an end-to-end embedding migration trial against my own Hindsight deployment (~19k rows, vchord BM25 backend) with cl-nagoya/ruri-v3-70m and ruri-v3-130m, applying the configured prefix to re-embed and query paths. The patch worked transparently; the encoded prefix appears exactly once on both sides. (The migration itself wasn't a quality improvement on my code-mixed / ops-memory corpus, but that's a downstream model-fit question, orthogonal to this PR.)

Test plan

uv run pytest tests/test_embeddings_prefix.py — 10/10 pass
uv run pytest tests/test_embeddings_openai_batch_size.py tests/test_custom_embedding_dimension.py tests/test_gemini_embeddings.py tests/test_litellm_sdk_embeddings.py — 76 pass / 16 skipped (network-dependent), 0 fail
ruff check clean on touched files
ruff format no diffs on touched files
Empty-prefix smoke: LocalSTEmbeddings(model).encode([text]) returns identical vectors with or without the patch (template-method short-circuits)

Adds HINDSIGHT_API_EMBEDDINGS_{QUERY,DOC}_PREFIX env vars and a template-method `Embeddings.encode(texts, purpose)` that applies the configured prefix before delegating to subclass `_encode_impl(texts)`. Empty prefixes (default) preserve byte-identical pre-existing behavior. Asymmetric retrieval models (ruri v3, intfloat/multilingual-e5-*, BAAI/bge-zh-v1.5, Snowflake/snowflake-arctic-embed-*) need distinct query/document prefixes to reach published retrieval scores. Without this, callers either lose retrieval quality silently or wrap the encoder externally — duplicating prefix logic for every deployment. Patch surface: - config.py: 2 new env vars + dataclass fields - embeddings.py: base class gets concrete `encode(texts, purpose)` and `_apply_prefix` helper; all 7 backends (Local, TEI, OpenAI, Cohere, LiteLLM, LiteLLMSDK, Gemini) rename `encode` → `_encode_impl` and accept query_prefix/doc_prefix via super().__init__ - create_embeddings_from_env: prefix kwargs plumbed uniformly to every provider branch (8 incl. openrouter which reuses OpenAIEmbeddings) - embedding_utils.py: `generate_embedding{,s_batch}` thread `purpose` through; batch path uses functools.partial since run_in_executor can't pass kwargs directly - embedding_processing.py: same purpose-threading for the retain facade - memory_engine.py: 2 query callsites (recall + reflect search) tagged `purpose="query"` Tests cover: prefix application per purpose, default purpose="document", empty-prefix byte-identical pass-through, async executor path threads purpose correctly, config reads env vars. 10 new + 76 existing embedding-related tests pass.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(embeddings): provider-common query/document prefix support#1585

feat(embeddings): provider-common query/document prefix support#1585
valda wants to merge 1 commit into
vectorize-io:mainfrom
valda:feat/embeddings-prefix-config

valda commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

valda commented May 11, 2026

Summary

Design

Files touched

Compatibility note

Validation

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant