Skip to content

feat(embeddings): provider-common query/document prefix support#1585

Open
valda wants to merge 1 commit into
vectorize-io:mainfrom
valda:feat/embeddings-prefix-config
Open

feat(embeddings): provider-common query/document prefix support#1585
valda wants to merge 1 commit into
vectorize-io:mainfrom
valda:feat/embeddings-prefix-config

Conversation

@valda
Copy link
Copy Markdown

@valda valda commented May 11, 2026

Summary

Adds two new environment variables and a thin template-method layer so every embedding backend can prepend distinct prefixes to query-side vs document-side texts. Required by asymmetric retrieval models (Ruri v3, intfloat/multilingual-e5-, BAAI/bge-large-zh-v1.5, Snowflake/snowflake-arctic-embed-, etc.) to reach published retrieval scores. Empty prefixes (default) preserve byte-identical pre-existing behavior — this is a zero-risk addition for users who don't opt in.

New env vars:

  • HINDSIGHT_API_EMBEDDINGS_QUERY_PREFIX
  • HINDSIGHT_API_EMBEDDINGS_DOC_PREFIX

Set both for asymmetric models, leave empty for symmetric models. Example for Ruri v3 (per its model card):

HINDSIGHT_API_EMBEDDINGS_QUERY_PREFIX=検索クエリ: 
HINDSIGHT_API_EMBEDDINGS_DOC_PREFIX=検索文書: 

Design

The Embeddings base class becomes a template method:

  • Concrete __init__(query_prefix="", doc_prefix="") stored as instance attributes.
  • Concrete encode(self, texts, purpose: Literal["query", "document"] = "document") applies the configured prefix via _apply_prefix(texts, purpose), then delegates to an abstract _encode_impl(texts).
  • Each backend renames encode_encode_impl and accepts query_prefix / doc_prefix kwargs that it forwards via super().__init__(...).

This puts prefix application in exactly one place — structurally impossible to forget for new backends, and zero per-backend code duplication. Default empty prefix is a no-op (the _apply_prefix helper short-circuits when both prefixes are empty), so backends that previously passed encode(texts) directly are byte-identical.

create_embeddings_from_env() reads config.embeddings_query_prefix / config.embeddings_doc_prefix once and plumbs them to every provider branch (local / TEI / OpenAI / openrouter / Cohere / LiteLLM / LiteLLMSDK / Google) via a single prefix_kwargs dict — uniform plumbing so missing one branch can't create a silent "prefix not applied for provider X" bug.

Internal callsites that are actually queries (memory_engine.py:3005 primary recall, :6078 reflect mental-model search) now pass purpose=\"query\" through generate_embeddings_batch. All other call sites remain on the default purpose=\"document\". The async path uses functools.partial to bind purpose before run_in_executor, since the executor API doesn't accept kwargs directly.

Files touched

  • hindsight_api/config.py: 2 env var bindings + dataclass fields + from_env() wiring (+10 lines)
  • hindsight_api/engine/embeddings.py: base class template-method + 7 backend renames + factory prefix_kwargs plumbing (~177 lines net of churn)
  • hindsight_api/engine/retain/embedding_utils.py: thread purpose through generate_embedding / generate_embeddings_batch (+functools.partial for executor)
  • hindsight_api/engine/retain/embedding_processing.py: thread purpose through the retain facade
  • hindsight_api/engine/memory_engine.py: tag 2 query callsites with purpose=\"query\"
  • tests/test_embeddings_prefix.py: 10 new tests

Total: +298 / -84.

Compatibility note

This is the authoritative prefix-injection mechanism for Hindsight-mediated embeddings. If a downstream provider (e.g., a TEI deployment) is independently configured to prepend its own prefix, this would be double-applied. Set Hindsight's env vars to empty in that case. Documented as a comment in config.py.

Validation

  • 10 new tests cover: per-purpose prefix application, default purpose=\"document\", empty-prefix byte-identical pass-through, async executor path threads purpose, config reads env vars + empty defaults.
  • 76 existing embedding-related tests still pass (no regression).
  • I dogfooded the patch by running an end-to-end embedding migration trial against my own Hindsight deployment (~19k rows, vchord BM25 backend) with cl-nagoya/ruri-v3-70m and ruri-v3-130m, applying the configured prefix to re-embed and query paths. The patch worked transparently; the encoded prefix appears exactly once on both sides. (The migration itself wasn't a quality improvement on my code-mixed / ops-memory corpus, but that's a downstream model-fit question, orthogonal to this PR.)

Test plan

  • uv run pytest tests/test_embeddings_prefix.py — 10/10 pass
  • uv run pytest tests/test_embeddings_openai_batch_size.py tests/test_custom_embedding_dimension.py tests/test_gemini_embeddings.py tests/test_litellm_sdk_embeddings.py — 76 pass / 16 skipped (network-dependent), 0 fail
  • ruff check clean on touched files
  • ruff format no diffs on touched files
  • Empty-prefix smoke: LocalSTEmbeddings(model).encode([text]) returns identical vectors with or without the patch (template-method short-circuits)

Adds HINDSIGHT_API_EMBEDDINGS_{QUERY,DOC}_PREFIX env vars and a
template-method `Embeddings.encode(texts, purpose)` that applies the
configured prefix before delegating to subclass `_encode_impl(texts)`.
Empty prefixes (default) preserve byte-identical pre-existing behavior.

Asymmetric retrieval models (ruri v3, intfloat/multilingual-e5-*,
BAAI/bge-zh-v1.5, Snowflake/snowflake-arctic-embed-*) need distinct
query/document prefixes to reach published retrieval scores. Without
this, callers either lose retrieval quality silently or wrap the encoder
externally — duplicating prefix logic for every deployment.

Patch surface:
- config.py: 2 new env vars + dataclass fields
- embeddings.py: base class gets concrete `encode(texts, purpose)` and
  `_apply_prefix` helper; all 7 backends (Local, TEI, OpenAI, Cohere,
  LiteLLM, LiteLLMSDK, Gemini) rename `encode` → `_encode_impl` and
  accept query_prefix/doc_prefix via super().__init__
- create_embeddings_from_env: prefix kwargs plumbed uniformly to every
  provider branch (8 incl. openrouter which reuses OpenAIEmbeddings)
- embedding_utils.py: `generate_embedding{,s_batch}` thread `purpose`
  through; batch path uses functools.partial since run_in_executor
  can't pass kwargs directly
- embedding_processing.py: same purpose-threading for the retain facade
- memory_engine.py: 2 query callsites (recall + reflect search) tagged
  `purpose="query"`

Tests cover: prefix application per purpose, default purpose="document",
empty-prefix byte-identical pass-through, async executor path threads
purpose correctly, config reads env vars. 10 new + 76 existing
embedding-related tests pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant