feat(embeddings): provider-common query/document prefix support#1585
Open
valda wants to merge 1 commit into
Open
feat(embeddings): provider-common query/document prefix support#1585valda wants to merge 1 commit into
valda wants to merge 1 commit into
Conversation
Adds HINDSIGHT_API_EMBEDDINGS_{QUERY,DOC}_PREFIX env vars and a
template-method `Embeddings.encode(texts, purpose)` that applies the
configured prefix before delegating to subclass `_encode_impl(texts)`.
Empty prefixes (default) preserve byte-identical pre-existing behavior.
Asymmetric retrieval models (ruri v3, intfloat/multilingual-e5-*,
BAAI/bge-zh-v1.5, Snowflake/snowflake-arctic-embed-*) need distinct
query/document prefixes to reach published retrieval scores. Without
this, callers either lose retrieval quality silently or wrap the encoder
externally — duplicating prefix logic for every deployment.
Patch surface:
- config.py: 2 new env vars + dataclass fields
- embeddings.py: base class gets concrete `encode(texts, purpose)` and
`_apply_prefix` helper; all 7 backends (Local, TEI, OpenAI, Cohere,
LiteLLM, LiteLLMSDK, Gemini) rename `encode` → `_encode_impl` and
accept query_prefix/doc_prefix via super().__init__
- create_embeddings_from_env: prefix kwargs plumbed uniformly to every
provider branch (8 incl. openrouter which reuses OpenAIEmbeddings)
- embedding_utils.py: `generate_embedding{,s_batch}` thread `purpose`
through; batch path uses functools.partial since run_in_executor
can't pass kwargs directly
- embedding_processing.py: same purpose-threading for the retain facade
- memory_engine.py: 2 query callsites (recall + reflect search) tagged
`purpose="query"`
Tests cover: prefix application per purpose, default purpose="document",
empty-prefix byte-identical pass-through, async executor path threads
purpose correctly, config reads env vars. 10 new + 76 existing
embedding-related tests pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds two new environment variables and a thin template-method layer so every embedding backend can prepend distinct prefixes to query-side vs document-side texts. Required by asymmetric retrieval models (Ruri v3, intfloat/multilingual-e5-, BAAI/bge-large-zh-v1.5, Snowflake/snowflake-arctic-embed-, etc.) to reach published retrieval scores. Empty prefixes (default) preserve byte-identical pre-existing behavior — this is a zero-risk addition for users who don't opt in.
New env vars:
HINDSIGHT_API_EMBEDDINGS_QUERY_PREFIXHINDSIGHT_API_EMBEDDINGS_DOC_PREFIXSet both for asymmetric models, leave empty for symmetric models. Example for Ruri v3 (per its model card):
Design
The
Embeddingsbase class becomes a template method:__init__(query_prefix="", doc_prefix="")stored as instance attributes.encode(self, texts, purpose: Literal["query", "document"] = "document")applies the configured prefix via_apply_prefix(texts, purpose), then delegates to an abstract_encode_impl(texts).encode→_encode_impland acceptsquery_prefix/doc_prefixkwargs that it forwards viasuper().__init__(...).This puts prefix application in exactly one place — structurally impossible to forget for new backends, and zero per-backend code duplication. Default empty prefix is a no-op (the
_apply_prefixhelper short-circuits when both prefixes are empty), so backends that previously passedencode(texts)directly are byte-identical.create_embeddings_from_env()readsconfig.embeddings_query_prefix/config.embeddings_doc_prefixonce and plumbs them to every provider branch (local / TEI / OpenAI / openrouter / Cohere / LiteLLM / LiteLLMSDK / Google) via a singleprefix_kwargsdict — uniform plumbing so missing one branch can't create a silent "prefix not applied for provider X" bug.Internal callsites that are actually queries (
memory_engine.py:3005primary recall,:6078reflect mental-model search) now passpurpose=\"query\"throughgenerate_embeddings_batch. All other call sites remain on the defaultpurpose=\"document\". The async path usesfunctools.partialto bindpurposebeforerun_in_executor, since the executor API doesn't accept kwargs directly.Files touched
hindsight_api/config.py: 2 env var bindings + dataclass fields +from_env()wiring (+10 lines)hindsight_api/engine/embeddings.py: base class template-method + 7 backend renames + factoryprefix_kwargsplumbing (~177 lines net of churn)hindsight_api/engine/retain/embedding_utils.py: threadpurposethroughgenerate_embedding/generate_embeddings_batch(+functools.partial for executor)hindsight_api/engine/retain/embedding_processing.py: threadpurposethrough the retain facadehindsight_api/engine/memory_engine.py: tag 2 query callsites withpurpose=\"query\"tests/test_embeddings_prefix.py: 10 new testsTotal: +298 / -84.
Compatibility note
This is the authoritative prefix-injection mechanism for Hindsight-mediated embeddings. If a downstream provider (e.g., a TEI deployment) is independently configured to prepend its own prefix, this would be double-applied. Set Hindsight's env vars to empty in that case. Documented as a comment in
config.py.Validation
purpose=\"document\", empty-prefix byte-identical pass-through, async executor path threads purpose, config reads env vars + empty defaults.Test plan
uv run pytest tests/test_embeddings_prefix.py— 10/10 passuv run pytest tests/test_embeddings_openai_batch_size.py tests/test_custom_embedding_dimension.py tests/test_gemini_embeddings.py tests/test_litellm_sdk_embeddings.py— 76 pass / 16 skipped (network-dependent), 0 failruff checkclean on touched filesruff formatno diffs on touched filesLocalSTEmbeddings(model).encode([text])returns identical vectors with or without the patch (template-method short-circuits)