Add dimensions parameter to OllamaDocumentEmbedder and OllamaTextEmbedder
Is your feature request related to a problem? Please describe.
The Ollama SDK (ollama-python >= 0.6.2) exposes a dimensions parameter on client.embed(...) that allows server-side embedding truncation via Matryoshka Representation Learning (MRL). Models such as qwen3-embedding, nomic-embed-text-v1.5, and mxbai-embed-large natively support reduced dimensions — useful for reducing vector store footprint, similarity search latency, and memory usage in HNSW indexes.
Currently, OllamaDocumentEmbedder and OllamaTextEmbedder in Haystack do not expose this parameter. Passing it through generation_kwargs (options=...) does not work, because the Ollama SDK treats dimensions as a top-level argument of the request payload, not as part of options. As a result, users always receive the full-dimension vector and must truncate + re-normalize client-side, which wastes bandwidth and adds redundant boilerplate everywhere the embedder is used.
Describe the solution you'd like
Add an optional dimensions: int | None = None parameter to OllamaDocumentEmbedder.__init__ (and OllamaTextEmbedder for API symmetry), forwarded to self._client.embed(...):
class OllamaDocumentEmbedder:
def __init__(
self,
model: str = "nomic-embed-text",
url: str = "http://localhost:11434",
generation_kwargs: dict[str, Any] | None = None,
timeout: int = 120,
keep_alive: str | int | None = None,
dimensions: int | None = None, # NEW
# ... other args
):
...
self.dimensions = dimensions
def _embed_batch(
self,
texts_to_embed: list[str],
batch_size: int,
generation_kwargs: dict[str, Any] | None = None,
) -> list[list[float]]:
all_embeddings = []
for i in tqdm(
range(0, len(texts_to_embed), batch_size),
disable=not self.progress_bar,
desc="Calculating embeddings",
):
batch = texts_to_embed[i : i + batch_size]
result = self._client.embed(
model=self.model,
input=batch,
options=generation_kwargs,
dimensions=self.dimensions, # HERE
keep_alive=self.keep_alive,
)
all_embeddings.extend(result["embeddings"])
return all_embeddings
Implementation notes
dimensions=None preserves current behavior (full-dim) → fully backward compatible.
- Include
dimensions in to_dict() / from_dict() for pipeline serialization.
- Apply the same change to
OllamaTextEmbedder for API parity.
- Optional validation: warn when
dimensions is set but the model does not support MRL (the Ollama server already returns an error in that case, so this may be unnecessary).
Describe alternatives you've considered
- Truncate + re-normalize client-side: works, but wastes bandwidth (full vector is still transferred) and adds boilerplate to every consumer of the embedder.
- Pass via
generation_kwargs / options: does not work — dimensions is not a field of Ollama's options; it is a top-level field of the /api/embed request payload.
- Subclass the embedder: works as a local workaround but becomes tech debt — this is a first-class parameter of the upstream SDK and belongs on the Haystack component.
Additional context
- Ollama SDK reference:
ollama-python >= 0.6.2, embed() accepts dimensions: Optional[int].
- Ollama server: supports
dimensions on the /api/embed endpoint (MRL truncation).
- Real-world use case: running
qwen3-embedding-0.6b locally via Ollama in a hybrid retrieval pipeline (dense + SPLADE). Truncating from 1024 → 512 dims roughly halves HNSW index size with minimal recall loss when a cross-encoder re-ranker is in place.
Add
dimensionsparameter toOllamaDocumentEmbedderandOllamaTextEmbedderIs your feature request related to a problem? Please describe.
The Ollama SDK (
ollama-python >= 0.6.2) exposes adimensionsparameter onclient.embed(...)that allows server-side embedding truncation via Matryoshka Representation Learning (MRL). Models such asqwen3-embedding,nomic-embed-text-v1.5, andmxbai-embed-largenatively support reduced dimensions — useful for reducing vector store footprint, similarity search latency, and memory usage in HNSW indexes.Currently,
OllamaDocumentEmbedderandOllamaTextEmbedderin Haystack do not expose this parameter. Passing it throughgeneration_kwargs(options=...) does not work, because the Ollama SDK treatsdimensionsas a top-level argument of the request payload, not as part ofoptions. As a result, users always receive the full-dimension vector and must truncate + re-normalize client-side, which wastes bandwidth and adds redundant boilerplate everywhere the embedder is used.Describe the solution you'd like
Add an optional
dimensions: int | None = Noneparameter toOllamaDocumentEmbedder.__init__(andOllamaTextEmbedderfor API symmetry), forwarded toself._client.embed(...):Implementation notes
dimensions=Nonepreserves current behavior (full-dim) → fully backward compatible.dimensionsinto_dict()/from_dict()for pipeline serialization.OllamaTextEmbedderfor API parity.dimensionsis set but the model does not support MRL (the Ollama server already returns an error in that case, so this may be unnecessary).Describe alternatives you've considered
generation_kwargs/options: does not work —dimensionsis not a field of Ollama'soptions; it is a top-level field of the/api/embedrequest payload.Additional context
ollama-python >= 0.6.2,embed()acceptsdimensions: Optional[int].dimensionson the/api/embedendpoint (MRL truncation).qwen3-embedding-0.6blocally via Ollama in a hybrid retrieval pipeline (dense + SPLADE). Truncating from 1024 → 512 dims roughly halves HNSW index size with minimal recall loss when a cross-encoder re-ranker is in place.