deepset-ai
diff --git a/‎docs-website/docs/pipeline-components/embedders.mdx‎
Lines changed: 2 additions & 0 deletions b/‎docs-website/docs/pipeline-components/embedders.mdx‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎docs-website/docs/pipeline-components/embedders/vllmdocumentembedder.mdx‎
Lines changed: 175 additions & 0 deletions b/‎docs-website/docs/pipeline-components/embedders/vllmdocumentembedder.mdx‎
Lines changed: 175 additions & 0 deletions
diff --git a/‎docs-website/docs/pipeline-components/embedders/vllmtextembedder.mdx‎
Lines changed: 138 additions & 0 deletions b/‎docs-website/docs/pipeline-components/embedders/vllmtextembedder.mdx‎
Lines changed: 138 additions & 0 deletions
diff --git a/‎docs-website/sidebars.js‎
Lines changed: 2 additions & 0 deletions b/‎docs-website/sidebars.js‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎docs-website/versioned_docs/version-2.28-unstable/pipeline-components/embedders.mdx‎
Lines changed: 2 additions & 0 deletions b/‎docs-website/versioned_docs/version-2.28-unstable/pipeline-components/embedders.mdx‎
Lines changed: 2 additions & 0 deletions
@@ -56,5 +56,7 @@ These are the Embedders available in Haystack:
 | [STACKITDocumentEmbedder](embedders/stackitdocumentembedder.mdx)                                       | Enables document embedding using the STACKIT API.                                                                                                                                                                                           |
 | [VertexAITextEmbedder](embedders/vertexaitextembedder.mdx)                                             | Computes embeddings for text (such as a query) using models through VertexAI Embeddings API. **_This integration will be deprecated soon. We recommend using [GoogleGenAITextEmbedder](embedders/googlegenaitextembedder.mdx) integration instead._** |
 | [VertexAIDocumentEmbedder](embedders/vertexaidocumentembedder.mdx)                                     | Computes embeddings for documents using models through VertexAI Embeddings API. **_This integration will be deprecated soon. We recommend using [GoogleGenAIDocumentEmbedder](embedders/googlegenaidocumentembedder.mdx)  integration instead._**     |
+| [VLLMTextEmbedder](embedders/vllmtextembedder.mdx)                                                     | Computes the embeddings of a string using models served with vLLM.                                                                                                                                                                          |
+| [VLLMDocumentEmbedder](embedders/vllmdocumentembedder.mdx)                                             | Computes the embeddings of a list of documents using models served with vLLM.                                                                                                                                                               |
 | [WatsonxTextEmbedder](embedders/watsonxtextembedder.mdx)                                               | Computes embeddings for text (such as a query) using IBM Watsonx models.                                                                                                                                                                    |
 | [WatsonxDocumentEmbedder](embedders/watsonxdocumentembedder.mdx)                                       | Computes embeddings for documents using IBM Watsonx models.                                                                                                                                                                                 |
@@ -0,0 +1,175 @@
+---
+title: "VLLMDocumentEmbedder"
+id: vllmdocumentembedder
+slug: "/vllmdocumentembedder"
+description: "This component computes the embeddings of a list of documents using models served with vLLM."
+---
+
+# VLLMDocumentEmbedder
+
+This component computes the embeddings of a list of documents using models served with [vLLM](https://docs.vllm.ai/).
+
+<div className="key-value-table">
+
+|  |  |
+| --- | --- |
+| **Most common position in a pipeline** | Before a [`DocumentWriter`](../writers/documentwriter.mdx) in an indexing pipeline         |
+| **Mandatory init variables**           | `model`: The name of the model served by vLLM                                              |
+| **Mandatory run variables**            | `documents`: A list of documents                                                           |
+| **Output variables**                   | `documents`: A list of documents (enriched with embeddings)                                |
+| **API reference**                      | [vLLM](/reference/integrations-vllm)                                                       |
+| **GitHub link**                        | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/vllm      |
+
+</div>
+
+## Overview
+
+[vLLM](https://docs.vllm.ai/) is a high-throughput and memory-efficient inference and serving engine for LLMs. It exposes an OpenAI-compatible HTTP server, which `VLLMDocumentEmbedder` uses to compute embeddings through the Embeddings API.
+
+`VLLMDocumentEmbedder` computes the embeddings of a list of documents and stores the obtained vectors in the `embedding` field of each document. It expects a vLLM server to be running and accessible at the `api_base_url` parameter (by default, `http://localhost:8000/v1`). To embed a string (such as a query), use the [`VLLMTextEmbedder`](vllmtextembedder.mdx).
+
+The vectors computed by this component are necessary to perform embedding retrieval on a collection of documents. At retrieval time, the vector that represents the query is compared with those of the documents to find the most similar or relevant ones.
+
+If the vLLM server was started with `--api-key`, provide the API key through the `VLLM_API_KEY` environment variable or the `api_key` init parameter using Haystack's [Secret](../../concepts/secret-management.mdx) API.
+
+### Compatible models
+
+vLLM supports a range of embedding models. Check the [vLLM pooling models docs](https://docs.vllm.ai/en/stable/models/pooling_models) for the list of supported architectures and models.
+
+### vLLM-specific parameters
+
+You can pass vLLM-specific parameters through the `extra_parameters` dictionary. These are forwarded as `extra_body` to the OpenAI-compatible embeddings endpoint. Use this to pass parameters that are not part of the standard OpenAI Embeddings API, such as `truncate_prompt_tokens` or `truncation_side`. See the [vLLM Embeddings API docs](https://docs.vllm.ai/en/stable/models/pooling_models/embed/#openai-compatible-embeddings-api) for details.
+
+```python
+embedder = VLLMDocumentEmbedder(
+    model="google/embeddinggemma-300m",
+    extra_parameters={"truncate_prompt_tokens": 256, "truncation_side": "right"},
+)
+```
+
+### Matryoshka embeddings
+
+If the model was trained with Matryoshka Representation Learning, you can reduce the dimensionality of the output vector through the `dimensions` parameter. See the [vLLM Matryoshka docs](https://docs.vllm.ai/en/stable/models/pooling_models/embed/#matryoshka-embeddings) for details.
+
+### Batching and failure handling
+
+`VLLMDocumentEmbedder` encodes documents in batches. Use `batch_size` (default `32`) to control how many documents are sent in a single request to the vLLM server, and `progress_bar` to toggle the progress indicator.
+
+By default (`raise_on_failure=False`), failed embedding requests are logged and processing continues with the remaining documents. Set `raise_on_failure=True` to raise an exception instead.
+
+### Instructions
+
+Some embedding models require prepending the document text with an instruction to work better for retrieval. For example, if you use [intfloat/e5-large-v2](https://huggingface.co/intfloat/e5-large-v2), you should prefix your document with the following instruction: "passage:".
+
+This is how it works with `VLLMDocumentEmbedder`:
+
+```python
+instruction = "passage:"
+embedder = VLLMDocumentEmbedder(
+    model="intfloat/e5-large-v2",
+    prefix=instruction,
+)
+```
+
+### Embedding metadata
+
+Documents often come with a set of metadata. If they are distinctive and semantically meaningful, you can embed them along with the text of the document to improve retrieval. Pass the relevant fields through `meta_fields_to_embed`; they are concatenated to the document text using `embedding_separator` (a newline by default):
+
+```python
+from haystack import Document
+from haystack_integrations.components.embedders.vllm import VLLMDocumentEmbedder
+
+doc = Document(content="some text", meta={"title": "relevant title", "page_number": 18})
+
+embedder = VLLMDocumentEmbedder(
+    model="google/embeddinggemma-300m",
+    meta_fields_to_embed=["title"],
+)
+
+docs_with_embeddings = embedder.run(documents=[doc])["documents"]
+```
+
+## Usage
+
+Install the `vllm-haystack` package to use the `VLLMDocumentEmbedder`:
+
+```shell
+pip install vllm-haystack
+```
+
+### Starting the vLLM server
+
+Before using this component, start a vLLM server with an embedding model:
+
+```bash
+vllm serve google/embeddinggemma-300m
+```
+
+For details on server options, see the [vLLM CLI docs](https://docs.vllm.ai/en/stable/cli/serve/).
+
+### On its own
+
+```python
+from haystack import Document
+from haystack_integrations.components.embedders.vllm import VLLMDocumentEmbedder
+
+doc = Document(content="I love pizza!")
+
+document_embedder = VLLMDocumentEmbedder(model="google/embeddinggemma-300m")
+
+result = document_embedder.run([doc])
+print(result["documents"][0].embedding)
+
+## [-0.0215301513671875, 0.01499176025390625, ...]
+```
+
+### In a pipeline
+
+```python
+from haystack import Document, Pipeline
+from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
+from haystack.components.writers import DocumentWriter
+from haystack.document_stores.in_memory import InMemoryDocumentStore
+from haystack.document_stores.types import DuplicatePolicy
+from haystack_integrations.components.embedders.vllm import (
+    VLLMDocumentEmbedder,
+    VLLMTextEmbedder,
+)
+
+document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")
+
+documents = [
+    Document(content="My name is Wolfgang and I live in Berlin"),
+    Document(content="I saw a black horse running"),
+    Document(content="Germany has many big cities"),
+]
+
+document_embedder = VLLMDocumentEmbedder(model="google/embeddinggemma-300m")
+writer = DocumentWriter(document_store=document_store, policy=DuplicatePolicy.OVERWRITE)
+
+indexing_pipeline = Pipeline()
+indexing_pipeline.add_component("document_embedder", document_embedder)
+indexing_pipeline.add_component("writer", writer)
+indexing_pipeline.connect("document_embedder", "writer")
+
+indexing_pipeline.run({"document_embedder": {"documents": documents}})
+
+query_pipeline = Pipeline()
+query_pipeline.add_component(
+    "text_embedder",
+    VLLMTextEmbedder(model="google/embeddinggemma-300m"),
+)
+query_pipeline.add_component(
+    "retriever",
+    InMemoryEmbeddingRetriever(document_store=document_store),
+)
+query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
+
+query = "Who lives in Berlin?"
+
+result = query_pipeline.run({"text_embedder": {"text": query}})
+
+print(result["retriever"]["documents"][0])
+
+## Document(id=..., content: 'My name is Wolfgang and I live in Berlin', score: ...)
+```
@@ -0,0 +1,138 @@
+---
+title: "VLLMTextEmbedder"
+id: vllmtextembedder
+slug: "/vllmtextembedder"
+description: "This component computes the embeddings of a string using models served with vLLM."
+---
+
+# VLLMTextEmbedder
+
+This component computes the embeddings of a string using models served with [vLLM](https://docs.vllm.ai/).
+
+<div className="key-value-table">
+
+|  |  |
+| --- | --- |
+| **Most common position in a pipeline** | Before an embedding [Retriever](../retrievers.mdx) in a query/RAG pipeline                 |
+| **Mandatory init variables**           | `model`: The name of the model served by vLLM                                              |
+| **Mandatory run variables**            | `text`: A string                                                                           |
+| **Output variables**                   | `embedding`: A vector (list of float numbers)                                              |
+| **API reference**                      | [vLLM](/reference/integrations-vllm)                                                       |
+| **GitHub link**                        | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/vllm      |
+
+</div>
+
+## Overview
+
+[vLLM](https://docs.vllm.ai/) is a high-throughput and memory-efficient inference and serving engine for LLMs. It exposes an OpenAI-compatible HTTP server, which `VLLMTextEmbedder` uses to compute embeddings through the Embeddings API.
+
+`VLLMTextEmbedder` expects a vLLM server to be running and accessible at the `api_base_url` parameter (by default, `http://localhost:8000/v1`). Use this component to embed a simple string (such as a query) into a vector. For embedding lists of documents, use the [`VLLMDocumentEmbedder`](vllmdocumentembedder.mdx).
+
+When you perform embedding retrieval, use this component first to transform your query into a vector. Then, the embedding Retriever will use the vector to search for similar or relevant documents.
+
+If the vLLM server was started with `--api-key`, provide the API key through the `VLLM_API_KEY` environment variable or the `api_key` init parameter using Haystack's [Secret](../../concepts/secret-management.mdx) API.
+
+### Compatible models
+
+vLLM supports a range of embedding models. Check the [vLLM pooling models docs](https://docs.vllm.ai/en/stable/models/pooling_models) for the list of supported architectures and models.
+
+### vLLM-specific parameters
+
+You can pass vLLM-specific parameters through the `extra_parameters` dictionary. These are forwarded as `extra_body` to the OpenAI-compatible embeddings endpoint. Use this to pass parameters that are not part of the standard OpenAI Embeddings API, such as `truncate_prompt_tokens` or `truncation_side`. See the [vLLM Embeddings API docs](https://docs.vllm.ai/en/stable/models/pooling_models/embed/#openai-compatible-embeddings-api) for details.
+
+```python
+embedder = VLLMTextEmbedder(
+    model="google/embeddinggemma-300m",
+    extra_parameters={"truncate_prompt_tokens": 256, "truncation_side": "right"},
+)
+```
+
+### Matryoshka embeddings
+
+If the model was trained with Matryoshka Representation Learning, you can reduce the dimensionality of the output vector through the `dimensions` parameter. See the [vLLM Matryoshka docs](https://docs.vllm.ai/en/stable/models/pooling_models/embed/#matryoshka-embeddings) for details.
+
+### Instructions
+
+Some embedding models require prepending the text with an instruction to work better for retrieval. For example, if you use [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5#model-list), you should prefix your query with the following instruction: "Represent this sentence for searching relevant passages:".
+
+This is how it works with `VLLMTextEmbedder`:
+
+```python
+instruction = "Represent this sentence for searching relevant passages:"
+embedder = VLLMTextEmbedder(
+    model="BAAI/bge-large-en-v1.5",
+    prefix=instruction,
+)
+```
+
+## Usage
+
+Install the `vllm-haystack` package to use the `VLLMTextEmbedder`:
+
+```shell
+pip install vllm-haystack
+```
+
+### Starting the vLLM server
+
+Before using this component, start a vLLM server with an embedding model:
+
+```bash
+vllm serve google/embeddinggemma-300m
+```
+
+For details on server options, see the [vLLM CLI docs](https://docs.vllm.ai/en/stable/cli/serve/).
+
+### On its own
+
+```python
+from haystack_integrations.components.embedders.vllm import VLLMTextEmbedder
+
+text_embedder = VLLMTextEmbedder(model="google/embeddinggemma-300m")
+print(text_embedder.run("I love pizza!"))
+
+## {'embedding': [-0.0215301513671875, 0.01499176025390625, ...], 'meta': {...}}
+```
+
+### In a pipeline
+
+```python
+from haystack import Document, Pipeline
+from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
+from haystack.document_stores.in_memory import InMemoryDocumentStore
+from haystack_integrations.components.embedders.vllm import (
+    VLLMDocumentEmbedder,
+    VLLMTextEmbedder,
+)
+
+document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")
+
+documents = [
+    Document(content="My name is Wolfgang and I live in Berlin"),
+    Document(content="I saw a black horse running"),
+    Document(content="Germany has many big cities"),
+]
+
+document_embedder = VLLMDocumentEmbedder(model="google/embeddinggemma-300m")
+documents_with_embeddings = document_embedder.run(documents)["documents"]
+document_store.write_documents(documents_with_embeddings)
+
+query_pipeline = Pipeline()
+query_pipeline.add_component(
+    "text_embedder",
+    VLLMTextEmbedder(model="google/embeddinggemma-300m"),
+)
+query_pipeline.add_component(
+    "retriever",
+    InMemoryEmbeddingRetriever(document_store=document_store),
+)
+query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
+
+query = "Who lives in Berlin?"
+
+result = query_pipeline.run({"text_embedder": {"text": query}})
+
+print(result["retriever"]["documents"][0])
+
+## Document(id=..., content: 'My name is Wolfgang and I live in Berlin', score: ...)
+```
@@ -312,6 +312,8 @@ export default {
             'pipeline-components/embedders/stackittextembedder',
             'pipeline-components/embedders/vertexaidocumentembedder',
             'pipeline-components/embedders/vertexaitextembedder',
+            'pipeline-components/embedders/vllmdocumentembedder',
+            'pipeline-components/embedders/vllmtextembedder',
             'pipeline-components/embedders/watsonxdocumentembedder',
             'pipeline-components/embedders/watsonxtextembedder',
             'pipeline-components/embedders/external-integrations-embedders',
 
@@ -56,5 +56,7 @@ These are the Embedders available in Haystack:
 | [STACKITDocumentEmbedder](embedders/stackitdocumentembedder.mdx)                                       | Enables document embedding using the STACKIT API.                                                                                                                                                                                           |
 | [VertexAITextEmbedder](embedders/vertexaitextembedder.mdx)                                             | Computes embeddings for text (such as a query) using models through VertexAI Embeddings API. **_This integration will be deprecated soon. We recommend using [GoogleGenAITextEmbedder](embedders/googlegenaitextembedder.mdx) integration instead._** |
 | [VertexAIDocumentEmbedder](embedders/vertexaidocumentembedder.mdx)                                     | Computes embeddings for documents using models through VertexAI Embeddings API. **_This integration will be deprecated soon. We recommend using [GoogleGenAIDocumentEmbedder](embedders/googlegenaidocumentembedder.mdx)  integration instead._**     |
+| [VLLMTextEmbedder](embedders/vllmtextembedder.mdx)                                                     | Computes the embeddings of a string using models served with vLLM.                                                                                                                                                                          |
+| [VLLMDocumentEmbedder](embedders/vllmdocumentembedder.mdx)                                             | Computes the embeddings of a list of documents using models served with vLLM.                                                                                                                                                               |
 | [WatsonxTextEmbedder](embedders/watsonxtextembedder.mdx)                                               | Computes embeddings for text (such as a query) using IBM Watsonx models.                                                                                                                                                                    |
 | [WatsonxDocumentEmbedder](embedders/watsonxdocumentembedder.mdx)                                       | Computes embeddings for documents using IBM Watsonx models.                                                                                                                                                                                 |