Skip to content

Commit 1598fc3

Browse files
authored
docs: add vLLM embedders (#11151)
1 parent 3d0a2fb commit 1598fc3

8 files changed

Lines changed: 635 additions & 1 deletion

File tree

docs-website/docs/pipeline-components/embedders.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,5 +56,7 @@ These are the Embedders available in Haystack:
5656
| [STACKITDocumentEmbedder](embedders/stackitdocumentembedder.mdx) | Enables document embedding using the STACKIT API. |
5757
| [VertexAITextEmbedder](embedders/vertexaitextembedder.mdx) | Computes embeddings for text (such as a query) using models through VertexAI Embeddings API. **_This integration will be deprecated soon. We recommend using [GoogleGenAITextEmbedder](embedders/googlegenaitextembedder.mdx) integration instead._** |
5858
| [VertexAIDocumentEmbedder](embedders/vertexaidocumentembedder.mdx) | Computes embeddings for documents using models through VertexAI Embeddings API. **_This integration will be deprecated soon. We recommend using [GoogleGenAIDocumentEmbedder](embedders/googlegenaidocumentembedder.mdx) integration instead._** |
59+
| [VLLMTextEmbedder](embedders/vllmtextembedder.mdx) | Computes the embeddings of a string using models served with vLLM. |
60+
| [VLLMDocumentEmbedder](embedders/vllmdocumentembedder.mdx) | Computes the embeddings of a list of documents using models served with vLLM. |
5961
| [WatsonxTextEmbedder](embedders/watsonxtextembedder.mdx) | Computes embeddings for text (such as a query) using IBM Watsonx models. |
6062
| [WatsonxDocumentEmbedder](embedders/watsonxdocumentembedder.mdx) | Computes embeddings for documents using IBM Watsonx models. |
Lines changed: 175 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,175 @@
1+
---
2+
title: "VLLMDocumentEmbedder"
3+
id: vllmdocumentembedder
4+
slug: "/vllmdocumentembedder"
5+
description: "This component computes the embeddings of a list of documents using models served with vLLM."
6+
---
7+
8+
# VLLMDocumentEmbedder
9+
10+
This component computes the embeddings of a list of documents using models served with [vLLM](https://docs.vllm.ai/).
11+
12+
<div className="key-value-table">
13+
14+
| | |
15+
| --- | --- |
16+
| **Most common position in a pipeline** | Before a [`DocumentWriter`](../writers/documentwriter.mdx) in an indexing pipeline |
17+
| **Mandatory init variables** | `model`: The name of the model served by vLLM |
18+
| **Mandatory run variables** | `documents`: A list of documents |
19+
| **Output variables** | `documents`: A list of documents (enriched with embeddings) |
20+
| **API reference** | [vLLM](/reference/integrations-vllm) |
21+
| **GitHub link** | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/vllm |
22+
23+
</div>
24+
25+
## Overview
26+
27+
[vLLM](https://docs.vllm.ai/) is a high-throughput and memory-efficient inference and serving engine for LLMs. It exposes an OpenAI-compatible HTTP server, which `VLLMDocumentEmbedder` uses to compute embeddings through the Embeddings API.
28+
29+
`VLLMDocumentEmbedder` computes the embeddings of a list of documents and stores the obtained vectors in the `embedding` field of each document. It expects a vLLM server to be running and accessible at the `api_base_url` parameter (by default, `http://localhost:8000/v1`). To embed a string (such as a query), use the [`VLLMTextEmbedder`](vllmtextembedder.mdx).
30+
31+
The vectors computed by this component are necessary to perform embedding retrieval on a collection of documents. At retrieval time, the vector that represents the query is compared with those of the documents to find the most similar or relevant ones.
32+
33+
If the vLLM server was started with `--api-key`, provide the API key through the `VLLM_API_KEY` environment variable or the `api_key` init parameter using Haystack's [Secret](../../concepts/secret-management.mdx) API.
34+
35+
### Compatible models
36+
37+
vLLM supports a range of embedding models. Check the [vLLM pooling models docs](https://docs.vllm.ai/en/stable/models/pooling_models) for the list of supported architectures and models.
38+
39+
### vLLM-specific parameters
40+
41+
You can pass vLLM-specific parameters through the `extra_parameters` dictionary. These are forwarded as `extra_body` to the OpenAI-compatible embeddings endpoint. Use this to pass parameters that are not part of the standard OpenAI Embeddings API, such as `truncate_prompt_tokens` or `truncation_side`. See the [vLLM Embeddings API docs](https://docs.vllm.ai/en/stable/models/pooling_models/embed/#openai-compatible-embeddings-api) for details.
42+
43+
```python
44+
embedder = VLLMDocumentEmbedder(
45+
model="google/embeddinggemma-300m",
46+
extra_parameters={"truncate_prompt_tokens": 256, "truncation_side": "right"},
47+
)
48+
```
49+
50+
### Matryoshka embeddings
51+
52+
If the model was trained with Matryoshka Representation Learning, you can reduce the dimensionality of the output vector through the `dimensions` parameter. See the [vLLM Matryoshka docs](https://docs.vllm.ai/en/stable/models/pooling_models/embed/#matryoshka-embeddings) for details.
53+
54+
### Batching and failure handling
55+
56+
`VLLMDocumentEmbedder` encodes documents in batches. Use `batch_size` (default `32`) to control how many documents are sent in a single request to the vLLM server, and `progress_bar` to toggle the progress indicator.
57+
58+
By default (`raise_on_failure=False`), failed embedding requests are logged and processing continues with the remaining documents. Set `raise_on_failure=True` to raise an exception instead.
59+
60+
### Instructions
61+
62+
Some embedding models require prepending the document text with an instruction to work better for retrieval. For example, if you use [intfloat/e5-large-v2](https://huggingface.co/intfloat/e5-large-v2), you should prefix your document with the following instruction: "passage:".
63+
64+
This is how it works with `VLLMDocumentEmbedder`:
65+
66+
```python
67+
instruction = "passage:"
68+
embedder = VLLMDocumentEmbedder(
69+
model="intfloat/e5-large-v2",
70+
prefix=instruction,
71+
)
72+
```
73+
74+
### Embedding metadata
75+
76+
Documents often come with a set of metadata. If they are distinctive and semantically meaningful, you can embed them along with the text of the document to improve retrieval. Pass the relevant fields through `meta_fields_to_embed`; they are concatenated to the document text using `embedding_separator` (a newline by default):
77+
78+
```python
79+
from haystack import Document
80+
from haystack_integrations.components.embedders.vllm import VLLMDocumentEmbedder
81+
82+
doc = Document(content="some text", meta={"title": "relevant title", "page_number": 18})
83+
84+
embedder = VLLMDocumentEmbedder(
85+
model="google/embeddinggemma-300m",
86+
meta_fields_to_embed=["title"],
87+
)
88+
89+
docs_with_embeddings = embedder.run(documents=[doc])["documents"]
90+
```
91+
92+
## Usage
93+
94+
Install the `vllm-haystack` package to use the `VLLMDocumentEmbedder`:
95+
96+
```shell
97+
pip install vllm-haystack
98+
```
99+
100+
### Starting the vLLM server
101+
102+
Before using this component, start a vLLM server with an embedding model:
103+
104+
```bash
105+
vllm serve google/embeddinggemma-300m
106+
```
107+
108+
For details on server options, see the [vLLM CLI docs](https://docs.vllm.ai/en/stable/cli/serve/).
109+
110+
### On its own
111+
112+
```python
113+
from haystack import Document
114+
from haystack_integrations.components.embedders.vllm import VLLMDocumentEmbedder
115+
116+
doc = Document(content="I love pizza!")
117+
118+
document_embedder = VLLMDocumentEmbedder(model="google/embeddinggemma-300m")
119+
120+
result = document_embedder.run([doc])
121+
print(result["documents"][0].embedding)
122+
123+
## [-0.0215301513671875, 0.01499176025390625, ...]
124+
```
125+
126+
### In a pipeline
127+
128+
```python
129+
from haystack import Document, Pipeline
130+
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
131+
from haystack.components.writers import DocumentWriter
132+
from haystack.document_stores.in_memory import InMemoryDocumentStore
133+
from haystack.document_stores.types import DuplicatePolicy
134+
from haystack_integrations.components.embedders.vllm import (
135+
VLLMDocumentEmbedder,
136+
VLLMTextEmbedder,
137+
)
138+
139+
document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")
140+
141+
documents = [
142+
Document(content="My name is Wolfgang and I live in Berlin"),
143+
Document(content="I saw a black horse running"),
144+
Document(content="Germany has many big cities"),
145+
]
146+
147+
document_embedder = VLLMDocumentEmbedder(model="google/embeddinggemma-300m")
148+
writer = DocumentWriter(document_store=document_store, policy=DuplicatePolicy.OVERWRITE)
149+
150+
indexing_pipeline = Pipeline()
151+
indexing_pipeline.add_component("document_embedder", document_embedder)
152+
indexing_pipeline.add_component("writer", writer)
153+
indexing_pipeline.connect("document_embedder", "writer")
154+
155+
indexing_pipeline.run({"document_embedder": {"documents": documents}})
156+
157+
query_pipeline = Pipeline()
158+
query_pipeline.add_component(
159+
"text_embedder",
160+
VLLMTextEmbedder(model="google/embeddinggemma-300m"),
161+
)
162+
query_pipeline.add_component(
163+
"retriever",
164+
InMemoryEmbeddingRetriever(document_store=document_store),
165+
)
166+
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
167+
168+
query = "Who lives in Berlin?"
169+
170+
result = query_pipeline.run({"text_embedder": {"text": query}})
171+
172+
print(result["retriever"]["documents"][0])
173+
174+
## Document(id=..., content: 'My name is Wolfgang and I live in Berlin', score: ...)
175+
```
Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
---
2+
title: "VLLMTextEmbedder"
3+
id: vllmtextembedder
4+
slug: "/vllmtextembedder"
5+
description: "This component computes the embeddings of a string using models served with vLLM."
6+
---
7+
8+
# VLLMTextEmbedder
9+
10+
This component computes the embeddings of a string using models served with [vLLM](https://docs.vllm.ai/).
11+
12+
<div className="key-value-table">
13+
14+
| | |
15+
| --- | --- |
16+
| **Most common position in a pipeline** | Before an embedding [Retriever](../retrievers.mdx) in a query/RAG pipeline |
17+
| **Mandatory init variables** | `model`: The name of the model served by vLLM |
18+
| **Mandatory run variables** | `text`: A string |
19+
| **Output variables** | `embedding`: A vector (list of float numbers) |
20+
| **API reference** | [vLLM](/reference/integrations-vllm) |
21+
| **GitHub link** | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/vllm |
22+
23+
</div>
24+
25+
## Overview
26+
27+
[vLLM](https://docs.vllm.ai/) is a high-throughput and memory-efficient inference and serving engine for LLMs. It exposes an OpenAI-compatible HTTP server, which `VLLMTextEmbedder` uses to compute embeddings through the Embeddings API.
28+
29+
`VLLMTextEmbedder` expects a vLLM server to be running and accessible at the `api_base_url` parameter (by default, `http://localhost:8000/v1`). Use this component to embed a simple string (such as a query) into a vector. For embedding lists of documents, use the [`VLLMDocumentEmbedder`](vllmdocumentembedder.mdx).
30+
31+
When you perform embedding retrieval, use this component first to transform your query into a vector. Then, the embedding Retriever will use the vector to search for similar or relevant documents.
32+
33+
If the vLLM server was started with `--api-key`, provide the API key through the `VLLM_API_KEY` environment variable or the `api_key` init parameter using Haystack's [Secret](../../concepts/secret-management.mdx) API.
34+
35+
### Compatible models
36+
37+
vLLM supports a range of embedding models. Check the [vLLM pooling models docs](https://docs.vllm.ai/en/stable/models/pooling_models) for the list of supported architectures and models.
38+
39+
### vLLM-specific parameters
40+
41+
You can pass vLLM-specific parameters through the `extra_parameters` dictionary. These are forwarded as `extra_body` to the OpenAI-compatible embeddings endpoint. Use this to pass parameters that are not part of the standard OpenAI Embeddings API, such as `truncate_prompt_tokens` or `truncation_side`. See the [vLLM Embeddings API docs](https://docs.vllm.ai/en/stable/models/pooling_models/embed/#openai-compatible-embeddings-api) for details.
42+
43+
```python
44+
embedder = VLLMTextEmbedder(
45+
model="google/embeddinggemma-300m",
46+
extra_parameters={"truncate_prompt_tokens": 256, "truncation_side": "right"},
47+
)
48+
```
49+
50+
### Matryoshka embeddings
51+
52+
If the model was trained with Matryoshka Representation Learning, you can reduce the dimensionality of the output vector through the `dimensions` parameter. See the [vLLM Matryoshka docs](https://docs.vllm.ai/en/stable/models/pooling_models/embed/#matryoshka-embeddings) for details.
53+
54+
### Instructions
55+
56+
Some embedding models require prepending the text with an instruction to work better for retrieval. For example, if you use [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5#model-list), you should prefix your query with the following instruction: "Represent this sentence for searching relevant passages:".
57+
58+
This is how it works with `VLLMTextEmbedder`:
59+
60+
```python
61+
instruction = "Represent this sentence for searching relevant passages:"
62+
embedder = VLLMTextEmbedder(
63+
model="BAAI/bge-large-en-v1.5",
64+
prefix=instruction,
65+
)
66+
```
67+
68+
## Usage
69+
70+
Install the `vllm-haystack` package to use the `VLLMTextEmbedder`:
71+
72+
```shell
73+
pip install vllm-haystack
74+
```
75+
76+
### Starting the vLLM server
77+
78+
Before using this component, start a vLLM server with an embedding model:
79+
80+
```bash
81+
vllm serve google/embeddinggemma-300m
82+
```
83+
84+
For details on server options, see the [vLLM CLI docs](https://docs.vllm.ai/en/stable/cli/serve/).
85+
86+
### On its own
87+
88+
```python
89+
from haystack_integrations.components.embedders.vllm import VLLMTextEmbedder
90+
91+
text_embedder = VLLMTextEmbedder(model="google/embeddinggemma-300m")
92+
print(text_embedder.run("I love pizza!"))
93+
94+
## {'embedding': [-0.0215301513671875, 0.01499176025390625, ...], 'meta': {...}}
95+
```
96+
97+
### In a pipeline
98+
99+
```python
100+
from haystack import Document, Pipeline
101+
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
102+
from haystack.document_stores.in_memory import InMemoryDocumentStore
103+
from haystack_integrations.components.embedders.vllm import (
104+
VLLMDocumentEmbedder,
105+
VLLMTextEmbedder,
106+
)
107+
108+
document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")
109+
110+
documents = [
111+
Document(content="My name is Wolfgang and I live in Berlin"),
112+
Document(content="I saw a black horse running"),
113+
Document(content="Germany has many big cities"),
114+
]
115+
116+
document_embedder = VLLMDocumentEmbedder(model="google/embeddinggemma-300m")
117+
documents_with_embeddings = document_embedder.run(documents)["documents"]
118+
document_store.write_documents(documents_with_embeddings)
119+
120+
query_pipeline = Pipeline()
121+
query_pipeline.add_component(
122+
"text_embedder",
123+
VLLMTextEmbedder(model="google/embeddinggemma-300m"),
124+
)
125+
query_pipeline.add_component(
126+
"retriever",
127+
InMemoryEmbeddingRetriever(document_store=document_store),
128+
)
129+
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
130+
131+
query = "Who lives in Berlin?"
132+
133+
result = query_pipeline.run({"text_embedder": {"text": query}})
134+
135+
print(result["retriever"]["documents"][0])
136+
137+
## Document(id=..., content: 'My name is Wolfgang and I live in Berlin', score: ...)
138+
```

docs-website/sidebars.js

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -312,6 +312,8 @@ export default {
312312
'pipeline-components/embedders/stackittextembedder',
313313
'pipeline-components/embedders/vertexaidocumentembedder',
314314
'pipeline-components/embedders/vertexaitextembedder',
315+
'pipeline-components/embedders/vllmdocumentembedder',
316+
'pipeline-components/embedders/vllmtextembedder',
315317
'pipeline-components/embedders/watsonxdocumentembedder',
316318
'pipeline-components/embedders/watsonxtextembedder',
317319
'pipeline-components/embedders/external-integrations-embedders',

docs-website/versioned_docs/version-2.28-unstable/pipeline-components/embedders.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,5 +56,7 @@ These are the Embedders available in Haystack:
5656
| [STACKITDocumentEmbedder](embedders/stackitdocumentembedder.mdx) | Enables document embedding using the STACKIT API. |
5757
| [VertexAITextEmbedder](embedders/vertexaitextembedder.mdx) | Computes embeddings for text (such as a query) using models through VertexAI Embeddings API. **_This integration will be deprecated soon. We recommend using [GoogleGenAITextEmbedder](embedders/googlegenaitextembedder.mdx) integration instead._** |
5858
| [VertexAIDocumentEmbedder](embedders/vertexaidocumentembedder.mdx) | Computes embeddings for documents using models through VertexAI Embeddings API. **_This integration will be deprecated soon. We recommend using [GoogleGenAIDocumentEmbedder](embedders/googlegenaidocumentembedder.mdx) integration instead._** |
59+
| [VLLMTextEmbedder](embedders/vllmtextembedder.mdx) | Computes the embeddings of a string using models served with vLLM. |
60+
| [VLLMDocumentEmbedder](embedders/vllmdocumentembedder.mdx) | Computes the embeddings of a list of documents using models served with vLLM. |
5961
| [WatsonxTextEmbedder](embedders/watsonxtextembedder.mdx) | Computes embeddings for text (such as a query) using IBM Watsonx models. |
6062
| [WatsonxDocumentEmbedder](embedders/watsonxdocumentembedder.mdx) | Computes embeddings for documents using IBM Watsonx models. |

0 commit comments

Comments
 (0)