Skip to content

Commit c3069c8

Browse files
authored
feat: add vLLM embedders (#3163)
* feat: add vLLM embedders * improvements * integration tests on the ci * fixes * lower bound pin for more-itertools * more pins * rm serde methods
1 parent 1c1c57e commit c3069c8

16 files changed

Lines changed: 1079 additions & 30 deletions

File tree

.github/workflows/vllm.yml

Lines changed: 32 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ env:
3030
PYTHONUNBUFFERED: "1"
3131
FORCE_COLOR: "1"
3232
VLLM_MODEL: "Qwen/Qwen3-0.6B"
33+
VLLM_EMBEDDING_MODEL: "sentence-transformers/all-MiniLM-L6-v2"
3334
# we only test on Ubuntu to keep vLLM server running simple
3435
TEST_MATRIX_OS: '["ubuntu-latest"]'
3536
# vLLM is not compatible with Python 3.14. https://github.com/vllm-project/vllm/issues/34096
@@ -88,12 +89,13 @@ jobs:
8889
"https://github.com/vllm-project/vllm/releases/download/v${VLLM_VERSION}/vllm-${VLLM_VERSION}+cpu-cp38-abi3-manylinux_2_35_x86_64.whl" \
8990
--torch-backend cpu
9091
91-
- name: Start vLLM server
92+
- name: Start vLLM chat server
9293
env:
9394
VLLM_TARGET_DEVICE: "cpu"
9495
VLLM_CPU_KVCACHE_SPACE: "4"
9596
run: |
9697
nohup hatch run -- vllm serve ${{ env.VLLM_MODEL }} \
98+
--port 8000 \
9799
--reasoning-parser qwen3 \
98100
--max-model-len 1024 \
99101
--enforce-eager \
@@ -102,20 +104,45 @@ jobs:
102104
--tool-call-parser hermes \
103105
--max-num-seqs 1 &
104106
105-
# Wait for the vLLM server to be ready with a timeout of 300 seconds
107+
# Wait for the vLLM chat server to be ready with a timeout of 300 seconds
106108
timeout=300
107109
while [ $timeout -gt 0 ] && ! curl -sSf http://localhost:8000/health > /dev/null 2>&1; do
108-
echo "Waiting for vLLM server to start..."
110+
echo "Waiting for vLLM chat server to start..."
109111
sleep 10
110112
((timeout-=10))
111113
done
112114
113115
if [ $timeout -eq 0 ]; then
114-
echo "Timed out waiting for vLLM server to start."
116+
echo "Timed out waiting for vLLM chat server to start."
115117
exit 1
116118
fi
117119
118-
echo "vLLM server started successfully."
120+
echo "vLLM chat server started successfully."
121+
122+
- name: Start vLLM embedding server
123+
env:
124+
VLLM_TARGET_DEVICE: "cpu"
125+
VLLM_CPU_KVCACHE_SPACE: "4"
126+
run: |
127+
nohup hatch run -- vllm serve ${{ env.VLLM_EMBEDDING_MODEL }} \
128+
--port 8001 \
129+
--enforce-eager \
130+
--max-num-seqs 1 &
131+
132+
# Wait for the vLLM embedding server to be ready with a timeout of 300 seconds
133+
timeout=300
134+
while [ $timeout -gt 0 ] && ! curl -sSf http://localhost:8001/health > /dev/null 2>&1; do
135+
echo "Waiting for vLLM embedding server to start..."
136+
sleep 10
137+
((timeout-=10))
138+
done
139+
140+
if [ $timeout -eq 0 ]; then
141+
echo "Timed out waiting for vLLM embedding server to start."
142+
exit 1
143+
fi
144+
145+
echo "vLLM embedding server started successfully."
119146
120147
- name: Lint
121148
if: matrix.python-version == '3.10' && runner.os == 'Linux'

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ Please check out our [Contribution Guidelines](CONTRIBUTING.md) for all the deta
7878
| [togetherai-haystack](integrations/togetherai/) | Generator | [![PyPI - Version](https://img.shields.io/pypi/v/togetherai-haystack.svg)](https://pypi.org/project/togetherai-haystack) | [![Test / togetherai](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/togetherai.yml/badge.svg)](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/togetherai.yml) | [![Coverage badge](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/deepset-ai/haystack-core-integrations/python-coverage-comment-action-data-togetherai/endpoint.json&label=)](https://htmlpreview.github.io/?https://github.com/deepset-ai/haystack-core-integrations/blob/python-coverage-comment-action-data-togetherai/htmlcov/index.html) | [![Coverage badge](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/deepset-ai/haystack-core-integrations/python-coverage-comment-action-data-togetherai-combined/endpoint.json&label=)](https://htmlpreview.github.io/?https://github.com/deepset-ai/haystack-core-integrations/blob/python-coverage-comment-action-data-togetherai-combined/htmlcov/index.html) |
7979
| [unstructured-fileconverter-haystack](integrations/unstructured/) | File converter | [![PyPI - Version](https://img.shields.io/pypi/v/unstructured-fileconverter-haystack.svg)](https://pypi.org/project/unstructured-fileconverter-haystack) | [![Test / unstructured](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/unstructured.yml/badge.svg)](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/unstructured.yml) | [![Coverage badge](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/deepset-ai/haystack-core-integrations/python-coverage-comment-action-data-unstructured/endpoint.json&label=)](https://htmlpreview.github.io/?https://github.com/deepset-ai/haystack-core-integrations/blob/python-coverage-comment-action-data-unstructured/htmlcov/index.html) | [![Coverage badge](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/deepset-ai/haystack-core-integrations/python-coverage-comment-action-data-unstructured-combined/endpoint.json&label=)](https://htmlpreview.github.io/?https://github.com/deepset-ai/haystack-core-integrations/blob/python-coverage-comment-action-data-unstructured-combined/htmlcov/index.html) |
8080
| [valkey-haystack](integrations/valkey/) | Document Store | [![PyPI - Version](https://img.shields.io/pypi/v/valkey-haystack.svg)](https://pypi.org/project/valkey-haystack) | [![Test / valkey](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/valkey.yml/badge.svg)](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/valkey.yml) | [![Coverage badge](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/deepset-ai/haystack-core-integrations/python-coverage-comment-action-data-valkey/endpoint.json&label=)](https://htmlpreview.github.io/?https://github.com/deepset-ai/haystack-core-integrations/blob/python-coverage-comment-action-data-valkey/htmlcov/index.html) | [![Coverage badge](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/deepset-ai/haystack-core-integrations/python-coverage-comment-action-data-valkey-combined/endpoint.json&label=)](https://htmlpreview.github.io/?https://github.com/deepset-ai/haystack-core-integrations/blob/python-coverage-comment-action-data-valkey-combined/htmlcov/index.html) |
81-
| [vllm-haystack](integrations/vllm/) | Generator | [![PyPI - Version](https://img.shields.io/pypi/v/vllm-haystack.svg)](https://pypi.org/project/vllm-haystack) | [![Test / vllm](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/vllm.yml/badge.svg)](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/vllm.yml) | [![Coverage badge](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/deepset-ai/haystack-core-integrations/python-coverage-comment-action-data-vllm/endpoint.json&label=)](https://htmlpreview.github.io/?https://github.com/deepset-ai/haystack-core-integrations/blob/python-coverage-comment-action-data-vllm/htmlcov/index.html) | [![Coverage badge](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/deepset-ai/haystack-core-integrations/python-coverage-comment-action-data-vllm-combined/endpoint.json&label=)](https://htmlpreview.github.io/?https://github.com/deepset-ai/haystack-core-integrations/blob/python-coverage-comment-action-data-vllm-combined/htmlcov/index.html) |
81+
| [vllm-haystack](integrations/vllm/) | Embedder, Generator | [![PyPI - Version](https://img.shields.io/pypi/v/vllm-haystack.svg)](https://pypi.org/project/vllm-haystack) | [![Test / vllm](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/vllm.yml/badge.svg)](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/vllm.yml) | [![Coverage badge](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/deepset-ai/haystack-core-integrations/python-coverage-comment-action-data-vllm/endpoint.json&label=)](https://htmlpreview.github.io/?https://github.com/deepset-ai/haystack-core-integrations/blob/python-coverage-comment-action-data-vllm/htmlcov/index.html) | [![Coverage badge](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/deepset-ai/haystack-core-integrations/python-coverage-comment-action-data-vllm-combined/endpoint.json&label=)](https://htmlpreview.github.io/?https://github.com/deepset-ai/haystack-core-integrations/blob/python-coverage-comment-action-data-vllm-combined/htmlcov/index.html) |
8282
| [watsonx-haystack](integrations/watsonx/) | Embedder, Generator | [![PyPI - Version](https://img.shields.io/pypi/v/watsonx-haystack.svg?color=orange)](https://pypi.org/project/watsonx-haystack) | [![Test / watsonx](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/watsonx.yml/badge.svg)](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/watsonx.yml) | [![Coverage badge](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/deepset-ai/haystack-core-integrations/python-coverage-comment-action-data-watsonx/endpoint.json&label=)](https://htmlpreview.github.io/?https://github.com/deepset-ai/haystack-core-integrations/blob/python-coverage-comment-action-data-watsonx/htmlcov/index.html) | [![Coverage badge](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/deepset-ai/haystack-core-integrations/python-coverage-comment-action-data-watsonx-combined/endpoint.json&label=)](https://htmlpreview.github.io/?https://github.com/deepset-ai/haystack-core-integrations/blob/python-coverage-comment-action-data-watsonx-combined/htmlcov/index.html) |
8383
| [weave-haystack](integrations/weave/) | Tracer | [![PyPI - Version](https://img.shields.io/pypi/v/weave-haystack.svg)](https://pypi.org/project/weave-haystack) | [![Test / weave](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/weave.yml/badge.svg)](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/weave.yml) | [![Coverage badge](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/deepset-ai/haystack-core-integrations/python-coverage-comment-action-data-weave/endpoint.json&label=)](https://htmlpreview.github.io/?https://github.com/deepset-ai/haystack-core-integrations/blob/python-coverage-comment-action-data-weave/htmlcov/index.html) | [![Coverage badge](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/deepset-ai/haystack-core-integrations/python-coverage-comment-action-data-weave-combined/endpoint.json&label=)](https://htmlpreview.github.io/?https://github.com/deepset-ai/haystack-core-integrations/blob/python-coverage-comment-action-data-weave-combined/htmlcov/index.html) |
8484
| [weaviate-haystack](integrations/weaviate/) | Document Store | [![PyPI - Version](https://img.shields.io/pypi/v/weaviate-haystack.svg)](https://pypi.org/project/weaviate-haystack) | [![Test / weaviate](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/weaviate.yml/badge.svg)](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/weaviate.yml) | [![Coverage badge](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/deepset-ai/haystack-core-integrations/python-coverage-comment-action-data-weaviate/endpoint.json&label=)](https://htmlpreview.github.io/?https://github.com/deepset-ai/haystack-core-integrations/blob/python-coverage-comment-action-data-weaviate/htmlcov/index.html) | [![Coverage badge](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/deepset-ai/haystack-core-integrations/python-coverage-comment-action-data-weaviate-combined/endpoint.json&label=)](https://htmlpreview.github.io/?https://github.com/deepset-ai/haystack-core-integrations/blob/python-coverage-comment-action-data-weaviate-combined/htmlcov/index.html) |

integrations/vllm/README.md

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,19 @@
1111

1212
Refer to the general [Contribution Guidelines](https://github.com/deepset-ai/haystack-core-integrations/blob/main/CONTRIBUTING.md).
1313

14-
To run integration tests locally, you need to have a running vLLM server. Refer to the [workflow file](https://github.com/deepset-ai/haystack-core-integrations/blob/main/.github/workflows/vllm.yml) for more details.
14+
To run integration tests locally, you need two vLLM servers running in parallel: one for the chat generator on port `8000` and one for the embedders on port `8001`. Refer to the [workflow file](https://github.com/deepset-ai/haystack-core-integrations/blob/main/.github/workflows/vllm.yml) for more details.
1515

16-
For example, on macOs, you can install [vLLM-metal](https://github.com/vllm-project/vllm-metal) and run the server with:
16+
For example, on macOs, you can install [vLLM-metal](https://github.com/vllm-project/vllm-metal) and start the chat generator server with:
1717

1818
```bash
19-
source ~/.venv-vllm-metal/bin/activate && vllm serve Qwen/Qwen3-0.6B --reasoning-parser qwen3 --max-model-len 1024 --enforce-eager --enable-auto-tool-choice --tool-call-parser hermes
19+
# chat generator server (port 8000)
20+
source ~/.venv-vllm-metal/bin/activate && vllm serve Qwen/Qwen3-0.6B --reasoning-parser qwen3 --max-model-len 1024 --enforce-eager --enable-auto-tool-choice --tool-call-parser hermes
21+
```
22+
23+
vLLM-metal does not support embedding models. On macOS, you can run the embedding server via CPU Docker image:
24+
25+
```bash
26+
# embedders server (port 8001)
27+
docker run --rm -p 8001:8000 -e VLLM_CPU_OMP_THREADS_BIND=0-3 vllm/vllm-openai-cpu:latest \
28+
--model sentence-transformers/all-MiniLM-L6-v2 --enforce-eager
2029
```

integrations/vllm/pydoc/config_docusaurus.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
loaders:
22
- modules:
33
- haystack_integrations.components.generators.vllm.chat.chat_generator
4+
- haystack_integrations.components.embedders.vllm.text_embedder
5+
- haystack_integrations.components.embedders.vllm.document_embedder
46
search_path: [../src]
57
processors:
68
- type: filter

integrations/vllm/pyproject.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ classifiers = [
2222
"Programming Language :: Python :: Implementation :: CPython",
2323
"Programming Language :: Python :: Implementation :: PyPy",
2424
]
25-
dependencies = ["haystack-ai>=2.23.0", "openai"]
25+
dependencies = ["haystack-ai>=2.23.0", "openai", "more_itertools>=9.0.0", "tqdm>=4.48.0"]
2626

2727
[project.urls]
2828
Documentation = "https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/vllm#readme"
@@ -66,7 +66,7 @@ integration = 'pytest -m "integration" {args:tests}'
6666
all = 'pytest {args:tests}'
6767
unit-cov-retry = 'pytest --cov=haystack_integrations --reruns 3 --reruns-delay 30 -x -m "not integration" {args:tests}'
6868
integration-cov-append-retry = 'pytest --cov=haystack_integrations --cov-append --reruns 3 --reruns-delay 30 -x -m "integration" {args:tests}'
69-
types = "mypy -p haystack_integrations.components.generators.vllm {args}"
69+
types = "mypy -p haystack_integrations.components.generators.vllm -p haystack_integrations.components.embedders.vllm -p haystack_integrations.common.vllm {args}"
7070

7171
[tool.mypy]
7272
install_types = true

integrations/vllm/src/haystack_integrations/common/py.typed

Whitespace-only changes.
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# SPDX-FileCopyrightText: 2022-present deepset GmbH <info@deepset.ai>
2+
#
3+
# SPDX-License-Identifier: Apache-2.0
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# SPDX-FileCopyrightText: 2022-present deepset GmbH <info@deepset.ai>
2+
#
3+
# SPDX-License-Identifier: Apache-2.0
4+
5+
from typing import Any
6+
7+
from haystack.utils import Secret
8+
from haystack.utils.http_client import init_http_client
9+
from openai import AsyncOpenAI, OpenAI
10+
11+
12+
def _create_openai_clients(
13+
api_key: Secret | None,
14+
api_base_url: str,
15+
timeout: float | None,
16+
max_retries: int | None,
17+
http_client_kwargs: dict[str, Any] | None,
18+
) -> tuple[OpenAI, AsyncOpenAI]:
19+
"""
20+
Build sync and async OpenAI clients pointing at a vLLM server.
21+
22+
A placeholder api key is used when the user did not supply one and no `VLLM_API_KEY` env var is set, because the
23+
OpenAI client requires a non-empty value.
24+
`timeout` and `max_retries` are only forwarded when provided: when None, the OpenAI client's own defaults apply.
25+
"""
26+
resolved_api_key = "placeholder-api-key"
27+
if api_key is not None and (value := api_key.resolve_value()):
28+
resolved_api_key = value
29+
30+
client_kwargs: dict[str, Any] = {"api_key": resolved_api_key, "base_url": api_base_url}
31+
if timeout is not None:
32+
client_kwargs["timeout"] = timeout
33+
if max_retries is not None:
34+
client_kwargs["max_retries"] = max_retries
35+
36+
sync_client = OpenAI(http_client=init_http_client(http_client_kwargs, async_client=False), **client_kwargs)
37+
async_client = AsyncOpenAI(http_client=init_http_client(http_client_kwargs, async_client=True), **client_kwargs)
38+
return sync_client, async_client

integrations/vllm/src/haystack_integrations/components/embedders/py.typed

Whitespace-only changes.
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# SPDX-FileCopyrightText: 2022-present deepset GmbH <info@deepset.ai>
2+
#
3+
# SPDX-License-Identifier: Apache-2.0
4+
5+
from .document_embedder import VLLMDocumentEmbedder
6+
from .text_embedder import VLLMTextEmbedder
7+
8+
__all__ = ["VLLMDocumentEmbedder", "VLLMTextEmbedder"]

0 commit comments

Comments
 (0)