Skip to content

Commit 5212d3b

Browse files
Sync Core Integrations API reference (vllm) on Docusaurus (#11138)
Co-authored-by: anakin87 <44616784+anakin87@users.noreply.github.com>
1 parent 3553f1a commit 5212d3b

11 files changed

Lines changed: 1881 additions & 88 deletions

File tree

  • docs-website
    • reference_versioned_docs
      • version-2.18/integrations-api
      • version-2.19/integrations-api
      • version-2.20/integrations-api
      • version-2.21/integrations-api
      • version-2.22/integrations-api
      • version-2.23/integrations-api
      • version-2.24/integrations-api
      • version-2.25/integrations-api
      • version-2.26/integrations-api
      • version-2.27/integrations-api
    • reference/integrations-api

docs-website/reference/integrations-api/vllm.md

Lines changed: 171 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ Create the OpenAI clients.
118118
#### run
119119

120120
```python
121-
run(documents: list[Document]) -> dict[str, Any]
121+
run(documents: list[Document]) -> dict[str, list[Document] | dict[str, Any]]
122122
```
123123

124124
Embed a list of Documents.
@@ -129,14 +129,16 @@ Embed a list of Documents.
129129

130130
**Returns:**
131131

132-
- <code>dict\[str, Any\]</code> – A dictionary with:
132+
- <code>dict\[str, list\[Document\] | dict\[str, Any\]\]</code> – A dictionary with:
133133
- `documents`: The input documents with their `embedding` field populated.
134134
- `meta`: Information about the usage of the model.
135135

136136
#### run_async
137137

138138
```python
139-
run_async(documents: list[Document]) -> dict[str, Any]
139+
run_async(
140+
documents: list[Document],
141+
) -> dict[str, list[Document] | dict[str, Any]]
140142
```
141143

142144
Asynchronously embed a list of Documents.
@@ -147,7 +149,7 @@ Asynchronously embed a list of Documents.
147149

148150
**Returns:**
149151

150-
- <code>dict\[str, Any\]</code> – A dictionary with:
152+
- <code>dict\[str, list\[Document\] | dict\[str, Any\]\]</code> – A dictionary with:
151153
- `documents`: The input documents with their `embedding` field populated.
152154
- `meta`: Information about the usage of the model.
153155

@@ -245,7 +247,7 @@ Create the OpenAI clients.
245247
#### run
246248

247249
```python
248-
run(text: str) -> dict[str, Any]
250+
run(text: str) -> dict[str, list[float] | dict[str, Any]]
249251
```
250252

251253
Embed a single string.
@@ -256,14 +258,14 @@ Embed a single string.
256258

257259
**Returns:**
258260

259-
- <code>dict\[str, Any\]</code> – A dictionary with:
261+
- <code>dict\[str, list\[float\] | dict\[str, Any\]\]</code> – A dictionary with:
260262
- `embedding`: The embedding of the input text.
261263
- `meta`: Information about the usage of the model.
262264

263265
#### run_async
264266

265267
```python
266-
run_async(text: str) -> dict[str, Any]
268+
run_async(text: str) -> dict[str, list[float] | dict[str, Any]]
267269
```
268270

269271
Asynchronously embed a single string.
@@ -274,7 +276,7 @@ Asynchronously embed a single string.
274276

275277
**Returns:**
276278

277-
- <code>dict\[str, Any\]</code> – A dictionary with:
279+
- <code>dict\[str, list\[float\] | dict\[str, Any\]\]</code> – A dictionary with:
278280
- `embedding`: The embedding of the input text.
279281
- `meta`: Information about the usage of the model.
280282

@@ -532,3 +534,164 @@ Run the VLLM chat generator on the given input data asynchronously.
532534

533535
- <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key:
534536
- `replies`: A list containing the generated responses as ChatMessage instances.
537+
538+
## haystack_integrations.components.rankers.vllm.ranker
539+
540+
### VLLMRanker
541+
542+
Ranks Documents based on their similarity to a query using models served with [vLLM](https://docs.vllm.ai/).
543+
544+
It expects a vLLM server to be running and accessible at the `api_base_url` parameter and uses the
545+
`/rerank` endpoint exposed by vLLM.
546+
547+
### Starting the vLLM server
548+
549+
Before using this component, start a vLLM server with a reranker model:
550+
551+
```bash
552+
vllm serve BAAI/bge-reranker-base
553+
```
554+
555+
For details on server options, see the [vLLM CLI docs](https://docs.vllm.ai/en/stable/cli/serve/).
556+
557+
### Usage example
558+
559+
```python
560+
from haystack import Document
561+
from haystack_integrations.components.rankers.vllm import VLLMRanker
562+
563+
ranker = VLLMRanker(model="BAAI/bge-reranker-base")
564+
docs = [
565+
Document(content="The capital of Brazil is Brasilia."),
566+
Document(content="The capital of France is Paris."),
567+
]
568+
result = ranker.run(query="What is the capital of France?", documents=docs)
569+
print(result["documents"][0].content)
570+
```
571+
572+
### Usage example with vLLM-specific parameters
573+
574+
Pass vLLM-specific parameters via the `extra_parameters` dictionary. They are merged into the
575+
request body sent to the `/rerank` endpoint.
576+
577+
```python
578+
ranker = VLLMRanker(
579+
model="BAAI/bge-reranker-base",
580+
extra_parameters={"truncate_prompt_tokens": 256},
581+
)
582+
```
583+
584+
#### __init__
585+
586+
```python
587+
__init__(
588+
*,
589+
model: str,
590+
api_key: Secret | None = Secret.from_env_var("VLLM_API_KEY", strict=False),
591+
api_base_url: str = "http://localhost:8000/v1",
592+
top_k: int | None = None,
593+
score_threshold: float | None = None,
594+
meta_fields_to_embed: list[str] | None = None,
595+
meta_data_separator: str = "\n",
596+
http_client_kwargs: dict[str, Any] | None = None,
597+
extra_parameters: dict[str, Any] | None = None
598+
) -> None
599+
```
600+
601+
Creates an instance of VLLMRanker.
602+
603+
**Parameters:**
604+
605+
- **model** (<code>str</code>) – The name of the reranker model served by vLLM. Check
606+
[vLLM documentation](https://docs.vllm.ai/en/stable/models/pooling_models/scoring/#supported-models) for
607+
information on supported models.
608+
- **api_key** (<code>Secret | None</code>) – The vLLM API key. Defaults to the `VLLM_API_KEY` environment variable.
609+
Only required if the vLLM server was started with `--api-key`.
610+
- **api_base_url** (<code>str</code>) – The base URL of the vLLM server.
611+
- **top_k** (<code>int | None</code>) – The maximum number of Documents to return. If `None`, all documents are returned.
612+
- **score_threshold** (<code>float | None</code>) – If set, documents with a relevance score below this value are dropped.
613+
Applied after `top_k`, so the output may contain fewer than `top_k` documents.
614+
- **meta_fields_to_embed** (<code>list\[str\] | None</code>) – List of meta fields that should be concatenated with the document
615+
content before reranking.
616+
- **meta_data_separator** (<code>str</code>) – Separator used to concatenate the meta fields to the document content.
617+
- **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client` or
618+
`httpx.AsyncClient`. For more information, see the
619+
[HTTPX documentation](https://www.python-httpx.org/api/#client).
620+
- **extra_parameters** (<code>dict\[str, Any\] | None</code>) – Additional parameters merged into the request body sent to the vLLM
621+
`/rerank` endpoint. Use this to pass parameters not part of the standard rerank API, such as
622+
`truncate_prompt_tokens`. See the
623+
[vLLM docs](https://docs.vllm.ai/en/stable/models/pooling_models/scoring/#rerank-api) for more information.
624+
625+
**Raises:**
626+
627+
- <code>ValueError</code> – If `top_k` is not > 0.
628+
629+
#### warm_up
630+
631+
```python
632+
warm_up() -> None
633+
```
634+
635+
Create the httpx clients.
636+
637+
#### run
638+
639+
```python
640+
run(
641+
query: str,
642+
documents: list[Document],
643+
top_k: int | None = None,
644+
score_threshold: float | None = None,
645+
) -> dict[str, list[Document] | dict[str, Any]]
646+
```
647+
648+
Returns a list of Documents ranked by their similarity to the given query.
649+
650+
**Parameters:**
651+
652+
- **query** (<code>str</code>) – Query string.
653+
- **documents** (<code>list\[Document\]</code>) – List of Documents to rank.
654+
- **top_k** (<code>int | None</code>) – The maximum number of Documents to return. Overrides the value set at initialization.
655+
- **score_threshold** (<code>float | None</code>) – Minimum relevance score required for a document to be returned. Overrides
656+
the value set at initialization.
657+
658+
**Returns:**
659+
660+
- <code>dict\[str, list\[Document\] | dict\[str, Any\]\]</code> – A dictionary with:
661+
- `documents`: Documents sorted from most to least relevant.
662+
- `meta`: Information about the model and usage.
663+
664+
**Raises:**
665+
666+
- <code>ValueError</code> – If `top_k` is not > 0.
667+
668+
#### run_async
669+
670+
```python
671+
run_async(
672+
query: str,
673+
documents: list[Document],
674+
top_k: int | None = None,
675+
score_threshold: float | None = None,
676+
) -> dict[str, list[Document] | dict[str, Any]]
677+
```
678+
679+
Asynchronously returns a list of Documents ranked by their similarity to the given query.
680+
681+
**Parameters:**
682+
683+
- **query** (<code>str</code>) – Query string.
684+
- **documents** (<code>list\[Document\]</code>) – List of Documents to rank.
685+
- **top_k** (<code>int | None</code>) – The maximum number of Documents to return. Overrides the value set at initialization.
686+
- **score_threshold** (<code>float | None</code>) – Minimum relevance score required for a document to be returned. Overrides
687+
the value set at initialization.
688+
689+
**Returns:**
690+
691+
- <code>dict\[str, list\[Document\] | dict\[str, Any\]\]</code> – A dictionary with:
692+
- `documents`: Documents sorted from most to least relevant.
693+
- `meta`: Information about the model and usage.
694+
695+
**Raises:**
696+
697+
- <code>ValueError</code> – If `top_k` is not > 0.

0 commit comments

Comments
 (0)