Skip to content

Commit fdffbff

Browse files
authored
docs: add vLLM Ranker docs page (#11154)
1 parent 8f9b896 commit fdffbff

6 files changed

Lines changed: 283 additions & 33 deletions

File tree

docs-website/docs/pipeline-components/rankers.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,3 +26,4 @@ Rankers are a group of components that order documents by given criteria. Their
2626
| [TransformersSimilarityRanker](rankers/transformerssimilarityranker.mdx) | A legacy version of [SentenceTransformersSimilarityRanker](rankers/sentencetransformerssimilarityranker.mdx). |
2727
| [SentenceTransformersDiversityRanker](rankers/sentencetransformersdiversityranker.mdx) | A Diversity Ranker based on Sentence Transformers. |
2828
| [SentenceTransformersSimilarityRanker](rankers/sentencetransformerssimilarityranker.mdx) | A model-based Ranker that orders documents based on their relevance to the query. It uses a cross-encoder model to produce query and document embeddings. It then compares the similarity of the query embedding to the document embeddings to produce a ranking with the most similar documents appearing first. <br /> <br />It's a powerful Ranker that takes word order and syntax into account. You can use it to improve the initial ranking done by a weaker Retriever, but it's also more expensive computationally than the Rankers that don't use models. |
29+
| [VLLMRanker](rankers/vllmranker.mdx) | Ranks documents based on their similarity to the query using reranker models served with vLLM. |
Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
---
2+
title: "VLLMRanker"
3+
id: vllmranker
4+
slug: "/vllmranker"
5+
description: "This component ranks documents based on their similarity to the query using reranker models served with vLLM."
6+
---
7+
8+
# VLLMRanker
9+
10+
This component ranks documents based on their similarity to the query using reranker models served with [vLLM](https://docs.vllm.ai/).
11+
12+
<div className="key-value-table">
13+
14+
| | |
15+
| --- | --- |
16+
| **Most common position in a pipeline** | In a query pipeline, after a component that returns a list of documents such as a [Retriever](../retrievers.mdx) |
17+
| **Mandatory init variables** | `model`: The name of the reranker model served by vLLM |
18+
| **Mandatory run variables** | `query`: A query string <br /> <br />`documents`: A list of document objects |
19+
| **Output variables** | `documents`: A list of document objects |
20+
| **API reference** | [vLLM](/reference/integrations-vllm) |
21+
| **GitHub link** | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/vllm |
22+
23+
</div>
24+
25+
## Overview
26+
27+
[vLLM](https://docs.vllm.ai/) is a high-throughput and memory-efficient inference and serving engine for LLMs. It exposes an HTTP server, which `VLLMRanker` uses to rerank documents through the `/rerank` endpoint.
28+
29+
`VLLMRanker` expects a vLLM server to be running and accessible at the `api_base_url` parameter (by default, `http://localhost:8000/v1`). Use this component after a Retriever in a query pipeline to reorder the retrieved documents by relevance to the query.
30+
31+
You can also specify the `top_k` parameter to set the maximum number of documents to return, and the `score_threshold` parameter to drop documents with a relevance score below a given value.
32+
33+
If the vLLM server was started with `--api-key`, provide the API key through the `VLLM_API_KEY` environment variable or the `api_key` init parameter using Haystack's [Secret](../../concepts/secret-management.mdx) API.
34+
35+
### Compatible models
36+
37+
vLLM supports a range of reranker models. Check the [vLLM supported models docs](https://docs.vllm.ai/en/stable/models/pooling_models/scoring/#supported-models) for the list of supported architectures and models.
38+
39+
### vLLM-specific parameters
40+
41+
You can pass vLLM-specific parameters through the `extra_parameters` dictionary. These are merged into the request body sent to the `/rerank` endpoint. Use this to pass parameters that are not part of the standard rerank API, such as `truncate_prompt_tokens`. See the [vLLM rerank API docs](https://docs.vllm.ai/en/stable/models/pooling_models/scoring/#rerank-api) for details.
42+
43+
```python
44+
ranker = VLLMRanker(
45+
model="BAAI/bge-reranker-base",
46+
extra_parameters={"truncate_prompt_tokens": 256},
47+
)
48+
```
49+
50+
### Embedding meta fields
51+
52+
Some use cases benefit from including meta information (such as a title) alongside the document content when reranking. Pass the names of the meta fields to include through the `meta_fields_to_embed` parameter; they will be concatenated with the document content using `meta_data_separator`.
53+
54+
```python
55+
ranker = VLLMRanker(
56+
model="BAAI/bge-reranker-base",
57+
meta_fields_to_embed=["title"],
58+
meta_data_separator="\n",
59+
)
60+
```
61+
62+
## Usage
63+
64+
Install the `vllm-haystack` package to use the `VLLMRanker`:
65+
66+
```shell
67+
pip install vllm-haystack
68+
```
69+
70+
### Starting the vLLM server
71+
72+
Before using this component, start a vLLM server with a reranker model:
73+
74+
```bash
75+
vllm serve BAAI/bge-reranker-base
76+
```
77+
78+
For details on server options, see the [vLLM CLI docs](https://docs.vllm.ai/en/stable/cli/serve/).
79+
80+
### On its own
81+
82+
```python
83+
from haystack import Document
84+
from haystack_integrations.components.rankers.vllm import VLLMRanker
85+
86+
ranker = VLLMRanker(model="BAAI/bge-reranker-base")
87+
88+
docs = [
89+
Document(content="The capital of Brazil is Brasilia."),
90+
Document(content="The capital of France is Paris."),
91+
]
92+
result = ranker.run(query="What is the capital of France?", documents=docs)
93+
print(result["documents"][0].content)
94+
95+
## The capital of France is Paris.
96+
```
97+
98+
### In a pipeline
99+
100+
```python
101+
from haystack import Document, Pipeline
102+
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
103+
from haystack.document_stores.in_memory import InMemoryDocumentStore
104+
from haystack_integrations.components.rankers.vllm import VLLMRanker
105+
106+
docs = [
107+
Document(content="Paris is in France"),
108+
Document(content="Berlin is in Germany"),
109+
Document(content="Lyon is in France"),
110+
]
111+
document_store = InMemoryDocumentStore()
112+
document_store.write_documents(docs)
113+
114+
retriever = InMemoryBM25Retriever(document_store=document_store)
115+
ranker = VLLMRanker(model="BAAI/bge-reranker-base")
116+
117+
document_ranker_pipeline = Pipeline()
118+
document_ranker_pipeline.add_component(instance=retriever, name="retriever")
119+
document_ranker_pipeline.add_component(instance=ranker, name="ranker")
120+
121+
document_ranker_pipeline.connect("retriever.documents", "ranker.documents")
122+
123+
query = "Cities in France"
124+
result = document_ranker_pipeline.run(
125+
data={
126+
"retriever": {"query": query, "top_k": 3},
127+
"ranker": {"query": query, "top_k": 2},
128+
},
129+
)
130+
131+
print(result["ranker"]["documents"][0])
132+
133+
## Document(id=..., content: 'Paris is in France', score: ...)
134+
```

docs-website/sidebars.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -502,6 +502,7 @@ export default {
502502
'pipeline-components/rankers/sentencetransformersdiversityranker',
503503
'pipeline-components/rankers/sentencetransformerssimilarityranker',
504504
'pipeline-components/rankers/transformerssimilarityranker',
505+
'pipeline-components/rankers/vllmranker',
505506
'pipeline-components/rankers/external-integrations-rankers',
506507
],
507508
},

docs-website/versioned_docs/version-2.28/pipeline-components/rankers.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,3 +26,4 @@ Rankers are a group of components that order documents by given criteria. Their
2626
| [TransformersSimilarityRanker](rankers/transformerssimilarityranker.mdx) | A legacy version of [SentenceTransformersSimilarityRanker](rankers/sentencetransformerssimilarityranker.mdx). |
2727
| [SentenceTransformersDiversityRanker](rankers/sentencetransformersdiversityranker.mdx) | A Diversity Ranker based on Sentence Transformers. |
2828
| [SentenceTransformersSimilarityRanker](rankers/sentencetransformerssimilarityranker.mdx) | A model-based Ranker that orders documents based on their relevance to the query. It uses a cross-encoder model to produce query and document embeddings. It then compares the similarity of the query embedding to the document embeddings to produce a ranking with the most similar documents appearing first. <br /> <br />It's a powerful Ranker that takes word order and syntax into account. You can use it to improve the initial ranking done by a weaker Retriever, but it's also more expensive computationally than the Rankers that don't use models. |
29+
| [VLLMRanker](rankers/vllmranker.mdx) | Ranks documents based on their similarity to the query using reranker models served with vLLM. |
Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
---
2+
title: "VLLMRanker"
3+
id: vllmranker
4+
slug: "/vllmranker"
5+
description: "This component ranks documents based on their similarity to the query using reranker models served with vLLM."
6+
---
7+
8+
# VLLMRanker
9+
10+
This component ranks documents based on their similarity to the query using reranker models served with [vLLM](https://docs.vllm.ai/).
11+
12+
<div className="key-value-table">
13+
14+
| | |
15+
| --- | --- |
16+
| **Most common position in a pipeline** | In a query pipeline, after a component that returns a list of documents such as a [Retriever](../retrievers.mdx) |
17+
| **Mandatory init variables** | `model`: The name of the reranker model served by vLLM |
18+
| **Mandatory run variables** | `query`: A query string <br /> <br />`documents`: A list of document objects |
19+
| **Output variables** | `documents`: A list of document objects |
20+
| **API reference** | [vLLM](/reference/integrations-vllm) |
21+
| **GitHub link** | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/vllm |
22+
23+
</div>
24+
25+
## Overview
26+
27+
[vLLM](https://docs.vllm.ai/) is a high-throughput and memory-efficient inference and serving engine for LLMs. It exposes an HTTP server, which `VLLMRanker` uses to rerank documents through the `/rerank` endpoint.
28+
29+
`VLLMRanker` expects a vLLM server to be running and accessible at the `api_base_url` parameter (by default, `http://localhost:8000/v1`). Use this component after a Retriever in a query pipeline to reorder the retrieved documents by relevance to the query.
30+
31+
You can also specify the `top_k` parameter to set the maximum number of documents to return, and the `score_threshold` parameter to drop documents with a relevance score below a given value.
32+
33+
If the vLLM server was started with `--api-key`, provide the API key through the `VLLM_API_KEY` environment variable or the `api_key` init parameter using Haystack's [Secret](../../concepts/secret-management.mdx) API.
34+
35+
### Compatible models
36+
37+
vLLM supports a range of reranker models. Check the [vLLM supported models docs](https://docs.vllm.ai/en/stable/models/pooling_models/scoring/#supported-models) for the list of supported architectures and models.
38+
39+
### vLLM-specific parameters
40+
41+
You can pass vLLM-specific parameters through the `extra_parameters` dictionary. These are merged into the request body sent to the `/rerank` endpoint. Use this to pass parameters that are not part of the standard rerank API, such as `truncate_prompt_tokens`. See the [vLLM rerank API docs](https://docs.vllm.ai/en/stable/models/pooling_models/scoring/#rerank-api) for details.
42+
43+
```python
44+
ranker = VLLMRanker(
45+
model="BAAI/bge-reranker-base",
46+
extra_parameters={"truncate_prompt_tokens": 256},
47+
)
48+
```
49+
50+
### Embedding meta fields
51+
52+
Some use cases benefit from including meta information (such as a title) alongside the document content when reranking. Pass the names of the meta fields to include through the `meta_fields_to_embed` parameter; they will be concatenated with the document content using `meta_data_separator`.
53+
54+
```python
55+
ranker = VLLMRanker(
56+
model="BAAI/bge-reranker-base",
57+
meta_fields_to_embed=["title"],
58+
meta_data_separator="\n",
59+
)
60+
```
61+
62+
## Usage
63+
64+
Install the `vllm-haystack` package to use the `VLLMRanker`:
65+
66+
```shell
67+
pip install vllm-haystack
68+
```
69+
70+
### Starting the vLLM server
71+
72+
Before using this component, start a vLLM server with a reranker model:
73+
74+
```bash
75+
vllm serve BAAI/bge-reranker-base
76+
```
77+
78+
For details on server options, see the [vLLM CLI docs](https://docs.vllm.ai/en/stable/cli/serve/).
79+
80+
### On its own
81+
82+
```python
83+
from haystack import Document
84+
from haystack_integrations.components.rankers.vllm import VLLMRanker
85+
86+
ranker = VLLMRanker(model="BAAI/bge-reranker-base")
87+
88+
docs = [
89+
Document(content="The capital of Brazil is Brasilia."),
90+
Document(content="The capital of France is Paris."),
91+
]
92+
result = ranker.run(query="What is the capital of France?", documents=docs)
93+
print(result["documents"][0].content)
94+
95+
## The capital of France is Paris.
96+
```
97+
98+
### In a pipeline
99+
100+
```python
101+
from haystack import Document, Pipeline
102+
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
103+
from haystack.document_stores.in_memory import InMemoryDocumentStore
104+
from haystack_integrations.components.rankers.vllm import VLLMRanker
105+
106+
docs = [
107+
Document(content="Paris is in France"),
108+
Document(content="Berlin is in Germany"),
109+
Document(content="Lyon is in France"),
110+
]
111+
document_store = InMemoryDocumentStore()
112+
document_store.write_documents(docs)
113+
114+
retriever = InMemoryBM25Retriever(document_store=document_store)
115+
ranker = VLLMRanker(model="BAAI/bge-reranker-base")
116+
117+
document_ranker_pipeline = Pipeline()
118+
document_ranker_pipeline.add_component(instance=retriever, name="retriever")
119+
document_ranker_pipeline.add_component(instance=ranker, name="ranker")
120+
121+
document_ranker_pipeline.connect("retriever.documents", "ranker.documents")
122+
123+
query = "Cities in France"
124+
result = document_ranker_pipeline.run(
125+
data={
126+
"retriever": {"query": query, "top_k": 3},
127+
"ranker": {"query": query, "top_k": 2},
128+
},
129+
)
130+
131+
print(result["ranker"]["documents"][0])
132+
133+
## Document(id=..., content: 'Paris is in France', score: ...)
134+
```

0 commit comments

Comments
 (0)