docs: add vLLM Ranker docs page (#11154)

anakin87 · web-flow · commit fdffbff5f24c · 2026-04-21T11:11:55.000+02:00
diff --git a/docs-website/docs/pipeline-components/rankers.mdx b/docs-website/docs/pipeline-components/rankers.mdx
@@ -26,3 +26,4 @@ Rankers are a group of components that order documents by given criteria. Their
 | [TransformersSimilarityRanker](rankers/transformerssimilarityranker.mdx) | A legacy version of [SentenceTransformersSimilarityRanker](rankers/sentencetransformerssimilarityranker.mdx). |
 | [SentenceTransformersDiversityRanker](rankers/sentencetransformersdiversityranker.mdx) | A Diversity Ranker based on Sentence Transformers. |
 | [SentenceTransformersSimilarityRanker](rankers/sentencetransformerssimilarityranker.mdx) | A model-based Ranker that orders documents based on their relevance to the query. It uses a cross-encoder model to produce query and document embeddings. It then compares the similarity of the query embedding to the document embeddings to produce a ranking with the most similar documents appearing first.  <br /> <br />It's a powerful Ranker that takes word order and syntax into account. You can use it to improve the initial ranking done by a weaker Retriever, but it's also more expensive computationally than the Rankers that don't use models. |
+| [VLLMRanker](rankers/vllmranker.mdx) | Ranks documents based on their similarity to the query using reranker models served with vLLM. |
diff --git a/docs-website/docs/pipeline-components/rankers/vllmranker.mdx b/docs-website/docs/pipeline-components/rankers/vllmranker.mdx
@@ -0,0 +1,134 @@
+---
+title: "VLLMRanker"
+id: vllmranker
+slug: "/vllmranker"
+description: "This component ranks documents based on their similarity to the query using reranker models served with vLLM."
+---
+
+# VLLMRanker
+
+This component ranks documents based on their similarity to the query using reranker models served with [vLLM](https://docs.vllm.ai/).
+
+<div className="key-value-table">
+
+|  |  |
+| --- | --- |
+| **Most common position in a pipeline** | In a query pipeline, after a component that returns a list of documents such as a [Retriever](../retrievers.mdx) |
+| **Mandatory init variables**           | `model`: The name of the reranker model served by vLLM                                    |
+| **Mandatory run variables**            | `query`: A query string  <br /> <br />`documents`: A list of document objects             |
+| **Output variables**                   | `documents`: A list of document objects                                                   |
+| **API reference**                      | [vLLM](/reference/integrations-vllm)                                                      |
+| **GitHub link**                        | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/vllm      |
+
+</div>
+
+## Overview
+
+[vLLM](https://docs.vllm.ai/) is a high-throughput and memory-efficient inference and serving engine for LLMs. It exposes an HTTP server, which `VLLMRanker` uses to rerank documents through the `/rerank` endpoint.
+
+`VLLMRanker` expects a vLLM server to be running and accessible at the `api_base_url` parameter (by default, `http://localhost:8000/v1`). Use this component after a Retriever in a query pipeline to reorder the retrieved documents by relevance to the query.
+
+You can also specify the `top_k` parameter to set the maximum number of documents to return, and the `score_threshold` parameter to drop documents with a relevance score below a given value.
+
+If the vLLM server was started with `--api-key`, provide the API key through the `VLLM_API_KEY` environment variable or the `api_key` init parameter using Haystack's [Secret](../../concepts/secret-management.mdx) API.
+
+### Compatible models
+
+vLLM supports a range of reranker models. Check the [vLLM supported models docs](https://docs.vllm.ai/en/stable/models/pooling_models/scoring/#supported-models) for the list of supported architectures and models.
+
+### vLLM-specific parameters
+
+You can pass vLLM-specific parameters through the `extra_parameters` dictionary. These are merged into the request body sent to the `/rerank` endpoint. Use this to pass parameters that are not part of the standard rerank API, such as `truncate_prompt_tokens`. See the [vLLM rerank API docs](https://docs.vllm.ai/en/stable/models/pooling_models/scoring/#rerank-api) for details.
+
+```python
+ranker = VLLMRanker(
+    model="BAAI/bge-reranker-base",
+    extra_parameters={"truncate_prompt_tokens": 256},
+)
+```
+
+### Embedding meta fields
+
+Some use cases benefit from including meta information (such as a title) alongside the document content when reranking. Pass the names of the meta fields to include through the `meta_fields_to_embed` parameter; they will be concatenated with the document content using `meta_data_separator`.
+
+```python
+ranker = VLLMRanker(
+    model="BAAI/bge-reranker-base",
+    meta_fields_to_embed=["title"],
+    meta_data_separator="\n",
+)
+```
+
+## Usage
+
+Install the `vllm-haystack` package to use the `VLLMRanker`:
+
+```shell
+pip install vllm-haystack
+```
+
+### Starting the vLLM server
+
+Before using this component, start a vLLM server with a reranker model:
+
+```bash
+vllm serve BAAI/bge-reranker-base
+```
+
+For details on server options, see the [vLLM CLI docs](https://docs.vllm.ai/en/stable/cli/serve/).
+
+### On its own
+
+```python
+from haystack import Document
+from haystack_integrations.components.rankers.vllm import VLLMRanker
+
+ranker = VLLMRanker(model="BAAI/bge-reranker-base")
+
+docs = [
+    Document(content="The capital of Brazil is Brasilia."),
+    Document(content="The capital of France is Paris."),
+]
+result = ranker.run(query="What is the capital of France?", documents=docs)
+print(result["documents"][0].content)
+
+## The capital of France is Paris.
+```
+
+### In a pipeline
+
+```python
+from haystack import Document, Pipeline
+from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
+from haystack.document_stores.in_memory import InMemoryDocumentStore
+from haystack_integrations.components.rankers.vllm import VLLMRanker
+
+docs = [
+    Document(content="Paris is in France"),
+    Document(content="Berlin is in Germany"),
+    Document(content="Lyon is in France"),
+]
+document_store = InMemoryDocumentStore()
+document_store.write_documents(docs)
+
+retriever = InMemoryBM25Retriever(document_store=document_store)
+ranker = VLLMRanker(model="BAAI/bge-reranker-base")
+
+document_ranker_pipeline = Pipeline()
+document_ranker_pipeline.add_component(instance=retriever, name="retriever")
+document_ranker_pipeline.add_component(instance=ranker, name="ranker")
+
+document_ranker_pipeline.connect("retriever.documents", "ranker.documents")
+
+query = "Cities in France"
+result = document_ranker_pipeline.run(
+    data={
+        "retriever": {"query": query, "top_k": 3},
+        "ranker": {"query": query, "top_k": 2},
+    },
+)
+
+print(result["ranker"]["documents"][0])
+
+## Document(id=..., content: 'Paris is in France', score: ...)
+```
diff --git a/docs-website/sidebars.js b/docs-website/sidebars.js
@@ -502,6 +502,7 @@ export default {
             'pipeline-components/rankers/sentencetransformersdiversityranker',
             'pipeline-components/rankers/sentencetransformerssimilarityranker',
             'pipeline-components/rankers/transformerssimilarityranker',
+            'pipeline-components/rankers/vllmranker',
             'pipeline-components/rankers/external-integrations-rankers',
           ],
         },
diff --git a/docs-website/versioned_docs/version-2.28/pipeline-components/rankers.mdx b/docs-website/versioned_docs/version-2.28/pipeline-components/rankers.mdx
@@ -26,3 +26,4 @@ Rankers are a group of components that order documents by given criteria. Their
 | [TransformersSimilarityRanker](rankers/transformerssimilarityranker.mdx) | A legacy version of [SentenceTransformersSimilarityRanker](rankers/sentencetransformerssimilarityranker.mdx). |
 | [SentenceTransformersDiversityRanker](rankers/sentencetransformersdiversityranker.mdx) | A Diversity Ranker based on Sentence Transformers. |
 | [SentenceTransformersSimilarityRanker](rankers/sentencetransformerssimilarityranker.mdx) | A model-based Ranker that orders documents based on their relevance to the query. It uses a cross-encoder model to produce query and document embeddings. It then compares the similarity of the query embedding to the document embeddings to produce a ranking with the most similar documents appearing first.  <br /> <br />It's a powerful Ranker that takes word order and syntax into account. You can use it to improve the initial ranking done by a weaker Retriever, but it's also more expensive computationally than the Rankers that don't use models. |
+| [VLLMRanker](rankers/vllmranker.mdx) | Ranks documents based on their similarity to the query using reranker models served with vLLM. |
diff --git a/docs-website/versioned_docs/version-2.28/pipeline-components/rankers/vllmranker.mdx b/docs-website/versioned_docs/version-2.28/pipeline-components/rankers/vllmranker.mdx
@@ -0,0 +1,134 @@
+---
+title: "VLLMRanker"
+id: vllmranker
+slug: "/vllmranker"
+description: "This component ranks documents based on their similarity to the query using reranker models served with vLLM."
+---
+
+# VLLMRanker
+
+This component ranks documents based on their similarity to the query using reranker models served with [vLLM](https://docs.vllm.ai/).
+
+<div className="key-value-table">
+
+|  |  |
+| --- | --- |
+| **Most common position in a pipeline** | In a query pipeline, after a component that returns a list of documents such as a [Retriever](../retrievers.mdx) |
+| **Mandatory init variables**           | `model`: The name of the reranker model served by vLLM                                    |
+| **Mandatory run variables**            | `query`: A query string  <br /> <br />`documents`: A list of document objects             |
+| **Output variables**                   | `documents`: A list of document objects                                                   |
+| **API reference**                      | [vLLM](/reference/integrations-vllm)                                                      |
+| **GitHub link**                        | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/vllm      |
+
+</div>
+
+## Overview
+
+[vLLM](https://docs.vllm.ai/) is a high-throughput and memory-efficient inference and serving engine for LLMs. It exposes an HTTP server, which `VLLMRanker` uses to rerank documents through the `/rerank` endpoint.
+
+`VLLMRanker` expects a vLLM server to be running and accessible at the `api_base_url` parameter (by default, `http://localhost:8000/v1`). Use this component after a Retriever in a query pipeline to reorder the retrieved documents by relevance to the query.
+
+You can also specify the `top_k` parameter to set the maximum number of documents to return, and the `score_threshold` parameter to drop documents with a relevance score below a given value.
+
+If the vLLM server was started with `--api-key`, provide the API key through the `VLLM_API_KEY` environment variable or the `api_key` init parameter using Haystack's [Secret](../../concepts/secret-management.mdx) API.
+
+### Compatible models
+
+vLLM supports a range of reranker models. Check the [vLLM supported models docs](https://docs.vllm.ai/en/stable/models/pooling_models/scoring/#supported-models) for the list of supported architectures and models.
+
+### vLLM-specific parameters
+
+You can pass vLLM-specific parameters through the `extra_parameters` dictionary. These are merged into the request body sent to the `/rerank` endpoint. Use this to pass parameters that are not part of the standard rerank API, such as `truncate_prompt_tokens`. See the [vLLM rerank API docs](https://docs.vllm.ai/en/stable/models/pooling_models/scoring/#rerank-api) for details.
+
+```python
+ranker = VLLMRanker(
+    model="BAAI/bge-reranker-base",
+    extra_parameters={"truncate_prompt_tokens": 256},
+)
+```
+
+### Embedding meta fields
+
+Some use cases benefit from including meta information (such as a title) alongside the document content when reranking. Pass the names of the meta fields to include through the `meta_fields_to_embed` parameter; they will be concatenated with the document content using `meta_data_separator`.
+
+```python
+ranker = VLLMRanker(
+    model="BAAI/bge-reranker-base",
+    meta_fields_to_embed=["title"],
+    meta_data_separator="\n",
+)
+```
+
+## Usage
+
+Install the `vllm-haystack` package to use the `VLLMRanker`:
+
+```shell
+pip install vllm-haystack
+```
+
+### Starting the vLLM server
+
+Before using this component, start a vLLM server with a reranker model:
+
+```bash
+vllm serve BAAI/bge-reranker-base
+```
+
+For details on server options, see the [vLLM CLI docs](https://docs.vllm.ai/en/stable/cli/serve/).
+
+### On its own
+
+```python
+from haystack import Document
+from haystack_integrations.components.rankers.vllm import VLLMRanker
+
+ranker = VLLMRanker(model="BAAI/bge-reranker-base")
+
+docs = [
+    Document(content="The capital of Brazil is Brasilia."),
+    Document(content="The capital of France is Paris."),
+]
+result = ranker.run(query="What is the capital of France?", documents=docs)
+print(result["documents"][0].content)
+
+## The capital of France is Paris.
+```
+
+### In a pipeline
+
+```python
+from haystack import Document, Pipeline
+from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
+from haystack.document_stores.in_memory import InMemoryDocumentStore
+from haystack_integrations.components.rankers.vllm import VLLMRanker
+
+docs = [
+    Document(content="Paris is in France"),
+    Document(content="Berlin is in Germany"),
+    Document(content="Lyon is in France"),
+]
+document_store = InMemoryDocumentStore()
+document_store.write_documents(docs)
+
+retriever = InMemoryBM25Retriever(document_store=document_store)
+ranker = VLLMRanker(model="BAAI/bge-reranker-base")
+
+document_ranker_pipeline = Pipeline()
+document_ranker_pipeline.add_component(instance=retriever, name="retriever")
+document_ranker_pipeline.add_component(instance=ranker, name="ranker")
+
+document_ranker_pipeline.connect("retriever.documents", "ranker.documents")
+
+query = "Cities in France"
+result = document_ranker_pipeline.run(
+    data={
+        "retriever": {"query": query, "top_k": 3},
+        "ranker": {"query": query, "top_k": 2},
+    },
+)
+
+print(result["ranker"]["documents"][0])
+
+## Document(id=..., content: 'Paris is in France', score: ...)
+```
diff --git a/docs-website/versioned_sidebars/version-2.28-sidebars.json b/docs-website/versioned_sidebars/version-2.28-sidebars.json