|
| 1 | +--- |
| 2 | +title: "VLLMRanker" |
| 3 | +id: vllmranker |
| 4 | +slug: "/vllmranker" |
| 5 | +description: "This component ranks documents based on their similarity to the query using reranker models served with vLLM." |
| 6 | +--- |
| 7 | + |
| 8 | +# VLLMRanker |
| 9 | + |
| 10 | +This component ranks documents based on their similarity to the query using reranker models served with [vLLM](https://docs.vllm.ai/). |
| 11 | + |
| 12 | +<div className="key-value-table"> |
| 13 | + |
| 14 | +| | | |
| 15 | +| --- | --- | |
| 16 | +| **Most common position in a pipeline** | In a query pipeline, after a component that returns a list of documents such as a [Retriever](../retrievers.mdx) | |
| 17 | +| **Mandatory init variables** | `model`: The name of the reranker model served by vLLM | |
| 18 | +| **Mandatory run variables** | `query`: A query string <br /> <br />`documents`: A list of document objects | |
| 19 | +| **Output variables** | `documents`: A list of document objects | |
| 20 | +| **API reference** | [vLLM](/reference/integrations-vllm) | |
| 21 | +| **GitHub link** | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/vllm | |
| 22 | + |
| 23 | +</div> |
| 24 | + |
| 25 | +## Overview |
| 26 | + |
| 27 | +[vLLM](https://docs.vllm.ai/) is a high-throughput and memory-efficient inference and serving engine for LLMs. It exposes an HTTP server, which `VLLMRanker` uses to rerank documents through the `/rerank` endpoint. |
| 28 | + |
| 29 | +`VLLMRanker` expects a vLLM server to be running and accessible at the `api_base_url` parameter (by default, `http://localhost:8000/v1`). Use this component after a Retriever in a query pipeline to reorder the retrieved documents by relevance to the query. |
| 30 | + |
| 31 | +You can also specify the `top_k` parameter to set the maximum number of documents to return, and the `score_threshold` parameter to drop documents with a relevance score below a given value. |
| 32 | + |
| 33 | +If the vLLM server was started with `--api-key`, provide the API key through the `VLLM_API_KEY` environment variable or the `api_key` init parameter using Haystack's [Secret](../../concepts/secret-management.mdx) API. |
| 34 | + |
| 35 | +### Compatible models |
| 36 | + |
| 37 | +vLLM supports a range of reranker models. Check the [vLLM supported models docs](https://docs.vllm.ai/en/stable/models/pooling_models/scoring/#supported-models) for the list of supported architectures and models. |
| 38 | + |
| 39 | +### vLLM-specific parameters |
| 40 | + |
| 41 | +You can pass vLLM-specific parameters through the `extra_parameters` dictionary. These are merged into the request body sent to the `/rerank` endpoint. Use this to pass parameters that are not part of the standard rerank API, such as `truncate_prompt_tokens`. See the [vLLM rerank API docs](https://docs.vllm.ai/en/stable/models/pooling_models/scoring/#rerank-api) for details. |
| 42 | + |
| 43 | +```python |
| 44 | +ranker = VLLMRanker( |
| 45 | + model="BAAI/bge-reranker-base", |
| 46 | + extra_parameters={"truncate_prompt_tokens": 256}, |
| 47 | +) |
| 48 | +``` |
| 49 | + |
| 50 | +### Embedding meta fields |
| 51 | + |
| 52 | +Some use cases benefit from including meta information (such as a title) alongside the document content when reranking. Pass the names of the meta fields to include through the `meta_fields_to_embed` parameter; they will be concatenated with the document content using `meta_data_separator`. |
| 53 | + |
| 54 | +```python |
| 55 | +ranker = VLLMRanker( |
| 56 | + model="BAAI/bge-reranker-base", |
| 57 | + meta_fields_to_embed=["title"], |
| 58 | + meta_data_separator="\n", |
| 59 | +) |
| 60 | +``` |
| 61 | + |
| 62 | +## Usage |
| 63 | + |
| 64 | +Install the `vllm-haystack` package to use the `VLLMRanker`: |
| 65 | + |
| 66 | +```shell |
| 67 | +pip install vllm-haystack |
| 68 | +``` |
| 69 | + |
| 70 | +### Starting the vLLM server |
| 71 | + |
| 72 | +Before using this component, start a vLLM server with a reranker model: |
| 73 | + |
| 74 | +```bash |
| 75 | +vllm serve BAAI/bge-reranker-base |
| 76 | +``` |
| 77 | + |
| 78 | +For details on server options, see the [vLLM CLI docs](https://docs.vllm.ai/en/stable/cli/serve/). |
| 79 | + |
| 80 | +### On its own |
| 81 | + |
| 82 | +```python |
| 83 | +from haystack import Document |
| 84 | +from haystack_integrations.components.rankers.vllm import VLLMRanker |
| 85 | + |
| 86 | +ranker = VLLMRanker(model="BAAI/bge-reranker-base") |
| 87 | + |
| 88 | +docs = [ |
| 89 | + Document(content="The capital of Brazil is Brasilia."), |
| 90 | + Document(content="The capital of France is Paris."), |
| 91 | +] |
| 92 | +result = ranker.run(query="What is the capital of France?", documents=docs) |
| 93 | +print(result["documents"][0].content) |
| 94 | + |
| 95 | +## The capital of France is Paris. |
| 96 | +``` |
| 97 | + |
| 98 | +### In a pipeline |
| 99 | + |
| 100 | +```python |
| 101 | +from haystack import Document, Pipeline |
| 102 | +from haystack.components.retrievers.in_memory import InMemoryBM25Retriever |
| 103 | +from haystack.document_stores.in_memory import InMemoryDocumentStore |
| 104 | +from haystack_integrations.components.rankers.vllm import VLLMRanker |
| 105 | + |
| 106 | +docs = [ |
| 107 | + Document(content="Paris is in France"), |
| 108 | + Document(content="Berlin is in Germany"), |
| 109 | + Document(content="Lyon is in France"), |
| 110 | +] |
| 111 | +document_store = InMemoryDocumentStore() |
| 112 | +document_store.write_documents(docs) |
| 113 | + |
| 114 | +retriever = InMemoryBM25Retriever(document_store=document_store) |
| 115 | +ranker = VLLMRanker(model="BAAI/bge-reranker-base") |
| 116 | + |
| 117 | +document_ranker_pipeline = Pipeline() |
| 118 | +document_ranker_pipeline.add_component(instance=retriever, name="retriever") |
| 119 | +document_ranker_pipeline.add_component(instance=ranker, name="ranker") |
| 120 | + |
| 121 | +document_ranker_pipeline.connect("retriever.documents", "ranker.documents") |
| 122 | + |
| 123 | +query = "Cities in France" |
| 124 | +result = document_ranker_pipeline.run( |
| 125 | + data={ |
| 126 | + "retriever": {"query": query, "top_k": 3}, |
| 127 | + "ranker": {"query": query, "top_k": 2}, |
| 128 | + }, |
| 129 | +) |
| 130 | + |
| 131 | +print(result["ranker"]["documents"][0]) |
| 132 | + |
| 133 | +## Document(id=..., content: 'Paris is in France', score: ...) |
| 134 | +``` |
0 commit comments