@@ -118,7 +118,7 @@ Create the OpenAI clients.
118118#### run
119119
120120``` python
121- run(documents: list[Document]) -> dict[str , Any]
121+ run(documents: list[Document]) -> dict[str , list[Document] | dict[ str , Any] ]
122122```
123123
124124Embed a list of Documents.
@@ -129,14 +129,16 @@ Embed a list of Documents.
129129
130130** Returns:**
131131
132- - <code >dict\[ str, Any\] </code > – A dictionary with:
132+ - <code >dict\[ str, list \[ Document \] | dict \[ str, Any\] \] </code > – A dictionary with:
133133- ` documents ` : The input documents with their ` embedding ` field populated.
134134- ` meta ` : Information about the usage of the model.
135135
136136#### run_async
137137
138138``` python
139- run_async(documents: list[Document]) -> dict[str , Any]
139+ run_async(
140+ documents: list[Document],
141+ ) -> dict[str , list[Document] | dict[str , Any]]
140142```
141143
142144Asynchronously embed a list of Documents.
@@ -147,7 +149,7 @@ Asynchronously embed a list of Documents.
147149
148150** Returns:**
149151
150- - <code >dict\[ str, Any\] </code > – A dictionary with:
152+ - <code >dict\[ str, list \[ Document \] | dict \[ str, Any\] \] </code > – A dictionary with:
151153- ` documents ` : The input documents with their ` embedding ` field populated.
152154- ` meta ` : Information about the usage of the model.
153155
@@ -245,7 +247,7 @@ Create the OpenAI clients.
245247#### run
246248
247249``` python
248- run(text: str ) -> dict[str , Any]
250+ run(text: str ) -> dict[str , list[ float ] | dict[ str , Any] ]
249251```
250252
251253Embed a single string.
@@ -256,14 +258,14 @@ Embed a single string.
256258
257259** Returns:**
258260
259- - <code >dict\[ str, Any\] </code > – A dictionary with:
261+ - <code >dict\[ str, list \[ float \] | dict \[ str, Any\] \] </code > – A dictionary with:
260262- ` embedding ` : The embedding of the input text.
261263- ` meta ` : Information about the usage of the model.
262264
263265#### run_async
264266
265267``` python
266- run_async(text: str ) -> dict[str , Any]
268+ run_async(text: str ) -> dict[str , list[ float ] | dict[ str , Any] ]
267269```
268270
269271Asynchronously embed a single string.
@@ -274,7 +276,7 @@ Asynchronously embed a single string.
274276
275277** Returns:**
276278
277- - <code >dict\[ str, Any\] </code > – A dictionary with:
279+ - <code >dict\[ str, list \[ float \] | dict \[ str, Any\] \] </code > – A dictionary with:
278280- ` embedding ` : The embedding of the input text.
279281- ` meta ` : Information about the usage of the model.
280282
@@ -532,3 +534,164 @@ Run the VLLM chat generator on the given input data asynchronously.
532534
533535- <code >dict\[ str, list\[ ChatMessage\]\] </code > – A dictionary with the following key:
534536- ` replies ` : A list containing the generated responses as ChatMessage instances.
537+
538+ ## haystack_integrations.components.rankers.vllm.ranker
539+
540+ ### VLLMRanker
541+
542+ Ranks Documents based on their similarity to a query using models served with [ vLLM] ( https://docs.vllm.ai/ ) .
543+
544+ It expects a vLLM server to be running and accessible at the ` api_base_url ` parameter and uses the
545+ ` /rerank ` endpoint exposed by vLLM.
546+
547+ ### Starting the vLLM server
548+
549+ Before using this component, start a vLLM server with a reranker model:
550+
551+ ``` bash
552+ vllm serve BAAI/bge-reranker-base
553+ ```
554+
555+ For details on server options, see the [ vLLM CLI docs] ( https://docs.vllm.ai/en/stable/cli/serve/ ) .
556+
557+ ### Usage example
558+
559+ ``` python
560+ from haystack import Document
561+ from haystack_integrations.components.rankers.vllm import VLLMRanker
562+
563+ ranker = VLLMRanker(model = " BAAI/bge-reranker-base" )
564+ docs = [
565+ Document(content = " The capital of Brazil is Brasilia." ),
566+ Document(content = " The capital of France is Paris." ),
567+ ]
568+ result = ranker.run(query = " What is the capital of France?" , documents = docs)
569+ print (result[" documents" ][0 ].content)
570+ ```
571+
572+ ### Usage example with vLLM-specific parameters
573+
574+ Pass vLLM-specific parameters via the ` extra_parameters ` dictionary. They are merged into the
575+ request body sent to the ` /rerank ` endpoint.
576+
577+ ``` python
578+ ranker = VLLMRanker(
579+ model = " BAAI/bge-reranker-base" ,
580+ extra_parameters = {" truncate_prompt_tokens" : 256 },
581+ )
582+ ```
583+
584+ #### __ init__
585+
586+ ``` python
587+ __init__ (
588+ * ,
589+ model: str ,
590+ api_key: Secret | None = Secret.from_env_var(" VLLM_API_KEY" , strict = False ),
591+ api_base_url: str = " http://localhost:8000/v1" ,
592+ top_k: int | None = None ,
593+ score_threshold: float | None = None ,
594+ meta_fields_to_embed: list[str ] | None = None ,
595+ meta_data_separator: str = " \n " ,
596+ http_client_kwargs: dict[str , Any] | None = None ,
597+ extra_parameters: dict[str , Any] | None = None
598+ ) -> None
599+ ```
600+
601+ Creates an instance of VLLMRanker.
602+
603+ ** Parameters:**
604+
605+ - ** model** (<code >str</code >) – The name of the reranker model served by vLLM. Check
606+ [ vLLM documentation] ( https://docs.vllm.ai/en/stable/models/pooling_models/scoring/#supported-models ) for
607+ information on supported models.
608+ - ** api_key** (<code >Secret | None</code >) – The vLLM API key. Defaults to the ` VLLM_API_KEY ` environment variable.
609+ Only required if the vLLM server was started with ` --api-key ` .
610+ - ** api_base_url** (<code >str</code >) – The base URL of the vLLM server.
611+ - ** top_k** (<code >int | None</code >) – The maximum number of Documents to return. If ` None ` , all documents are returned.
612+ - ** score_threshold** (<code >float | None</code >) – If set, documents with a relevance score below this value are dropped.
613+ Applied after ` top_k ` , so the output may contain fewer than ` top_k ` documents.
614+ - ** meta_fields_to_embed** (<code >list\[ str\] | None</code >) – List of meta fields that should be concatenated with the document
615+ content before reranking.
616+ - ** meta_data_separator** (<code >str</code >) – Separator used to concatenate the meta fields to the document content.
617+ - ** http_client_kwargs** (<code >dict\[ str, Any\] | None</code >) – A dictionary of keyword arguments to configure a custom ` httpx.Client ` or
618+ ` httpx.AsyncClient ` . For more information, see the
619+ [ HTTPX documentation] ( https://www.python-httpx.org/api/#client ) .
620+ - ** extra_parameters** (<code >dict\[ str, Any\] | None</code >) – Additional parameters merged into the request body sent to the vLLM
621+ ` /rerank ` endpoint. Use this to pass parameters not part of the standard rerank API, such as
622+ ` truncate_prompt_tokens ` . See the
623+ [ vLLM docs] ( https://docs.vllm.ai/en/stable/models/pooling_models/scoring/#rerank-api ) for more information.
624+
625+ ** Raises:**
626+
627+ - <code >ValueError</code > – If ` top_k ` is not > 0.
628+
629+ #### warm_up
630+
631+ ``` python
632+ warm_up() -> None
633+ ```
634+
635+ Create the httpx clients.
636+
637+ #### run
638+
639+ ``` python
640+ run(
641+ query: str ,
642+ documents: list[Document],
643+ top_k: int | None = None ,
644+ score_threshold: float | None = None ,
645+ ) -> dict[str , list[Document] | dict[str , Any]]
646+ ```
647+
648+ Returns a list of Documents ranked by their similarity to the given query.
649+
650+ ** Parameters:**
651+
652+ - ** query** (<code >str</code >) – Query string.
653+ - ** documents** (<code >list\[ Document\] </code >) – List of Documents to rank.
654+ - ** top_k** (<code >int | None</code >) – The maximum number of Documents to return. Overrides the value set at initialization.
655+ - ** score_threshold** (<code >float | None</code >) – Minimum relevance score required for a document to be returned. Overrides
656+ the value set at initialization.
657+
658+ ** Returns:**
659+
660+ - <code >dict\[ str, list\[ Document\] | dict\[ str, Any\]\] </code > – A dictionary with:
661+ - ` documents ` : Documents sorted from most to least relevant.
662+ - ` meta ` : Information about the model and usage.
663+
664+ ** Raises:**
665+
666+ - <code >ValueError</code > – If ` top_k ` is not > 0.
667+
668+ #### run_async
669+
670+ ``` python
671+ run_async(
672+ query: str ,
673+ documents: list[Document],
674+ top_k: int | None = None ,
675+ score_threshold: float | None = None ,
676+ ) -> dict[str , list[Document] | dict[str , Any]]
677+ ```
678+
679+ Asynchronously returns a list of Documents ranked by their similarity to the given query.
680+
681+ ** Parameters:**
682+
683+ - ** query** (<code >str</code >) – Query string.
684+ - ** documents** (<code >list\[ Document\] </code >) – List of Documents to rank.
685+ - ** top_k** (<code >int | None</code >) – The maximum number of Documents to return. Overrides the value set at initialization.
686+ - ** score_threshold** (<code >float | None</code >) – Minimum relevance score required for a document to be returned. Overrides
687+ the value set at initialization.
688+
689+ ** Returns:**
690+
691+ - <code >dict\[ str, list\[ Document\] | dict\[ str, Any\]\] </code > – A dictionary with:
692+ - ` documents ` : Documents sorted from most to least relevant.
693+ - ` meta ` : Information about the model and usage.
694+
695+ ** Raises:**
696+
697+ - <code >ValueError</code > – If ` top_k ` is not > 0.
0 commit comments