Skip to content

Commit 5cdac70

Browse files
authored
docs: FastembedLateInteractionRanker - refactor docs (#11116)
1 parent fc67668 commit 5cdac70

6 files changed

Lines changed: 218 additions & 24 deletions

File tree

docs-website/docs/pipeline-components/rankers.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ Rankers are a group of components that order documents by given criteria. Their
1414
| [AmazonBedrockRanker](rankers/amazonbedrockranker.mdx) | Ranks documents based on their similarity to the query using Amazon Bedrock models. |
1515
| [CohereRanker](rankers/cohereranker.mdx) | Ranks documents based on their similarity to the query using Cohere rerank models. |
1616
| [FastembedRanker](rankers/fastembedranker.mdx) | Ranks documents based on their similarity to the query using cross-encoder models supported by FastEmbed. |
17-
| [FastembedColbertRanker](rankers/fastembedcolbertranker.mdx) | Ranks documents based on their similarity to the query using ColBERT models supported by FastEmbed. |
17+
| [FastembedLateInteractionRanker](rankers/fastembedlateinteractionranker.mdx) | Ranks documents based on their similarity to the query using late interaction models supported by FastEmbed. |
1818
| [HuggingFaceTEIRanker](rankers/huggingfaceteiranker.mdx) | Ranks documents based on their similarity to the query using a Text Embeddings Inference (TEI) API endpoint. |
1919
| [JinaRanker](rankers/jinaranker.mdx) | Ranks documents based on their similarity to the query using Jina AI models. |
2020
| [LLMRanker](rankers/llmranker.mdx) | Ranks documents for a query using a Large Language Model, which returns ranked document indices as JSON. |

docs-website/docs/pipeline-components/rankers/fastembedcolbertranker.mdx renamed to docs-website/docs/pipeline-components/rankers/fastembedlateinteractionranker.mdx

Lines changed: 32 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
---
2-
title: "FastembedColbertRanker"
3-
id: fastembedcolbertranker
4-
slug: "/fastembedcolbertranker"
5-
description: "Use this component to rank documents based on ColBERT late-interaction scoring using models supported by FastEmbed."
2+
title: "FastembedLateInteractionRanker"
3+
id: fastembedlateinteractionranker
4+
slug: "/fastembedlateinteractionranker"
5+
description: "Use this component to rank documents based on late interaction scoring using models supported by FastEmbed."
66
---
77

8-
# FastembedColbertRanker
8+
# FastembedLateInteractionRanker
99

1010
Use this component to rank documents based on their similarity to the query using ColBERT models via FastEmbed.
1111

@@ -23,11 +23,11 @@ Use this component to rank documents based on their similarity to the query usin
2323

2424
## Overview
2525

26-
`FastembedColbertRanker` ranks documents using **ColBERT late-interaction scoring**. Unlike cross-encoder rankers (which encode the query and document together), ColBERT encodes the query and each document independently into token-level embeddings, then computes a **MaxSim** score: for each query token, it finds the most similar document token, and sums these maximum similarities into a final relevance score.
26+
`FastembedLateInteractionRanker` ranks documents using **late interaction scoring**. Unlike cross-encoder rankers (which encode the query and document together), ColBERT encodes the query and each document independently into token-level embeddings, then computes a **MaxSim** score: for each query token, it finds the most similar document token, and sums these maximum similarities into a final relevance score.
2727

2828
This approach gives ColBERT a strong balance between accuracy and efficiency — it is more expressive than bi-encoders while being faster than cross-encoders at inference time.
2929

30-
`FastembedColbertRanker` is most useful in query pipelines such as a retrieval-augmented generation (RAG) pipeline or a document search pipeline. Use it after a Retriever to rerank a candidate set of documents by relevance. When combining with a Retriever, set the Retriever's `top_k` higher than the Ranker's `top_k` — retrieve a broad candidate set, then let ColBERT select the best ones.
30+
`FastembedLateInteractionRanker` is most useful in query pipelines such as a retrieval-augmented generation (RAG) pipeline or a document search pipeline. Use it after a Retriever to rerank a candidate set of documents by relevance. When combining with a Retriever, set the Retriever's `top_k` higher than the Ranker's `top_k` — retrieve a broad candidate set, then let ColBERT select the best ones.
3131

3232
By default, this component uses the `colbert-ir/colbertv2.0` model. For details on different initialization settings, check out the [API reference](/reference/fastembed-embedders) page.
3333

@@ -52,7 +52,7 @@ pip install fastembed-haystack
5252
You can set the path where the model is stored in a cache directory. You can also set the number of threads a single `onnxruntime` session can use.
5353

5454
```python
55-
ranker = FastembedColbertRanker(
55+
ranker = FastembedLateInteractionRanker(
5656
model_name="colbert-ir/colbertv2.0",
5757
cache_dir="/your_cache_directory",
5858
threads=2,
@@ -62,7 +62,7 @@ ranker = FastembedColbertRanker(
6262
For offline encoding of large document sets, enable data-parallel processing:
6363

6464
```python
65-
ranker = FastembedColbertRanker(
65+
ranker = FastembedLateInteractionRanker(
6666
model_name="colbert-ir/colbertv2.0",
6767
batch_size=64,
6868
parallel=2, # number of parallel processes; 0 = use all cores
@@ -73,15 +73,17 @@ ranker = FastembedColbertRanker(
7373

7474
### On its own
7575

76-
This example uses `FastembedColbertRanker` to rank two simple documents.
76+
This example uses `FastembedLateInteractionRanker` to rank two simple documents.
7777

7878
```python
7979
from haystack import Document
80-
from haystack_integrations.components.rankers.fastembed import FastembedColbertRanker
80+
from haystack_integrations.components.rankers.fastembed import (
81+
FastembedLateInteractionRanker,
82+
)
8183

8284
docs = [Document(content="Paris"), Document(content="Berlin")]
8385

84-
ranker = FastembedColbertRanker(model_name="colbert-ir/colbertv2.0", top_k=1)
86+
ranker = FastembedLateInteractionRanker(model_name="colbert-ir/colbertv2.0", top_k=1)
8587

8688
result = ranker.run(query="City in Germany", documents=docs)
8789
print(result["documents"][0].content)
@@ -90,21 +92,30 @@ print(result["documents"][0].content)
9092

9193
### In a pipeline
9294

93-
Below is an example of a full RAG pipeline that retrieves documents using embedding similarity, reranks them with `FastembedColbertRanker`, and generates an answer with an LLM.
95+
Below is an example of a full RAG pipeline that retrieves documents using embedding similarity, reranks them with `FastembedLateInteractionRanker`, and generates an answer with an LLM.
96+
97+
This example uses the `HuggingFaceLocalChatGenerator`, which requires additional packages:
98+
99+
```shell
100+
pip install "transformers[torch]"
101+
```
94102

95103
```python
96104
from haystack import Document, Pipeline
97105
from haystack.document_stores.in_memory import InMemoryDocumentStore
98-
from haystack.components.embedders import (
99-
SentenceTransformersDocumentEmbedder,
100-
SentenceTransformersTextEmbedder,
101-
)
106+
102107
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
103108
from haystack.components.builders.chat_prompt_builder import ChatPromptBuilder
104109
from haystack.components.generators.chat import HuggingFaceLocalChatGenerator
105110
from haystack.components.writers import DocumentWriter
106111
from haystack.dataclasses import ChatMessage
107-
from haystack_integrations.components.rankers.fastembed import FastembedColbertRanker
112+
from haystack_integrations.components.rankers.fastembed import (
113+
FastembedLateInteractionRanker,
114+
)
115+
from haystack_integrations.components.embedders.fastembed import (
116+
FastembedDocumentEmbedder,
117+
FastembedTextEmbedder,
118+
)
108119

109120
# Set up and populate the document store
110121
document_store = InMemoryDocumentStore()
@@ -115,7 +126,7 @@ docs = [
115126
]
116127

117128
indexing = Pipeline()
118-
indexing.add_component("embedder", SentenceTransformersDocumentEmbedder())
129+
indexing.add_component("embedder", FastembedDocumentEmbedder())
119130
indexing.add_component("writer", DocumentWriter(document_store=document_store))
120131
indexing.connect("embedder", "writer")
121132
indexing.run({"embedder": {"documents": docs}})
@@ -132,14 +143,14 @@ prompt_template = [
132143

133144
# Build the query pipeline with ColBERT reranking
134145
rag = Pipeline()
135-
rag.add_component("text_embedder", SentenceTransformersTextEmbedder())
146+
rag.add_component("text_embedder", FastembedTextEmbedder())
136147
rag.add_component(
137148
"retriever",
138149
InMemoryEmbeddingRetriever(document_store=document_store, top_k=3),
139150
)
140151
rag.add_component(
141152
"ranker",
142-
FastembedColbertRanker(model_name="colbert-ir/colbertv2.0", top_k=2),
153+
FastembedLateInteractionRanker(model_name="colbert-ir/colbertv2.0", top_k=2),
143154
)
144155
rag.add_component(
145156
"prompt_builder",

docs-website/sidebars.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -487,7 +487,7 @@ export default {
487487
'pipeline-components/rankers/amazonbedrockranker',
488488
'pipeline-components/rankers/cohereranker',
489489
'pipeline-components/rankers/fastembedranker',
490-
'pipeline-components/rankers/fastembedcolbertranker',
490+
'pipeline-components/rankers/fastembedlateinteractionranker',
491491
'pipeline-components/rankers/huggingfaceteiranker',
492492
'pipeline-components/rankers/jinaranker',
493493
'pipeline-components/rankers/llmranker',

docs-website/versioned_docs/version-2.27/pipeline-components/rankers.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ Rankers are a group of components that order documents by given criteria. Their
1414
| [AmazonBedrockRanker](rankers/amazonbedrockranker.mdx) | Ranks documents based on their similarity to the query using Amazon Bedrock models. |
1515
| [CohereRanker](rankers/cohereranker.mdx) | Ranks documents based on their similarity to the query using Cohere rerank models. |
1616
| [FastembedRanker](rankers/fastembedranker.mdx) | Ranks documents based on their similarity to the query using cross-encoder models supported by FastEmbed. |
17+
| [FastembedLateInteractionRanker](rankers/fastembedlateinteractionranker.mdx) | Ranks documents based on their similarity to the query using late interaction models supported by FastEmbed. |
1718
| [HuggingFaceTEIRanker](rankers/huggingfaceteiranker.mdx) | Ranks documents based on their similarity to the query using a Text Embeddings Inference (TEI) API endpoint. |
1819
| [JinaRanker](rankers/jinaranker.mdx) | Ranks documents based on their similarity to the query using Jina AI models. |
1920
| [LLMRanker](rankers/llmranker.mdx) | Ranks documents for a query using a Large Language Model, which returns ranked document indices as JSON. |
Lines changed: 181 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,181 @@
1+
---
2+
title: "FastembedLateInteractionRanker"
3+
id: fastembedlateinteractionranker
4+
slug: "/fastembedlateinteractionranker"
5+
description: "Use this component to rank documents based on late interaction scoring using models supported by FastEmbed."
6+
---
7+
8+
# FastembedLateInteractionRanker
9+
10+
Use this component to rank documents based on their similarity to the query using ColBERT models via FastEmbed.
11+
12+
<div className="key-value-table">
13+
14+
| | |
15+
| --- | --- |
16+
| **Most common position in a pipeline** | In a query pipeline, after a component that returns a list of documents such as a [Retriever](../retrievers.mdx) |
17+
| **Mandatory run variables** | `documents`: A list of documents <br /> <br />`query`: A query string |
18+
| **Output variables** | `documents`: A list of documents |
19+
| **API reference** | [FastEmbed](/reference/fastembed-embedders) |
20+
| **GitHub link** | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/fastembed |
21+
22+
</div>
23+
24+
## Overview
25+
26+
`FastembedLateInteractionRanker` ranks documents using **late interaction scoring**. Unlike cross-encoder rankers (which encode the query and document together), ColBERT encodes the query and each document independently into token-level embeddings, then computes a **MaxSim** score: for each query token, it finds the most similar document token, and sums these maximum similarities into a final relevance score.
27+
28+
This approach gives ColBERT a strong balance between accuracy and efficiency — it is more expressive than bi-encoders while being faster than cross-encoders at inference time.
29+
30+
`FastembedLateInteractionRanker` is most useful in query pipelines such as a retrieval-augmented generation (RAG) pipeline or a document search pipeline. Use it after a Retriever to rerank a candidate set of documents by relevance. When combining with a Retriever, set the Retriever's `top_k` higher than the Ranker's `top_k` — retrieve a broad candidate set, then let ColBERT select the best ones.
31+
32+
By default, this component uses the `colbert-ir/colbertv2.0` model. For details on different initialization settings, check out the [API reference](/reference/fastembed-embedders) page.
33+
34+
:::note
35+
ColBERT scores are **unnormalized sums** (not probabilities). Their magnitude depends on query length and document length, typically ranging from ~3 to ~30. They are meaningful for ranking within a single query but should not be compared across different queries.
36+
:::
37+
38+
### Compatible Models
39+
40+
You can find the compatible ColBERT models in the [FastEmbed documentation](https://qdrant.github.io/fastembed/examples/Supported_Models/).
41+
42+
### Installation
43+
44+
To start using this integration with Haystack, install the package with:
45+
46+
```shell
47+
pip install fastembed-haystack
48+
```
49+
50+
### Parameters
51+
52+
You can set the path where the model is stored in a cache directory. You can also set the number of threads a single `onnxruntime` session can use.
53+
54+
```python
55+
ranker = FastembedLateInteractionRanker(
56+
model_name="colbert-ir/colbertv2.0",
57+
cache_dir="/your_cache_directory",
58+
threads=2,
59+
)
60+
```
61+
62+
For offline encoding of large document sets, enable data-parallel processing:
63+
64+
```python
65+
ranker = FastembedLateInteractionRanker(
66+
model_name="colbert-ir/colbertv2.0",
67+
batch_size=64,
68+
parallel=2, # number of parallel processes; 0 = use all cores
69+
)
70+
```
71+
72+
## Usage
73+
74+
### On its own
75+
76+
This example uses `FastembedLateInteractionRanker` to rank two simple documents.
77+
78+
```python
79+
from haystack import Document
80+
from haystack_integrations.components.rankers.fastembed import (
81+
FastembedLateInteractionRanker,
82+
)
83+
84+
docs = [Document(content="Paris"), Document(content="Berlin")]
85+
86+
ranker = FastembedLateInteractionRanker(model_name="colbert-ir/colbertv2.0", top_k=1)
87+
88+
result = ranker.run(query="City in Germany", documents=docs)
89+
print(result["documents"][0].content)
90+
# Berlin
91+
```
92+
93+
### In a pipeline
94+
95+
Below is an example of a full RAG pipeline that retrieves documents using embedding similarity, reranks them with `FastembedLateInteractionRanker`, and generates an answer with an LLM.
96+
97+
This example uses the `HuggingFaceLocalChatGenerator`, which requires additional packages:
98+
99+
```shell
100+
pip install "transformers[torch]"
101+
```
102+
103+
```python
104+
from haystack import Document, Pipeline
105+
from haystack.document_stores.in_memory import InMemoryDocumentStore
106+
107+
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
108+
from haystack.components.builders.chat_prompt_builder import ChatPromptBuilder
109+
from haystack.components.generators.chat import HuggingFaceLocalChatGenerator
110+
from haystack.components.writers import DocumentWriter
111+
from haystack.dataclasses import ChatMessage
112+
from haystack_integrations.components.rankers.fastembed import (
113+
FastembedLateInteractionRanker,
114+
)
115+
from haystack_integrations.components.embedders.fastembed import (
116+
FastembedDocumentEmbedder,
117+
FastembedTextEmbedder,
118+
)
119+
120+
# Set up and populate the document store
121+
document_store = InMemoryDocumentStore()
122+
docs = [
123+
Document(content="Paris is the capital of France."),
124+
Document(content="Berlin is the capital of Germany."),
125+
Document(content="Madrid is the capital of Spain."),
126+
]
127+
128+
indexing = Pipeline()
129+
indexing.add_component("embedder", FastembedDocumentEmbedder())
130+
indexing.add_component("writer", DocumentWriter(document_store=document_store))
131+
indexing.connect("embedder", "writer")
132+
indexing.run({"embedder": {"documents": docs}})
133+
134+
# Define the chat prompt template
135+
prompt_template = [
136+
ChatMessage.from_system("You are a helpful assistant."),
137+
ChatMessage.from_user(
138+
"Given these documents, answer the question.\n"
139+
"Documents:\n{% for doc in documents %}{{ doc.content }}{% endfor %}\n"
140+
"Question: {{query}}\nAnswer:",
141+
),
142+
]
143+
144+
# Build the query pipeline with ColBERT reranking
145+
rag = Pipeline()
146+
rag.add_component("text_embedder", FastembedTextEmbedder())
147+
rag.add_component(
148+
"retriever",
149+
InMemoryEmbeddingRetriever(document_store=document_store, top_k=3),
150+
)
151+
rag.add_component(
152+
"ranker",
153+
FastembedLateInteractionRanker(model_name="colbert-ir/colbertv2.0", top_k=2),
154+
)
155+
rag.add_component(
156+
"prompt_builder",
157+
ChatPromptBuilder(
158+
template=prompt_template,
159+
required_variables={"query", "documents"},
160+
),
161+
)
162+
rag.add_component(
163+
"llm",
164+
HuggingFaceLocalChatGenerator(model="HuggingFaceTB/SmolLM2-360M-Instruct"),
165+
)
166+
167+
rag.connect("text_embedder.embedding", "retriever.query_embedding")
168+
rag.connect("retriever.documents", "ranker.documents")
169+
rag.connect("ranker.documents", "prompt_builder.documents")
170+
rag.connect("prompt_builder.prompt", "llm.messages")
171+
172+
query = "What is the capital of Germany?"
173+
result = rag.run(
174+
{
175+
"text_embedder": {"text": query},
176+
"ranker": {"query": query},
177+
"prompt_builder": {"query": query},
178+
},
179+
)
180+
print(result["llm"]["replies"][0].text)
181+
```

docs-website/versioned_sidebars/version-2.27-sidebars.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -483,6 +483,7 @@
483483
"pipeline-components/rankers/amazonbedrockranker",
484484
"pipeline-components/rankers/cohereranker",
485485
"pipeline-components/rankers/fastembedranker",
486+
"pipeline-components/rankers/fastembedlateinteractionranker",
486487
"pipeline-components/rankers/huggingfaceteiranker",
487488
"pipeline-components/rankers/jinaranker",
488489
"pipeline-components/rankers/llmranker",
@@ -700,4 +701,4 @@
700701
]
701702
}
702703
]
703-
}
704+
}

0 commit comments

Comments
 (0)