Skip to content

Commit 01ee2d9

Browse files
Clarify ViDoRe v1-v3 evaluation in Readme (#137)
* update readme * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * Update README for MTEB evaluation process clarity --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
1 parent 62dd6a5 commit 01ee2d9

1 file changed

Lines changed: 25 additions & 2 deletions

File tree

README.md

Lines changed: 25 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,30 @@
1919
2020
---
2121

22-
## What is Pipeline Evaluation?
22+
## Evaluating single-model retrievers on ViDoRe v1–v3 with MTEB
23+
We shifted from in-house evaluations to the general MTEB evaluation framework for retrieval models by moving to [MTEB](https://github.com/embeddings-benchmark/mteb/tree/main).
24+
25+
Here are the main steps to evaluate and submit your retriever to the ViDoRe V1-V3 leaderboards ; see the [MTEB official documentation](https://embeddings-benchmark.github.io/mteb/contributing/adding_a_model/) for full details. This section covers mteb leaderboards only; for our in-house pipeline leaderboard, see the section below.
26+
27+
1. Create your model implementation file (if it does not exist already) [here](https://github.com/embeddings-benchmark/mteb/tree/main/mteb/models/model_implementations), then open a PR to the [MTEB repository](https://github.com/embeddings-benchmark/mteb) with your changes; examples for Colpali-like models can be found in [this file](https://github.com/embeddings-benchmark/mteb/blob/main/mteb/models/model_implementations/colpali_models.py).
28+
29+
2. Evaluate your model:
30+
```python
31+
import mteb
32+
from mteb.models.model_implementations.my_custom_model import MyCustomModel
33+
34+
my_model = MyCustomModel(my_args)
35+
tasks = mteb.get_tasks(["ViDoRe (v3)"])
36+
37+
results = mteb.evaluate(my_model, tasks=tasks)
38+
```
39+
40+
3. Open a PR on the [mteb_results_repo](https://github.com/embeddings-benchmark/results/tree/main) with the generated results file to submit your results to the leaderboard
41+
42+
4. To evaluate on private sets, once all this is done you can ask the MTEB team to evaluate your model on private ViDoRe v3 sets by opening a dedicated issue on [their repo](https://github.com/embeddings-benchmark/mteb/issues)
43+
44+
## Evaluating a complex pipeline
45+
2346

2447
Pipeline evaluation allows you to evaluate **complete end-to-end retrieval systems** on the ViDoRe v3 benchmark datasets. Unlike traditional retriever evaluation that focuses on individual model components, pipeline evaluation lets you test:
2548

@@ -28,7 +51,7 @@ Pipeline evaluation allows you to evaluate **complete end-to-end retrieval syste
2851
- **Custom preprocessing pipelines** (e.g., OCR → chunking → embedding)
2952
- **Arbitrary retrieval logic** that goes beyond standard dense/sparse retrievers
3053

31-
## 📊 Results Repository & Submission Guidelines
54+
### 📊 Results Repository & Submission Guidelines
3255

3356
**This repository serves as the primary community results repository for visual document retrieval benchmarks using complex pipelines.** We encourage researchers and practitioners to submit their pipeline evaluation results to create a centralized location where the community can compare different approaches and track progress on ViDoRe v3 datasets.
3457

0 commit comments

Comments
 (0)