Skip to content

dataset: Add elastic kb retrieval#4487

Merged
KennethEnevoldsen merged 13 commits intoembeddings-benchmark:mainfrom
emilia-elastic:add-elastic-kb-retrieval
Apr 30, 2026
Merged

dataset: Add elastic kb retrieval#4487
KennethEnevoldsen merged 13 commits intoembeddings-benchmark:mainfrom
emilia-elastic:add-elastic-kb-retrieval

Conversation

@emilia-elastic
Copy link
Copy Markdown
Contributor

  • I have tested that the dataset runs with the mteb package.
  • I have run the following models on the task (adding the results to the pr). These can be run using the mteb run -m {model_name} -t {task_name} command.
    • mteb/baseline-random encoder
    • intfloat/multilingual-e5-small or another small model
  • I have checked that the performance is neither trivial (close to perfect scores) nor random.
  • I have considered the size of the dataset and reduced it if it is too big (e.g. 2048 examples for binary classification)

fix: config

add: results

fix: remove eval results, add prompt and fix sample_creation

- Remove local evaluation results (not part of task PR)
- Add query prompt for instruction-tuned models
- Change sample_creation to "found and created" (mix of real chat queries and synthetic)

fix: clarify description for real-world vs synthetic query grounding

add: baseline results for ElasticKBRetrieval
@KennethEnevoldsen KennethEnevoldsen changed the title Add elastic kb retrieval dataset: Add elastic kb retrieval Apr 23, 2026
Copy link
Copy Markdown
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great (for context this is intended for the RTEB private subset)

Comment thread mteb/tasks/retrieval/eng/elastic_kb_retrieval.py Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you upload these as a PR on the results repository instead (looks like you also have other results - you can just submit the full list)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(but let us do it once the dataset is finalized)

Comment thread mteb/tasks/retrieval/eng/elastic_kb_retrieval.py Outdated
@KennethEnevoldsen
Copy link
Copy Markdown
Contributor

Note you need to tag it as "private" as well using the "is_public=False" int the TaskMetadata (partly the reasons why tests fail)

@Samoed Samoed added the new dataset Issues related to adding a new task or dataset label Apr 25, 2026
Comment thread mteb/tasks/retrieval/eng/elastic_kb_retrieval.py Outdated
Comment thread mteb/tasks/retrieval/eng/elastic_kb_retrieval.py
Copy link
Copy Markdown
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is just documentation issue now an a reupload from my side afterwards

Comment thread mteb/tasks/retrieval/eng/elastic_kb_retrieval.py
Comment thread mteb/tasks/retrieval/eng/elastic_kb_retrieval.py Outdated
Comment thread mteb/tasks/retrieval/eng/elastic_kb_retrieval.py Outdated
@KennethEnevoldsen KennethEnevoldsen enabled auto-merge (squash) April 30, 2026 07:41
@KennethEnevoldsen
Copy link
Copy Markdown
Contributor

@emilia-elastic added a PR with the results - can you confirm that everything looks as expected there?

@KennethEnevoldsen KennethEnevoldsen merged commit e792bce into embeddings-benchmark:main Apr 30, 2026
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new dataset Issues related to adding a new task or dataset

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants