Skip to content

move mteb_version to individual subsets in result files#4401

Open
fzoll wants to merge 1 commit intoembeddings-benchmark:mainfrom
fzoll:fix/per-subset-mteb-version
Open

move mteb_version to individual subsets in result files#4401
fzoll wants to merge 1 commit intoembeddings-benchmark:mainfrom
fzoll:fix/per-subset-mteb-version

Conversation

@fzoll
Copy link
Copy Markdown
Contributor

@fzoll fzoll commented Apr 16, 2026

Store mteb_version per subset in score dicts so that results evaluated with different MTEB versions can be merged without erasing existing subset scores. The top-level mteb_version is now computed as the latest version across all subsets.

  • Add mteb_version to each subset's score dict in from_task_results()
  • Remove mteb_version from default merge criteria (is_mergeable/merge)
  • Compute top-level mteb_version from per-subset versions after merge
  • Fix bug in aggregated_task.py where mteb_version was always overwritten regardless of version mismatch condition

Close #4354

@Samoed Samoed changed the title feat: move mteb_version to individual subsets in result files move mteb_version to individual subsets in result files Apr 16, 2026
Comment thread mteb/results/task_result.py
@fzoll fzoll force-pushed the fix/per-subset-mteb-version branch from 992c78a to cb55fea Compare April 16, 2026 16:15
@fzoll fzoll requested a review from Samoed April 16, 2026 16:17
@fzoll
Copy link
Copy Markdown
Contributor Author

fzoll commented Apr 16, 2026

Removed the dead Criteria.MTEB_VERSION check and unused mteb_version variable assignments.

Copy link
Copy Markdown
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, though we need to decide what to do with the top level version. I would suggest just removing (see comment below).

I would also keep the option to ensure the same version is required to merge, but not have it as the default.

With this change, we could also save the results per. subset (so if you stop a run you don't have to start from scratch). This is probably a seperate PR though.

Comment thread mteb/results/task_result.py
Comment thread tests/test_results/test_task_results.py Outdated
assert subsets["en-fr"]["mteb_version"] == "2.20.1"

# Top-level version should be the latest
assert merged.mteb_version == "2.20.1"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm but shouldn't this be a range or list?

It is also an option to remove it completely, but the add a property that returns a range. Then we also of course need to fix it when loading historic result files (which then just assigns the mteb version to each subset)

@fzoll fzoll force-pushed the fix/per-subset-mteb-version branch from cb55fea to 40bac77 Compare April 17, 2026 10:44
@fzoll
Copy link
Copy Markdown
Contributor Author

fzoll commented Apr 17, 2026

Good point — restored Criteria.MTEB_VERSION as an opt-in check (not in default criteria). Regarding the top-level mteb_version: agree it could become a range or be removed in favor of per-subset versions. Happy to tackle that in a follow-up PR if you'd like.

@KennethEnevoldsen
Copy link
Copy Markdown
Contributor

Regarding the top-level mteb_version: agree it could become a range or be removed in favor of per-subset versions. Happy to tackle that in a follow-up PR if you'd like.

Hmm but I don't want to merge it in with the top-level being a specific version when the sub-levels are not that version (I believe it would cause confusion)

Store mteb_version per subset in score dicts so that results evaluated
with different MTEB versions can be merged without erasing existing
subset scores. The top-level mteb_version is now computed as the latest
version across all subsets.

- Add mteb_version to each subset's score dict in from_task_results()
- Remove mteb_version from default merge criteria (is_mergeable/merge)
- Compute top-level mteb_version from per-subset versions after merge
- Fix bug in aggregated_task.py where mteb_version was always
  overwritten regardless of version mismatch condition

Closes embeddings-benchmark#4354
@fzoll fzoll force-pushed the fix/per-subset-mteb-version branch from 40bac77 to e7e7eb3 Compare April 17, 2026 12:55
@fzoll
Copy link
Copy Markdown
Contributor Author

fzoll commented Apr 17, 2026

Good point — changed the top-level mteb_version to a range (e.g. "2.12.4-2.20.1") when subsets were evaluated with different versions. Single version if all match.

@Samoed
Copy link
Copy Markdown
Member

Samoed commented Apr 17, 2026

@fzoll Can you keep the commits separate? With force-pushed commits this is very hard to review

assert subsets["en-fr"]["mteb_version"] == "2.20.1"

# Top-level should be the latest found across subsets
assert merged.mteb_version == "2.20.1"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be "2.12.4-2.20.1" and subsets["en-de"]["mteb_version"]==2.12.4

Copy link
Copy Markdown
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor change otherwise I think this is good

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 3, 2026

This pull request has been automatically marked as stale due to inactivity.

@github-actions github-actions Bot added the stale label May 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Move mteb_version to individual subsets in result files

3 participants