move mteb_version to individual subsets in result files#4401
move mteb_version to individual subsets in result files#4401fzoll wants to merge 1 commit intoembeddings-benchmark:mainfrom
Conversation
992c78a to
cb55fea
Compare
|
Removed the dead |
KennethEnevoldsen
left a comment
There was a problem hiding this comment.
Looks good, though we need to decide what to do with the top level version. I would suggest just removing (see comment below).
I would also keep the option to ensure the same version is required to merge, but not have it as the default.
With this change, we could also save the results per. subset (so if you stop a run you don't have to start from scratch). This is probably a seperate PR though.
| assert subsets["en-fr"]["mteb_version"] == "2.20.1" | ||
|
|
||
| # Top-level version should be the latest | ||
| assert merged.mteb_version == "2.20.1" |
There was a problem hiding this comment.
hmm but shouldn't this be a range or list?
It is also an option to remove it completely, but the add a property that returns a range. Then we also of course need to fix it when loading historic result files (which then just assigns the mteb version to each subset)
cb55fea to
40bac77
Compare
|
Good point — restored |
Hmm but I don't want to merge it in with the top-level being a specific version when the sub-levels are not that version (I believe it would cause confusion) |
Store mteb_version per subset in score dicts so that results evaluated with different MTEB versions can be merged without erasing existing subset scores. The top-level mteb_version is now computed as the latest version across all subsets. - Add mteb_version to each subset's score dict in from_task_results() - Remove mteb_version from default merge criteria (is_mergeable/merge) - Compute top-level mteb_version from per-subset versions after merge - Fix bug in aggregated_task.py where mteb_version was always overwritten regardless of version mismatch condition Closes embeddings-benchmark#4354
40bac77 to
e7e7eb3
Compare
|
Good point — changed the top-level |
|
@fzoll Can you keep the commits separate? With force-pushed commits this is very hard to review |
| assert subsets["en-fr"]["mteb_version"] == "2.20.1" | ||
|
|
||
| # Top-level should be the latest found across subsets | ||
| assert merged.mteb_version == "2.20.1" |
There was a problem hiding this comment.
it should be "2.12.4-2.20.1" and subsets["en-de"]["mteb_version"]==2.12.4
KennethEnevoldsen
left a comment
There was a problem hiding this comment.
A minor change otherwise I think this is good
|
This pull request has been automatically marked as stale due to inactivity. |
Store mteb_version per subset in score dicts so that results evaluated with different MTEB versions can be merged without erasing existing subset scores. The top-level mteb_version is now computed as the latest version across all subsets.
Close #4354