Add benchmark version to result json. by superdosh · Pull Request #1407 · mlcommons/modelbench

superdosh · 2025-12-05T14:01:58Z

github-actions · 2025-12-05T14:02:08Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

wpietri

This solves the stated need, but I'm wondering if we should just put in the full benchmark uid twice: once in the string form and once in the full JSON form. Seems like the real problem here is "want a piece of the benchmark UID without parsing it", so unless we're sure version is the only thing we need, I'd rather do it all at once unless there's some reason not to.

@rogthefrog, given that you're both the requester and the person who has spent the most time on structured benchmark UIDs, what are your thoughts?

superdosh · 2025-12-05T17:23:32Z

This solves the stated need, but I'm wondering if we should just put in the full benchmark uid twice: once in the string form and once in the full JSON form. Seems like the real problem here is "want a piece of the benchmark UID without parsing it", so unless we're sure version is the only thing we need, I'd rather do it all at once unless there's some reason not to.

@rogthefrog, given that you're both the requester and the person who has spent the most time on structured benchmark UIDs, what are your thoughts?

Conclusion from standup was that we'd put everything in. Will update the PR!

bkorycki · 2025-12-05T18:34:25Z

        elif isinstance(o, BenchmarkDefinition):
-            return {"uid": o.uid, "hazards": o.hazards()}
+            def_dict = {"uid": o.uid, "hazards": o.hazards()}
+            def_dict.update(o.uid_dict)


Why do we need both the uid and the uid dictionary? Do they contain different information?

The dictionary contains all the elements of the uid, but it doesn't explain how to turn the elements into the full string. So if anybody needs to compare to a string uid, they'll need either import modelbench or duplicate the logic. Plus with file format it's generally better to be backwards compatible, so that we don't have to hunt down and rewrite anything that parses the results JSON.

As an example of what uid_dict looks like:

{'class': 'general_purpose_ai_chat_benchmark', 'version': '1.1', 'locale': 'en_us', 'prompt_set': 'practice', 'evaluator': 'default'}

rogthefrog

👍🏻

Add benchmark version to result json.

88650dd

superdosh temporarily deployed to Scheduled Testing December 5, 2025 14:02 — with GitHub Actions Inactive

superdosh marked this pull request as ready for review December 5, 2025 14:06

superdosh requested a review from a team as a code owner December 5, 2025 14:06

superdosh requested review from bkorycki, dhosterman, rogthefrog and wpietri December 5, 2025 14:06

superdosh commented Dec 5, 2025

View reviewed changes

Comment thread src/modelbench/record.py Outdated

wpietri approved these changes Dec 5, 2025

View reviewed changes

Pass everything.

a62e15d

superdosh temporarily deployed to Scheduled Testing December 5, 2025 18:02 — with GitHub Actions Inactive

bkorycki reviewed Dec 5, 2025

View reviewed changes

rogthefrog approved these changes Dec 5, 2025

View reviewed changes

superdosh merged commit 3310f9f into main Dec 5, 2025
2 checks passed

superdosh deleted the results-version branch December 5, 2025 19:16

github-actions Bot locked and limited conversation to collaborators Dec 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmark version to result json.#1407

Add benchmark version to result json.#1407
superdosh merged 2 commits into
mainfrom
results-version

superdosh commented Dec 5, 2025

Uh oh!

github-actions Bot commented Dec 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

wpietri left a comment

Uh oh!

superdosh commented Dec 5, 2025

Uh oh!

bkorycki Dec 5, 2025

Uh oh!

wpietri Dec 5, 2025

Uh oh!

superdosh Dec 5, 2025

Uh oh!

rogthefrog left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

superdosh commented Dec 5, 2025

Uh oh!

github-actions Bot commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

wpietri left a comment

Choose a reason for hiding this comment

Uh oh!

superdosh commented Dec 5, 2025

Uh oh!

bkorycki Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

wpietri Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

superdosh Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

rogthefrog left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions Bot commented Dec 5, 2025 •

edited

Loading