Skip to content

Commit 4128601

Browse files
committed
[algoperf submissions] Mv scoring/ from algorithmic-efficiency (dev)
1 parent cd3078b commit 4128601

23 files changed

Lines changed: 6776 additions & 3 deletions
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
name: Scoring Tests
2+
3+
# Runs the scoring code's linting and unit tests. The scoring code lives in
4+
# `scoring/` and computes the AlgoPerf leaderboard from submission logs.
5+
on:
6+
push:
7+
paths:
8+
- 'scoring/**'
9+
- 'pyproject.toml'
10+
- '.github/workflows/scoring_tests.yml'
11+
pull_request:
12+
paths:
13+
- 'scoring/**'
14+
- 'pyproject.toml'
15+
- '.github/workflows/scoring_tests.yml'
16+
17+
jobs:
18+
ruff:
19+
runs-on: ubuntu-latest
20+
steps:
21+
- uses: actions/checkout@v4
22+
- name: Set up Python 3.11
23+
uses: actions/setup-python@v5
24+
with:
25+
python-version: '3.11'
26+
- name: Install ruff
27+
run: |
28+
python -m pip install --upgrade pip
29+
pip install ruff==0.12.0
30+
- name: Lint scoring/
31+
run: ruff check scoring/
32+
- name: Format check scoring/
33+
run: ruff format --check scoring/
34+
35+
pytest:
36+
runs-on: ubuntu-latest
37+
steps:
38+
- uses: actions/checkout@v4
39+
- name: Set up Python 3.11
40+
uses: actions/setup-python@v5
41+
with:
42+
python-version: '3.11'
43+
- name: Install scoring package
44+
run: |
45+
python -m pip install --upgrade pip
46+
pip install -e .[dev]
47+
- name: Run scoring unit tests
48+
# Runs the whole scoring/ suite: the log-parsing unit tests
49+
# (test_scoring_utils.py) and the end-to-end scoring smoke test
50+
# (test_score_submissions.py), which reproduces the published v0.5
51+
# leaderboard and guards the score-aggregation path.
52+
run: pytest scoring/

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
11
.DS_Store
22
__pycache__/
33
*.pyc
4+
5+
# Scoring output artifacts (see README "Scoring")
6+
scoring_results*/
7+
*.egg-info/

CONTRIBUTING.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,7 @@ Generally we encourage people to become MLCommons members if they wish to contri
77
Regardless of whether you are a member, your organization (or you as an individual contributor) needs to sign the MLCommons Contributor License Agreement (CLA). Please submit your GitHub username to the [MLCommons Subscription form](https://mlcommons.org/community/subscribe/) to start that process.
88

99
MLCommons project work is tracked with issue trackers and pull requests. Modify the project in your own fork and issue a pull request once you want other developers to take a look at what you have done and discuss the proposed changes. Ensure that cla-bot and other checks pass for your pull requests.
10+
11+
## Scoring code
12+
13+
The leaderboard scoring code lives in [`scoring/`](./scoring/). See the [Scoring section of the README](./README.md#scoring) for how to install it and regenerate the leaderboard.

README.md

Lines changed: 60 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,13 @@ This repository hosts the official rolling leaderboard for the [**AlgoPerf: Trai
99
The benchmark measures neural network training speedups due to algorithmic improvements in training algorithms.
1010
The leaderboard tracks the aggregate performance of different algorithms on a variety of [workloads](https://github.com/mlcommons/algorithmic-efficiency/blob/main/DOCUMENTATION.md#workloads) and under two different [tuning rulesets](https://github.com/mlcommons/algorithmic-efficiency/blob/main/DOCUMENTATION.md#tuning).
1111

12-
> [!NOTE]
12+
> [!NOTE]
1313
> **If you want to submit to the AlgoPerf benchmark, please open a PR with your submission. The AlgoPerf working group will review your submission and potentially evaluate your submission on all workloads. For more details, see the [How to Submit](#how-to-submit) section.**
1414
1515
## Live Leaderboards
1616

17-
> **Leaderboard Version:** 0.6
18-
> **Last Updated:** 2025-03-24 15:07 UTC
17+
> **Leaderboard Version:** 0.6
18+
> **Last Updated:** 2025-03-24 15:07 UTC
1919
> **Using Benchmark Version:** [latest](https://github.com/mlcommons/algorithmic-efficiency)
2020
2121
> [!TIP]
@@ -58,6 +58,63 @@ To submit your algorithm for evaluation on the AlgoPerf leaderboard, please foll
5858
2. **Create a Pull Request:** Fork this repository, create a new branch and add your submission code to a new folder within either `submissions/external_tuning/` or `submissions/self_tuning`. Open a pull request (PR) to the `evaluation` branch of this repository. Make sure to fill out the PR template asking for information such as submission name, authors, affiliations, etc.
5959
3. **PR Review and Evaluation:** The AlgoPerf working group will review your PR. Based on our available resources and the perceived potential of the method, it will be selected for a free evaluation and merged into the `evaluation` branch. The working group will run your submission on all workloads and push the results, as well as the updated leaderboard, to the `main`branch.
6060

61+
## Scoring
62+
63+
The code that computes this leaderboard lives in [`scoring/`](./scoring/). Given a
64+
directory of submission logs (such as those under [`previous_leaderboards/`](./previous_leaderboards/)),
65+
it computes the performance profiles, time-to-target, AlgoPerf benchmark scores, and
66+
speedups used in the tables above. This code was moved here from the
67+
[`scoring/` directory of the algorithmic-efficiency repository](https://github.com/mlcommons/algorithmic-efficiency)
68+
so that the repository that hosts the leaderboard also owns the code that produces it.
69+
70+
### Installation
71+
72+
The scoring code is self-contained. To run it, set up a fresh
73+
Python (>=3.11) environment, e.g. via `conda` or `virtualenv`:
74+
75+
```bash
76+
python3 -m venv env && source env/bin/activate
77+
pip3 install -e . # installs the scoring tooling (numpy, pandas, scipy, ...)
78+
```
79+
80+
> [!NOTE]
81+
> The `scoring/workload_targets*.json` files are generated from the benchmark
82+
> definitions in [algorithmic-efficiency](https://github.com/mlcommons/algorithmic-efficiency)
83+
> (`scoring/generate_workload_targets.py`) and commited here. Each file is
84+
> frozen for one benchmark version, carrying that version's base/held-out
85+
> workload sets and per-workload targets. Regenerate and re-copy when a
86+
> benchmark version changes the workloads or targets.
87+
88+
### Regenerating the leaderboard
89+
90+
The current targets are the default. To score an older leaderboard, pass
91+
`--workload_targets` for that version's file (e.g. `scoring/workload_targets_v05.json`).
92+
93+
```bash
94+
# External tuning ruleset
95+
python -m scoring.score_submissions \
96+
--submission_directory previous_leaderboards/algoperf_v06/logs/external_tuning \
97+
--compute_performance_profiles \
98+
--output_dir scoring_results_external_tuning
99+
100+
# Self-tuning ruleset (add --self_tuning_ruleset)
101+
python -m scoring.score_submissions \
102+
--submission_directory previous_leaderboards/algoperf_v06/logs/self_tuning \
103+
--compute_performance_profiles \
104+
--self_tuning_ruleset \
105+
--output_dir scoring_results_self_tuning
106+
107+
# Reproduce the v0.5 leaderboard (8 base + 6 held-out workloads)
108+
python -m scoring.score_submissions \
109+
--workload_targets scoring/workload_targets_v05.json \
110+
--submission_directory previous_leaderboards/algoperf_v05/logs/external_tuning \
111+
--compute_performance_profiles \
112+
--output_dir scoring_results_v05_external
113+
```
114+
115+
See the [scoring methodology](https://github.com/mlcommons/algorithmic-efficiency/blob/main/docs/DOCUMENTATION.md#scoring)
116+
in the benchmark documentation for details on how scores are computed.
117+
61118
## Citation
62119

63120
If you use the _AlgoPerf benchmark_ in your research, please consider citing our paper.

pyproject.toml

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
###############################################################################
2+
# MLCommons AlgoPerf: Leaderboard Scoring Tooling #
3+
###############################################################################
4+
# This package contains the scoring code used to compute the AlgoPerf
5+
# leaderboard (performance profiles, time-to-target, benchmark scores and
6+
# speedups) from submission logs stored in this repository.
7+
#
8+
# It was moved here from the `scoring/` directory of
9+
# https://github.com/mlcommons/algorithmic-efficiency so that the repository
10+
# that owns the leaderboard also owns the code that produces it.
11+
12+
[project]
13+
name = "algoperf-submissions"
14+
version = "0.6.0"
15+
description = "Scoring tooling for the MLCommons AlgoPerf: Training Algorithms leaderboard"
16+
authors = [
17+
{ name = "MLCommons Algorithms Working Group", email = "algorithms@mlcommons.org" },
18+
]
19+
license = { text = "Apache 2.0" }
20+
readme = "README.md"
21+
requires-python = ">=3.11"
22+
23+
24+
dependencies = [
25+
"absl-py==2.1.0",
26+
"numpy==2.1.3",
27+
"pandas==2.2.3",
28+
"matplotlib==3.9.2",
29+
"scipy==1.14.1",
30+
"tabulate==0.9.0",
31+
]
32+
33+
[project.optional-dependencies]
34+
dev = ["pytest==8.3.3", "ruff==0.12.0"]
35+
36+
[build-system]
37+
requires = ["setuptools>=45"]
38+
build-backend = "setuptools.build_meta"
39+
40+
[tool.setuptools.packages.find]
41+
# Only package the scoring code; ignore submissions/ and previous_leaderboards/.
42+
include = ["scoring*"]
43+
44+
###############################################################################
45+
# Linting & Formatting Configurations #
46+
###############################################################################
47+
[tool.ruff]
48+
line-length = 80
49+
indent-width = 2
50+
target-version = "py311"
51+
52+
[tool.ruff.format]
53+
quote-style = "single"

scoring/__init__.py

Whitespace-only changes.

scoring/algoperf_v05/__init__.py

Whitespace-only changes.
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
import json
2+
import os
3+
import struct
4+
5+
import numpy as np
6+
from absl import app, flags, logging
7+
8+
flags.DEFINE_integer(
9+
'held_out_workloads_seed',
10+
None,
11+
'Random seed for scoring.AlgoPerf v0.5 seed: 3438810845',
12+
)
13+
flags.DEFINE_string(
14+
'output_filename',
15+
'held_out_workloads.json',
16+
'Path to file to record sampled held_out workloads.',
17+
)
18+
FLAGS = flags.FLAGS
19+
20+
HELD_OUT_WORKLOADS = {
21+
'librispeech': [
22+
'librispeech_conformer_attention_temperature',
23+
'librispeech_conformer_layernorm',
24+
# 'librispeech_conformer_gelu', # Removed due to bug in target setting procedure
25+
'librispeech_deepspeech_no_resnet',
26+
'librispeech_deepspeech_norm_and_spec_aug',
27+
'librispeech_deepspeech_tanh',
28+
],
29+
'imagenet': [
30+
'imagenet_resnet_silu',
31+
'imagenet_resnet_gelu',
32+
'imagenet_resnet_large_bn_init',
33+
'imagenet_vit_glu',
34+
'imagenet_vit_post_ln',
35+
'imagenet_vit_map',
36+
],
37+
'ogbg': ['ogbg_gelu', 'ogbg_silu', 'ogbg_model_size'],
38+
'wmt': ['wmt_post_ln', 'wmt_attention_temp', 'wmt_glu_tanh'],
39+
'fastmri': ['fastmri_model_size', 'fastmri_tanh', 'fastmri_layernorm'],
40+
'criteo1tb': [
41+
'criteo1tb_layernorm',
42+
'criteo1tb_embed_init',
43+
'criteo1tb_resnet',
44+
],
45+
}
46+
47+
48+
def save_held_out_workloads(held_out_workloads, filename):
49+
with open(filename, 'w') as f:
50+
json.dump(held_out_workloads, f)
51+
52+
53+
def main(_):
54+
rng_seed = FLAGS.held_out_workloads_seed
55+
output_filename = FLAGS.output_filename
56+
57+
if not rng_seed:
58+
rng_seed = struct.unpack('I', os.urandom(4))[0]
59+
60+
logging.info('Using RNG seed %d', rng_seed)
61+
rng = np.random.default_rng(rng_seed)
62+
63+
sampled_held_out_workloads = []
64+
for _, v in HELD_OUT_WORKLOADS.items():
65+
sampled_index = rng.integers(len(v))
66+
sampled_held_out_workloads.append(v[sampled_index])
67+
68+
logging.info(f'Sampled held-out workloads: {sampled_held_out_workloads}')
69+
save_held_out_workloads(sampled_held_out_workloads, output_filename)
70+
71+
72+
if __name__ == '__main__':
73+
app.run(main)
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
["librispeech_conformer_layernorm", "imagenet_resnet_large_bn_init", "ogbg_model_size", "wmt_glu_tanh", "fastmri_tanh", "criteo1tb_embed_init"]
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
["librispeech_conformer_gelu", "imagenet_resnet_silu", "ogbg_gelu", "wmt_post_ln", "fastmri_model_size", "criteo1tb_layernorm"]

0 commit comments

Comments
 (0)