Skip to content

Commit 6f43721

Browse files
committed
[algoperf submissions] Mv scoring/ from algorithmic-efficiency (dev)
1 parent cd3078b commit 6f43721

22 files changed

Lines changed: 6701 additions & 0 deletions
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
name: Scoring Tests
2+
3+
# Runs the scoring code's linting and unit tests. The scoring code lives in
4+
# `scoring/` and computes the AlgoPerf leaderboard from submission logs.
5+
on:
6+
push:
7+
paths:
8+
- 'scoring/**'
9+
- 'pyproject.toml'
10+
- '.github/workflows/scoring_tests.yml'
11+
pull_request:
12+
paths:
13+
- 'scoring/**'
14+
- 'pyproject.toml'
15+
- '.github/workflows/scoring_tests.yml'
16+
17+
jobs:
18+
ruff:
19+
runs-on: ubuntu-latest
20+
steps:
21+
- uses: actions/checkout@v4
22+
- name: Set up Python 3.11
23+
uses: actions/setup-python@v5
24+
with:
25+
python-version: '3.11'
26+
- name: Install ruff
27+
run: |
28+
python -m pip install --upgrade pip
29+
pip install ruff==0.12.0
30+
- name: Lint scoring/
31+
run: ruff check scoring/
32+
- name: Format check scoring/
33+
run: ruff format --check scoring/
34+
35+
pytest:
36+
runs-on: ubuntu-latest
37+
steps:
38+
- uses: actions/checkout@v4
39+
- name: Set up Python 3.11
40+
uses: actions/setup-python@v5
41+
with:
42+
python-version: '3.11'
43+
- name: Install scoring package
44+
run: |
45+
python -m pip install --upgrade pip
46+
pip install -e .[dev]
47+
- name: Run scoring unit tests
48+
run: pytest scoring/test_scoring_utils.py

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
11
.DS_Store
22
__pycache__/
33
*.pyc
4+
5+
# Scoring output artifacts (see README "Scoring")
6+
scoring_results*/
7+
*.egg-info/

CONTRIBUTING.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,7 @@ Generally we encourage people to become MLCommons members if they wish to contri
77
Regardless of whether you are a member, your organization (or you as an individual contributor) needs to sign the MLCommons Contributor License Agreement (CLA). Please submit your GitHub username to the [MLCommons Subscription form](https://mlcommons.org/community/subscribe/) to start that process.
88

99
MLCommons project work is tracked with issue trackers and pull requests. Modify the project in your own fork and issue a pull request once you want other developers to take a look at what you have done and discuss the proposed changes. Ensure that cla-bot and other checks pass for your pull requests.
10+
11+
## Scoring code
12+
13+
The leaderboard scoring code lives in [`scoring/`](./scoring/). See the [Scoring section of the README](./README.md#scoring) for how to install it and regenerate the leaderboard. Changes to `scoring/` are linted and unit-tested by the `Scoring Tests` GitHub Actions workflow.

README.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,67 @@ To submit your algorithm for evaluation on the AlgoPerf leaderboard, please foll
5858
2. **Create a Pull Request:** Fork this repository, create a new branch and add your submission code to a new folder within either `submissions/external_tuning/` or `submissions/self_tuning`. Open a pull request (PR) to the `evaluation` branch of this repository. Make sure to fill out the PR template asking for information such as submission name, authors, affiliations, etc.
5959
3. **PR Review and Evaluation:** The AlgoPerf working group will review your PR. Based on our available resources and the perceived potential of the method, it will be selected for a free evaluation and merged into the `evaluation` branch. The working group will run your submission on all workloads and push the results, as well as the updated leaderboard, to the `main`branch.
6060

61+
## Scoring
62+
63+
The code that computes this leaderboard lives in [`scoring/`](./scoring/). Given a
64+
directory of submission logs (such as those under [`previous_leaderboards/`](./previous_leaderboards/)),
65+
it computes the performance profiles, time-to-target, AlgoPerf benchmark scores, and
66+
speedups used in the tables above. This code was moved here from the
67+
[`scoring/` directory of the algorithmic-efficiency repository](https://github.com/mlcommons/algorithmic-efficiency)
68+
so that the repository that hosts the leaderboard also owns the code that produces it.
69+
70+
### Installation
71+
72+
The scoring code is self-contained: the per-workload target metrics, target
73+
values, and step hints it needs are vendored in
74+
[`scoring/workload_targets.json`](./scoring/workload_targets.json), so it
75+
requires neither the `algoperf` package nor JAX/PyTorch/TensorFlow — just a
76+
small numerical/plotting stack. As with the benchmark itself, set up a fresh
77+
Python (>=3.11) environment, e.g. via `conda` or `virtualenv`:
78+
79+
```bash
80+
python3 -m venv env && source env/bin/activate
81+
pip3 install -e . # installs the scoring tooling (numpy, pandas, scipy, ...)
82+
```
83+
84+
> [!NOTE]
85+
> The `scoring/workload_targets*.json` files are generated from the benchmark
86+
> definitions in [algorithmic-efficiency](https://github.com/mlcommons/algorithmic-efficiency)
87+
> (`scoring/generate_workload_targets.py`) and vendored here. Each file is
88+
> frozen for one benchmark version, carrying that version's base/held-out
89+
> workload sets and per-workload targets. Regenerate and re-copy when a
90+
> benchmark version changes the workloads or targets.
91+
92+
### Regenerating the leaderboard
93+
94+
The current (v0.6) targets are the default. To score an older leaderboard, pass
95+
`--workload_targets` for that version's file (e.g. `scoring/workload_targets_v05.json`).
96+
97+
```bash
98+
# External tuning ruleset
99+
python -m scoring.score_submissions \
100+
--submission_directory previous_leaderboards/algoperf_v06/logs/external_tuning \
101+
--compute_performance_profiles \
102+
--output_dir scoring_results_external_tuning
103+
104+
# Self-tuning ruleset (add --self_tuning_ruleset)
105+
python -m scoring.score_submissions \
106+
--submission_directory previous_leaderboards/algoperf_v06/logs/self_tuning \
107+
--compute_performance_profiles \
108+
--self_tuning_ruleset \
109+
--output_dir scoring_results_self_tuning
110+
111+
# Reproduce the v0.5 leaderboard (8 base + 6 held-out workloads)
112+
python -m scoring.score_submissions \
113+
--workload_targets scoring/workload_targets_v05.json \
114+
--submission_directory previous_leaderboards/algoperf_v05/logs/external_tuning \
115+
--compute_performance_profiles \
116+
--output_dir scoring_results_v05_external
117+
```
118+
119+
See the [scoring methodology](https://github.com/mlcommons/algorithmic-efficiency/blob/main/docs/DOCUMENTATION.md#scoring)
120+
in the benchmark documentation for details on how scores are computed.
121+
61122
## Citation
62123

63124
If you use the _AlgoPerf benchmark_ in your research, please consider citing our paper.

pyproject.toml

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
###############################################################################
2+
# MLCommons AlgoPerf: Leaderboard Scoring Tooling #
3+
###############################################################################
4+
# This package contains the scoring code used to compute the AlgoPerf
5+
# leaderboard (performance profiles, time-to-target, benchmark scores and
6+
# speedups) from submission logs stored in this repository.
7+
#
8+
# It was moved here from the `scoring/` directory of
9+
# https://github.com/mlcommons/algorithmic-efficiency so that the repository
10+
# that owns the leaderboard also owns the code that produces it.
11+
12+
[project]
13+
name = "algoperf-submissions"
14+
version = "0.6.0"
15+
description = "Scoring tooling for the MLCommons AlgoPerf: Training Algorithms leaderboard"
16+
authors = [
17+
{ name = "MLCommons Algorithms Working Group", email = "algorithms@mlcommons.org" },
18+
]
19+
license = { text = "Apache 2.0" }
20+
readme = "README.md"
21+
requires-python = ">=3.11"
22+
23+
# The scoring code is intentionally self-contained: per-workload target metrics,
24+
# target values, and step hints are vendored in scoring/workload_targets.json
25+
# (generated from algorithmic-efficiency), so scoring needs none of algoperf,
26+
# JAX, PyTorch, or TensorFlow -- just the numerical/plotting stack below.
27+
dependencies = [
28+
"absl-py==2.1.0",
29+
"numpy>=2.0.2",
30+
"pandas>=2.0.1",
31+
"matplotlib>=3.9.2",
32+
"scipy>=1.13.0", # used by scoring/compute_speedups.py (was undeclared upstream)
33+
"tabulate==0.9.0",
34+
]
35+
36+
[project.optional-dependencies]
37+
dev = ["pytest==8.3.3", "ruff==0.12.0"]
38+
39+
[build-system]
40+
requires = ["setuptools>=45"]
41+
build-backend = "setuptools.build_meta"
42+
43+
[tool.setuptools.packages.find]
44+
# Only package the scoring code; ignore submissions/ and previous_leaderboards/.
45+
include = ["scoring*"]
46+
47+
###############################################################################
48+
# Linting & Formatting Configurations #
49+
###############################################################################
50+
[tool.ruff]
51+
line-length = 80
52+
indent-width = 2
53+
target-version = "py311"
54+
55+
[tool.ruff.format]
56+
quote-style = "single"

scoring/__init__.py

Whitespace-only changes.

scoring/algoperf_v05/__init__.py

Whitespace-only changes.
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
import json
2+
import os
3+
import struct
4+
5+
import numpy as np
6+
from absl import app, flags, logging
7+
8+
flags.DEFINE_integer(
9+
'held_out_workloads_seed',
10+
None,
11+
'Random seed for scoring.AlgoPerf v0.5 seed: 3438810845',
12+
)
13+
flags.DEFINE_string(
14+
'output_filename',
15+
'held_out_workloads.json',
16+
'Path to file to record sampled held_out workloads.',
17+
)
18+
FLAGS = flags.FLAGS
19+
20+
HELD_OUT_WORKLOADS = {
21+
'librispeech': [
22+
'librispeech_conformer_attention_temperature',
23+
'librispeech_conformer_layernorm',
24+
# 'librispeech_conformer_gelu', # Removed due to bug in target setting procedure
25+
'librispeech_deepspeech_no_resnet',
26+
'librispeech_deepspeech_norm_and_spec_aug',
27+
'librispeech_deepspeech_tanh',
28+
],
29+
'imagenet': [
30+
'imagenet_resnet_silu',
31+
'imagenet_resnet_gelu',
32+
'imagenet_resnet_large_bn_init',
33+
'imagenet_vit_glu',
34+
'imagenet_vit_post_ln',
35+
'imagenet_vit_map',
36+
],
37+
'ogbg': ['ogbg_gelu', 'ogbg_silu', 'ogbg_model_size'],
38+
'wmt': ['wmt_post_ln', 'wmt_attention_temp', 'wmt_glu_tanh'],
39+
'fastmri': ['fastmri_model_size', 'fastmri_tanh', 'fastmri_layernorm'],
40+
'criteo1tb': [
41+
'criteo1tb_layernorm',
42+
'criteo1tb_embed_init',
43+
'criteo1tb_resnet',
44+
],
45+
}
46+
47+
48+
def save_held_out_workloads(held_out_workloads, filename):
49+
with open(filename, 'w') as f:
50+
json.dump(held_out_workloads, f)
51+
52+
53+
def main(_):
54+
rng_seed = FLAGS.held_out_workloads_seed
55+
output_filename = FLAGS.output_filename
56+
57+
if not rng_seed:
58+
rng_seed = struct.unpack('I', os.urandom(4))[0]
59+
60+
logging.info('Using RNG seed %d', rng_seed)
61+
rng = np.random.default_rng(rng_seed)
62+
63+
sampled_held_out_workloads = []
64+
for _, v in HELD_OUT_WORKLOADS.items():
65+
sampled_index = rng.integers(len(v))
66+
sampled_held_out_workloads.append(v[sampled_index])
67+
68+
logging.info(f'Sampled held-out workloads: {sampled_held_out_workloads}')
69+
save_held_out_workloads(sampled_held_out_workloads, output_filename)
70+
71+
72+
if __name__ == '__main__':
73+
app.run(main)
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
["librispeech_conformer_layernorm", "imagenet_resnet_large_bn_init", "ogbg_model_size", "wmt_glu_tanh", "fastmri_tanh", "criteo1tb_embed_init"]
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
["librispeech_conformer_gelu", "imagenet_resnet_silu", "ogbg_gelu", "wmt_post_ln", "fastmri_model_size", "criteo1tb_layernorm"]

0 commit comments

Comments
 (0)