Commit a2c496a
authored
[OMNIML-4788] specdec_bench: configuration.json provenance + upload_to_s3 (#1531)
> [!WARNING]
> **Breaking on-disk schema change (specdec_bench v1.0.0).** This PR
renames the acceptance-rate metric fields across `AcceptanceRate` /
`MTBench` / `SpecBench` writers:
>
> | Old (pre-1.0.0) | New (1.0.0) |
> |---|---|
> | `Request_AR` | `Request_AL` |
> | `Category_AR` | `Category_AL` |
> | `Average_AR` | `Average_AL` |
> | — | `Joint_Acceptance_Rate` (new) |
>
> The renamed values were always **acceptance length** (mean tokens
generated per inference step), not a rate, and the visualizer reads
`*_AL`. Pre-1.0.0 runs in S3 have `*_AR` and no `Joint_AR`; they must be
re-run or post-processed before comparing. The visualizer aggregates
runs by `specdec_bench` major version so accidental cross-methodology
comparison is blocked.
### What does this PR do?
Type of change: new feature
Adds reproducibility provenance to `specdec_bench/configuration.json`
and ports `upload_to_s3.py` from `iputterman/specdec_bench@main`
(personal-namespace fork) into upstream. This is the first PR in a
multi-stage migration off Izzy's fork now that he's left the team.
Tracked in [OMNIML-4788](https://jirasw.nvidia.com/browse/OMNIML-4788).
**Provenance fields added to configuration.json** (alongside existing
argv / engine_version / gpu / python_version):
- `specdec_bench_version` — methodology semver declared in
`specdec_bench/__init__.py`. Bump minor on additive metrics, major on
changed metric *definitions*. The visualizer (Phase 4 of the migration)
will aggregate runs by major version so plots don't accidentally compare
across methodology changes.
- `specdec_bench_sha`, `modelopt_sha`, `modelopt_version`,
`nmm_sandbox_sha`, `container_image` — code/runtime provenance. Each
prefers an env var set by the harness (`SPECDEC_BENCH_SHA`,
`MODELOPT_SHA`, `MODELOPT_VERSION`, `NMM_SANDBOX_SHA`,
`CONTAINER_IMAGE`) and falls back to `git rev-parse` /
`modelopt.__version__` when running standalone. The env-var preference
is necessary because the runtime container has no `.git/` (the launcher
packager tarballs source without git metadata) and may not have
`modelopt` installed.
- `checkpoint.{path, size_bytes, index_sha256, index_source}` — cheap
reproducibility fingerprint that hashes `model.safetensors.index.json`
(or `config.json` fallback). Changes whenever any tensor changes.
- `serving_config` — engine-level config dict captured after init via a
new `Model.get_serving_config()` method. VLLM dumps `AsyncEngineArgs` +
the live `vllm_config.to_dict()`; SGLANG dumps the `engine_kwargs`
passed to `sgl.Engine`; TRTLLM left at the base default `{}` for a later
iteration.
- `timestamp` — UTC ISO 8601.
**Other changes**
- `upload_to_s3.py` + `specdec_bench/s3_utils.py` ported from
iputterman/specdec_bench@main. Recognizes run dirs by sentinel files,
refuses to overwrite existing S3 prefixes.
- `_redact_config` allowlists `tokenizer`, `tokenizer_path`,
`tokenizer_mode`, `tokenizer_revision` so the model path stops being
redacted (latent bug from substring-matching `token` ⊂ `tokenizer`).
- `requirements_speed.txt`: `boto3`, `botocore` added (used by
`s3_utils`).
**Out of scope** (deferred to Phase 1b / Phase 2):
- `--sweep_config` driver that emits per-run-dir nesting
`<sweep>/<NNN_dataset_c<conc>>/`
- `--s3_upload` flag baked into `run.py` itself
- Launcher auto-injection of the provenance env vars (currently the
example YAML sets them statically)
- `container_digest` (enroot integration) and full GPU/driver inventory
- TRTLLM `get_serving_config()`
### Usage
```bash
# Run a smoke benchmark (Qwen3.5-4B + vLLM + MTP draft=3) — example YAML included
uv run launch.py --yaml examples/Qwen/Qwen3.5-4B/specdec_bench_mtp.yaml --yes
# After it lands, upload the run directory to S3:
S3_KEY_ID=team-specdec-workgroup \
S3_SECRET=... \
python upload_to_s3.py /path/to/sweep_dir s3://team-specdec-workgroup/results
```
### Testing
Cluster-tested end-to-end on cw-dfw (Slurm job 11978794, NeMo Run
experiment `cicd_1779403623`, ~19 min wall):
- Qwen3.5-4B + vLLM + MTP draft=3 + SPEED-Bench-Internal/qualitative (80
requests)
- `configuration.json` (22 KB) populated all eight new provenance fields
- `Request_AR` mean 3.327 (vs 3.330 on the pre-Phase-1a run — within
noise; methodology unchanged)
- `upload_to_s3.py` (real upload, not dry-run) landed
[s3://team-specdec-workgroup/results/qwen35_4_mtp_smoke_2026-05-21/specdec_bench_mtp/](https://app.s8k.io/buckets/team-specdec-workgroup/?prefix=results%2Fqwen35_4_mtp_smoke_2026-05-21%2F)
where the visualizer at http://10.131.132.205:8080 can pick it up.
### Before your PR is "Ready for review"
- Is this change backward compatible?: ✅
- `configuration.json` only gains fields. `upload_to_s3.py` /
`s3_utils.py` are new files. `Model.get_serving_config()` default = `{}`
so existing subclasses without an override behave as before.
- If you copied code from any other sources or added a new PIP
dependency, did you follow guidance in `CONTRIBUTING.md`: ✅
- `boto3` / `botocore` are Apache 2.0 (permissive); `upload_to_s3.py` +
`s3_utils.py` are ported from a private NVIDIA repo with explicit
copyright headers retained.
- Did you write any new necessary tests?: ❌
- Validated by cluster smoke (see Testing). Will add unit-tests for
`dump_env` provenance fields and `upload_to_s3._discover_runs` in a
follow-up.
- Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?:
N/A
- Internal-facing tooling.
- Did you get Claude approval on this PR?: ❌ (triggering after open)
### Additional Information
Tracked in JIRA
[OMNIML-4788](https://jirasw.nvidia.com/browse/OMNIML-4788). The full
multi-phase plan is on that ticket's SPEC block — this PR is Phase 1a.
Cherry-picked alongside the harness change are two example YAMLs
(`examples/Qwen/Qwen3.5-4B/specdec_bench.yaml` for the NONE
autoregressive baseline, `..._mtp.yaml` for the MTP run) that gave us
cluster-test evidence. Can be split out if preferred.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added an S3 upload CLI for benchmark results with dry-run and
skip-existing options
* Automatic capture of run configuration, provenance and redacted
environment into saved config
* Models now export serving configuration for reproducible runs
* New launcher entrypoint and example job configs for Qwen SPEED-Bench
runs
* **Documentation**
* README section describing S3 upload usage and supported local layouts
* **Bug Fixes / Changes**
* Acceptance-rate metric keys renamed in output (AR -> AL)
* **Tests / CI**
* New tests for redaction and S3 utilities; CI now runs specdec_bench
examples
<!-- review_stack_entry_start -->
[](https://app.coderabbit.ai/change-stack/NVIDIA/Model-Optimizer/pull/1531?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)
<!-- review_stack_entry_end -->
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: chenhany <chenhany@nvidia.com>1 parent 999c999 commit a2c496a
20 files changed
Lines changed: 1099 additions & 31 deletions
File tree
- .github/workflows
- examples/specdec_bench
- specdec_bench
- datasets
- metrics
- models
- tests/examples/specdec_bench
- tools/launcher
- common/specdec_bench
- examples/Qwen/Qwen3.5-4B
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
39 | | - | |
| 39 | + | |
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
64 | 64 | | |
65 | 65 | | |
66 | 66 | | |
67 | | - | |
| 67 | + | |
68 | 68 | | |
69 | 69 | | |
70 | 70 | | |
| |||
145 | 145 | | |
146 | 146 | | |
147 | 147 | | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
148 | 175 | | |
149 | 176 | | |
150 | 177 | | |
| |||
Lines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
1 | 3 | | |
2 | 4 | | |
3 | 5 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
23 | 24 | | |
24 | 25 | | |
25 | 26 | | |
| |||
174 | 175 | | |
175 | 176 | | |
176 | 177 | | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
177 | 182 | | |
178 | 183 | | |
179 | 184 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
147 | 147 | | |
148 | 148 | | |
149 | 149 | | |
150 | | - | |
| 150 | + | |
151 | 151 | | |
152 | 152 | | |
153 | 153 | | |
| |||
Lines changed: 14 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
55 | 55 | | |
56 | 56 | | |
57 | 57 | | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
58 | 66 | | |
59 | 67 | | |
60 | 68 | | |
61 | 69 | | |
62 | | - | |
| 70 | + | |
63 | 71 | | |
64 | 72 | | |
65 | | - | |
| 73 | + | |
66 | 74 | | |
67 | 75 | | |
68 | | - | |
| 76 | + | |
69 | 77 | | |
70 | 78 | | |
71 | | - | |
| 79 | + | |
72 | 80 | | |
73 | | - | |
74 | | - | |
| 81 | + | |
| 82 | + | |
75 | 83 | | |
76 | 84 | | |
77 | 85 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
37 | | - | |
| 37 | + | |
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
44 | | - | |
| 44 | + | |
45 | 45 | | |
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
49 | | - | |
| 49 | + | |
50 | 50 | | |
51 | | - | |
| 51 | + | |
52 | 52 | | |
53 | 53 | | |
54 | 54 | | |
55 | | - | |
56 | | - | |
57 | | - | |
58 | | - | |
59 | | - | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
60 | 60 | | |
61 | 61 | | |
62 | 62 | | |
| |||
Lines changed: 13 additions & 13 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
47 | | - | |
| 47 | + | |
48 | 48 | | |
49 | 49 | | |
50 | 50 | | |
51 | 51 | | |
52 | 52 | | |
53 | | - | |
| 53 | + | |
54 | 54 | | |
55 | 55 | | |
56 | | - | |
| 56 | + | |
57 | 57 | | |
58 | 58 | | |
59 | | - | |
60 | | - | |
| 59 | + | |
| 60 | + | |
61 | 61 | | |
62 | 62 | | |
63 | 63 | | |
64 | | - | |
65 | | - | |
66 | | - | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
67 | 67 | | |
68 | 68 | | |
69 | 69 | | |
| |||
93 | 93 | | |
94 | 94 | | |
95 | 95 | | |
96 | | - | |
| 96 | + | |
97 | 97 | | |
98 | 98 | | |
99 | | - | |
| 99 | + | |
100 | 100 | | |
101 | 101 | | |
102 | 102 | | |
103 | 103 | | |
104 | | - | |
| 104 | + | |
105 | 105 | | |
106 | 106 | | |
107 | 107 | | |
| |||
124 | 124 | | |
125 | 125 | | |
126 | 126 | | |
127 | | - | |
128 | | - | |
| 127 | + | |
| 128 | + | |
129 | 129 | | |
130 | 130 | | |
131 | 131 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
30 | 39 | | |
31 | 40 | | |
0 commit comments