You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[NV] Add GitHub Action to collect SPEED-Bench AL matrix (#1650)
* Add GitHub Action to collect SPEED-Bench AL matrix
Push-button (workflow_dispatch) collection of the DeepSeek-V4-Pro
SPEED-Bench acceptance-length matrix (thinking on/off x MTP 1-8) on
self-hosted B300 runners, optionally opening a PR that updates
benchmarks/speedbench-reference-al.yaml.
- benchmarks/single_node/dsv4_fp4_b300_vllm_speedbench_matrix.sh:
per (thinking, MTP) cell, serve vLLM, run SPEED-Bench, derive AL from
/metrics, and emit the YAML matrix. Serves from MODEL_PATH (the local
pre-staged weights resolved by the launcher), falling back to MODEL for
a standalone local run. Carries a temporary --chat-template-kwargs shim
until vllm-project/vllm#44244 lands in the benchmark image (idempotent,
applied only for thinking-on cells).
- runners/launch_b300-nv.sh: add opt-in BENCH_SCRIPT_OVERRIDE and
SALLOC_TIME_LIMIT hooks; both default to the prior behavior.
- .github/workflows/speedbench-al.yml: workflow_dispatch entry point;
MODEL is the HF id so the launcher resolves the staged MODEL_PATH.
* speedbench-al: default open-pr to false (artifact-only by default)
Make the workflow default to Option 1 (upload the AL matrix as an
artifact for manual review/paste) rather than auto-opening a PR. The
auto-PR path stays available as an opt-in (open-pr: true), but keeping
it off by default avoids exposing a write-scoped PAT on the self-hosted
runner and matches the repo's artifact-collection convention.
* speedbench-al: parameterize model + relocate collector script
Address review:
- Model is now a workflow input (model + model-prefix, default
deepseek-ai/DeepSeek-V4-Pro / dsv4). MODEL, MODEL_PREFIX, EXP_NAME,
BENCH_SCRIPT_OVERRIDE, artifact names and the Create-PR branch/title/body
are all derived from those inputs. The emitted YAML top-level key is now
derived from the model (MODEL_KEY, defaults to the model basename lowercased).
- Move the collector to benchmarks/single_node/speedbench/dsv4_fp4_b300_vllm.sh
and fix its benchmark_lib.sh source path (../ -> ../../) for the deeper dir.
0 commit comments