Skip to content

Commit d8933d7

Browse files
authored
[NV] Add GitHub Action to collect SPEED-Bench AL matrix (#1650)
* Add GitHub Action to collect SPEED-Bench AL matrix Push-button (workflow_dispatch) collection of the DeepSeek-V4-Pro SPEED-Bench acceptance-length matrix (thinking on/off x MTP 1-8) on self-hosted B300 runners, optionally opening a PR that updates benchmarks/speedbench-reference-al.yaml. - benchmarks/single_node/dsv4_fp4_b300_vllm_speedbench_matrix.sh: per (thinking, MTP) cell, serve vLLM, run SPEED-Bench, derive AL from /metrics, and emit the YAML matrix. Serves from MODEL_PATH (the local pre-staged weights resolved by the launcher), falling back to MODEL for a standalone local run. Carries a temporary --chat-template-kwargs shim until vllm-project/vllm#44244 lands in the benchmark image (idempotent, applied only for thinking-on cells). - runners/launch_b300-nv.sh: add opt-in BENCH_SCRIPT_OVERRIDE and SALLOC_TIME_LIMIT hooks; both default to the prior behavior. - .github/workflows/speedbench-al.yml: workflow_dispatch entry point; MODEL is the HF id so the launcher resolves the staged MODEL_PATH. * speedbench-al: default open-pr to false (artifact-only by default) Make the workflow default to Option 1 (upload the AL matrix as an artifact for manual review/paste) rather than auto-opening a PR. The auto-PR path stays available as an opt-in (open-pr: true), but keeping it off by default avoids exposing a write-scoped PAT on the self-hosted runner and matches the repo's artifact-collection convention. * speedbench-al: parameterize model + relocate collector script Address review: - Model is now a workflow input (model + model-prefix, default deepseek-ai/DeepSeek-V4-Pro / dsv4). MODEL, MODEL_PREFIX, EXP_NAME, BENCH_SCRIPT_OVERRIDE, artifact names and the Create-PR branch/title/body are all derived from those inputs. The emitted YAML top-level key is now derived from the model (MODEL_KEY, defaults to the model basename lowercased). - Move the collector to benchmarks/single_node/speedbench/dsv4_fp4_b300_vllm.sh and fix its benchmark_lib.sh source path (../ -> ../../) for the deeper dir.
1 parent 437e01a commit d8933d7

3 files changed

Lines changed: 563 additions & 1 deletion

File tree

Lines changed: 214 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,214 @@
1+
name: SpeedBench AL Collection
2+
3+
# Push-button (workflow_dispatch) collection of a SPEED-Bench acceptance-length
4+
# (AL) matrix: thinking_on/off x MTP levels, for the given model (defaults to
5+
# DeepSeek-V4-Pro). Produces the golden reference consumed by the
6+
# synthetic-acceptance framework and (optionally) opens a PR updating
7+
# benchmarks/speedbench-reference-al.yaml.
8+
9+
on:
10+
workflow_dispatch:
11+
inputs:
12+
runner:
13+
description: "Self-hosted GPU runner label (B300)"
14+
required: false
15+
type: string
16+
default: 'b300'
17+
model:
18+
description: "HF model id (basename must be in launcher STAGED_MODELS for pre-staged local weights)"
19+
required: false
20+
type: string
21+
default: 'deepseek-ai/DeepSeek-V4-Pro'
22+
model-prefix:
23+
description: "Model prefix; drives launcher MODEL_PATH resolution, exp name, collector script, and artifact names"
24+
required: false
25+
type: string
26+
default: 'dsv4'
27+
image:
28+
description: "vLLM container image"
29+
required: false
30+
type: string
31+
default: 'vllm/vllm-openai:v0.21.0'
32+
mtp-list:
33+
description: "Space-separated MTP levels (num_speculative_tokens)"
34+
required: false
35+
type: string
36+
default: '1 2 3 4 5 6 7 8'
37+
thinking-modes:
38+
description: "Space-separated thinking modes to collect"
39+
required: false
40+
type: string
41+
default: 'off on'
42+
category:
43+
description: "SPEED-Bench category"
44+
required: false
45+
type: string
46+
default: 'coding'
47+
output-len:
48+
description: "Per-request output length"
49+
required: false
50+
type: string
51+
default: '4096'
52+
thinking-kwargs:
53+
description: "chat_template_kwargs JSON for thinking-on cells (match golden config)"
54+
required: false
55+
type: string
56+
default: '{"thinking": true, "reasoning_effort": "high"}'
57+
salloc-time:
58+
description: "Slurm allocation minutes (16 server starts ~ several hours)"
59+
required: false
60+
type: string
61+
default: '480'
62+
open-pr:
63+
description: "Open a PR updating benchmarks/speedbench-reference-al.yaml (default off: artifact-only, paste values in manually)"
64+
required: false
65+
type: boolean
66+
default: false
67+
ref:
68+
description: "Git ref (branch/sha) to checkout"
69+
required: false
70+
type: string
71+
72+
permissions:
73+
contents: read
74+
75+
env:
76+
HF_TOKEN: ${{ secrets.HF_TOKEN }}
77+
HF_HUB_CACHE: '/mnt/hf_hub_cache/'
78+
# Drive the single-node path in runners/launch_b300-nv.sh. MODEL is the HF id;
79+
# its basename (e.g. DeepSeek-V4-Pro) must be in the launcher's STAGED_MODELS so
80+
# the launcher resolves MODEL_PATH to the pre-staged local weights and mounts
81+
# them. The collector serves from MODEL_PATH (see SERVE_MODEL), so no download.
82+
MODEL: ${{ inputs.model }}
83+
MODEL_PREFIX: ${{ inputs.model-prefix }}
84+
PRECISION: fp4
85+
FRAMEWORK: vllm
86+
EXP_NAME: ${{ inputs.model-prefix }}_speedbench
87+
IMAGE: ${{ inputs.image }}
88+
TP: '8'
89+
EP_SIZE: '1'
90+
DP_ATTENTION: 'false'
91+
SPEC_DECODING: mtp
92+
# Run the AL-matrix collector instead of the auto-selected throughput script.
93+
BENCH_SCRIPT_OVERRIDE: benchmarks/single_node/speedbench/${{ inputs.model-prefix }}_fp4_b300_vllm.sh
94+
SALLOC_TIME_LIMIT: ${{ inputs.salloc-time }}
95+
# Matrix-collector tunables (propagated into the container via srun --export=ALL).
96+
MTP_LIST: ${{ inputs.mtp-list }}
97+
THINKING_MODES: ${{ inputs.thinking-modes }}
98+
CATEGORY: ${{ inputs.category }}
99+
SPEEDBENCH_OUTPUT_LEN: ${{ inputs.output-len }}
100+
CHAT_TEMPLATE_KWARGS_ON: ${{ inputs.thinking-kwargs }}
101+
OUT_YAML: /workspace/speedbench-reference-al.yaml
102+
PYTHONDONTWRITEBYTECODE: '1'
103+
PYTHONPYCACHEPREFIX: /tmp/inferencex-pycache
104+
105+
jobs:
106+
collect-al:
107+
runs-on: ${{ inputs.runner }}
108+
timeout-minutes: 600
109+
name: "SpeedBench AL matrix | ${{ inputs.category }} | mtp=[${{ inputs.mtp-list }}] | thinking=[${{ inputs.thinking-modes }}]"
110+
steps:
111+
- name: Resource cleanup (pre-run)
112+
run: &resource-cleanup |
113+
# Cleanup Docker resources
114+
if command -v docker >/dev/null 2>&1 && docker info >/dev/null 2>&1; then
115+
echo "[Docker] Cleaning up resources ..."
116+
docker ps -aq | xargs -r docker rm -f
117+
docker network prune -f
118+
while [ -n "$(docker ps -aq)" ]; do
119+
docker ps -a
120+
sleep 5
121+
done
122+
fi
123+
124+
# Cleanup SLURM resources
125+
if command -v squeue >/dev/null 2>&1; then
126+
echo "[Slurm] Cleaning up jobs with name: ${{ runner.name }} ..."
127+
scancel --name="${{ runner.name }}" || true
128+
while [ -n "$(squeue --name='${{ runner.name }}' --noheader --format='%i')" ]; do
129+
squeue --name="${{ runner.name }}"
130+
sleep 5
131+
done
132+
fi
133+
134+
# Cleanup AL-matrix outputs from a prior job on this runner so a stale
135+
# matrix from a previous run is never picked up as this job's output.
136+
rm -rf "${{ github.workspace }}/speedbench_results" 2>/dev/null || true
137+
138+
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
139+
with:
140+
token: ${{ secrets.REPO_PAT }}
141+
fetch-depth: 0
142+
ref: ${{ inputs.ref || github.sha }}
143+
clean: true
144+
submodules: true
145+
146+
- name: Cleanup stale outputs (pre-run)
147+
run: |
148+
rm -f speedbench-reference-al.yaml || true
149+
rm -f gpu_metrics.csv || true
150+
rm -rf speed_bench_data || true
151+
152+
- name: Collect AL matrix
153+
env:
154+
RUNNER_NAME: ${{ runner.name }}
155+
run: |
156+
set -euo pipefail
157+
bash ./runners/launch_${RUNNER_NAME%%_*}.sh
158+
159+
if [ ! -f "speedbench-reference-al.yaml" ]; then
160+
echo "AL collection failed: speedbench-reference-al.yaml not produced." >&2
161+
exit 1
162+
fi
163+
echo "### SpeedBench AL matrix" >> "$GITHUB_STEP_SUMMARY"
164+
echo '```yaml' >> "$GITHUB_STEP_SUMMARY"
165+
cat speedbench-reference-al.yaml >> "$GITHUB_STEP_SUMMARY"
166+
echo '```' >> "$GITHUB_STEP_SUMMARY"
167+
168+
- name: Upload AL matrix artifact
169+
if: always()
170+
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
171+
with:
172+
name: speedbench-reference-al-${{ inputs.model-prefix }}
173+
path: speedbench-reference-al.yaml
174+
if-no-files-found: warn
175+
176+
- name: Open PR updating reference yaml
177+
if: ${{ inputs.open-pr && success() }}
178+
env:
179+
GH_TOKEN: ${{ secrets.REPO_PAT }}
180+
run: |
181+
set -euo pipefail
182+
# NOTE: the reference yaml is keyed by model at the top level. This
183+
# overwrites it with the current model's matrix; when more than one
184+
# model is collected, replace this cp with a per-model-key YAML merge.
185+
cp speedbench-reference-al.yaml benchmarks/speedbench-reference-al.yaml
186+
187+
BRANCH="speedbench-al/${{ inputs.model-prefix }}-auto-${{ github.run_id }}"
188+
git config user.name "github-actions"
189+
git config user.email "github-actions@github.com"
190+
git checkout -b "$BRANCH"
191+
git add benchmarks/speedbench-reference-al.yaml
192+
if git diff --cached --quiet; then
193+
echo "No change in reference yaml; skipping PR."
194+
exit 0
195+
fi
196+
git commit -m "Update SpeedBench AL reference matrix for ${{ inputs.model }} (auto, run ${{ github.run_id }})"
197+
git push -u origin "$BRANCH"
198+
gh pr create \
199+
--title "Update SpeedBench AL reference matrix for ${{ inputs.model-prefix }} (auto)" \
200+
--body "Auto-generated by the SpeedBench AL Collection workflow (run ${{ github.run_id }}). Model: \`${{ inputs.model }}\`, category: \`${{ inputs.category }}\`, MTP: \`${{ inputs.mtp-list }}\`, thinking: \`${{ inputs.thinking-modes }}\`, output_len: \`${{ inputs.output-len }}\`. Please review the measured values before merging." \
201+
--base main \
202+
--head "$BRANCH"
203+
204+
- name: Upload server logs
205+
if: always()
206+
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
207+
with:
208+
name: speedbench_server_logs-${{ inputs.model-prefix }}
209+
path: speedbench_results/server_*.log
210+
if-no-files-found: ignore
211+
212+
- name: Resource cleanup (post-run)
213+
if: always()
214+
run: *resource-cleanup

0 commit comments

Comments
 (0)