Skip to content

Commit b6fa101

Browse files
cquil11Oseltamivir
authored andcommitted
feat: performance changelog triggered runs (as opposed to nightly) (#267) [skip-sweep]
* add logic for event driven runs new single workflow that runs on merge to main, new perg-changelog.yaml to track performance changes, new logic to parse changelog, removed cron job in full sweep schedulers * testing pt 1 * raise error if yaml diff in perf changelog is not valid * remove unused imports in process_changelog.py * config data key fix * raise error if test-config subprocess fails to run * backfill changelog * backfill changelog pt 2 * backfill changelog pt 3 * backfill changelog pt 4 * backfill changelog pt 5 * backfill changelog pt 6 * add always() condition to upload changelog metadata * backfill changelog pt 7 (test) * backfill changelog pt 8 (revert test) * backfill changelog pt 9 * backfill changelog pt 11 * change if condition for jobs in run sweep workflow * debugging run sweep workflow * debugging run sweep workflow pt 2 * debugging run sweep workflow pt 3 (revert) * debugging run sweep workflow pt 4 * debugging run sweep workflow pt 5 * debugging run sweep workflow pt 6 * debugging run sweep workflow pt 7 * add always() condition to upload changelog metadata (add back, this got removed) * add bmk prefix to results * backfill changelog official * for concurrency group, use more unique sha
1 parent 4f9eb0c commit b6fa101

14 files changed

Lines changed: 867 additions & 138 deletions

.github/workflows/benchmark-multinode-tmpl.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -170,5 +170,5 @@ jobs:
170170
- name: Upload results
171171
uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
172172
with:
173-
name: ${{ env.RESULT_FILENAME }}
173+
name: bmk_${{ env.RESULT_FILENAME }}
174174
path: agg_${{ env.RESULT_FILENAME }}_*.json

.github/workflows/benchmark-tmpl.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -169,5 +169,5 @@ jobs:
169169
- name: Upload result
170170
uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
171171
with:
172-
name: ${{ env.RESULT_FILENAME }}
172+
name: bmk_${{ env.RESULT_FILENAME }}
173173
path: agg_${{ env.RESULT_FILENAME }}.json

.github/workflows/collect-results.yml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ name: Template - Collect Results
33
on:
44
workflow_call:
55
inputs:
6-
exp-name:
6+
result-prefix:
77
required: false
88
type: string
99
default: ''
@@ -26,18 +26,18 @@ jobs:
2626
uses: actions/download-artifact@018cc2cf5baa6db3ef3c5f8a56943fffe632ef53 # v6.0.0
2727
with:
2828
path: results/
29-
pattern: ${{ inputs.exp-name && format('{0}_*', inputs.exp-name) || '*' }}
29+
pattern: ${{ inputs.result-prefix && format('{0}_*', inputs.result-prefix) || '*' }}
3030

3131
- name: Print summary
3232
run: |
3333
pip install tabulate
3434
python3 utils/summarize.py results/ >> $GITHUB_STEP_SUMMARY
3535
3636
- name: Aggregate results
37-
run: python3 utils/collect_results.py results/ ${{ inputs.exp-name || 'all' }}
37+
run: python3 utils/collect_results.py results/ ${{ inputs.result-prefix || 'all' }}
3838

3939
- name: Upload aggregated results
4040
uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
4141
with:
42-
name: results_${{ inputs.exp-name || 'all' }}
43-
path: agg_${{ inputs.exp-name || 'all' }}.json
42+
name: results_${{ inputs.result-prefix || 'all' }}
43+
path: agg_${{ inputs.result-prefix || 'all' }}.json

.github/workflows/full-sweep-1k1k-scheduler.yml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,6 @@ name: "Full Sweep Scheduler - 1k1k"
22

33
on:
44
workflow_dispatch:
5-
schedule:
6-
- cron: "0 0 * * *"
75

86
jobs:
97
get-dsr1-configs:

.github/workflows/full-sweep-1k8k-scheduler.yml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,6 @@ name: "Full Sweep Scheduler - 1k8k"
22

33
on:
44
workflow_dispatch:
5-
schedule:
6-
- cron: "0 0 * * *"
75

86
jobs:
97
get-dsr1-configs:

.github/workflows/full-sweep-8k1k-scheduler.yml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,6 @@ name: "Full Sweep Scheduler - 8k1k"
22

33
on:
44
workflow_dispatch:
5-
schedule:
6-
- cron: "0 0 * * *"
75

86
jobs:
97
get-dsr1-configs:

.github/workflows/run-sweep.yml

Lines changed: 235 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,235 @@
1+
name: "Run Sweep"
2+
run-name: Run Sweep - ${{ github.event.pull_request.title || github.ref_name }}
3+
4+
concurrency:
5+
group: sweep-${{ github.event.pull_request.number || github.sha }}
6+
cancel-in-progress: true
7+
8+
on:
9+
push:
10+
branches:
11+
- main
12+
paths:
13+
- "perf-changelog.yaml"
14+
pull_request:
15+
branches:
16+
- main
17+
types:
18+
- ready_for_review
19+
- synchronize
20+
- labeled
21+
paths:
22+
- "perf-changelog.yaml"
23+
24+
jobs:
25+
setup:
26+
runs-on: ubuntu-latest
27+
if: >-
28+
(github.event_name == 'pull_request' && !github.event.pull_request.draft && contains(github.event.pull_request.labels.*.name, 'sweep-enabled')) ||
29+
(github.event_name != 'pull_request' && !contains(github.event.head_commit.message, '[skip-sweep]'))
30+
outputs:
31+
search-space-config: ${{ steps.setup.outputs.search-space-config }}
32+
steps:
33+
- name: Checkout code
34+
uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3 # v6.0.0
35+
with:
36+
fetch-depth: 0
37+
38+
- id: setup
39+
run: |
40+
pip install pydantic
41+
42+
if [ "${{ github.event_name }}" == "pull_request" ]; then
43+
BASE_REF="origin/${{ github.base_ref }}"
44+
HEAD_REF="${{ github.event.pull_request.head.sha }}"
45+
else
46+
BASE_REF="${{ github.event.before }}"
47+
HEAD_REF="${{ github.event.after }}"
48+
fi
49+
50+
CONFIG_JSON=$(python3 ${GITHUB_WORKSPACE}/utils/process_changelog.py \
51+
--changelog-file ${GITHUB_WORKSPACE}/perf-changelog.yaml \
52+
--base-ref "$BASE_REF" \
53+
--head-ref "$HEAD_REF")
54+
55+
echo "search-space-config=$CONFIG_JSON" >> $GITHUB_OUTPUT
56+
57+
sweep-multi-node-1k1k:
58+
needs: setup
59+
if: ${{ toJson(fromJson(needs.setup.outputs.search-space-config).multi_node['1k1k']) != 'null' }}
60+
uses: ./.github/workflows/benchmark-multinode-tmpl.yml
61+
name: multi-node 1k1k /
62+
strategy:
63+
fail-fast: false
64+
matrix:
65+
config: ${{ fromJson(needs.setup.outputs.search-space-config).multi_node['1k1k'] }}
66+
secrets: inherit
67+
with: &multi-node-inputs
68+
isl: ${{ matrix.config.isl }}
69+
osl: ${{ matrix.config.osl }}
70+
max-model-len: ${{ matrix.config.max-model-len }}
71+
runner: ${{ matrix.config.runner }}
72+
image: ${{ matrix.config.image }}
73+
model: ${{ matrix.config.model }}
74+
model-prefix: ${{ matrix.config.model-prefix }}
75+
framework: ${{ matrix.config.framework }}
76+
precision: ${{ matrix.config.precision }}
77+
exp-name: ${{ matrix.config.exp-name }}
78+
conc-list: ${{ toJson(matrix.config.conc) }}
79+
spec-decoding: ${{ matrix.config.spec-decoding }}
80+
disagg: ${{ matrix.config.disagg }}
81+
82+
prefill-num-worker: ${{ matrix.config.prefill.num-worker }}
83+
prefill-tp: ${{ matrix.config.prefill.tp }}
84+
prefill-ep: ${{ matrix.config.prefill.ep }}
85+
prefill-dp-attn: ${{ matrix.config.prefill.dp-attn }}
86+
prefill-additional-settings: ${{ toJson(matrix.config.prefill.additional-settings) }}
87+
88+
decode-num-worker: ${{ matrix.config.decode.num-worker }}
89+
decode-tp: ${{ matrix.config.decode.tp }}
90+
decode-ep: ${{ matrix.config.decode.ep }}
91+
decode-dp-attn: ${{ matrix.config.decode.dp-attn }}
92+
decode-additional-settings: ${{ toJson(matrix.config.decode.additional-settings) }}
93+
94+
sweep-multi-node-1k8k:
95+
needs: setup
96+
if: ${{ toJson(fromJson(needs.setup.outputs.search-space-config).multi_node['1k8k']) != 'null' }}
97+
uses: ./.github/workflows/benchmark-multinode-tmpl.yml
98+
name: multi-node 1k8k /
99+
strategy:
100+
fail-fast: false
101+
matrix:
102+
config: ${{ fromJson(needs.setup.outputs.search-space-config).multi_node['1k8k'] }}
103+
secrets: inherit
104+
with: *multi-node-inputs
105+
106+
sweep-multi-node-8k1k:
107+
needs: setup
108+
if: ${{ toJson(fromJson(needs.setup.outputs.search-space-config).multi_node['8k1k']) != 'null' }}
109+
uses: ./.github/workflows/benchmark-multinode-tmpl.yml
110+
name: multi-node 8k1k /
111+
strategy:
112+
fail-fast: false
113+
matrix:
114+
config: ${{ fromJson(needs.setup.outputs.search-space-config).multi_node['8k1k'] }}
115+
secrets: inherit
116+
with: *multi-node-inputs
117+
118+
sweep-single-node-1k1k:
119+
needs: setup
120+
if: ${{ toJson(fromJson(needs.setup.outputs.search-space-config).single_node['1k1k']) != 'null' }}
121+
uses: ./.github/workflows/benchmark-tmpl.yml
122+
name: single-node 1k1k /
123+
strategy:
124+
fail-fast: false
125+
matrix:
126+
config: ${{ fromJson(needs.setup.outputs.search-space-config).single_node['1k1k'] }}
127+
secrets: inherit
128+
with: &single-node-inputs
129+
exp-name: ${{ matrix.config.exp-name }}
130+
isl: ${{ matrix.config.isl }}
131+
osl: ${{ matrix.config.osl }}
132+
max-model-len: ${{ matrix.config.max-model-len }}
133+
runner: ${{ matrix.config.runner }}
134+
image: ${{ matrix.config.image }}
135+
model: ${{ matrix.config.model }}
136+
model-prefix: ${{ matrix.config.model-prefix }}
137+
framework: ${{ matrix.config.framework }}
138+
precision: ${{ matrix.config.precision }}
139+
tp: ${{ matrix.config.tp }}
140+
ep: ${{ matrix.config.ep }}
141+
dp-attn: ${{ matrix.config.dp-attn }}
142+
conc: ${{ matrix.config.conc }}
143+
spec-decoding: ${{ matrix.config.spec-decoding }}
144+
disagg: ${{ matrix.config.disagg }}
145+
146+
sweep-single-node-1k8k:
147+
needs: setup
148+
if: ${{ toJson(fromJson(needs.setup.outputs.search-space-config).single_node['1k8k']) != 'null' }}
149+
uses: ./.github/workflows/benchmark-tmpl.yml
150+
name: single-node 1k8k /
151+
strategy:
152+
fail-fast: false
153+
matrix:
154+
config: ${{ fromJson(needs.setup.outputs.search-space-config).single_node['1k8k'] }}
155+
secrets: inherit
156+
with: *single-node-inputs
157+
158+
sweep-single-node-8k1k:
159+
needs: setup
160+
if: ${{ toJson(fromJson(needs.setup.outputs.search-space-config).single_node['8k1k']) != 'null' }}
161+
uses: ./.github/workflows/benchmark-tmpl.yml
162+
name: single-node 8k1k /
163+
strategy:
164+
fail-fast: false
165+
matrix:
166+
config: ${{ fromJson(needs.setup.outputs.search-space-config).single_node['8k1k'] }}
167+
secrets: inherit
168+
with: *single-node-inputs
169+
170+
collect-results:
171+
needs:
172+
[
173+
sweep-single-node-1k1k,
174+
sweep-single-node-1k8k,
175+
sweep-single-node-8k1k,
176+
sweep-multi-node-1k1k,
177+
sweep-multi-node-1k8k,
178+
sweep-multi-node-8k1k,
179+
setup,
180+
]
181+
if: ${{ always() && needs.setup.result != 'skipped' }}
182+
uses: ./.github/workflows/collect-results.yml
183+
secrets: inherit
184+
with:
185+
result-prefix: "bmk"
186+
187+
upload-changelog-metadata:
188+
needs: [setup, collect-results]
189+
if: ${{ always() && needs.setup.result != 'skipped' }}
190+
runs-on: ubuntu-latest
191+
steps:
192+
- name: Extract and save changelog metadata
193+
env:
194+
CONFIG_JSON: ${{ needs.setup.outputs.search-space-config }}
195+
run: |
196+
echo "$CONFIG_JSON" | jq '.changelog_metadata' > changelog_metadata.json
197+
198+
- name: Upload changelog artifact
199+
uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
200+
with:
201+
name: changelog-metadata
202+
path: changelog_metadata.json
203+
204+
calc-success-rate:
205+
needs: collect-results
206+
if: ${{ always() && needs.collect-results.result != 'skipped'}}
207+
runs-on: ubuntu-latest
208+
209+
env:
210+
RESULTS_DIR: "results/"
211+
STATS_FILENAME: "run_stats"
212+
GITHUB_TOKEN: ${{ secrets.REPO_PAT }}
213+
214+
steps:
215+
- uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3 # v6.0.0
216+
with:
217+
token: ${{ secrets.REPO_PAT }}
218+
fetch-depth: 0
219+
220+
- name: Download results artifacts
221+
uses: actions/download-artifact@018cc2cf5baa6db3ef3c5f8a56943fffe632ef53 # v6.0.0
222+
with:
223+
path: ${{ env.RESULTS_DIR }}
224+
pattern: results_*
225+
226+
- name: Install python dependencies
227+
run: pip install PyGithub
228+
229+
- name: Calculate success rate
230+
run: python3 utils/calc_success_rate.py $STATS_FILENAME
231+
232+
- uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
233+
with:
234+
name: "run-stats"
235+
path: ${{ env.STATS_FILENAME }}.json

perf-changelog.yaml

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
- config-keys:
2+
- 70b-fp8-*-vllm
3+
description: |
4+
- Add compilation-config: '{"custom_ops": ["-rms_norm", "-quant_fp8", "-silu_and_mul"]}' as
5+
extra config to all benchmarks/70b_fp8_mi*.sh scripts
6+
- 6-7% uplift for llama for 6/8 configs
7+
PR: https://github.com/InferenceMAX/InferenceMAX/pull/95
8+
- config-keys:
9+
- gptoss-fp4-*-trt
10+
description: |
11+
- Upgrade GPT-OSS TRT images from 'release:1.1.0rc2.post2' to '1.2.0rc0.post1'
12+
- Add NCCL_GRAPH_REGISTER=0 to benchmarks/gptoss_fp4_b200_trt_slurm.sh
13+
- Change kv_cache_config.dtype from 'auto' to 'fp8' in benchmarks/gptoss_fp4_b200_trt_slurm.sh
14+
- Remove MOE_BACKEND=CUTLASS, now just defaults to TRTLLM
15+
PR: https://github.com/InferenceMAX/InferenceMAX/pull/110
16+
- config-keys:
17+
- gptoss*
18+
- dsr1*
19+
description: |
20+
- Remove Llama 70B runs to make room for multi-node disagg prefill+wideEP on
21+
h100/h200/b200/mi300/mi325/mi355
22+
PR: https://github.com/InferenceMAX/InferenceMAX/pull/149
23+
- config-keys:
24+
- gptoss-fp4-b200-vllm
25+
- gptoss-fp4-h100-vllm
26+
- gptoss-fp4-h200-vllm
27+
description: |
28+
- Upgrade vLLM from 0.10.2 to 0.11.0 for GPT-OSS NVIDIA single-node configs
29+
- Adds compilation-config: '{"cudagraph_mode":"PIECEWISE"} accordingly since vLLM 0.11.0
30+
requires now defaults to FULL_AND_PIECEWISE
31+
PR: https://github.com/InferenceMAX/InferenceMAX/pull/159
32+
- config-keys:
33+
- dsr1*
34+
description: |
35+
- Fixes bug where 1k8k and 8k1k full sweeps had incorrect max-model-len for DeepSeek
36+
PR: https://github.com/InferenceMAX/InferenceMAX/pull/163
37+
- config-keys:
38+
- dsr1-fp4-b200-sglang
39+
- dsr1-fp8-b200-sglang
40+
- dsr1-fp8-h200-sglang
41+
description: |
42+
- Consolidates H200 and B200 SGLang configurations to use unified v0.5.5-cu129-amd64
43+
image tag and updates deprecated SGLang server arguments to their current equivalents.
44+
- --enable-flashinfer-trtllm-moe & --enable-ep-moe is no longer available in sglang so we needed to change it
45+
- ep: 4 for all tp: 4 entries (3 occurrences in dsr1-fp4-b200-sglang)
46+
- ep: 8 for all tp: 8 entries (6 occurrences across dsr1-fp4-b200-sglang and dsr1-fp8-b200-sglang)
47+
- dsr1_fp4_b200_docker.sh: Replaced --enable-ep-moe with --ep-size $EP_SIZE and --enable-flashinfer-trtllm-moe with
48+
--moe-runner-backend flashinfer_trtllm
49+
- dsr1_fp8_b200_docker.sh: Replaced --enable-flashinfer-trtllm-moe with --moe-runner-backend flashinfer_trtllm and
50+
added --ep-size $EP_SIZE
51+
- launch_b200-nvd.sh: Added -e EP_SIZE to Docker run command to pass environment variable to container
52+
- launch_b200-tg.sh: Added -e EP_SIZE to Docker run command to pass environment variable to container
53+
PR: https://github.com/InferenceMAX/InferenceMAX/pull/204
54+
- config-keys:
55+
- gptoss-fp4-mi355x-vllm
56+
- gptoss-fp4-b200-vllm
57+
description: |
58+
- Extend concurrency to 128 for gptoss mi355x/b200 vllm configurations
59+
PR: https://github.com/InferenceMAX/InferenceMAX/pull/209
60+
- config-keys:
61+
- gptoss-fp4-b200-trt
62+
description: |
63+
- Extend concurrency to 128 for gptoss b200 TRT configurations
64+
PR: https://github.com/InferenceMAX/InferenceMAX/pull/233
65+
- config-keys:
66+
- "*gb200-sglang"
67+
description: |
68+
- Introducing some improvements in GB200 SGLang DSR1 submission
69+
PR: https://github.com/InferenceMAX/InferenceMAX/pull/257
70+
- config-keys:
71+
- dsr1-fp8-h200-trt
72+
description: |
73+
- Update TRT image from nvcr.io#nvidia/tensorrt-llm/release:1.2.0rc0.post1 to nvcr.io#nvidia/tensorrt-llm/release:1.2.0rc2
74+
- Increase concurrency for some configurations
75+
PR: https://github.com/InferenceMAX/InferenceMAX/pull/266
76+
- config-keys:
77+
- gptoss-fp4-b200-vllm
78+
- gptoss-fp4-h100-vllm
79+
- gptoss-fp4-h200-vllm
80+
description: |
81+
- Update vLLM image for NVIDIA configs from vLLM 0.11.0 to vLLM 0.11.2
82+
- Adds kv-cache-dtype: fp8 to benchmarks/gptoss_fp4_b200_docker.sh
83+
PR: https://github.com/InferenceMAX/InferenceMAX/pull/273

0 commit comments

Comments
 (0)