Add DSV4 GB300 wide-EP sweep configs (EP=12/16/24/32/40)#1586
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26642196095 |
1 similar comment
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26642196095 |
d5728f5 to
562aa3f
Compare
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26824028702 |
|
there should be bugs on the visualizer, the result looks normal but the plot make it's interactivity too big |
|
@kedarpotdar-nv @jgangani could you have a look? |
|
actually, the median TPOT is showed as 0 in the table(while mean is goo), so issue is not visualize, there are real issue for tpot. Comparing to NVIDIA/srt-slurm#173, the decode side added load-balance-method which is buggy, can be the issue. |
|
and |
| ep-num-redundant-experts: 16 | ||
| enable-dp-attention: true | ||
| enable-dp-lm-head: true | ||
| max-running-requests: 18400 |
There was a problem hiding this comment.
EP40 max-running-requests typo
Medium Severity
The EP=40 wide-EP recipe sets decode max-running-requests to 18400, while the other four new sweep recipes and perf-changelog.yaml specify 18432 for Weiliang-aligned decode tuning.
Reviewed by Cursor Bugbot for commit 4b2448d. Configure here.
4b2448d to
08bd9ee
Compare
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26974513741 |
08bd9ee to
4dc8a67
Compare
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27032160756 |
1 similar comment
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27032160756 |
6b6de89 to
a70b40a
Compare
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27032160756 |
b14ac97 to
d02ccd7
Compare
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27033619607 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27033666549 |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
There are 3 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit d02ccd7. Configure here.
| EPLB_ARGS=() | ||
| if [ "${DP_ATTENTION}" = "true" ]; then | ||
| MOE_ARGS=(--moe-backend deep_gemm_mega_moe) | ||
| EPLB_ARGS=(--enable-eplb --eplb-config '{"communicator":"torch_nccl", "use_async": false}') |
There was a problem hiding this comment.
B200 vLLM drops EPLB flags
Medium Severity
The B200 DSV4 vLLM launch script no longer passes --enable-eplb / --eplb-config when DP_ATTENTION is true, while the same commit bumps dsv4-fp4-b200-vllm to v0.22.0 with no perf-changelog entry for either change.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit d02ccd7. Configure here.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27033807266 |
7493408 to
d02ccd7
Compare
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27033910325 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27033992200 |
Add 5 wide-EP sweep configs (EP=12/16/24/32/40) from Weiliang, remove 4 old dominated configs that are no longer on the frontier.
94d8944 to
74f1a45
Compare
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27034082957 |


Summary
Note
Medium Risk
Changes multinode SLURM benchmark topology and serving flags for expensive GB300 runs; mistakes could schedule wrong node/GPU layouts or skew published perf, but scope is config/recipes only.
Overview
Replaces the DSV4 GB300
dynamo-sglang8k/1k disaggregated search space with a wide expert-parallel sweep aligned to srt-slurm PR#173: five 18-node points at decode EP 12/16/24/32/40 with matching concurrencies (12000 down to 2048), plus a retained low-concurrency1p1dtp4/tp4case. Four older dominated topology entries (mixed node counts and DEP16-focused decode) are removed fromnvidia-master.yaml.The linked srt-slurm recipe YAMLs are retargeted or added (
disagg-gb300-*-18-c*.yaml): node counts, decode TP/EP, multi-frontend routing, benchmark concurrency, and tuned SGLang decode settings (swa-full-tokens-ratio=0.20,max-running-requests=18432, prefillenable-dp-lm-head/cuda-graph-max-bs=512). One new recipe file is added for the EP=24 topology.Smaller benchmark pins:
dsv4-fp4-b200-vllmmoves from a nightly image digest tovllm/vllm-openai:v0.22.0;gptoss-fp4-mi355x-vllmon AMD master switches the model toamd/gpt-oss-120b-w-mxfp4-a-fp8. The B200 DSv4 vLLM launch script drops EPLB flags when DP attention is enabled.perf-changelog.yamldocuments the GB300 sweep.Reviewed by Cursor Bugbot for commit 74f1a45. Bugbot is set up for automated code reviews on this repo. Configure here.