Add DSV4 GB300 wide-EP sweep configs (EP=12/16/24/32/40) by yhyang201 · Pull Request #1586 · SemiAnalysisAI/InferenceX

yhyang201 · 2026-05-29T14:08:25Z

Summary

Add 5 new search-space entries and recipe files for DSV4 GB300 non-MTP wide-EP sweep, matching srt-slurm PR#173 topology (18 nodes total).
EP sizes: 12/16/24/32/40, decode nodes: 3/4/6/8/10, concurrencies: 12000/8192/3000/2500/2048.
Uses InferenceX env vars and sglang_config (megamoe, W4A4, nightly-20260520 image), with Weiliang's tuned decode params (swa-full-tokens-ratio=0.20, max-running-requests=18432).

Note

Medium Risk
Changes multinode SLURM benchmark topology and serving flags for expensive GB300 runs; mistakes could schedule wrong node/GPU layouts or skew published perf, but scope is config/recipes only.

Overview
Replaces the DSV4 GB300 dynamo-sglang 8k/1k disaggregated search space with a wide expert-parallel sweep aligned to srt-slurm PR#173: five 18-node points at decode EP 12/16/24/32/40 with matching concurrencies (12000 down to 2048), plus a retained low-concurrency 1p1d tp4/tp4 case. Four older dominated topology entries (mixed node counts and DEP16-focused decode) are removed from nvidia-master.yaml.

The linked srt-slurm recipe YAMLs are retargeted or added (disagg-gb300-*-18-c*.yaml): node counts, decode TP/EP, multi-frontend routing, benchmark concurrency, and tuned SGLang decode settings (swa-full-tokens-ratio=0.20, max-running-requests=18432, prefill enable-dp-lm-head / cuda-graph-max-bs=512). One new recipe file is added for the EP=24 topology.

Smaller benchmark pins: dsv4-fp4-b200-vllm moves from a nightly image digest to vllm/vllm-openai:v0.22.0; gptoss-fp4-mi355x-vllm on AMD master switches the model to amd/gpt-oss-120b-w-mxfp4-a-fp8. The B200 DSv4 vLLM launch script drops EPLB flags when DP attention is enabled. perf-changelog.yaml documents the GB300 sweep.

^{Reviewed by Cursor Bugbot for commit 74f1a45. Bugbot is set up for automated code reviews on this repo. Configure here.}

github-actions · 2026-05-29T14:08:38Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-30T02:43:50Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26642196095
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26642196095

github-actions · 2026-05-30T11:51:47Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26642196095
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26642196095

github-actions · 2026-06-02T17:49:20Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26824028702
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26824028702

weireweire · 2026-06-03T05:03:13Z

there should be bugs on the visualizer, the result looks normal but the plot make it's interactivity too big

dsv4 | deepseek-ai/DeepSeek-V4-Pro | GB300-CW | DYNAMO-SGLANG | FP4 | 8192 | 1024 | 4 | 4 | true | 15 | 60 | 12 | 12 | true | 1 | 12 | 12000 | 147376 | 0 | 0 | 0 | 18.4703 | 0 | 54.141 | 0 | 0 | 0 | 164.389 | 0 | 0
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --

weireweire · 2026-06-03T06:04:06Z

@kedarpotdar-nv @jgangani could you have a look?

weireweire · 2026-06-03T08:46:44Z

actually, the median TPOT is showed as 0 in the table(while mean is goo), so issue is not visualize, there are real issue for tpot.

Comparing to NVIDIA/srt-slurm#173, the decode side added load-balance-method which is buggy, can be the issue.

weireweire · 2026-06-03T08:57:00Z

and enable_multiple_frontends: true and set num 8 is also missing

cursor · 2026-06-03T16:09:32Z

+      ep-num-redundant-experts: 16
+      enable-dp-attention: true
+      enable-dp-lm-head: true
+      max-running-requests: 18400


EP40 max-running-requests typo

Medium Severity

The EP=40 wide-EP recipe sets decode max-running-requests to 18400, while the other four new sweep recipes and perf-changelog.yaml specify 18432 for Weiliang-aligned decode tuning.

^{Reviewed by Cursor Bugbot for commit 4b2448d. Configure here.}

github-actions · 2026-06-05T01:16:10Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26974513741
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26974513741

github-actions · 2026-06-05T18:15:18Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27032160756
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27032160756

github-actions · 2026-06-05T18:32:27Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27032160756
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27032160756

github-actions · 2026-06-05T18:41:30Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27032160756
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27032160756

github-actions · 2026-06-05T18:45:37Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27033619607
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27033619607

github-actions · 2026-06-05T18:46:56Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27033666549
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27033666549

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 3 total unresolved issues (including 1 from previous review).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit d02ccd7. Configure here.}

cursor · 2026-06-05T18:47:08Z

-EPLB_ARGS=()
 if [ "${DP_ATTENTION}" = "true" ]; then
    MOE_ARGS=(--moe-backend deep_gemm_mega_moe)
-    EPLB_ARGS=(--enable-eplb --eplb-config '{"communicator":"torch_nccl", "use_async": false}')


B200 vLLM drops EPLB flags

Medium Severity

The B200 DSV4 vLLM launch script no longer passes --enable-eplb / --eplb-config when DP_ATTENTION is true, while the same commit bumps dsv4-fp4-b200-vllm to v0.22.0 with no perf-changelog entry for either change.

Additional Locations (1)

.github/configs/nvidia-master.yaml#L1758-L1759

^{Reviewed by Cursor Bugbot for commit d02ccd7. Configure here.}

github-actions · 2026-06-05T18:49:49Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27033807266
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27033807266

github-actions · 2026-06-05T18:52:02Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27033910325
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27033910325

github-actions · 2026-06-05T18:53:38Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27033992200
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27033992200

Add 5 wide-EP sweep configs (EP=12/16/24/32/40) from Weiliang, remove 4 old dominated configs that are no longer on the frontier.

github-actions · 2026-06-05T21:38:41Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27034082957
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27034082957

yhyang201 requested a review from a team May 29, 2026 14:08

yhyang201 requested review from jgangani and kedarpotdar-nv as code owners May 29, 2026 14:08

github-project-automation Bot added this to InferenceMAX Board May 29, 2026

yhyang201 added a commit that referenced this pull request May 29, 2026

Append perf-changelog entry for PR #1586

d5728f5

yhyang201 added the full-sweep-enabled label May 29, 2026

claude Bot reviewed May 29, 2026

View reviewed changes

Comment thread ...ti_node/srt-slurm-recipes/sglang/deepseek-v4/8k1k/disagg-gb300-8p1d-dep4-dep40-18-c2048.yaml Outdated

yhyang201 added a commit that referenced this pull request Jun 2, 2026

Append perf-changelog entry for PR #1586

562aa3f

yhyang201 force-pushed the add-dsv4-gb300-weiliang-wideep-sweep branch from d5728f5 to 562aa3f Compare June 2, 2026 13:47

cursor Bot reviewed Jun 3, 2026

View reviewed changes

yhyang201 added a commit that referenced this pull request Jun 4, 2026

Append perf-changelog entry for PR #1586

7be1353

yhyang201 force-pushed the add-dsv4-gb300-weiliang-wideep-sweep branch from 4b2448d to 08bd9ee Compare June 4, 2026 19:27

cursor Bot reviewed Jun 4, 2026

View reviewed changes

Comment thread ..._node/srt-slurm-recipes/sglang/deepseek-v4/8k1k/disagg-gb300-15p1d-dep4-dep12-18-c12000.yaml

yhyang201 added a commit that referenced this pull request Jun 5, 2026

Append perf-changelog entry for PR #1586

4b71b80

yhyang201 force-pushed the add-dsv4-gb300-weiliang-wideep-sweep branch from 08bd9ee to 4dc8a67 Compare June 5, 2026 18:14

yhyang201 force-pushed the add-dsv4-gb300-weiliang-wideep-sweep branch 2 times, most recently from 6b6de89 to a70b40a Compare June 5, 2026 18:36

yhyang201 closed this Jun 5, 2026

github-project-automation Bot moved this to Done in InferenceMAX Board Jun 5, 2026

yhyang201 reopened this Jun 5, 2026

yhyang201 added the sweep-enabled label Jun 5, 2026

yhyang201 force-pushed the add-dsv4-gb300-weiliang-wideep-sweep branch from b14ac97 to d02ccd7 Compare June 5, 2026 18:45

yhyang201 requested review from 1am9trash, billishyahao, chunfangamd, seungrokj and yctseng0211 as code owners June 5, 2026 18:45

yhyang201 removed the sweep-enabled label Jun 5, 2026

cursor Bot reviewed Jun 5, 2026

View reviewed changes

yhyang201 force-pushed the add-dsv4-gb300-weiliang-wideep-sweep branch from 7493408 to d02ccd7 Compare June 5, 2026 18:51

Add DSV4 GB300 wide-EP sweep configs, remove dominated old configs

74f1a45

Add 5 wide-EP sweep configs (EP=12/16/24/32/40) from Weiliang, remove 4 old dominated configs that are no longer on the frontier.

yhyang201 force-pushed the add-dsv4-gb300-weiliang-wideep-sweep branch from 94d8944 to 74f1a45 Compare June 5, 2026 18:55

Conversation

yhyang201 commented May 29, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

github-actions Bot commented May 29, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

weireweire commented Jun 3, 2026

Uh oh!

weireweire commented Jun 3, 2026

Uh oh!

weireweire commented Jun 3, 2026

Uh oh!

weireweire commented Jun 3, 2026

Uh oh!

cursor Bot Jun 3, 2026

Choose a reason for hiding this comment

EP40 max-running-requests typo

Uh oh!

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cursor Bot Jun 5, 2026

Choose a reason for hiding this comment

B200 vLLM drops EPLB flags

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yhyang201 commented May 29, 2026 •

edited by cursor Bot

Loading