Skip to content

Add DSV4 GB300 wide-EP sweep configs (EP=12/16/24/32/40)#1586

Open
yhyang201 wants to merge 1 commit into
mainfrom
add-dsv4-gb300-weiliang-wideep-sweep
Open

Add DSV4 GB300 wide-EP sweep configs (EP=12/16/24/32/40)#1586
yhyang201 wants to merge 1 commit into
mainfrom
add-dsv4-gb300-weiliang-wideep-sweep

Conversation

@yhyang201
Copy link
Copy Markdown
Collaborator

@yhyang201 yhyang201 commented May 29, 2026

Summary

  • Add 5 new search-space entries and recipe files for DSV4 GB300 non-MTP wide-EP sweep, matching srt-slurm PR#173 topology (18 nodes total).
  • EP sizes: 12/16/24/32/40, decode nodes: 3/4/6/8/10, concurrencies: 12000/8192/3000/2500/2048.
  • Uses InferenceX env vars and sglang_config (megamoe, W4A4, nightly-20260520 image), with Weiliang's tuned decode params (swa-full-tokens-ratio=0.20, max-running-requests=18432).

Note

Medium Risk
Changes multinode SLURM benchmark topology and serving flags for expensive GB300 runs; mistakes could schedule wrong node/GPU layouts or skew published perf, but scope is config/recipes only.

Overview
Replaces the DSV4 GB300 dynamo-sglang 8k/1k disaggregated search space with a wide expert-parallel sweep aligned to srt-slurm PR#173: five 18-node points at decode EP 12/16/24/32/40 with matching concurrencies (12000 down to 2048), plus a retained low-concurrency 1p1d tp4/tp4 case. Four older dominated topology entries (mixed node counts and DEP16-focused decode) are removed from nvidia-master.yaml.

The linked srt-slurm recipe YAMLs are retargeted or added (disagg-gb300-*-18-c*.yaml): node counts, decode TP/EP, multi-frontend routing, benchmark concurrency, and tuned SGLang decode settings (swa-full-tokens-ratio=0.20, max-running-requests=18432, prefill enable-dp-lm-head / cuda-graph-max-bs=512). One new recipe file is added for the EP=24 topology.

Smaller benchmark pins: dsv4-fp4-b200-vllm moves from a nightly image digest to vllm/vllm-openai:v0.22.0; gptoss-fp4-mi355x-vllm on AMD master switches the model to amd/gpt-oss-120b-w-mxfp4-a-fp8. The B200 DSv4 vLLM launch script drops EPLB flags when DP attention is enabled. perf-changelog.yaml documents the GB300 sweep.

Reviewed by Cursor Bugbot for commit 74f1a45. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@github-actions
Copy link
Copy Markdown
Contributor

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

yhyang201 added a commit that referenced this pull request Jun 2, 2026
@yhyang201 yhyang201 force-pushed the add-dsv4-gb300-weiliang-wideep-sweep branch from d5728f5 to 562aa3f Compare June 2, 2026 13:47
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

@weireweire
Copy link
Copy Markdown

there should be bugs on the visualizer, the result looks normal but the plot make it's interactivity too big

dsv4 | deepseek-ai/DeepSeek-V4-Pro | GB300-CW | DYNAMO-SGLANG | FP4 | 8192 | 1024 | 4 | 4 | true | 15 | 60 | 12 | 12 | true | 1 | 12 | 12000 | 147376 | 0 | 0 | 0 | 18.4703 | 0 | 54.141 | 0 | 0 | 0 | 164.389 | 0 | 0
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --


@weireweire
Copy link
Copy Markdown

@kedarpotdar-nv @jgangani could you have a look?

@weireweire
Copy link
Copy Markdown

actually, the median TPOT is showed as 0 in the table(while mean is goo), so issue is not visualize, there are real issue for tpot.

Comparing to NVIDIA/srt-slurm#173, the decode side added load-balance-method which is buggy, can be the issue.

@weireweire
Copy link
Copy Markdown

and enable_multiple_frontends: true and set num 8 is also missing

ep-num-redundant-experts: 16
enable-dp-attention: true
enable-dp-lm-head: true
max-running-requests: 18400
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EP40 max-running-requests typo

Medium Severity

The EP=40 wide-EP recipe sets decode max-running-requests to 18400, while the other four new sweep recipes and perf-changelog.yaml specify 18432 for Weiliang-aligned decode tuning.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 4b2448d. Configure here.

yhyang201 added a commit that referenced this pull request Jun 4, 2026
@yhyang201 yhyang201 force-pushed the add-dsv4-gb300-weiliang-wideep-sweep branch from 4b2448d to 08bd9ee Compare June 4, 2026 19:27
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

yhyang201 added a commit that referenced this pull request Jun 5, 2026
@yhyang201 yhyang201 force-pushed the add-dsv4-gb300-weiliang-wideep-sweep branch from 08bd9ee to 4dc8a67 Compare June 5, 2026 18:14
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

@yhyang201 yhyang201 force-pushed the add-dsv4-gb300-weiliang-wideep-sweep branch 2 times, most recently from 6b6de89 to a70b40a Compare June 5, 2026 18:36
@yhyang201 yhyang201 closed this Jun 5, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 3 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit d02ccd7. Configure here.

Comment thread .github/configs/amd-master.yaml
EPLB_ARGS=()
if [ "${DP_ATTENTION}" = "true" ]; then
MOE_ARGS=(--moe-backend deep_gemm_mega_moe)
EPLB_ARGS=(--enable-eplb --eplb-config '{"communicator":"torch_nccl", "use_async": false}')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

B200 vLLM drops EPLB flags

Medium Severity

The B200 DSV4 vLLM launch script no longer passes --enable-eplb / --eplb-config when DP_ATTENTION is true, while the same commit bumps dsv4-fp4-b200-vllm to v0.22.0 with no perf-changelog entry for either change.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit d02ccd7. Configure here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

@yhyang201 yhyang201 force-pushed the add-dsv4-gb300-weiliang-wideep-sweep branch from 7493408 to d02ccd7 Compare June 5, 2026 18:51
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

Add 5 wide-EP sweep configs (EP=12/16/24/32/40) from Weiliang,
remove 4 old dominated configs that are no longer on the frontier.
@yhyang201 yhyang201 force-pushed the add-dsv4-gb300-weiliang-wideep-sweep branch from 94d8944 to 74f1a45 Compare June 5, 2026 18:55
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

2 participants