[NV] Update B300 DSV4 SGLang Pareto sweep by Ankur-singh · Pull Request #1575 · SemiAnalysisAI/InferenceX

Ankur-singh · 2026-05-27T22:16:44Z

Rebased copy of #1552 with origin/main merged in and the perf-changelog.yaml conflict resolved. Original PR: #1552.

Summary

update dsv4-fp4-b300-sglang to lmsysorg/sglang:nightly-dev-cu13-20260522-7cf193fe
align the B300 DSV4 non-MTP SGLang sweep with the 2026-05-19 8k/1k submission frontier:
- TP8 no DP attention: c1, c2, c4, c8, c16, c32, c64
- DEP8 with DP attention: c512, c768, c1024, c1536, c2048
map the single-node launch script to the TP-only flashinfer_mxfp4 recipe vs the DEP8 mixed-chunk DeepEP/DeepGEMM recipe

Testing

bash -n benchmarks/single_node/dsv4_fp4_b300_sglang.sh
git diff --check
uv run --python 3.13 --with pydantic --with pyyaml python utils/matrix_logic/generate_sweep_configs.py test-config --config-keys dsv4-fp4-b300-sglang --config-files .github/configs/nvidia-master.yaml --no-evals
uv run --python 3.13 --with pydantic --with pyyaml python utils/matrix_logic/generate_sweep_configs.py test-config --config-keys dsv4-fp4-b300-sglang --config-files .github/configs/nvidia-master.yaml --evals-only
uv run --python 3.13 --with pydantic --with pyyaml python utils/process_changelog.py --base-ref origin/main --head-ref HEAD --changelog-file perf-changelog.yaml --trim-conc

Note

Low Risk
Benchmark matrix and shell launch tuning only; no application auth, data, or serving logic outside perf harness configs.

Overview
Aligns the DeepSeek-V4-Pro FP4 B300 SGLang (non-MTP) benchmark with the 2026-05-19 8k/1k submission frontier and a newer nightly container.

In .github/configs/nvidia-master.yaml, bumps dsv4-fp4-b300-sglang to lmsysorg/sglang:nightly-dev-cu13-20260604-14ed9b44 and replaces sparse single-conc search-space rows with conc-list sweeps: TP8 / dp-attn: false for c1–c64 and TP8 EP8 / dp-attn: true for c512–c2048 (same pattern for 1k/1k and 8k/1k). Comments now describe selection via dp-attn instead of per-CONC script logic.

benchmarks/single_node/fixed_seq_len/dsv4_fp4_b300_sglang.sh drops the old CONC-keyed launch profiles in favor of two recipes keyed by DP_ATTENTION: low-latency flashinfer_mxfp4 TP-only vs throughput MegaMoE / mixed-chunk DEP8. Model loading skips hf download when the runner uses a staged /data/models path; serving uses --model-path $MODEL and fixed --max-running-requests. Adds shared SGLang JIT/opt env defaults and documents mount paths for the dev image.

perf-changelog.yaml documents the config-key change and PR link.

^{Reviewed by Cursor Bugbot for commit fbeb15a. Bugbot is set up for automated code reviews on this repo. Configure here.}

…-frontier # Conflicts: # perf-changelog.yaml

github-actions · 2026-05-27T22:16:53Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-27T22:16:53Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-27T22:23:04Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26542138862
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26542138862

claude · 2026-05-27T22:26:47Z

+- config-keys:
+    - dsv4-fp4-b300-sglang
+  description:
+    - "Update DeepSeek-V4-Pro FP4 B300 SGLang non-MTP sweep to the 2026-05-19 8k/1k submission frontier: TP8 no-DP-attention c1-c64 and DEP8 DP-attention c512/c768/c1024/c1536/c2048"
+    - "Use lmsysorg/sglang:nightly-dev-cu13-20260522-7cf193fe to pick up the merged SGLang warmup path"
+    - "Map dp-attn=false to TP8 flashinfer_mxfp4 with chunked-prefill 8192; map dp-attn=true to DEP8 mixed-chunk MegaMoE throughput settings"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1552


🔴 The new perf-changelog.yaml entry's pr-link points to PR #1552 (the closed/superseded original), but the merging PR is #1575. This breaks the file's convention (every neighboring entry links to its own merging PR) and will trip utils/merge_with_reuse.sh: its conflict-resolution helper skips entries whose link doesn't end with /pull/<current-pr>, and the post-merge assert last["pr-link"].endswith("/$PR") would fail. Change the link to /pull/1575 (or use the XXX placeholder).

Extended reasoning...

What the bug is

The new entry appended at perf-changelog.yaml:3175-3181 sets:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1552

…but the PR being merged is #1575 (the rebased successor — the PR description itself notes "Rebased copy of #1552 with origin/main merged in"). PR #1552 is closed and will not contain the merge commit for these changes.

Why this is a real issue, not just cosmetic

Every other recent entry in perf-changelog.yaml links to its own merging PR (e.g. the immediately preceding dsv4-fp4-mi355x-sglang entry → /pull/1568, others → /pull/1555, /pull/1558, /pull/1516, /pull/1354). AGENTS.md documents this convention. More importantly, the repo's canonical merge tool utils/merge_with_reuse.sh (referenced from .claude/commands/merge-prs.md) enforces it programmatically:

At ~line 136, when resolving perf-changelog.yaml conflicts it filters incoming entries with: if "XXX" not in link and not link.endswith(f"/pull/{pr}"): continue. With pr=1575 and link=…/pull/1552, this entry would be silently skipped during automated conflict resolution, producing "No PR contributions found".

At ~line 172, after the merge it runs assert last["pr-link"].endswith("/$PR"). With $PR=1575 and the last entry pointing at /pull/1552, this assertion would fail.

Step-by-step proof

Maintainer runs utils/merge_with_reuse.sh 1575.

The script fetches origin/main, attempts the merge, and hits a conflict in perf-changelog.yaml (likely, given how frequently this file changes).

The embedded Python helper walks the incoming entries from the PR branch. For each new entry it checks:
if "XXX" not in link and not link.endswith(f"/pull/{pr}"): continue
With pr = "1575" and link = "https://github.com/SemiAnalysisAI/InferenceX/pull/1552", the condition is true → the entry is dropped from the merged result.

Even if no conflict arises (so the helper isn't invoked), the post-merge sanity check runs:
assert last["pr-link"].endswith(f"/{pr}")
Since the last entry is the new one and ends with /1552, this assertion raises and the merge tool aborts.

Downstream readers using pr-link to find the diff/merge commit land on a closed, superseded PR instead of the one that actually merged.

Fix

Update line 3181 to:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1575

or, equivalently, use the XXX placeholder that merge_with_reuse.sh rewrites at merge time:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX

The historical reference to #1552 (which the PR description already provides) can stay in the PR description; the changelog entry should point at the PR that actually lands the change, both for convention and for tooling correctness.

github-actions · 2026-05-29T18:00:00Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26542138862
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26542138862

github-actions · 2026-06-04T22:03:55Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26542138862
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26542138862

…tale mount comment

Resolve conflicts keeping PR #1575's Pareto two-line design for dsv4-fp4-b300-sglang: dev-cu13 image + conc-list search-space, with the DP_ATTENTION benchmark script relocated to fixed_seq_len/. Keep main for all other configs, the -mtp entry, and the changelog (our entry appended).

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit fbeb15a. Configure here.}

cursor · 2026-06-05T00:22:46Z

 set -x
 PYTHONNOUSERSITE=1 sglang serve \
-    --model-path $MODEL_PATH --served-model-name $MODEL \
+    --model-path $MODEL \


Server ignores staged MODEL_PATH

High Severity

On B300, launch_b300-nv.sh keeps MODEL as the Hugging Face id and sets MODEL_PATH to pre-staged weights, but this script passes --model-path $MODEL and no longer uses MODEL_PATH or --served-model-name, unlike dsr1_fp4_b300.sh and dsv4_fp4_b300_sglang_mtp.sh.

^{Reviewed by Cursor Bugbot for commit fbeb15a. Configure here.}

github-actions · 2026-06-05T02:11:32Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26987594224
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26987594224

YAMY1234 and others added 3 commits May 22, 2026 00:05

Update B300 DSV4 SGLang sweep

a947d18

Resolve B300 DSV4 SGLang PR conflicts

c183bc8

Merge remote-tracking branch 'origin/main' into dsv4-b300-sglang-html…

deee4cc

…-frontier # Conflicts: # perf-changelog.yaml

Ankur-singh requested a review from a team May 27, 2026 22:16

Ankur-singh requested review from jgangani and kedarpotdar-nv as code owners May 27, 2026 22:16

github-project-automation Bot added this to InferenceMAX Board May 27, 2026

Ankur-singh added the full-sweep-enabled label May 27, 2026

cursor Bot reviewed May 27, 2026

View reviewed changes

Comment thread .github/configs/nvidia-master.yaml

claude Bot reviewed May 27, 2026

View reviewed changes

Ankur-singh mentioned this pull request Jun 4, 2026

[NV] Update B300 DSV4 SGLang Pareto sweep #1552

Closed

Ankur-singh added 2 commits June 4, 2026 15:37

dsv4-fp4-b300-sglang: use dev-cu13 image; fix changelog pr-link and s…

999175d

…tale mount comment

cursor Bot reviewed Jun 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NV] Update B300 DSV4 SGLang Pareto sweep#1575

[NV] Update B300 DSV4 SGLang Pareto sweep#1575
Ankur-singh wants to merge 5 commits into
mainfrom
dsv4-b300-sglang-html-frontier

Ankur-singh commented May 27, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

claude Bot May 27, 2026

Uh oh!

github-actions Bot commented May 29, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 5, 2026

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Ankur-singh commented May 27, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

claude Bot May 27, 2026

Choose a reason for hiding this comment

What the bug is

Why this is a real issue, not just cosmetic

Step-by-step proof

Fix

Uh oh!

github-actions Bot commented May 29, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 5, 2026

Choose a reason for hiding this comment

Server ignores staged MODEL_PATH

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ankur-singh commented May 27, 2026 •

edited by cursor Bot

Loading