[WIP] Update Dsv4 B300 configs by wzhao18 · Pull Request #1656 · SemiAnalysisAI/InferenceX

wzhao18 · 2026-06-03T17:26:26Z

Note

Low Risk
Benchmark and runner config only; no auth, serving API, or production inference path changes.

Overview
Switches the dsv4-fp4-b300-vllm recipe to the nvidia/DeepSeek-V4-Pro-NVFP4 checkpoint instead of the upstream deepseek-ai/DeepSeek-V4-Pro weights, and registers DeepSeek-V4-Pro-NVFP4 in the B300 runner’s pre-staged model list so jobs can load from scratch without a fresh HF pull.

The single-node benchmark script no longer passes --moe-backend deep_gemm_mega_moe when data-parallel attention is enabled—that block is commented out, so DP layouts run with default MoE backend behavior for this NVFP4 setup.

^{Reviewed by Cursor Bugbot for commit 41630f2. Bugbot is set up for automated code reviews on this repo. Configure here.}

github-actions · 2026-06-03T17:26:37Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

claude · 2026-06-03T17:29:44Z

+- config-keys:
+    - dsv4-fp4-b300-vllm
+  description:
+    - "Update B300 dsv4 image to nvfp4"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1652


🔴 Two issues in the new perf-changelog entry for dsv4-fp4-b300-vllm: (1) The pr-link references #1652, but that's the MiniMax-M2.5 PR (the entry immediately above) — this is a copy-paste, the link should point to this PR (#1656). (2) The description says "Update B300 dsv4 image to nvfp4", but the diff only changes the model field; the image (vllm/vllm-openai:v0.22.0) is unchanged, so it should say something like "Update B300 dsv4 model to NVFP4 checkpoint".

Extended reasoning...

Bug 1: Wrong PR link

The new changelog entry at line 3438 sets pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1652. However, PR #1652 is [B300][vLLM] Add MiniMax-M2.5 FP4 disagg Dynamo configs (already merged at commit 7d4063d) — a completely different PR. The current PR is #1656 ("[WIP] Update Dsv4 B300 configs"). The entry immediately above (line 3432, for minimaxm2.5-fp4-b300-dynamo-vllm) correctly references #1652, so this is almost certainly a copy-paste error from the adjacent entry.

Bug 2: Misleading description ("image" vs "model")

The description says "Update B300 dsv4 image to nvfp4". But the diff in .github/configs/nvidia-master.yaml only changes the model field:

- model: deepseek-ai/DeepSeek-V4-Pro + model: nvidia/DeepSeek-V4-Pro-NVFP4

The image field (vllm/vllm-openai:v0.22.0) is unchanged. In perf-changelog.yaml "image" consistently refers to the Docker container image — e.g. line 3430 of the same file says "Image: vllm/vllm-openai:v0.20.1". The PR's own Cursor Bugbot summary even acknowledges this: "Image, search space, and other recipe fields are unchanged." Calling the model swap an "image" change is misleading.

Step-by-step proof

The PR metadata shows this is PR [WIP] Update Dsv4 B300 configs #1656, titled "[WIP] Update Dsv4 B300 configs".

Recent commit history (7d4063d) confirms PR [B300][vLLM] Add MiniMax-M2.5 FP4 disagg Dynamo configs #1652 is a separate, already-merged PR titled "[B300][vLLM] Add MiniMax-M2.5 FP4 disagg Dynamo configs".

In perf-changelog.yaml, the entry at line 3432 (the MiniMax-M2.5 entry) legitimately uses pr-link: .../pull/1652.

The newly added entry at line 3438 (this PR's entry) uses the same pr-link: .../pull/1652 — duplicating the link from the line above instead of referencing [WIP] Update Dsv4 B300 configs #1656.

Reading the diff in .github/configs/nvidia-master.yaml, only model: changes; image: stays at vllm/vllm-openai:v0.22.0. So "Update B300 dsv4 image to nvfp4" describes a change that did not happen, and omits the change that did.

Suggested fix

- config-keys: - dsv4-fp4-b300-vllm description: - "Update B300 dsv4 model to NVFP4 checkpoint (nvidia/DeepSeek-V4-Pro-NVFP4)" pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1656

Impact

Documentation-only — no runtime behavior is affected. But the changelog is the canonical record for tracing perf changes back to their PR, and the incorrect link points readers to an unrelated PR with no context for this model swap.

github-actions · 2026-06-03T17:45:30Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26901571751
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26901571751

github-actions · 2026-06-04T17:06:58Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26901571751
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26901571751

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 41630f2. Configure here.}

cursor · 2026-06-04T19:02:22Z

-fi
+# if [ "${DP_ATTENTION}" = "true" ]; then
+#     MOE_ARGS=(--moe-backend deep_gemm_mega_moe)
+# fi


DP-attn megamoe backend disabled

Medium Severity

With DP_ATTENTION=true, the script no longer passes --moe-backend deep_gemm_mega_moe, but dsv4-fp4-b300-vllm still schedules high-concurrency dp-attn/ep points. That diverges from the prior B300 pareto recipe and from dsv4_fp4_b300_vllm_mtp.sh / B200 vLLM siblings, so those runs may not match the intended serving path.

^{Reviewed by Cursor Bugbot for commit 41630f2. Configure here.}

github-actions · 2026-06-04T22:40:23Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26973148843
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26973148843

Use nvfp4 model

ee16194

wzhao18 requested a review from a team June 3, 2026 17:26

wzhao18 requested review from jgangani and kedarpotdar-nv as code owners June 3, 2026 17:26

github-project-automation Bot added this to InferenceMAX Board Jun 3, 2026

Update PR number

e8ecb38

wzhao18 added the full-sweep-enabled label Jun 3, 2026

claude Bot reviewed Jun 3, 2026

View reviewed changes

wzhao18 and others added 2 commits June 4, 2026 12:00

Stage dsv4 b300

8df7b13

Merge branch 'main' into wzhao/dsv4-b300-nvfp4

41630f2

cursor Bot reviewed Jun 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Update Dsv4 B300 configs#1656

[WIP] Update Dsv4 B300 configs#1656
wzhao18 wants to merge 4 commits into
mainfrom
wzhao/dsv4-b300-nvfp4

wzhao18 commented Jun 3, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

claude Bot Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 4, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wzhao18 commented Jun 3, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

claude Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 4, 2026

Choose a reason for hiding this comment

DP-attn megamoe backend disabled

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wzhao18 commented Jun 3, 2026 •

edited by cursor Bot

Loading