Skip to content

[WIP] Update Dsv4 B300 configs#1656

Open
wzhao18 wants to merge 4 commits into
mainfrom
wzhao/dsv4-b300-nvfp4
Open

[WIP] Update Dsv4 B300 configs#1656
wzhao18 wants to merge 4 commits into
mainfrom
wzhao/dsv4-b300-nvfp4

Conversation

@wzhao18
Copy link
Copy Markdown
Collaborator

@wzhao18 wzhao18 commented Jun 3, 2026

Note

Low Risk
Benchmark and runner config only; no auth, serving API, or production inference path changes.

Overview
Switches the dsv4-fp4-b300-vllm recipe to the nvidia/DeepSeek-V4-Pro-NVFP4 checkpoint instead of the upstream deepseek-ai/DeepSeek-V4-Pro weights, and registers DeepSeek-V4-Pro-NVFP4 in the B300 runner’s pre-staged model list so jobs can load from scratch without a fresh HF pull.

The single-node benchmark script no longer passes --moe-backend deep_gemm_mega_moe when data-parallel attention is enabled—that block is commented out, so DP layouts run with default MoE backend behavior for this NVFP4 setup.

Reviewed by Cursor Bugbot for commit 41630f2. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 3, 2026

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Comment thread perf-changelog.yaml Outdated
Comment on lines +3434 to +3438
- config-keys:
- dsv4-fp4-b300-vllm
description:
- "Update B300 dsv4 image to nvfp4"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1652
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Two issues in the new perf-changelog entry for dsv4-fp4-b300-vllm: (1) The pr-link references #1652, but that's the MiniMax-M2.5 PR (the entry immediately above) — this is a copy-paste, the link should point to this PR (#1656). (2) The description says "Update B300 dsv4 image to nvfp4", but the diff only changes the model field; the image (vllm/vllm-openai:v0.22.0) is unchanged, so it should say something like "Update B300 dsv4 model to NVFP4 checkpoint".

Extended reasoning...

Bug 1: Wrong PR link

The new changelog entry at line 3438 sets pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1652. However, PR #1652 is [B300][vLLM] Add MiniMax-M2.5 FP4 disagg Dynamo configs (already merged at commit 7d4063d) — a completely different PR. The current PR is #1656 ("[WIP] Update Dsv4 B300 configs"). The entry immediately above (line 3432, for minimaxm2.5-fp4-b300-dynamo-vllm) correctly references #1652, so this is almost certainly a copy-paste error from the adjacent entry.

Bug 2: Misleading description ("image" vs "model")

The description says "Update B300 dsv4 image to nvfp4". But the diff in .github/configs/nvidia-master.yaml only changes the model field:

-  model: deepseek-ai/DeepSeek-V4-Pro
+  model: nvidia/DeepSeek-V4-Pro-NVFP4

The image field (vllm/vllm-openai:v0.22.0) is unchanged. In perf-changelog.yaml "image" consistently refers to the Docker container image — e.g. line 3430 of the same file says "Image: vllm/vllm-openai:v0.20.1". The PR's own Cursor Bugbot summary even acknowledges this: "Image, search space, and other recipe fields are unchanged." Calling the model swap an "image" change is misleading.

Step-by-step proof

  1. The PR metadata shows this is PR [WIP] Update Dsv4 B300 configs #1656, titled "[WIP] Update Dsv4 B300 configs".
  2. Recent commit history (7d4063d) confirms PR [B300][vLLM] Add MiniMax-M2.5 FP4 disagg Dynamo configs #1652 is a separate, already-merged PR titled "[B300][vLLM] Add MiniMax-M2.5 FP4 disagg Dynamo configs".
  3. In perf-changelog.yaml, the entry at line 3432 (the MiniMax-M2.5 entry) legitimately uses pr-link: .../pull/1652.
  4. The newly added entry at line 3438 (this PR's entry) uses the same pr-link: .../pull/1652 — duplicating the link from the line above instead of referencing [WIP] Update Dsv4 B300 configs #1656.
  5. Reading the diff in .github/configs/nvidia-master.yaml, only model: changes; image: stays at vllm/vllm-openai:v0.22.0. So "Update B300 dsv4 image to nvfp4" describes a change that did not happen, and omits the change that did.

Suggested fix

- config-keys:
    - dsv4-fp4-b300-vllm
  description:
    - "Update B300 dsv4 model to NVFP4 checkpoint (nvidia/DeepSeek-V4-Pro-NVFP4)"
  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1656

Impact

Documentation-only — no runtime behavior is affected. But the changelog is the canonical record for tracing perf changes back to their PR, and the incorrect link points readers to an unrelated PR with no context for this model swap.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 3, 2026

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 41630f2. Configure here.

fi
# if [ "${DP_ATTENTION}" = "true" ]; then
# MOE_ARGS=(--moe-backend deep_gemm_mega_moe)
# fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DP-attn megamoe backend disabled

Medium Severity

With DP_ATTENTION=true, the script no longer passes --moe-backend deep_gemm_mega_moe, but dsv4-fp4-b300-vllm still schedules high-concurrency dp-attn/ep points. That diverges from the prior B300 pareto recipe and from dsv4_fp4_b300_vllm_mtp.sh / B200 vLLM siblings, so those runs may not match the intended serving path.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 41630f2. Configure here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant