Skip to content

[NVIDIA] GPTOSS GB200 DISAGG Configurations + Assign EP explicitly for AGG#387

Merged
cquil11 merged 6 commits into
mainfrom
jgangani_gptoss_gb200_b200_1k1k
Jan 7, 2026
Merged

[NVIDIA] GPTOSS GB200 DISAGG Configurations + Assign EP explicitly for AGG#387
cquil11 merged 6 commits into
mainfrom
jgangani_gptoss_gb200_b200_1k1k

Conversation

@jgangani
Copy link
Copy Markdown
Collaborator

@jgangani jgangani commented Jan 6, 2026

  1. Adds GPTOSS GB200 DISAGG configurations for 1k1k and 8k1k workloads
  2. Explicitly assign EP in nvidia master file for AGG in order to correctly run DP attention with MoE EP.

Note

Introduces multi-node DISAGG benchmarking for GPT-OSS on GB200 and fixes explicit EP settings for DP-attention on B200 TRT.

  • Adds gptoss-fp4-gb200-dynamo-trt to nvidia-master.yaml with 1k1k and 8k1k search spaces (prefill/decode worker counts, TP/EP, DP-attn, conc-lists, and token/batch/mem settings)
  • Updates gptoss-fp4-b200-trt search-space to explicitly set ep: tp for DP-attn configs and adjust concurrency ranges
  • Enhances benchmarks/gptoss_fp4_b200_trt_slurm.sh to conditionally configure MoE AllToAll: disable when EP_SIZE=1, use MNNVL when EP_SIZE>1
  • Adds benchmarks/gptoss_fp4_gb200_dynamo-trt_slurm.sh to clone/run Dynamo TRT DISAGG sweeps and submit SLURM jobs
  • Updates runners/launch_gb200-nv.sh for dynamo-trt (GPT-OSS model path/served name) and broadens result directory matching
  • Records changes in perf-changelog.yaml

Written by Cursor Bugbot for commit b011864. This will update automatically on new commits. Configure here.

Jatin Gangani added 2 commits January 5, 2026 14:11
…ith the master file refactor MR

2. Add GB200 DISAGG configs

Signed-off-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com>
- Explicitly assign EP=TP for DP attention AGG candidates. EP was defaulted=1 during multinode refactor

Signed-off-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com>
@jgangani jgangani requested a review from a team as a code owner January 6, 2026 02:08
@jgangani jgangani changed the title Add GPTOSS DISAGG Configurations + Assign EP explicitly for AGG Draft: Add GPTOSS DISAGG Configurations + Assign EP explicitly for AGG Jan 6, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @jgangani, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces significant updates to the configuration and benchmarking infrastructure for GPTOSS models on GB200 systems. It primarily focuses on enabling and optimizing disaggregated inference configurations using Dynamo-TRT, alongside refining existing aggregated configurations by explicitly managing Expert Parallelism for Data Parallel attention. These changes aim to improve the flexibility and correctness of model deployment and benchmarking across various parallelism strategies.

Highlights

  • New GPTOSS DISAGG Configurations: Added new configurations for GPTOSS GB200 DISAGG setups, specifically for 1k1k and 8k1k workloads, enabling disaggregated inference with Dynamo-TRT.
  • Explicit EP Assignment for AGG: Explicitly assigned Expert Parallelism (EP) in the NVIDIA master configuration file for aggregated (AGG) setups to ensure correct execution of Data Parallel (DP) attention with Mixture of Experts (MoE) EP.
  • All2All Method Control: Implemented conditional logic to disable the All2All communication method for MoE Tensor Parallel (TP) when EP_SIZE is 1, while retaining it for Data Parallel Expert Parallelism (DEP) configurations.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds new disaggregated configurations for GPT-OSS on GB200 and updates existing configurations to explicitly set the expert parallelism (EP) size for DP attention, which is a good fix. The changes look mostly good, but I have a few suggestions to improve maintainability and robustness.

Specifically, I've pointed out some opportunities to reduce duplication in the YAML configuration using anchors, noted a fragile dependency on a personal git branch in a new benchmark script, and suggested some minor consistency improvements in shell scripts. I've also left reminders for placeholders like TODOs and XXX for the PR link that should be addressed before merging.

Comment thread benchmarks/gptoss_fp4_gb200_dynamo-trt_slurm.sh Outdated
Comment thread .github/configs/nvidia-master.yaml
Comment thread .github/configs/nvidia-master.yaml
Comment thread .github/configs/nvidia-master.yaml Outdated
Comment thread .github/configs/nvidia-master.yaml Outdated
Comment thread benchmarks/gptoss_fp4_gb200_dynamo-trt_slurm.sh Outdated
Comment thread benchmarks/gptoss_fp4_gb200_dynamo-trt_slurm.sh Outdated
Comment thread perf-changelog.yaml Outdated
- Addressed few gemini code review comments

Signed-off-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com>
Comment thread runners/launch_gb200-nv.sh
Comment thread .github/configs/nvidia-master.yaml Outdated
- "DECODE_NODES=8"

gptoss-fp4-gb200-dynamo-trt:
image: jwillthomson/dynamo-trtllm-1.2.0rc2-min-tokens-fix-v2
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz use official images

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with official image with the latest commit.

@jgangani jgangani force-pushed the jgangani_gptoss_gb200_b200_1k1k branch 2 times, most recently from b243a20 to b53f844 Compare January 6, 2026 08:03
Signed-off-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com>
@jgangani jgangani force-pushed the jgangani_gptoss_gb200_b200_1k1k branch from b53f844 to 5c93796 Compare January 6, 2026 08:03
ntasks_per_node=4

gen_nodes=$(((DECODE_TP + 3)/4 * DECODE_NUM_WORKERS))
total_nodes=$((PREFILL_NUM_WORKERS + gen_nodes))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Node calculation formula over-allocates resources for multi-worker configs

The gen_nodes formula (DECODE_TP + 3)/4 * DECODE_NUM_WORKERS allocates one node per worker when TP < 4, but the YAML configuration expects workers to share nodes. For example, the "D:4xTP2" config has num-worker: 4, tp: 2, and DECODE_NODES=2 in additional-settings. The formula calculates gen_nodes=4 (one node per worker), but only 2 nodes are needed (8 GPUs total fits on 2 nodes). This causes the sbatch request to allocate 5 total nodes instead of the expected 3, wasting cluster resources.

Additional Locations (1)

Fix in Cursor Fix in Web

Comment thread perf-changelog.yaml Outdated
- gptoss-fp4-b200-trt
description:
- Explicitly add EP=TP for DP attention configs. Multinode Refactor inadvertently changed default EP=1
- Add GPTOSS DISAGG configurations for GB200 and B200
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not disagg for B200 right?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also pls specify "GPTOSS DISAGG for 1k1k and 8k1k"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trying to make these slightly more detailed since they are now displayed on inferencemax dot ai

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch. Removed B200 from DISAGG comment in latest commit.

@cquil11
Copy link
Copy Markdown
Collaborator

cquil11 commented Jan 6, 2026

@jgangani left some comments
may we close https://github.com/InferenceMAX/InferenceMAX/pull/232 ?

Copy link
Copy Markdown
Collaborator

@cquil11 cquil11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm once comments are addressed!

@cquil11 cquil11 moved this to In Progress in InferenceMAX Board Jan 6, 2026
Signed-off-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com>
@jgangani
Copy link
Copy Markdown
Collaborator Author

jgangani commented Jan 6, 2026

@jgangani left some comments may we close #232 ?

closed.

@jgangani jgangani changed the title Draft: Add GPTOSS DISAGG Configurations + Assign EP explicitly for AGG GPTOSS GB200 DISAGG Configurations + Assign EP explicitly for AGG Jan 6, 2026
Comment thread runners/launch_gb200-nv.sh Outdated
@jgangani jgangani changed the title GPTOSS GB200 DISAGG Configurations + Assign EP explicitly for AGG Draft: GPTOSS GB200 DISAGG Configurations + Assign EP explicitly for AGG Jan 6, 2026
Signed-off-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com>
@jgangani
Copy link
Copy Markdown
Collaborator Author

jgangani commented Jan 7, 2026

@cquil11 can you please merge this?

@jgangani jgangani changed the title Draft: GPTOSS GB200 DISAGG Configurations + Assign EP explicitly for AGG GPTOSS GB200 DISAGG Configurations + Assign EP explicitly for AGG Jan 7, 2026
@cquil11 cquil11 merged commit 645b97a into main Jan 7, 2026
57 checks passed
@cquil11 cquil11 deleted the jgangani_gptoss_gb200_b200_1k1k branch January 7, 2026 15:36
@github-project-automation github-project-automation Bot moved this from In Progress to Done in InferenceMAX Board Jan 7, 2026
@cquil11 cquil11 added the NVIDIA label Apr 8, 2026
@cquil11 cquil11 changed the title GPTOSS GB200 DISAGG Configurations + Assign EP explicitly for AGG [NVIDIA] GPTOSS GB200 DISAGG Configurations + Assign EP explicitly for AGG Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

3 participants