[NVIDIA] GPTOSS GB200 DISAGG Configurations + Assign EP explicitly for AGG by jgangani · Pull Request #387 · SemiAnalysisAI/InferenceX

jgangani · 2026-01-06T02:08:10Z

Adds GPTOSS GB200 DISAGG configurations for 1k1k and 8k1k workloads
Explicitly assign EP in nvidia master file for AGG in order to correctly run DP attention with MoE EP.

Note

Introduces multi-node DISAGG benchmarking for GPT-OSS on GB200 and fixes explicit EP settings for DP-attention on B200 TRT.

Adds gptoss-fp4-gb200-dynamo-trt to nvidia-master.yaml with 1k1k and 8k1k search spaces (prefill/decode worker counts, TP/EP, DP-attn, conc-lists, and token/batch/mem settings)
Updates gptoss-fp4-b200-trt search-space to explicitly set ep: tp for DP-attn configs and adjust concurrency ranges
Enhances benchmarks/gptoss_fp4_b200_trt_slurm.sh to conditionally configure MoE AllToAll: disable when EP_SIZE=1, use MNNVL when EP_SIZE>1
Adds benchmarks/gptoss_fp4_gb200_dynamo-trt_slurm.sh to clone/run Dynamo TRT DISAGG sweeps and submit SLURM jobs
Updates runners/launch_gb200-nv.sh for dynamo-trt (GPT-OSS model path/served name) and broadens result directory matching
Records changes in perf-changelog.yaml

^{Written by Cursor Bugbot for commit b011864. This will update automatically on new commits. Configure here.}

…ith the master file refactor MR 2. Add GB200 DISAGG configs Signed-off-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com>

- Explicitly assign EP=TP for DP attention AGG candidates. EP was defaulted=1 during multinode refactor Signed-off-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com>

gemini-code-assist · 2026-01-06T02:08:30Z

Summary of Changes

Hello @jgangani, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces significant updates to the configuration and benchmarking infrastructure for GPTOSS models on GB200 systems. It primarily focuses on enabling and optimizing disaggregated inference configurations using Dynamo-TRT, alongside refining existing aggregated configurations by explicitly managing Expert Parallelism for Data Parallel attention. These changes aim to improve the flexibility and correctness of model deployment and benchmarking across various parallelism strategies.

Highlights

New GPTOSS DISAGG Configurations: Added new configurations for GPTOSS GB200 DISAGG setups, specifically for 1k1k and 8k1k workloads, enabling disaggregated inference with Dynamo-TRT.
Explicit EP Assignment for AGG: Explicitly assigned Expert Parallelism (EP) in the NVIDIA master configuration file for aggregated (AGG) setups to ensure correct execution of Data Parallel (DP) attention with Mixture of Experts (MoE) EP.
All2All Method Control: Implemented conditional logic to disable the All2All communication method for MoE Tensor Parallel (TP) when EP_SIZE is 1, while retaining it for Data Parallel Expert Parallelism (DEP) configurations.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds new disaggregated configurations for GPT-OSS on GB200 and updates existing configurations to explicitly set the expert parallelism (EP) size for DP attention, which is a good fix. The changes look mostly good, but I have a few suggestions to improve maintainability and robustness.

Specifically, I've pointed out some opportunities to reduce duplication in the YAML configuration using anchors, noted a fragile dependency on a personal git branch in a new benchmark script, and suggested some minor consistency improvements in shell scripts. I've also left reminders for placeholders like TODOs and XXX for the PR link that should be addressed before merging.

- Addressed few gemini code review comments Signed-off-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com>

functionstackx · 2026-01-06T03:43:09Z

        - "DECODE_NODES=8"
+
+gptoss-fp4-gb200-dynamo-trt:
+  image: jwillthomson/dynamo-trtllm-1.2.0rc2-min-tokens-fix-v2


plz use official images

Updated with official image with the latest commit.

Signed-off-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com>

cursor · 2026-01-06T08:10:58Z

+ntasks_per_node=4
+
+gen_nodes=$(((DECODE_TP + 3)/4 * DECODE_NUM_WORKERS))
+total_nodes=$((PREFILL_NUM_WORKERS + gen_nodes))


Node calculation formula over-allocates resources for multi-worker configs

The gen_nodes formula (DECODE_TP + 3)/4 * DECODE_NUM_WORKERS allocates one node per worker when TP < 4, but the YAML configuration expects workers to share nodes. For example, the "D:4xTP2" config has num-worker: 4, tp: 2, and DECODE_NODES=2 in additional-settings. The formula calculates gen_nodes=4 (one node per worker), but only 2 nodes are needed (8 GPUs total fits on 2 nodes). This causes the sbatch request to allocate 5 total nodes instead of the expected 3, wasting cluster resources.

Additional Locations (1)

.github/configs/nvidia-master.yaml#L1099-L1109

cquil11 · 2026-01-06T16:03:51Z

+    - gptoss-fp4-b200-trt
+  description:
+    - Explicitly add EP=TP for DP attention configs. Multinode Refactor inadvertently changed default EP=1
+    - Add GPTOSS DISAGG configurations for GB200 and B200


not disagg for B200 right?

also pls specify "GPTOSS DISAGG for 1k1k and 8k1k"

trying to make these slightly more detailed since they are now displayed on inferencemax dot ai

Thanks for the catch. Removed B200 from DISAGG comment in latest commit.

cquil11 · 2026-01-06T16:05:14Z

@jgangani left some comments
may we close https://github.com/InferenceMAX/InferenceMAX/pull/232 ?

cquil11

lgtm once comments are addressed!

Signed-off-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com>

jgangani · 2026-01-06T20:43:20Z

@jgangani left some comments may we close #232 ?

closed.

Signed-off-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com>

jgangani · 2026-01-07T03:29:57Z

@cquil11 can you please merge this?

Jatin Gangani added 2 commits January 5, 2026 14:11

1. Update B200 AGG configs to EP. These were mistakenly set to EP=1 w…

0feefdb

…ith the master file refactor MR 2. Add GB200 DISAGG configs Signed-off-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com>

- Add GPTOSS DISAGG configurations

7aae8cd

- Explicitly assign EP=TP for DP attention AGG candidates. EP was defaulted=1 during multinode refactor Signed-off-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com>

jgangani requested a review from a team as a code owner January 6, 2026 02:08

github-project-automation Bot added this to InferenceMAX Board Jan 6, 2026

jgangani changed the title ~~Add GPTOSS DISAGG Configurations + Assign EP explicitly for AGG~~ Draft: Add GPTOSS DISAGG Configurations + Assign EP explicitly for AGG Jan 6, 2026

gemini-code-assist Bot reviewed Jan 6, 2026

View reviewed changes

- Update PR # in perfchangelog

4752c9a

- Addressed few gemini code review comments Signed-off-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com>

jgangani requested review from cquil11 and functionstackx January 6, 2026 02:17

cursor Bot reviewed Jan 6, 2026

View reviewed changes

Comment thread runners/launch_gb200-nv.sh

cquil11 added the sweep-enabled label Jan 6, 2026

functionstackx reviewed Jan 6, 2026

View reviewed changes

jgangani force-pushed the jgangani_gptoss_gb200_b200_1k1k branch 2 times, most recently from b243a20 to b53f844 Compare January 6, 2026 08:03

Update with dynamo release branch and release container

5c93796

Signed-off-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com>

jgangani force-pushed the jgangani_gptoss_gb200_b200_1k1k branch from b53f844 to 5c93796 Compare January 6, 2026 08:03

cursor Bot reviewed Jan 6, 2026

View reviewed changes

cquil11 reviewed Jan 6, 2026

View reviewed changes

cquil11 approved these changes Jan 6, 2026

View reviewed changes

cquil11 moved this to In Progress in InferenceMAX Board Jan 6, 2026

Updated perfchangelog description based on PR feedback

e07fcc9

Signed-off-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com>

jgangani changed the title ~~Draft: Add GPTOSS DISAGG Configurations + Assign EP explicitly for AGG~~ GPTOSS GB200 DISAGG Configurations + Assign EP explicitly for AGG Jan 6, 2026

cursor Bot reviewed Jan 6, 2026

View reviewed changes

Comment thread runners/launch_gb200-nv.sh Outdated

jgangani changed the title ~~GPTOSS GB200 DISAGG Configurations + Assign EP explicitly for AGG~~ Draft: GPTOSS GB200 DISAGG Configurations + Assign EP explicitly for AGG Jan 6, 2026

update postprocessing pattern matching

b011864

Signed-off-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com>

jgangani changed the title ~~Draft: GPTOSS GB200 DISAGG Configurations + Assign EP explicitly for AGG~~ GPTOSS GB200 DISAGG Configurations + Assign EP explicitly for AGG Jan 7, 2026

cquil11 merged commit 645b97a into main Jan 7, 2026
57 checks passed

cquil11 deleted the jgangani_gptoss_gb200_b200_1k1k branch January 7, 2026 15:36

github-project-automation Bot moved this from In Progress to Done in InferenceMAX Board Jan 7, 2026

cquil11 added the NVIDIA label Apr 8, 2026

cquil11 changed the title ~~GPTOSS GB200 DISAGG Configurations + Assign EP explicitly for AGG~~ [NVIDIA] GPTOSS GB200 DISAGG Configurations + Assign EP explicitly for AGG Apr 8, 2026

Conversation

jgangani commented Jan 6, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot commented Jan 6, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

functionstackx Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

jgangani Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jan 6, 2026

Choose a reason for hiding this comment

Node calculation formula over-allocates resources for multi-worker configs

Uh oh!

cquil11 Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

cquil11 Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

cquil11 Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

jgangani Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

cquil11 commented Jan 6, 2026

Uh oh!

cquil11 left a comment

Choose a reason for hiding this comment

Uh oh!

jgangani commented Jan 6, 2026

Uh oh!

Uh oh!

jgangani commented Jan 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jgangani commented Jan 6, 2026 •

edited by cursor Bot

Loading