Skip to content

[NVIDIA] Enable GPTOSS GB200 DISAGG #232

Closed
jgangani wants to merge 15 commits into
mainfrom
jgangani_gptoss_disagg
Closed

[NVIDIA] Enable GPTOSS GB200 DISAGG #232
jgangani wants to merge 15 commits into
mainfrom
jgangani_gptoss_disagg

Conversation

@jgangani
Copy link
Copy Markdown
Collaborator

This MR enables disaggregation for GPTOSS on GB200.

Modified files to add GPTOSS to Disagg runners and workflow.

Successful tests here:
https://github.com/InferenceMAX/InferenceMAX/actions/runs/19353241086/job/55369372877

@functionstackx
Copy link
Copy Markdown
Collaborator

thanks for this contribution @jgangani

Can you explain what this means? is all of the datapoints just 4 gpus for prefill only and then 4 gpus for decode only? if not, can u explain the parallelism config & the conc for each datapoint?

/submit_disagg.sh mtp=off tp 1 1 1 512 20000 "0.9" 0 0 "128 256 512"
                    ./submit_disagg.sh mtp=off tp 1 1 2 1024 20000 "0.9" 0 0 "64 128 256"
                    ./submit_disagg.sh mtp=off tep 1 1 2 1024 20000 "0.9" 0 0 "64 256"
                    ./submit_disagg.sh mtp=off tp 1 1 4 2048 20000 "0.9" 0 0 "8 16 32 64 128"
                    ./submit_disagg.sh mtp=off tp 1 1 8 2048 20000 "0.9" 0 0 "1 2 4 8 16"

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enables disaggregation support for the GPT-OSS 120B model on GB200 hardware. The changes add GPT-OSS as a supported model alongside the existing DeepSeek-R1 configurations, implementing model-specific benchmark configurations and updating workflows to handle the new model.

Key Changes:

  • Added GPT-OSS model detection and configuration in the GB200 launch script
  • Implemented GPT-OSS-specific benchmark configurations with 8k/1k input/output sequence lengths
  • Updated workflows to support GPT-OSS model selection and dynamic model prefix generation

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
runners/launch_gb200-nv.sh Added GPT-OSS model path configuration, branch checkout logic, and model-specific benchmark parameters for disaggregation testing
.github/workflows/gb200-tests.yml Added GPT-OSS to model options and implemented dynamic model prefix mapping for cleaner experiment naming
.github/workflows/full-sweep-8k1k-scheduler.yml Added GPT-OSS configuration matrix entry and updated result collection dependencies
.github/workflows/benchmark-multinode-tmpl.yml Updated filename parsing patterns to match new hyphenated format (gpus-N, ctx-N, gen-N) and added MODEL environment variable

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

if [[ $MODEL == *"gpt-oss"* ]]; then
# GPT-OSS specific benchmark configurations
if [ "$isl" = "8192" ] && [ "$osl" = "1024" ]; then

Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary blank line with trailing whitespace. Remove this line or the trailing spaces.

Suggested change

Copilot uses AI. Check for mistakes.

# Find all result subdirectories in this logs directory
RESULT_SUBDIRS=$(find "$LOGS_DIR" -name "ctx*_gen*_[td]ep*_batch*_eplb*_mtp*" -type d)
RESULT_SUBDIRS=$(find "$LOGS_DIR" -name "ctx*_gen*_*_batch*_eplb*_mtp*" -type d)
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The find pattern 'ctx*gen*_batch_eplb*_mtp*' uses a generic wildcard () in the middle which may match unintended directory names. Consider using a more specific pattern like 'ctxgen*[td]ep*_batch*_eplb*_mtp*' or 'ctx*gen*{tp,tep,dep}_batch_eplb*_mtp*' to match only valid parallelism strategies (tp/tep/dep).

Suggested change
RESULT_SUBDIRS=$(find "$LOGS_DIR" -name "ctx*_gen*_*_batch*_eplb*_mtp*" -type d)
RESULT_SUBDIRS=$(find "$LOGS_DIR" -name "ctx*_gen*_{tp,tep,dep}_batch*_eplb*_mtp*" -type d)

Copilot uses AI. Check for mistakes.
"framework": "dynamo-sglang",
"mtp": "off",
}
# GPTOSS
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Inconsistent comment formatting. The DeepSeek comment on line 93 uses '# DSR1' while this uses '# GPTOSS' with different indentation. Align the comment indentation with line 93 for consistency.

Suggested change
# GPTOSS
# GPTOSS

Copilot uses AI. Check for mistakes.
@functionstackx
Copy link
Copy Markdown
Collaborator

also @jgangani please merge this in main branch/release candidate instead of doing an side branch ai-dynamo/dynamo@release/0.5.1-rc0.20251105...jthomson04/gpt-oss-disagg-slurm

@jgangani
Copy link
Copy Markdown
Collaborator Author

jgangani commented Nov 14, 2025

thanks for this contribution @jgangani

Can you explain what this means? is all of the datapoints just 4 gpus for prefill only and then 4 gpus for decode only? if not, can u explain the parallelism config & the conc for each datapoint?

/submit_disagg.sh mtp=off tp 1 1 1 512 20000 "0.9" 0 0 "128 256 512"
                    ./submit_disagg.sh mtp=off tp 1 1 2 1024 20000 "0.9" 0 0 "64 128 256"
                    ./submit_disagg.sh mtp=off tep 1 1 2 1024 20000 "0.9" 0 0 "64 256"
                    ./submit_disagg.sh mtp=off tp 1 1 4 2048 20000 "0.9" 0 0 "8 16 32 64 128"
                    ./submit_disagg.sh mtp=off tp 1 1 8 2048 20000 "0.9" 0 0 "1 2 4 8 16"

Following is the order:
<gen_server_config> <ctx_num> <gen_num_servers> <gen_tp_size> <gen_bs <gen_max_num_tokens>.
1 gpu for prefill. 2/4/8 for decode.

@jgangani
Copy link
Copy Markdown
Collaborator Author

also @jgangani please merge this in main branch/release candidate instead of doing an side branch ai-dynamo/dynamo@release/0.5.1-rc0.20251105...jthomson04/gpt-oss-disagg-slurm

Yes, that was the goal. wanted to test out the MR before merging this into release branch. Will update.

@functionstackx
Copy link
Copy Markdown
Collaborator

@jgangani thanks! Can u please enable 1k/8k and 1k/1k on gptoss gb200 in this PR too? Thanks!

@jgangani
Copy link
Copy Markdown
Collaborator Author

jgangani commented Nov 15, 2025

@functionstackx Switched to dynamo release branch.

@jgangani
Copy link
Copy Markdown
Collaborator Author

jgangani commented Nov 16, 2025

@jgangani thanks! Can u please enable 1k/8k and 1k/1k on gptoss gb200 in this PR too? Thanks!

I am working on 1k1k DISAGG pareto configs next. 1k8k DISAGG probably will be on par with AGG since it is predominantly doing just decode. Hence, I recommend we merge this MR first. does it make sense?

@functionstackx
Copy link
Copy Markdown
Collaborator

functionstackx commented Nov 16, 2025

if u can submit gb200 agg for 1k/8k in this PR too

@cquil11
Copy link
Copy Markdown
Collaborator

cquil11 commented Dec 3, 2025

we're gonna hold off on this til #251 gets merged this week

@cquil11
Copy link
Copy Markdown
Collaborator

cquil11 commented Dec 7, 2025

@jgangani so sorry brother but can you please rebase with main following the convention set forth in https://github.com/InferenceMAX/InferenceMAX/pull/251 ?

@jgangani
Copy link
Copy Markdown
Collaborator Author

jgangani commented Dec 7, 2025

Yes, I am working on it. Will open another MR based off post-251 merge.

@cquil11
Copy link
Copy Markdown
Collaborator

cquil11 commented Dec 17, 2025

@jgangani hi! where are we on this?

@jgangani
Copy link
Copy Markdown
Collaborator Author

@jgangani hi! where are we on this?

GB200 DISAGG for 8k1k is ready with refactored code. I can create an MR right away if need be. Still working through 1k1k config exploration. I will need few more days for 1k1k

@cquil11
Copy link
Copy Markdown
Collaborator

cquil11 commented Dec 18, 2025

@jgangani ok, no worries

@functionstackx
Copy link
Copy Markdown
Collaborator

happy new year @jgangani , what is the eta on this?

@jgangani
Copy link
Copy Markdown
Collaborator Author

jgangani commented Jan 5, 2026

happy new year @jgangani , what is the eta on this?

Happy new year! @cquil11 I have the branch ready for 1k1k, however, GB200 runners are not picking up the jobs, they get stuck at "Cleaning up resources" for hours and then get canceled. Can you take a quick look to see if you have a fix? https://github.com/InferenceMAX/InferenceMAX/actions/runs/20614644592/job/59388613154

@jgangani
Copy link
Copy Markdown
Collaborator Author

jgangani commented Jan 5, 2026

happy new year @jgangani , what is the eta on this?

Happy new year! @cquil11 I have the branch ready for 1k1k, however, GB200 runners are not picking up the jobs, they get stuck at "Cleaning up resources" for hours and then get canceled. Can you take a quick look to see if you have a fix? https://github.com/InferenceMAX/InferenceMAX/actions/runs/20614644592/job/59388613154

nvm. Rebase seems to have fixed it. There was an MR to remove sudo from benchmark_multinode yaml.

@jgangani jgangani closed this Jan 6, 2026
@github-project-automation github-project-automation Bot moved this from In Progress to Done in InferenceMAX Board Jan 6, 2026
@functionstackx functionstackx deleted the jgangani_gptoss_disagg branch January 11, 2026 19:50
@cquil11 cquil11 changed the title Enable GPTOSS GB200 DISAGG [NVIDIA] Enable GPTOSS GB200 DISAGG Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Development

Successfully merging this pull request may close these issues.

4 participants