Skip to content

Commit 54071ce

Browse files
Fridge003claudeyhyang201
authored
[GB300][SGLang] Enable W4A4 megamoe and bump SGLang image for dsv4-fp4-gb300-dynamo-sglang (#1382)
* [GB300][SGLang] Enable W4A4 megamoe and bump SGLang image for dsv4-fp4-gb300-dynamo-sglang - Append SGLANG_OPT_DEEPGEMM_MEGA_MOE_USE_FP4_ACTS=1 and SGLANG_OPT_DEEPGEMM_MEGA_MOE_USE_MXF4_KIND=1 wherever SGLANG_OPT_USE_DEEPGEMM_MEGA_MOE is set in the gb300 non-mtp recipes. - Update SGLang container image from lmsysorg/sglang-staging:deepseek-v4-grace-blackwell-dev to lmsysorg/sglang:nightly-dev-cu13-20260514-f7efff32. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Update perf-changelog.yaml * Update perf-changelog.yaml * Add custom_tokenizer to dsv4 non-MTP recipes for nightly image compatibility The new nightly image's transformers does not recognize deepseek_v4 model type, causing benchmark_serving.py to crash on tokenizer loading. * Update perf-changelog.yaml * Enable W4A4 megamoe FP4-acts/MXF4-kind opts on GB300 disagg recipes Adds SGLANG_OPT_DEEPGEMM_MEGA_MOE_USE_FP4_ACTS=1 and SGLANG_OPT_DEEPGEMM_MEGA_MOE_USE_MXF4_KIND=1 to the prefill/decode env blocks of the 5 GB300 disagg recipes that run with moe-a2a-backend: megamoe. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: yhyang201 <yhyang201@gmail.com>
1 parent 542a246 commit 54071ce

6 files changed

Lines changed: 26 additions & 10 deletions

benchmarks/multi_node/srt-slurm-recipes/sglang/deepseek-v4/8k1k/disagg-gb300-10p1d-dep4-dep16-14-c8192.yaml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -81,8 +81,9 @@ backend:
8181
SGLANG_OPT_SWA_SPLIT_LEAF_ON_INSERT: "1"
8282
SGLANG_OPT_SWA_EVICT_DROP_PAGE_MARGIN: "1"
8383
SGLANG_OPT_DEEPGEMM_MEGA_MOE_NUM_MAX_TOKENS_PER_RANK: "8192"
84+
SGLANG_OPT_DEEPGEMM_MEGA_MOE_USE_FP4_ACTS: "1"
85+
SGLANG_OPT_DEEPGEMM_MEGA_MOE_USE_MXF4_KIND: "1"
8486
SGLANG_OPT_USE_ONLINE_COMPRESS: "1"
85-
SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK: "0"
8687
NCCL_MNNVL_ENABLE: "1"
8788
NCCL_CUMEM_ENABLE: "1"
8889
SGLANG_MOONCAKE_CUSTOM_MEM_POOL: "True"
@@ -104,8 +105,9 @@ backend:
104105
SGLANG_OPT_SWA_SPLIT_LEAF_ON_INSERT: "1"
105106
SGLANG_OPT_SWA_EVICT_DROP_PAGE_MARGIN: "1"
106107
SGLANG_OPT_DEEPGEMM_MEGA_MOE_NUM_MAX_TOKENS_PER_RANK: "1280"
108+
SGLANG_OPT_DEEPGEMM_MEGA_MOE_USE_FP4_ACTS: "1"
109+
SGLANG_OPT_DEEPGEMM_MEGA_MOE_USE_MXF4_KIND: "1"
107110
SGLANG_OPT_USE_ONLINE_COMPRESS: "1"
108-
SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK: "0"
109111
NCCL_MNNVL_ENABLE: "1"
110112
NCCL_CUMEM_ENABLE: "1"
111113
SGLANG_MOONCAKE_CUSTOM_MEM_POOL: "True"

benchmarks/multi_node/srt-slurm-recipes/sglang/deepseek-v4/8k1k/disagg-gb300-12p1d-dep4-dep12-15-c21504.yaml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -81,8 +81,9 @@ backend:
8181
SGLANG_OPT_SWA_SPLIT_LEAF_ON_INSERT: "1"
8282
SGLANG_OPT_SWA_EVICT_DROP_PAGE_MARGIN: "1"
8383
SGLANG_OPT_DEEPGEMM_MEGA_MOE_NUM_MAX_TOKENS_PER_RANK: "8192"
84+
SGLANG_OPT_DEEPGEMM_MEGA_MOE_USE_FP4_ACTS: "1"
85+
SGLANG_OPT_DEEPGEMM_MEGA_MOE_USE_MXF4_KIND: "1"
8486
SGLANG_OPT_USE_ONLINE_COMPRESS: "1"
85-
SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK: "0"
8687
NCCL_MNNVL_ENABLE: "1"
8788
NCCL_CUMEM_ENABLE: "1"
8889
SGLANG_MOONCAKE_CUSTOM_MEM_POOL: "True"
@@ -104,8 +105,9 @@ backend:
104105
SGLANG_OPT_SWA_SPLIT_LEAF_ON_INSERT: "1"
105106
SGLANG_OPT_SWA_EVICT_DROP_PAGE_MARGIN: "1"
106107
SGLANG_OPT_DEEPGEMM_MEGA_MOE_NUM_MAX_TOKENS_PER_RANK: "1280"
108+
SGLANG_OPT_DEEPGEMM_MEGA_MOE_USE_FP4_ACTS: "1"
109+
SGLANG_OPT_DEEPGEMM_MEGA_MOE_USE_MXF4_KIND: "1"
107110
SGLANG_OPT_USE_ONLINE_COMPRESS: "1"
108-
SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK: "0"
109111
NCCL_MNNVL_ENABLE: "1"
110112
NCCL_CUMEM_ENABLE: "1"
111113
SGLANG_MOONCAKE_CUSTOM_MEM_POOL: "True"

benchmarks/multi_node/srt-slurm-recipes/sglang/deepseek-v4/8k1k/disagg-gb300-1p1d-dep4-dep16-5-c1024.yaml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -81,8 +81,9 @@ backend:
8181
SGLANG_OPT_SWA_SPLIT_LEAF_ON_INSERT: "1"
8282
SGLANG_OPT_SWA_EVICT_DROP_PAGE_MARGIN: "1"
8383
SGLANG_OPT_DEEPGEMM_MEGA_MOE_NUM_MAX_TOKENS_PER_RANK: "8192"
84+
SGLANG_OPT_DEEPGEMM_MEGA_MOE_USE_FP4_ACTS: "1"
85+
SGLANG_OPT_DEEPGEMM_MEGA_MOE_USE_MXF4_KIND: "1"
8486
SGLANG_OPT_USE_ONLINE_COMPRESS: "1"
85-
SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK: "0"
8687
NCCL_MNNVL_ENABLE: "1"
8788
NCCL_CUMEM_ENABLE: "1"
8889
SGLANG_MOONCAKE_CUSTOM_MEM_POOL: "True"
@@ -104,8 +105,9 @@ backend:
104105
SGLANG_OPT_SWA_SPLIT_LEAF_ON_INSERT: "1"
105106
SGLANG_OPT_SWA_EVICT_DROP_PAGE_MARGIN: "1"
106107
SGLANG_OPT_DEEPGEMM_MEGA_MOE_NUM_MAX_TOKENS_PER_RANK: "1280"
108+
SGLANG_OPT_DEEPGEMM_MEGA_MOE_USE_FP4_ACTS: "1"
109+
SGLANG_OPT_DEEPGEMM_MEGA_MOE_USE_MXF4_KIND: "1"
107110
SGLANG_OPT_USE_ONLINE_COMPRESS: "1"
108-
SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK: "0"
109111
NCCL_MNNVL_ENABLE: "1"
110112
NCCL_CUMEM_ENABLE: "1"
111113
SGLANG_MOONCAKE_CUSTOM_MEM_POOL: "True"

benchmarks/multi_node/srt-slurm-recipes/sglang/deepseek-v4/8k1k/disagg-gb300-4p1d-dep4-dep16-8-c1024.yaml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -81,8 +81,9 @@ backend:
8181
SGLANG_OPT_SWA_SPLIT_LEAF_ON_INSERT: "1"
8282
SGLANG_OPT_SWA_EVICT_DROP_PAGE_MARGIN: "1"
8383
SGLANG_OPT_DEEPGEMM_MEGA_MOE_NUM_MAX_TOKENS_PER_RANK: "8192"
84+
SGLANG_OPT_DEEPGEMM_MEGA_MOE_USE_FP4_ACTS: "1"
85+
SGLANG_OPT_DEEPGEMM_MEGA_MOE_USE_MXF4_KIND: "1"
8486
SGLANG_OPT_USE_ONLINE_COMPRESS: "1"
85-
SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK: "0"
8687
NCCL_MNNVL_ENABLE: "1"
8788
NCCL_CUMEM_ENABLE: "1"
8889
SGLANG_MOONCAKE_CUSTOM_MEM_POOL: "True"
@@ -104,8 +105,9 @@ backend:
104105
SGLANG_OPT_SWA_SPLIT_LEAF_ON_INSERT: "1"
105106
SGLANG_OPT_SWA_EVICT_DROP_PAGE_MARGIN: "1"
106107
SGLANG_OPT_DEEPGEMM_MEGA_MOE_NUM_MAX_TOKENS_PER_RANK: "1280"
108+
SGLANG_OPT_DEEPGEMM_MEGA_MOE_USE_FP4_ACTS: "1"
109+
SGLANG_OPT_DEEPGEMM_MEGA_MOE_USE_MXF4_KIND: "1"
107110
SGLANG_OPT_USE_ONLINE_COMPRESS: "1"
108-
SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK: "0"
109111
NCCL_MNNVL_ENABLE: "1"
110112
NCCL_CUMEM_ENABLE: "1"
111113
SGLANG_MOONCAKE_CUSTOM_MEM_POOL: "True"

benchmarks/multi_node/srt-slurm-recipes/sglang/deepseek-v4/8k1k/disagg-gb300-8p1d-dep4-dep16-12-c4096.yaml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -81,8 +81,9 @@ backend:
8181
SGLANG_OPT_SWA_SPLIT_LEAF_ON_INSERT: "1"
8282
SGLANG_OPT_SWA_EVICT_DROP_PAGE_MARGIN: "1"
8383
SGLANG_OPT_DEEPGEMM_MEGA_MOE_NUM_MAX_TOKENS_PER_RANK: "8192"
84+
SGLANG_OPT_DEEPGEMM_MEGA_MOE_USE_FP4_ACTS: "1"
85+
SGLANG_OPT_DEEPGEMM_MEGA_MOE_USE_MXF4_KIND: "1"
8486
SGLANG_OPT_USE_ONLINE_COMPRESS: "1"
85-
SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK: "0"
8687
NCCL_MNNVL_ENABLE: "1"
8788
NCCL_CUMEM_ENABLE: "1"
8889
SGLANG_MOONCAKE_CUSTOM_MEM_POOL: "True"
@@ -104,8 +105,9 @@ backend:
104105
SGLANG_OPT_SWA_SPLIT_LEAF_ON_INSERT: "1"
105106
SGLANG_OPT_SWA_EVICT_DROP_PAGE_MARGIN: "1"
106107
SGLANG_OPT_DEEPGEMM_MEGA_MOE_NUM_MAX_TOKENS_PER_RANK: "1280"
108+
SGLANG_OPT_DEEPGEMM_MEGA_MOE_USE_FP4_ACTS: "1"
109+
SGLANG_OPT_DEEPGEMM_MEGA_MOE_USE_MXF4_KIND: "1"
107110
SGLANG_OPT_USE_ONLINE_COMPRESS: "1"
108-
SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK: "0"
109111
NCCL_MNNVL_ENABLE: "1"
110112
NCCL_CUMEM_ENABLE: "1"
111113
SGLANG_MOONCAKE_CUSTOM_MEM_POOL: "True"

perf-changelog.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2481,6 +2481,12 @@
24812481
- "Turn to tp=4 for best perf"
24822482
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1375
24832483

2484+
- config-keys:
2485+
- dsv4-fp4-gb300-dynamo-sglang
2486+
description:
2487+
- "Enable W4A4 (MXFP4) megamoe by appending w4a4 related environ flags when megamoe is enabled"
2488+
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1382
2489+
24842490
- config-keys:
24852491
- dsr1-fp8-b200-sglang-mtp
24862492
description:

0 commit comments

Comments
 (0)