Skip to content

Commit 21a4ab0

Browse files
Update qwen3.5-fp8-b300-sglang (+mtp) SGLang image to v0.5.12-cu130 (#1451)
1 parent 5c9f96a commit 21a4ab0

4 files changed

Lines changed: 11 additions & 2 deletions

File tree

.github/configs/nvidia-master.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2416,7 +2416,7 @@ qwen3.5-fp8-b200-sglang-mtp:
24162416

24172417

24182418
qwen3.5-fp8-b300-sglang-mtp:
2419-
image: lmsysorg/sglang:v0.5.11-cu130
2419+
image: lmsysorg/sglang:v0.5.12-cu130
24202420
model: Qwen/Qwen3.5-397B-A17B-FP8
24212421
model-prefix: qwen3.5
24222422
runner: b300
@@ -2435,7 +2435,7 @@ qwen3.5-fp8-b300-sglang-mtp:
24352435
- { tp: 4, ep: 1, conc-start: 4, conc-end: 256, spec-decoding: mtp }
24362436

24372437
qwen3.5-fp8-b300-sglang:
2438-
image: lmsysorg/sglang:v0.5.10.post1-cu130
2438+
image: lmsysorg/sglang:v0.5.12-cu130
24392439
model: Qwen/Qwen3.5-397B-A17B-FP8
24402440
model-prefix: qwen3.5
24412441
runner: b300

benchmarks/single_node/qwen3.5_fp8_b300.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ PYTHONNOUSERSITE=1 python3 -m sglang.launch_server --model-path=$MODEL --host=0.
4040
--kv-cache-dtype fp8_e4m3 \
4141
--mamba-ssm-dtype bfloat16 \
4242
--attention-backend trtllm_mha \
43+
--mm-attention-backend triton_attn \
4344
--moe-runner-backend flashinfer_trtllm \
4445
--cuda-graph-max-bs $CONC \
4546
--max-running-requests $CONC \

benchmarks/single_node/qwen3.5_fp8_b300_mtp.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ SGLANG_ENABLE_SPEC_V2=1 PYTHONNOUSERSITE=1 python3 -m sglang.launch_server --mod
4040
--kv-cache-dtype fp8_e4m3 \
4141
--mamba-ssm-dtype bfloat16 \
4242
--attention-backend trtllm_mha \
43+
--mm-attention-backend triton_attn \
4344
--moe-runner-backend flashinfer_trtllm \
4445
--cuda-graph-max-bs $CONC \
4546
--max-running-requests $CONC \

perf-changelog.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3068,3 +3068,10 @@
30683068
description:
30693069
- "Bump image to rocm/sgl-dev:rocm720-mi35x-8c3b5aa-20260521-DSv4"
30703070
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1548
3071+
3072+
- config-keys:
3073+
- qwen3.5-fp8-b300-sglang
3074+
- qwen3.5-fp8-b300-sglang-mtp
3075+
description:
3076+
- "Update SGLang image from v0.5.10.post1-cu130 / v0.5.11-cu130 (30d old) to v0.5.12-cu130"
3077+
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1451

0 commit comments

Comments
 (0)