Skip to content

Commit 8862360

Browse files
authored
Update glm5-fp4-b300-sglang and -mtp SGLang image to v0.5.12-cu130 (#1420)
Conc 128 breaking but pareto works
1 parent d4948f9 commit 8862360

4 files changed

Lines changed: 22 additions & 2 deletions

File tree

.github/configs/nvidia-master.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2352,7 +2352,7 @@ glm5-fp4-b200-sglang-mtp:
23522352
# does not have a B300-specific recipe, so this config reuses the existing
23532353
# GLM-5 FP4 B200 SGLang recipe as-is until B300-specific tuning is available.
23542354
glm5-fp4-b300-sglang:
2355-
image: lmsysorg/sglang:v0.5.11-cu130
2355+
image: lmsysorg/sglang:v0.5.12-cu130
23562356
model: nvidia/GLM-5-NVFP4
23572357
model-prefix: glm5
23582358
runner: b300
@@ -2373,7 +2373,7 @@ glm5-fp4-b300-sglang:
23732373
- { tp: 4, ep: 1, conc-start: 4, conc-end: 256 }
23742374

23752375
glm5-fp4-b300-sglang-mtp:
2376-
image: lmsysorg/sglang:v0.5.11-cu130
2376+
image: lmsysorg/sglang:v0.5.12-cu130
23772377
model: nvidia/GLM-5-NVFP4
23782378
model-prefix: glm5
23792379
runner: b300

benchmarks/single_node/glm5_fp4_b300.sh

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,13 @@ nvidia-smi
2424

2525
if [[ "$MODEL" != /* ]]; then hf download "$MODEL"; fi
2626

27+
# Downgrade flashinfer to the version pinned in sglang v0.5.11 to test the
28+
# trtllm batched-GEMM regression suspicion from sgl-project/sglang#25563
29+
# (suggested by @trevor-m). sglang v0.5.12's pyproject.toml moved from
30+
# flashinfer_python==0.6.8.post1 → 0.6.11.post1, and the trtllm GEMM crash
31+
# at bs=128 + EAGLE on B300 appeared in the same image bump.
32+
pip install --no-deps "flashinfer_python==0.6.8.post1" "flashinfer_cubin==0.6.8.post1"
33+
2734
SERVER_LOG=/workspace/server.log
2835
PORT=${PORT:-8888}
2936

benchmarks/single_node/glm5_fp4_b300_mtp.sh

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,12 @@ nvidia-smi
2424
if [[ "$MODEL" != /* ]]; then hf download "$MODEL"; fi
2525

2626
pip install --no-deps "transformers==5.2.0" "huggingface-hub==1.4.1"
27+
# Downgrade flashinfer to the version pinned in sglang v0.5.11 to test the
28+
# trtllm batched-GEMM regression suspicion from sgl-project/sglang#25563
29+
# (suggested by @trevor-m). sglang v0.5.12's pyproject.toml moved from
30+
# flashinfer_python==0.6.8.post1 → 0.6.11.post1, and the trtllm GEMM crash
31+
# at bs=128 + EAGLE on B300 appeared in the same image bump.
32+
pip install --no-deps "flashinfer_python==0.6.8.post1" "flashinfer_cubin==0.6.8.post1"
2733

2834
export SGL_ENABLE_JIT_DEEPGEMM=1
2935
export SGLANG_ENABLE_SPEC_V2=1

perf-changelog.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3043,6 +3043,13 @@
30433043
- "Update SGLang image from nightly-dev-cu13-20260518-c67b2870 to nightly-dev-cu13-20260519-dbac4647"
30443044
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1492
30453045

3046+
- config-keys:
3047+
- glm5-fp4-b300-sglang
3048+
- glm5-fp4-b300-sglang-mtp
3049+
description:
3050+
- "Update SGLang image from v0.5.11-cu130 to v0.5.12-cu130"
3051+
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1420
3052+
30463053
- config-keys:
30473054
- dsr1-fp4-b200-sglang-mtp
30483055
description:

0 commit comments

Comments
 (0)