Skip to content

Commit 5481fbf

Browse files
Revert "Update glm5-fp4-b300-sglang and -mtp SGLang image to v0.5.12-cu130 (#…" (#1563)
This reverts commit 8862360.
1 parent 8862360 commit 5481fbf

4 files changed

Lines changed: 2 additions & 22 deletions

File tree

.github/configs/nvidia-master.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2352,7 +2352,7 @@ glm5-fp4-b200-sglang-mtp:
23522352
# does not have a B300-specific recipe, so this config reuses the existing
23532353
# GLM-5 FP4 B200 SGLang recipe as-is until B300-specific tuning is available.
23542354
glm5-fp4-b300-sglang:
2355-
image: lmsysorg/sglang:v0.5.12-cu130
2355+
image: lmsysorg/sglang:v0.5.11-cu130
23562356
model: nvidia/GLM-5-NVFP4
23572357
model-prefix: glm5
23582358
runner: b300
@@ -2373,7 +2373,7 @@ glm5-fp4-b300-sglang:
23732373
- { tp: 4, ep: 1, conc-start: 4, conc-end: 256 }
23742374

23752375
glm5-fp4-b300-sglang-mtp:
2376-
image: lmsysorg/sglang:v0.5.12-cu130
2376+
image: lmsysorg/sglang:v0.5.11-cu130
23772377
model: nvidia/GLM-5-NVFP4
23782378
model-prefix: glm5
23792379
runner: b300

benchmarks/single_node/glm5_fp4_b300.sh

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -24,13 +24,6 @@ nvidia-smi
2424

2525
if [[ "$MODEL" != /* ]]; then hf download "$MODEL"; fi
2626

27-
# Downgrade flashinfer to the version pinned in sglang v0.5.11 to test the
28-
# trtllm batched-GEMM regression suspicion from sgl-project/sglang#25563
29-
# (suggested by @trevor-m). sglang v0.5.12's pyproject.toml moved from
30-
# flashinfer_python==0.6.8.post1 → 0.6.11.post1, and the trtllm GEMM crash
31-
# at bs=128 + EAGLE on B300 appeared in the same image bump.
32-
pip install --no-deps "flashinfer_python==0.6.8.post1" "flashinfer_cubin==0.6.8.post1"
33-
3427
SERVER_LOG=/workspace/server.log
3528
PORT=${PORT:-8888}
3629

benchmarks/single_node/glm5_fp4_b300_mtp.sh

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -24,12 +24,6 @@ nvidia-smi
2424
if [[ "$MODEL" != /* ]]; then hf download "$MODEL"; fi
2525

2626
pip install --no-deps "transformers==5.2.0" "huggingface-hub==1.4.1"
27-
# Downgrade flashinfer to the version pinned in sglang v0.5.11 to test the
28-
# trtllm batched-GEMM regression suspicion from sgl-project/sglang#25563
29-
# (suggested by @trevor-m). sglang v0.5.12's pyproject.toml moved from
30-
# flashinfer_python==0.6.8.post1 → 0.6.11.post1, and the trtllm GEMM crash
31-
# at bs=128 + EAGLE on B300 appeared in the same image bump.
32-
pip install --no-deps "flashinfer_python==0.6.8.post1" "flashinfer_cubin==0.6.8.post1"
3327

3428
export SGL_ENABLE_JIT_DEEPGEMM=1
3529
export SGLANG_ENABLE_SPEC_V2=1

perf-changelog.yaml

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3043,13 +3043,6 @@
30433043
- "Update SGLang image from nightly-dev-cu13-20260518-c67b2870 to nightly-dev-cu13-20260519-dbac4647"
30443044
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1492
30453045

3046-
- config-keys:
3047-
- glm5-fp4-b300-sglang
3048-
- glm5-fp4-b300-sglang-mtp
3049-
description:
3050-
- "Update SGLang image from v0.5.11-cu130 to v0.5.12-cu130"
3051-
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1420
3052-
30533046
- config-keys:
30543047
- dsr1-fp4-b200-sglang-mtp
30553048
description:

0 commit comments

Comments
 (0)