Skip to content

Commit 67230af

Browse files
Klaud-Coldgithub-actions[bot]claude-fix-botfunctionstackx
authored
Update glm5-fp8-b300-sglang and glm5-fp8-b300-sglang-mtp SGLang image to v0.5.12-cu130 (#1421)
Ref #1154 Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Klaud Cold <Klaud-Cold@users.noreply.github.com> Co-authored-by: claude-fix-bot <claude-fix-bot@local> Co-authored-by: functionstackx <47992694+functionstackx@users.noreply.github.com>
1 parent e6212e9 commit 67230af

4 files changed

Lines changed: 20 additions & 4 deletions

File tree

.github/configs/nvidia-master.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2250,7 +2250,7 @@ glm5-fp8-b200-sglang-agentic:
22502250
- { tp: 8, ep: 1, offloading: none, conc-list: [1, 2, 4, 8, 16, 32, 64, 128] }
22512251

22522252
glm5-fp8-b300-sglang:
2253-
image: lmsysorg/sglang:v0.5.11-cu130
2253+
image: lmsysorg/sglang:v0.5.12-cu130
22542254
model: zai-org/GLM-5-FP8
22552255
model-prefix: glm5
22562256
runner: b300
@@ -2269,7 +2269,7 @@ glm5-fp8-b300-sglang:
22692269
- { tp: 8, ep: 1, conc-start: 4, conc-end: 256 }
22702270

22712271
glm5-fp8-b300-sglang-mtp:
2272-
image: lmsysorg/sglang:v0.5.11-cu130
2272+
image: lmsysorg/sglang:v0.5.12-cu130
22732273
model: zai-org/GLM-5-FP8
22742274
model-prefix: glm5
22752275
runner: b300

benchmarks/single_node/glm5_fp8_b300.sh

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,11 @@ if [[ "$MODEL" != /* ]]; then hf download "$MODEL"; fi
2525

2626
pip install --no-deps "transformers==5.2.0" "huggingface-hub==1.4.1"
2727

28-
export SGL_ENABLE_JIT_DEEPGEMM=1
28+
# Workaround for sgl-project/sglang#25551: v0.5.12 DeepGemm TMA-descriptor
29+
# regression on B300 (sm_120) crashes CUDA graph capture with
30+
# CUDA_ERROR_ILLEGAL_ADDRESS. Disabling JIT DeepGemm bypasses the affected
31+
# kernel path. Restore to =1 once the upstream regression is fixed.
32+
export SGL_ENABLE_JIT_DEEPGEMM=0
2933

3034
SERVER_LOG=/workspace/server.log
3135
PORT=${PORT:-8888}

benchmarks/single_node/glm5_fp8_b300_mtp.sh

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,11 @@ if [[ "$MODEL" != /* ]]; then hf download "$MODEL"; fi
2525

2626
pip install --no-deps "transformers==5.2.0" "huggingface-hub==1.4.1"
2727

28-
export SGL_ENABLE_JIT_DEEPGEMM=1
28+
# Workaround for sgl-project/sglang#25551: v0.5.12 DeepGemm TMA-descriptor
29+
# regression on B300 (sm_120) crashes CUDA graph capture with
30+
# CUDA_ERROR_ILLEGAL_ADDRESS. Disabling JIT DeepGemm bypasses the affected
31+
# kernel path. Restore to =1 once the upstream regression is fixed.
32+
export SGL_ENABLE_JIT_DEEPGEMM=0
2933
export SGLANG_ENABLE_SPEC_V2=1
3034

3135
SERVER_LOG=/workspace/server.log

perf-changelog.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2747,3 +2747,11 @@
27472747
description:
27482748
- "Update SGLang image from v0.5.9-rocm700-mi30x to v0.5.12-rocm700-mi30x"
27492749
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1425
2750+
2751+
- config-keys:
2752+
- glm5-fp8-b300-sglang
2753+
- glm5-fp8-b300-sglang-mtp
2754+
description:
2755+
- "Update SGLang image from v0.5.11-cu130 to v0.5.12-cu130"
2756+
- "Disable JIT DeepGemm (SGL_ENABLE_JIT_DEEPGEMM=0) to bypass v0.5.12 DeepGemm TMA-descriptor regression on B300 — see sgl-project/sglang#25551"
2757+
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1421

0 commit comments

Comments
 (0)