Skip to content

Commit f4b05bc

Browse files
Revert "Fix glm5-fp8-b300 DeepGemm regression" (#1536)
This reverts commit 12fb33e.
1 parent 12fb33e commit f4b05bc

3 files changed

Lines changed: 14 additions & 29 deletions

File tree

benchmarks/single_node/glm5_fp8_b300.sh

Lines changed: 7 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -23,17 +23,13 @@ nvidia-smi
2323

2424
if [[ "$MODEL" != /* ]]; then hf download "$MODEL"; fi
2525

26-
pip install --break-system-packages --no-deps "transformers==5.2.0" "huggingface-hub==1.4.1"
27-
28-
# Testing @trevor-m's suggestion in sgl-project/sglang#25551 (comment 4481466979):
29-
# downgrade sgl-deep-gemm 0.1.0 → 0.0.1 inside the v0.5.12 container to check
30-
# whether the deepgemm version jump is what causes the B300 TMA-descriptor
31-
# CUDA_ERROR_ILLEGAL_ADDRESS regression. Re-enabling JIT DeepGemm so the
32-
# downgraded version actually runs.
33-
# --break-system-packages required: the container's Python is PEP-668 externally-managed,
34-
# so the previous attempt silently failed and left the bundled 0.1.0 in place.
35-
pip install --break-system-packages --no-deps "sgl-deep-gemm==0.0.1"
36-
export SGL_ENABLE_JIT_DEEPGEMM=1
26+
pip install --no-deps "transformers==5.2.0" "huggingface-hub==1.4.1"
27+
28+
# Workaround for sgl-project/sglang#25551: v0.5.12 DeepGemm TMA-descriptor
29+
# regression on B300 (sm_120) crashes CUDA graph capture with
30+
# CUDA_ERROR_ILLEGAL_ADDRESS. Disabling JIT DeepGemm bypasses the affected
31+
# kernel path. Restore to =1 once the upstream regression is fixed.
32+
export SGL_ENABLE_JIT_DEEPGEMM=0
3733

3834
SERVER_LOG=/workspace/server.log
3935
PORT=${PORT:-8888}

benchmarks/single_node/glm5_fp8_b300_mtp.sh

Lines changed: 7 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -23,17 +23,13 @@ nvidia-smi
2323

2424
if [[ "$MODEL" != /* ]]; then hf download "$MODEL"; fi
2525

26-
pip install --break-system-packages --no-deps "transformers==5.2.0" "huggingface-hub==1.4.1"
27-
28-
# Testing @trevor-m's suggestion in sgl-project/sglang#25551 (comment 4481466979):
29-
# downgrade sgl-deep-gemm 0.1.0 → 0.0.1 inside the v0.5.12 container to check
30-
# whether the deepgemm version jump is what causes the B300 TMA-descriptor
31-
# CUDA_ERROR_ILLEGAL_ADDRESS regression. Re-enabling JIT DeepGemm so the
32-
# downgraded version actually runs.
33-
# --break-system-packages required: the container's Python is PEP-668 externally-managed,
34-
# so the previous attempt silently failed and left the bundled 0.1.0 in place.
35-
pip install --break-system-packages --no-deps "sgl-deep-gemm==0.0.1"
36-
export SGL_ENABLE_JIT_DEEPGEMM=1
26+
pip install --no-deps "transformers==5.2.0" "huggingface-hub==1.4.1"
27+
28+
# Workaround for sgl-project/sglang#25551: v0.5.12 DeepGemm TMA-descriptor
29+
# regression on B300 (sm_120) crashes CUDA graph capture with
30+
# CUDA_ERROR_ILLEGAL_ADDRESS. Disabling JIT DeepGemm bypasses the affected
31+
# kernel path. Restore to =1 once the upstream regression is fixed.
32+
export SGL_ENABLE_JIT_DEEPGEMM=0
3733
export SGLANG_ENABLE_SPEC_V2=1
3834

3935
SERVER_LOG=/workspace/server.log

perf-changelog.yaml

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3050,10 +3050,3 @@
30503050
description:
30513051
- "Update SGLang image from v0.5.11-cu130 (5d old) to v0.5.12-cu130"
30523052
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1475
3053-
3054-
- config-keys:
3055-
- glm5-fp8-b300-sglang
3056-
- glm5-fp8-b300-sglang-mtp
3057-
description:
3058-
- "Test @trevor-m's suggestion in sgl-project/sglang#25551: pin sgl-deep-gemm==0.0.1 inside v0.5.12 container to isolate whether the deep-gemm 0.0.1→0.1.0 upgrade is the source of the B300 CUDA_ERROR_ILLEGAL_ADDRESS regression."
3059-
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1512

0 commit comments

Comments
 (0)