File tree Expand file tree Collapse file tree
Expand file tree Collapse file tree Original file line number Diff line number Diff line change @@ -23,17 +23,13 @@ nvidia-smi
2323
2424if [[ " $MODEL " != /* ]]; then hf download " $MODEL " ; fi
2525
26- pip install --break-system-packages --no-deps " transformers==5.2.0" " huggingface-hub==1.4.1"
27-
28- # Testing @trevor-m's suggestion in sgl-project/sglang#25551 (comment 4481466979):
29- # downgrade sgl-deep-gemm 0.1.0 → 0.0.1 inside the v0.5.12 container to check
30- # whether the deepgemm version jump is what causes the B300 TMA-descriptor
31- # CUDA_ERROR_ILLEGAL_ADDRESS regression. Re-enabling JIT DeepGemm so the
32- # downgraded version actually runs.
33- # --break-system-packages required: the container's Python is PEP-668 externally-managed,
34- # so the previous attempt silently failed and left the bundled 0.1.0 in place.
35- pip install --break-system-packages --no-deps " sgl-deep-gemm==0.0.1"
36- export SGL_ENABLE_JIT_DEEPGEMM=1
26+ pip install --no-deps " transformers==5.2.0" " huggingface-hub==1.4.1"
27+
28+ # Workaround for sgl-project/sglang#25551: v0.5.12 DeepGemm TMA-descriptor
29+ # regression on B300 (sm_120) crashes CUDA graph capture with
30+ # CUDA_ERROR_ILLEGAL_ADDRESS. Disabling JIT DeepGemm bypasses the affected
31+ # kernel path. Restore to =1 once the upstream regression is fixed.
32+ export SGL_ENABLE_JIT_DEEPGEMM=0
3733
3834SERVER_LOG=/workspace/server.log
3935PORT=${PORT:- 8888}
Original file line number Diff line number Diff line change @@ -23,17 +23,13 @@ nvidia-smi
2323
2424if [[ " $MODEL " != /* ]]; then hf download " $MODEL " ; fi
2525
26- pip install --break-system-packages --no-deps " transformers==5.2.0" " huggingface-hub==1.4.1"
27-
28- # Testing @trevor-m's suggestion in sgl-project/sglang#25551 (comment 4481466979):
29- # downgrade sgl-deep-gemm 0.1.0 → 0.0.1 inside the v0.5.12 container to check
30- # whether the deepgemm version jump is what causes the B300 TMA-descriptor
31- # CUDA_ERROR_ILLEGAL_ADDRESS regression. Re-enabling JIT DeepGemm so the
32- # downgraded version actually runs.
33- # --break-system-packages required: the container's Python is PEP-668 externally-managed,
34- # so the previous attempt silently failed and left the bundled 0.1.0 in place.
35- pip install --break-system-packages --no-deps " sgl-deep-gemm==0.0.1"
36- export SGL_ENABLE_JIT_DEEPGEMM=1
26+ pip install --no-deps " transformers==5.2.0" " huggingface-hub==1.4.1"
27+
28+ # Workaround for sgl-project/sglang#25551: v0.5.12 DeepGemm TMA-descriptor
29+ # regression on B300 (sm_120) crashes CUDA graph capture with
30+ # CUDA_ERROR_ILLEGAL_ADDRESS. Disabling JIT DeepGemm bypasses the affected
31+ # kernel path. Restore to =1 once the upstream regression is fixed.
32+ export SGL_ENABLE_JIT_DEEPGEMM=0
3733export SGLANG_ENABLE_SPEC_V2=1
3834
3935SERVER_LOG=/workspace/server.log
Original file line number Diff line number Diff line change 30503050 description :
30513051 - " Update SGLang image from v0.5.11-cu130 (5d old) to v0.5.12-cu130"
30523052 pr-link : https://github.com/SemiAnalysisAI/InferenceX/pull/1475
3053-
3054- - config-keys :
3055- - glm5-fp8-b300-sglang
3056- - glm5-fp8-b300-sglang-mtp
3057- description :
3058- - " Test @trevor-m's suggestion in sgl-project/sglang#25551: pin sgl-deep-gemm==0.0.1 inside v0.5.12 container to isolate whether the deep-gemm 0.0.1→0.1.0 upgrade is the source of the B300 CUDA_ERROR_ILLEGAL_ADDRESS regression."
3059- pr-link : https://github.com/SemiAnalysisAI/InferenceX/pull/1512
You can’t perform that action at this time.
0 commit comments