Skip to content

Commit 48c1840

Browse files
authored
GLM5.1-FP4-MI355X-SGLang: bump Image to v0.5.12.post1-rocm720-mi35x-20260529 (#1593)
* glm5.1-fp4-mi355x-sglang: bump SGLang ROCm image to v0.5.12.post1-20260529 Fixes the GSM8K accuracy regression reported in sgl-project/sglang#25742 (v0.5.12-20260517 dropped to ~0.32 at TP=2). Local eval-only runs with this new image recover to gsm8k strict-match 0.975 at TP=2/conc=64 and 0.974 at TP=4/conc=16. * Update Perf-Changelog
1 parent e0cd8f7 commit 48c1840

2 files changed

Lines changed: 9 additions & 1 deletion

File tree

.github/configs/amd-master.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -684,7 +684,7 @@ glm5-fp8-mi355x-atom:
684684
- { tp: 8, conc-start: 4, conc-end: 256 }
685685

686686
glm5.1-fp4-mi355x-sglang:
687-
image: lmsysorg/sglang-rocm:v0.5.10rc0-rocm720-mi35x-20260415
687+
image: lmsysorg/sglang-rocm:v0.5.12.post1-rocm720-mi35x-20260529
688688
model: amd/GLM-5.1-MXFP4
689689
model-prefix: glm5.1
690690
runner: mi355x

perf-changelog.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3220,3 +3220,11 @@
32203220
description:
32213221
- "Update GB300 FP4 GLM-5 8k1k low-latency sweep to mirror NVIDIA/srt-slurm#175: add a 5th 1p17d topology (decode_nodes/workers=17), and lower decode max-running-requests / cuda-graph-max-bs / benchmark concurrency per-zip-index from a flat 4096/1024 to 128/64/32/16/1 (mrr & cuda-graph) and 128/64/32/16/12 (concurrency)"
32223222
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1583
3223+
3224+
- config-keys:
3225+
- glm5.1-fp4-mi355x-sglang
3226+
description:
3227+
- "Bump SGLang ROCm image from v0.5.10rc0-rocm720-mi35x-20260415 to v0.5.12.post1-rocm720-mi35x-20260529"
3228+
- "Picks up the fix for the GSM8K accuracy regression reported in sgl-project/sglang#25742 (v0.5.12-20260517 collapsed to ~0.32 at TP=2)"
3229+
- "Local eval-only runs on MI355X recover to gsm8k strict-match 0.975 at TP=2/conc=64 and 0.974 at TP=4/conc=16, well above the 0.92 upstream gate added in sgl-project/sglang#26396"
3230+
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1593

0 commit comments

Comments
 (0)