Skip to content

Commit 9b50d69

Browse files
committed
note namespace-package patcher fix in changelog (re-trigger sweep)
1 parent 79bb67a commit 9b50d69

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

perf-changelog.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3455,5 +3455,5 @@
34553455
- config-keys:
34563456
- dsr1-fp4-mi355x-sglang-disagg-8k1k-mtp
34573457
description:
3458-
- "Root-cause fix for MoRI dispatch-buffer corruption at low concurrency: replace the harness env clamp (bandaid) with an in-place patch (patches/apply_moriep_dispatch_floor.py, run by server_sglang.sh) that floors num_max_dispatch_tokens_per_rank to 256 inside the installed sglang moriep.py. NOTE a full-file overlay was tried first and crashed the scheduler (AttributeError: MoriEPDispatcher has no attribute expert_mask_gpu) because the lmsysorg image ships a downstream-patched moriep.py that diverges from the upstream v0.5.12.post1 tag; the surgical in-place patch preserves the vendor fork. The per-rank All2All receive buffer is sized worldSize*maxNumInpTokenPerRank; at conc-64/TP8/MTP3 the value collapses to 32, overrunning the dispatch kernel's receive slots (only guard is an assert compiled out under -DNDEBUG) -> silent corruption (decodes fine, gsm8k=0). Confirmed on MI355X: dispatch=32->0.00, 64->0.00 (one wavefront insufficient), >=256->0.94. Throughput unchanged. Upstream: sgl-project/sglang#27194, ROCm/mori#356."
3458+
- "Root-cause fix for MoRI dispatch-buffer corruption at low concurrency: replace the harness env clamp (bandaid) with an in-place patch (patches/apply_moriep_dispatch_floor.py, run by server_sglang.sh) that floors num_max_dispatch_tokens_per_rank to 256 inside the installed sglang moriep.py. NOTE a full-file overlay was tried first and crashed the scheduler (AttributeError: MoriEPDispatcher has no attribute expert_mask_gpu) because the lmsysorg image ships a downstream-patched moriep.py that diverges from the upstream v0.5.12.post1 tag; the surgical in-place patch preserves the vendor fork. The per-rank All2All receive buffer is sized worldSize*maxNumInpTokenPerRank; at conc-64/TP8/MTP3 the value collapses to 32, overrunning the dispatch kernel's receive slots (only guard is an assert compiled out under -DNDEBUG) -> silent corruption (decodes fine, gsm8k=0). Confirmed on MI355X: dispatch=32->0.00, 64->0.00 (one wavefront insufficient), >=256->0.94. Throughput unchanged. Upstream: sgl-project/sglang#27194, ROCm/mori#356. Also fixes namespace-package crash in patcher (sglang.__file__ is None in lmsysorg vendor image)."
34593459
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1659

0 commit comments

Comments
 (0)