You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
raise MoRI dispatch-buffer floor to 256 (warpSize=64 proven insufficient)
The conc-64 run with the warpSize floor (64) still scored gsm8k=0.00
(run 26919517564), disproving the one-wavefront hypothesis. The per-rank
dispatch buffer must hold the routing fan-in (a receiving rank takes tokens
from all worldSize peers), not just one warp-chunk. Empirically on MI355X:
dispatch=32 -> 0.00, dispatch=64 -> 0.00, dispatch>=256 -> 0.94. Clamp to the
proven 256. Throughput is unchanged; the corrupt run's ~3% edge was dropped
work, not real speed.
Copy file name to clipboardExpand all lines: perf-changelog.yaml
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -3455,5 +3455,5 @@
3455
3455
- config-keys:
3456
3456
- dsr1-fp4-mi355x-sglang-disagg-8k1k-mtp
3457
3457
description:
3458
-
- "Throwaway: validate warpSize floor fix — clamp MORI_MAX_DISPATCH_TOKENS_DECODE >= 64 (CDNA3/4 wavefront) in server_sglang.sh. The MoRI All2All dispatch kernel writes warpSize-aligned receive slots (destTokId = flagSlotId*warpSize + laneId), so a per-rank buffer < 64 overruns its region -> silent corruption (conc-64/TP8/MTP3 -> 32 tokens -> gsm8k=0). If gsm8k recovers, 64 is the minimal correct floor (best perf vs the proven-but-larger 256)."
3458
+
- "Fix MoRI dispatch-buffer corruption at low concurrency: clamp MORI_MAX_DISPATCH_TOKENS_DECODE >= 256 in server_sglang.sh. The harness sizes the per-rank All2All dispatch buffer from max(CONC_LIST)/TP*(MTP+1), which collapses to 32 at conc-64/TP8/MTP3 and silently corrupts the dispatch kernel's receive slots (decodes fine, gsm8k=0). Confirmed on MI355X: dispatch=32->0.00, dispatch=64->0.00 (warpSize alone insufficient), dispatch>=256->0.94. Throughput unchanged (the corrupt run's ~3% edge was dropped work)."
0 commit comments