Skip to content

Commit 9983cc0

Browse files
committed
add dispatch token clamp (>=256) and run benchmark+eval at conc-64
Clamp MORI_MAX_DISPATCH_TOKENS_DECODE to minimum 256 when DP+EP are both enabled, preventing SGLang's low-latency All2All kernel from being selected. That kernel silently corrupts outputs at small buffer sizes. Run A of A/B test: benchmark + eval WITH clamp on conc-64 DEP8+MTP3.
1 parent 45f69f5 commit 9983cc0

2 files changed

Lines changed: 10 additions & 2 deletions

File tree

benchmarks/multi_node/amd_utils/server_sglang.sh

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -248,6 +248,15 @@ if [[ "$DECODE_MTP_SIZE" -gt 0 ]]; then
248248
MORI_MOE_MAX_INPUT_TOKENS_DECODE=$((MORI_MOE_MAX_INPUT_TOKENS_DECODE * (DECODE_MTP_SIZE + 1)))
249249
fi
250250

251+
# Clamp dispatch tokens to >= 256 to avoid the low-latency All2All kernel
252+
# variant in MoRI which silently corrupts outputs at small buffer sizes.
253+
if [[ "$DECODE_ENABLE_DP" == "true" ]] && [[ "$DECODE_ENABLE_EP" == "true" ]]; then
254+
if [[ $MORI_MAX_DISPATCH_TOKENS_DECODE -lt 256 ]]; then
255+
echo "[WARN] Clamping MORI_MAX_DISPATCH_TOKENS_DECODE from $MORI_MAX_DISPATCH_TOKENS_DECODE to 256 (All2All kernel threshold)"
256+
MORI_MAX_DISPATCH_TOKENS_DECODE=256
257+
fi
258+
fi
259+
251260
# =============================================================================
252261
# Cluster Topology Configuration
253262
# =============================================================================

perf-changelog.yaml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3448,6 +3448,5 @@
34483448
- config-keys:
34493449
- dsr1-fp4-mi355x-sglang-disagg-8k1k-mtp
34503450
description:
3451-
- "Throwaway: conc-64-only gsm8k eval for DEP8+MTP3 to reproduce SGLANG_MORI_NUM_MAX_DISPATCH_TOKENS_PER_RANK < 256 corruption (dispatch=32 triggers broken All2All kernel, expect 0pct gsm8k). Not for merge."
3451+
- "Throwaway: conc-64 DEP8+MTP3 benchmark+eval WITH dispatch token clamp (MORI_MAX_DISPATCH_TOKENS_DECODE >= 256). A/B test for All2All kernel corruption fix."
34523452
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1659
3453-
evals-only: true

0 commit comments

Comments
 (0)