Skip to content

Commit b6f909e

Browse files
committed
throwaway: conc-64 gsm8k eval for DEP8+MTP3 to reproduce dispatch token corruption
Narrow dsr1-fp4-mi355x-sglang-disagg-8k1k-mtp search-space to a single DEP8+MTP3 conc-64 entry. With max(CONC_LIST)=64, the server computes SGLANG_MORI_NUM_MAX_DISPATCH_TOKENS_PER_RANK=32, which is below the 256 threshold that selects the correct All2All kernel. Expected: ~0% gsm8k (silent corruption from the low-latency All2All variant). Not for merge — throwaway validation of the dispatch token bug.
1 parent 1b23499 commit b6f909e

2 files changed

Lines changed: 11 additions & 136 deletions

File tree

.github/configs/amd-master.yaml

Lines changed: 4 additions & 136 deletions
Original file line numberDiff line numberDiff line change
@@ -1986,123 +1986,10 @@ dsr1-fp4-mi355x-sglang-disagg-8k1k-mtp:
19861986
- isl: 8192
19871987
osl: 1024
19881988
search-space:
1989-
# MTP configurations
1990-
# 1P1D pure TP8
1991-
- spec-decoding: "mtp"
1992-
conc-list: [ 1, 2, 4, 8 ]
1993-
prefill:
1994-
num-worker: 1
1995-
tp: 8
1996-
ep: 1
1997-
dp-attn: false
1998-
additional-settings:
1999-
- "PREFILL_NODES=1"
2000-
decode:
2001-
num-worker: 1
2002-
tp: 8
2003-
ep: 1
2004-
dp-attn: false
2005-
additional-settings:
2006-
- "DECODE_NODES=1"
2007-
- "DECODE_MTP_SIZE=3"
2008-
2009-
# 1P2D TP8
2010-
- spec-decoding: "mtp"
2011-
conc-list: [ 2, 4, 8, 16, 32 ]
2012-
prefill:
2013-
num-worker: 1
2014-
tp: 8
2015-
ep: 1
2016-
dp-attn: false
2017-
additional-settings:
2018-
- "PREFILL_NODES=1"
2019-
decode:
2020-
num-worker: 2
2021-
tp: 8
2022-
ep: 1
2023-
dp-attn: false
2024-
additional-settings:
2025-
- "DECODE_NODES=2"
2026-
- "DECODE_MTP_SIZE=3"
2027-
2028-
# 1P2D TP8
2029-
- spec-decoding: "mtp"
2030-
conc-list: [ 32, 64 ]
2031-
prefill:
2032-
num-worker: 1
2033-
tp: 8
2034-
ep: 1
2035-
dp-attn: false
2036-
additional-settings:
2037-
- "PREFILL_NODES=1"
2038-
decode:
2039-
num-worker: 2
2040-
tp: 8
2041-
ep: 1
2042-
dp-attn: false
2043-
additional-settings:
2044-
- "DECODE_NODES=2"
2045-
- "DECODE_MTP_SIZE=3"
2046-
2047-
# 1*DEP8 + 1*DEP8
2048-
- spec-decoding: "mtp"
2049-
conc-list: [ 640, 512 ]
2050-
prefill:
2051-
num-worker: 1
2052-
tp: 8
2053-
ep: 8
2054-
dp-attn: true
2055-
additional-settings:
2056-
- "PREFILL_NODES=1"
2057-
decode:
2058-
num-worker: 1
2059-
tp: 8
2060-
ep: 8
2061-
dp-attn: true
2062-
additional-settings:
2063-
- "DECODE_NODES=1"
2064-
- "DECODE_MTP_SIZE=3"
2065-
2066-
# 1*DEP8 + 1*DEP8
2067-
- spec-decoding: "mtp"
2068-
conc-list: [ 256 ]
2069-
prefill:
2070-
num-worker: 1
2071-
tp: 8
2072-
ep: 8
2073-
dp-attn: true
2074-
additional-settings:
2075-
- "PREFILL_NODES=1"
2076-
decode:
2077-
num-worker: 1
2078-
tp: 8
2079-
ep: 8
2080-
dp-attn: true
2081-
additional-settings:
2082-
- "DECODE_NODES=1"
2083-
- "DECODE_MTP_SIZE=3"
2084-
2085-
2086-
# 1*DEP8 + 1*DEP8
2087-
- spec-decoding: "mtp"
2088-
conc-list: [ 128 ]
2089-
prefill:
2090-
num-worker: 1
2091-
tp: 8
2092-
ep: 8
2093-
dp-attn: true
2094-
additional-settings:
2095-
- "PREFILL_NODES=1"
2096-
decode:
2097-
num-worker: 1
2098-
tp: 8
2099-
ep: 8
2100-
dp-attn: true
2101-
additional-settings:
2102-
- "DECODE_NODES=1"
2103-
- "DECODE_MTP_SIZE=3"
2104-
2105-
# 1*DEP8 + 1*DEP8
1989+
# THROWAWAY (not for merge): conc-64 only DEP8+MTP3 to reproduce
1990+
# SGLANG_MORI_NUM_MAX_DISPATCH_TOKENS_PER_RANK < 256 corruption.
1991+
# max(CONC_LIST)=64 → dispatch_tokens=64/8*4=32 → broken All2All kernel.
1992+
# 1*DEP8 + 1*DEP8, MTP3
21061993
- spec-decoding: "mtp"
21071994
conc-list: [ 64 ]
21081995
prefill:
@@ -2121,25 +2008,6 @@ dsr1-fp4-mi355x-sglang-disagg-8k1k-mtp:
21212008
- "DECODE_NODES=1"
21222009
- "DECODE_MTP_SIZE=3"
21232010

2124-
# 2*DEP8 + 1*DEP8
2125-
- spec-decoding: "mtp"
2126-
conc-list: [ 1024, 2048, 4096 ]
2127-
prefill:
2128-
num-worker: 2
2129-
tp: 8
2130-
ep: 8
2131-
dp-attn: true
2132-
additional-settings:
2133-
- "PREFILL_NODES=2"
2134-
decode:
2135-
num-worker: 1
2136-
tp: 8
2137-
ep: 8
2138-
dp-attn: true
2139-
additional-settings:
2140-
- "DECODE_NODES=1"
2141-
- "DECODE_MTP_SIZE=1"
2142-
21432011

21442012
# DSv4-Pro FP4 on MI355X via SGLang. Uses a rocm720 mi35x image built off the
21452013
# amd/deepseek_v4 branch in sgl-project/sglang; the SHA is encoded in the

perf-changelog.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3395,3 +3395,10 @@
33953395
description:
33963396
- "Add DeepSeek-V4-Pro FP4 MI355X ATOM MTP3 benchmark; image rocm/atom:rocm7.2.4_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom0.1.3"
33973397
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1627
3398+
3399+
- config-keys:
3400+
- dsr1-fp4-mi355x-sglang-disagg-8k1k-mtp
3401+
description:
3402+
- "Throwaway: conc-64-only gsm8k eval for DEP8+MTP3 to reproduce SGLANG_MORI_NUM_MAX_DISPATCH_TOKENS_PER_RANK < 256 corruption (dispatch=32 triggers broken All2All kernel, expect ~0% gsm8k). Not for merge."
3403+
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/PENDING
3404+
evals-only: true

0 commit comments

Comments
 (0)