Commit cfd608e
committed
[ROCm][gpt-oss] Pass GateMode.INTERLEAVE for MXFP4 W4A16 fused MoE
The MXFP4 W4A16 weight-load path in oracle/mxfp4.py uses
shuffle_weight_a16w4 (is_guinterleave=True), which interleaves gate/up
columns within each weight tile. The CK/FlyDSL MoE kernels in aiter
must be told this via gate_mode=GateMode.INTERLEAVE so they decode the
gate/up packing correctly.
Without the explicit gate_mode, aiter defaults to SEPARATED and (since
ROCm/aiter#3123) dispatches the (SEPARATED + Swiglu + per_1x32 + fp4x2)
case to a path that returns garbage for shuffled weights or crashes
during CK2stages JIT for the unshuffled Quark variant
(amd/gpt-oss-20b-w-mxfp4-a-bf16). This was the root cause of ROCM-25517
(gpt-oss-120b W4A16 gsm8k acc = 0) and ROCM-25478 (gpt-oss-20b Quark
JIT crash).
Other paths are unaffected:
- FP8 W8A8 (DeepSeek-V4-Pro, DeepSeek-V3.2): shuffled with
quark_ocp_mx.py:shuffle_weight(layout=(16,16)) — non-interleaved.
use_mxfp4_w4a16 is False, default SEPARATED preserved.
- MXFP4 W4A4 (amd/DeepSeek-R1-0528-MXFP4): shuffled via
rocm_aiter_ops.shuffle_weights — non-interleaved. use_mxfp4_w4a16
is False, default SEPARATED preserved.
The gate_mode kwarg was added to aiter.fused_moe in
ROCm/aiter#3123 (aiter>=0.1.14). To stay compatible with older aiter
shipping with vllm (e.g. aiter 0.1.13.post1 in the vllm-rocm:nightly
image), we probe the aiter signature and drop the kwarg when unsupported
— pre-vllm-project#3123 aiter tolerated the implicit SEPARATED default for
interleave-shuffled weights, so dropping the kwarg is safe there.
GateMode itself only exists on aiter>=0.1.14 and is imported under
try/except for the same reason.
Validation on MI355X (gfx950):
vllm@main + aiter@main (6aeba41) openai/gpt-oss-120b W4A16 gsm8k:
TP=1: 0.000 -> 0.905 TP=8: 0.000 -> 0.905
vllm@main + aiter@main amd/gpt-oss-20b-w-mxfp4-a-bf16 TP=2 enforce-eager:
CK2stages JIT crash -> serves cleanly
vllm-rocm:nightly + aiter 0.1.13.post1 openai/gpt-oss-120b W4A16 gsm8k:
TP=1: 0.910 (backward-compat — gate_mode kwarg silently dropped)
vllm-rocm:v0.22.0 + aiter@main openai/gpt-oss-120b W4A16 gsm8k:
TP=1: 0.895
amd/gpt-oss120b-w-mxfp4-a-fp8 W4A8 (this PR composes with vllm-project#44804):
TP=8 mc=1=326, mc=8=2087, mc=32=6523, mc=64=11610 tok/s
Reference: sgl-project/sglang#25580 (sglang's
equivalent fix). Recommended by aiter maintainer (XiaobingZhang) on
ROCm/aiter#3586.
Signed-off-by: Rohan Potdar <rohan.potdar@amd.com>1 parent 2ed0a96 commit cfd608e
2 files changed
Lines changed: 37 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
152 | 152 | | |
153 | 153 | | |
154 | 154 | | |
| 155 | + | |
155 | 156 | | |
156 | 157 | | |
157 | 158 | | |
| |||
162 | 163 | | |
163 | 164 | | |
164 | 165 | | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
165 | 171 | | |
166 | 172 | | |
167 | 173 | | |
| |||
183 | 189 | | |
184 | 190 | | |
185 | 191 | | |
| 192 | + | |
186 | 193 | | |
187 | 194 | | |
188 | 195 | | |
| |||
204 | 211 | | |
205 | 212 | | |
206 | 213 | | |
| 214 | + | |
207 | 215 | | |
208 | 216 | | |
209 | 217 | | |
| |||
1643 | 1651 | | |
1644 | 1652 | | |
1645 | 1653 | | |
| 1654 | + | |
| 1655 | + | |
| 1656 | + | |
| 1657 | + | |
| 1658 | + | |
| 1659 | + | |
| 1660 | + | |
| 1661 | + | |
| 1662 | + | |
| 1663 | + | |
| 1664 | + | |
| 1665 | + | |
| 1666 | + | |
| 1667 | + | |
1646 | 1668 | | |
1647 | 1669 | | |
1648 | 1670 | | |
| |||
1976 | 1998 | | |
1977 | 1999 | | |
1978 | 2000 | | |
| 2001 | + | |
1979 | 2002 | | |
1980 | 2003 | | |
1981 | 2004 | | |
| |||
1998 | 2021 | | |
1999 | 2022 | | |
2000 | 2023 | | |
| 2024 | + | |
2001 | 2025 | | |
2002 | 2026 | | |
2003 | 2027 | | |
| |||
Lines changed: 13 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
341 | 341 | | |
342 | 342 | | |
343 | 343 | | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
344 | 356 | | |
345 | 357 | | |
346 | 358 | | |
| |||
359 | 371 | | |
360 | 372 | | |
361 | 373 | | |
| 374 | + | |
362 | 375 | | |
363 | 376 | | |
364 | 377 | | |
| |||
0 commit comments