Commit d3a617f
minimaxm3-fp8-mi300x-vllm: enable AITER kernels for MXFP8 on MI300X
Enable AITER on MI300X/gfx942 for MiniMax-M3 MXFP8 via the single master
toggle VLLM_ROCM_USE_AITER=1. The per-component AITER flags (_MOE, _LINEAR,
_RMSNORM, _FP8BMM) default to True and are gated behind the master flag, so
they are left at their defaults. VLLM_ROCM_USE_AITER_MHA defaults to True and
is explicitly set to 0 to keep attention on TRITON_ATTN, since the MXFP8
checkpoint lacks calibrated q/prob scales for ROCm FP8 attention.
Also set AMD-recommended numerically-inert MI300X runtime knobs:
TORCH_BLAS_PREFER_HIPBLASLT=1, NCCL_MIN_NCHANNELS=112 (RCCL channels, raised
above the ~32-64 default for TP8), GPU_MAX_HW_QUEUES=2 (HIP streams, capped
below the default of 4). All changes are kernel-selection/runtime only;
GSM8K holds ~0.95.
Measured uplift (8xMI300X, 1k1k, total tok/s/gpu): +5.6..+10.8% across
conc 4..256; conc 1-2 unchanged (latency-bound).
Co-authored-by: Gong Zheng <zgong@amd.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>1 parent 6462ac0 commit d3a617f
2 files changed
Lines changed: 15 additions & 0 deletions
Lines changed: 7 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
37 | 44 | | |
38 | 45 | | |
39 | 46 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3871 | 3871 | | |
3872 | 3872 | | |
3873 | 3873 | | |
| 3874 | + | |
| 3875 | + | |
| 3876 | + | |
| 3877 | + | |
| 3878 | + | |
| 3879 | + | |
| 3880 | + | |
| 3881 | + | |
0 commit comments