Commit b990fb6
committed
MinimaxM2.5-FP8-MI325x-vLLM: pin AITER FA attention backend
vLLM PR #36702 (between v0.18.0 and v0.21.0) flipped the dense
full-attention default on ROCm from ROCM_AITER_FA to ROCM_ATTN, causing
a ~38% throughput regression for MiniMax-M2.5 FP8 on MI325X
(vllm-project/vllm#43029).
Align benchmarks/single_node/minimaxm2.5_fp8_mi325x.sh with the merged
upstream recipe (vllm-project/recipes#481) to restore the v0.18.0
attention path on the v0.21.0 image:
- export VLLM_ROCM_SHUFFLE_KV_CACHE_LAYOUT=1 (asm/hip paged-attention
auto-dispatch)
- pass --attention-backend ROCM_AITER_FA to vllm serve1 parent e0cd8f7 commit b990fb6
1 file changed
Lines changed: 2 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
27 | | - | |
28 | | - | |
29 | 27 | | |
| 28 | + | |
30 | 29 | | |
31 | 30 | | |
32 | 31 | | |
| |||
52 | 51 | | |
53 | 52 | | |
54 | 53 | | |
| 54 | + | |
55 | 55 | | |
56 | 56 | | |
57 | 57 | | |
| |||
0 commit comments