Commit 1b71db4
authored
MinimaxM2.5-FP8-MI325x-vLLM: pin AITER FA attention backend (#1594)
* MinimaxM2.5-FP8-MI325x-vLLM: pin AITER FA attention backend
vLLM PR #36702 (between v0.18.0 and v0.21.0) flipped the dense
full-attention default on ROCm from ROCM_AITER_FA to ROCM_ATTN, causing
a ~38% throughput regression for MiniMax-M2.5 FP8 on MI325X
(vllm-project/vllm#43029).
Align benchmarks/single_node/minimaxm2.5_fp8_mi325x.sh with the merged
upstream recipe (vllm-project/recipes#481) to restore the v0.18.0
attention path on the v0.21.0 image:
- export VLLM_ROCM_SHUFFLE_KV_CACHE_LAYOUT=1 (asm/hip paged-attention
auto-dispatch)
- pass --attention-backend ROCM_AITER_FA to vllm serve
* Update the perf-changelog
* runners/launch_mi325x-amds.sh: propagate srun failures
* minimaxm2.5-fp8-mi325x-vllm: align with upstream MiniMax-M2.5 ROCm recipe
* runners/launch_mi325x-amds.sh: derive PORT per job; sudo -n in cleanup
Use `40000 + (JOB_ID % 10000)` instead of a hard-coded 8888 — a
non-SLURM Docker workload on chi-mi325x-pod1-019 bound :8888 and
made every sweep job scheduled there fail in sock.bind() with
EADDRINUSE before vLLM ran. Also harden the benchmark_logs trap with
`sudo -n` so it fails fast under a non-tty instead of hanging.
* minimaxm2.5-fp8-mi325x-vllm: gate SHUFFLE_KV_CACHE_LAYOUT per (TP, CONC)
Set VLLM_ROCM_SHUFFLE_KV_CACHE_LAYOUT=1 (recipes#481 pillar 2) only at
shapes where AITER's gfx942 ASM paged-attn kernel exists: TP=2 EP=1
CONC<=16, TP=8 EP=8 CONC<=64. Above those, pa_fwd_asm hits
`get_heuristic_kernel: cannot get heuristic kernel!` (gqa=6,
block_size=32, qTile=0) and HTTP-500s every request. Mirrors the
per-shape toggle in the mi355x sibling. vllm#43029, sweep run
26692603804.1 parent 48c1840 commit 1b71db4
4 files changed
Lines changed: 45 additions & 11 deletions
File tree
- .github/configs
- benchmarks/single_node
- runners
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1038 | 1038 | | |
1039 | 1039 | | |
1040 | 1040 | | |
1041 | | - | |
| 1041 | + | |
1042 | 1042 | | |
1043 | 1043 | | |
1044 | 1044 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
27 | | - | |
28 | | - | |
29 | 27 | | |
30 | 28 | | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
31 | 39 | | |
32 | 40 | | |
33 | 41 | | |
| |||
52 | 60 | | |
53 | 61 | | |
54 | 62 | | |
| 63 | + | |
| 64 | + | |
55 | 65 | | |
56 | 66 | | |
57 | 67 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3228 | 3228 | | |
3229 | 3229 | | |
3230 | 3230 | | |
| 3231 | + | |
| 3232 | + | |
| 3233 | + | |
| 3234 | + | |
| 3235 | + | |
| 3236 | + | |
| 3237 | + | |
| 3238 | + | |
| 3239 | + | |
| 3240 | + | |
| 3241 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
| 2 | + | |
2 | 3 | | |
3 | 4 | | |
4 | | - | |
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
10 | 18 | | |
11 | 19 | | |
12 | 20 | | |
13 | 21 | | |
14 | 22 | | |
15 | 23 | | |
16 | | - | |
| 24 | + | |
17 | 25 | | |
18 | 26 | | |
19 | | - | |
| 27 | + | |
20 | 28 | | |
21 | 29 | | |
22 | 30 | | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
23 | 35 | | |
24 | | - | |
| 36 | + | |
| 37 | + | |
25 | 38 | | |
26 | | - | |
| 39 | + | |
27 | 40 | | |
28 | 41 | | |
29 | 42 | | |
30 | 43 | | |
31 | 44 | | |
32 | 45 | | |
33 | 46 | | |
34 | | - | |
35 | | - | |
36 | | - | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
37 | 50 | | |
38 | 51 | | |
39 | 52 | | |
| |||
0 commit comments