Commit c138338
authored
minimaxm2.5-fp8-h200-vllm: switch 8k/1k attention backend to FLASH_ATTN (#1668)
* perf: switch minimaxm2.5-fp8-h200-vllm 8k/1k attention backend to FLASH_ATTN
Switch the attention backend for the 8k/1k cell of minimaxm2.5-fp8-h200-vllm
from FLASHINFER to FLASH_ATTN. ISL-conditional: the 1k/1k cell is unchanged
(keeps FLASHINFER + --enable-flashinfer-autotune, byte-identical to prior
behavior); only ISL=8192 triggers the swap.
Appends a perf-changelog entry.
* chore: set perf-changelog pr-link for minimaxm2.5-fp8-h200-vllm FA3 swap
* chore: point perf-changelog pr-link to PR #16681 parent ea4f575 commit c138338
2 files changed
Lines changed: 17 additions & 2 deletions
Lines changed: 10 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
45 | 53 | | |
46 | 54 | | |
47 | 55 | | |
| |||
55 | 63 | | |
56 | 64 | | |
57 | 65 | | |
58 | | - | |
59 | | - | |
| 66 | + | |
| 67 | + | |
60 | 68 | | |
61 | 69 | | |
62 | 70 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3474 | 3474 | | |
3475 | 3475 | | |
3476 | 3476 | | |
| 3477 | + | |
| 3478 | + | |
| 3479 | + | |
| 3480 | + | |
| 3481 | + | |
| 3482 | + | |
| 3483 | + | |
0 commit comments