Skip to content

minimaxm2.5-fp8-h200-vllm: switch 8k/1k attention backend to FLASH_ATTN#1667

Closed
RohitNagraj wants to merge 2 commits into
SemiAnalysisAI:mainfrom
RohitNagraj:minimax-m25-h200-vllm-8k1k-fa3
Closed

minimaxm2.5-fp8-h200-vllm: switch 8k/1k attention backend to FLASH_ATTN#1667
RohitNagraj wants to merge 2 commits into
SemiAnalysisAI:mainfrom
RohitNagraj:minimax-m25-h200-vllm-8k1k-fa3

Conversation

@RohitNagraj
Copy link
Copy Markdown
Collaborator

@RohitNagraj RohitNagraj commented Jun 4, 2026

Switch the attention backend for the 8k/1k cell of minimaxm2.5-fp8-h200-vllm from FLASHINFER to FLASH_ATTN. ISL-conditional: the 1k/1k cell is unchanged (keeps FLASHINFER + --enable-flashinfer-autotune, byte-identical to prior behavior); only ISL=8192 triggers the swap.

Appends a perf-changelog entry.


Note

Low Risk
Benchmark-only vLLM flag selection for one ISL cell; no auth, data, or production serving paths.

Overview
For MiniMax-M2.5 FP8 H200 vLLM (minimaxm2.5_fp8_h200.sh), vllm serve now picks the attention stack from ISL: when ISL=8192 (8k/1k), it uses FLASH_ATTN and does not pass --enable-flashinfer-autotune; for all other input lengths it keeps FLASHINFER plus FlashInfer autotune, matching prior 1k/1k behavior.

A perf-changelog.yaml entry documents this under minimaxm2.5-fp8-h200-vllm.

Reviewed by Cursor Bugbot for commit 424fe77. Bugbot is set up for automated code reviews on this repo. Configure here.

…SH_ATTN

Switch the attention backend for the 8k/1k cell of minimaxm2.5-fp8-h200-vllm
from FLASHINFER to FLASH_ATTN. ISL-conditional: the 1k/1k cell is unchanged
(keeps FLASHINFER + --enable-flashinfer-autotune, byte-identical to prior
behavior); only ISL=8192 triggers the swap.

Appends a perf-changelog entry.
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@RohitNagraj
Copy link
Copy Markdown
Collaborator Author

Superseded by #1668, which is opened from a branch in this repository instead of a fork. Closing in favor of that PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

1 participant