minimaxm2.5-fp8-h200-vllm: switch 8k/1k attention backend to FLASH_ATTN by RohitNagraj · Pull Request #1667 · SemiAnalysisAI/InferenceX

RohitNagraj · 2026-06-04T22:04:14Z

Switch the attention backend for the 8k/1k cell of minimaxm2.5-fp8-h200-vllm from FLASHINFER to FLASH_ATTN. ISL-conditional: the 1k/1k cell is unchanged (keeps FLASHINFER + --enable-flashinfer-autotune, byte-identical to prior behavior); only ISL=8192 triggers the swap.

Appends a perf-changelog entry.

Note

Low Risk
Benchmark-only vLLM flag selection for one ISL cell; no auth, data, or production serving paths.

Overview
For MiniMax-M2.5 FP8 H200 vLLM (minimaxm2.5_fp8_h200.sh), vllm serve now picks the attention stack from ISL: when ISL=8192 (8k/1k), it uses FLASH_ATTN and does not pass --enable-flashinfer-autotune; for all other input lengths it keeps FLASHINFER plus FlashInfer autotune, matching prior 1k/1k behavior.

A perf-changelog.yaml entry documents this under minimaxm2.5-fp8-h200-vllm.

^{Reviewed by Cursor Bugbot for commit 424fe77. Bugbot is set up for automated code reviews on this repo. Configure here.}

…SH_ATTN Switch the attention backend for the 8k/1k cell of minimaxm2.5-fp8-h200-vllm from FLASHINFER to FLASH_ATTN. ISL-conditional: the 1k/1k cell is unchanged (keeps FLASHINFER + --enable-flashinfer-autotune, byte-identical to prior behavior); only ISL=8192 triggers the swap. Appends a perf-changelog entry.

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

RohitNagraj · 2026-06-04T23:35:43Z

Superseded by #1668, which is opened from a branch in this repository instead of a fork. Closing in favor of that PR.

RohitNagraj requested a review from a team June 4, 2026 22:04

github-project-automation Bot added this to InferenceMAX Board Jun 4, 2026

claude Bot reviewed Jun 4, 2026

View reviewed changes

chore: set perf-changelog pr-link for minimaxm2.5-fp8-h200-vllm FA3 swap

424fe77

RohitNagraj closed this Jun 4, 2026

github-project-automation Bot moved this to Done in InferenceMAX Board Jun 4, 2026

claude Bot mentioned this pull request Jun 4, 2026

minimaxm2.5-fp8-h200-vllm: switch 8k/1k attention backend to FLASH_ATTN #1668

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

minimaxm2.5-fp8-h200-vllm: switch 8k/1k attention backend to FLASH_ATTN#1667

minimaxm2.5-fp8-h200-vllm: switch 8k/1k attention backend to FLASH_ATTN#1667
RohitNagraj wants to merge 2 commits into
SemiAnalysisAI:mainfrom
RohitNagraj:minimax-m25-h200-vllm-8k1k-fa3

RohitNagraj commented Jun 4, 2026 •

edited by cursor Bot

Loading

Uh oh!

claude Bot left a comment

Uh oh!

RohitNagraj commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RohitNagraj commented Jun 4, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

RohitNagraj commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

RohitNagraj commented Jun 4, 2026 •

edited by cursor Bot

Loading