add attention backend recommendation for Minimax 2.5 by faradawn · Pull Request #512 · vllm-project/recipes

faradawn · 2026-06-05T04:17:09Z

For longer sequence lengths, use flash attention backend for best performance for minimax on H200 FP8.

Reference: SemiAnalysisAI/InferenceX#1668

Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>

vercel · 2026-06-05T04:17:15Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
vllm-recipes	Ready	Preview, Comment	Jun 5, 2026 4:18am

gemini-code-assist

Code Review

This pull request updates the deployment guide for MiniMax-M2.5 to recommend specific attention backends (FLASHINFER or FLASH_ATTN) based on sequence length for H200 FP8. The reviewer suggested phrasing improvements to make the instructions clearer and more direct, along with minor punctuation corrections.

gemini-code-assist · 2026-06-05T04:17:56Z

+  For H200 FP8, choose the attention backend by sequence length for best performance:
+  shorter sequences (e.g. 1024) should keep the command above with `--attention-backend FLASHINFER`
+  and `--enable-flashinfer-autotune`, while longer input sequences (e.g. 8192)
+  can prefer FlashAttention by replacing those flags with `--attention-backend FLASH_ATTN`.


The phrase 'can prefer FlashAttention' is slightly awkward. It is clearer and more direct to state that longer sequences 'should use' FlashAttention for optimal performance. Also, adding commas after 'e.g.' is standard style.

For H200 FP8, choose the attention backend by sequence length for best performance: shorter sequences (e.g., 1024) should keep the command above with `--attention-backend FLASHINFER` and `--enable-flashinfer-autotune`, while longer input sequences (e.g., 8192) should use FlashAttention by replacing those flags with `--attention-backend FLASH_ATTN`.

add attention backend recommendation

8bb9ae7

Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>

faradawn requested a review from esmeetu June 5, 2026 04:17

gemini-code-assist Bot reviewed Jun 5, 2026

View reviewed changes

vercel Bot deployed to Preview June 5, 2026 04:18 View deployment

esmeetu merged commit 66bc7f2 into vllm-project:main Jun 5, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add attention backend recommendation for Minimax 2.5#512

add attention backend recommendation for Minimax 2.5#512
esmeetu merged 1 commit into
vllm-project:mainfrom
faradawn:switch-flashattn

faradawn commented Jun 5, 2026

Uh oh!

vercel Bot commented Jun 5, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

faradawn commented Jun 5, 2026

Uh oh!

vercel Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel Bot commented Jun 5, 2026 •

edited

Loading