feat: FP8 ViT Attention w/ FlashInfer by AlpinDale · Pull Request #1660 · dphnAI/aphrodite-engine

AlpinDale · 2026-04-28T09:00:46Z

vllm-project/vllm#38065

Signed-off-by: AlpinDale <alpindale@gmail.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7cec407bab

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-28T09:06:28Z

+        mm_cfg = get_multimodal_config()
+        scale_path = mm_cfg.mm_encoder_fp8_scale_path if mm_cfg is not None else None
+        if scale_path is None:


Load static FP8 scales without relying on global config context

process_weights_after_loading fetches mm_encoder_fp8_scale_path via get_multimodal_config(), but this method is invoked after model init when set_current_aphrodite_config is no longer active (see BaseModelLoader.load_model calling process_weights_after_loading after the init context). In that normal path mm_cfg is None, so scale_path becomes None even when --mm-encoder-fp8-scale-path is set; static scales are never loaded, scale buffers stay at 1.0, and because _fp8_dynamic_scale is already False, scales are never updated dynamically either.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-28T09:06:28Z

+        self.fp8_enabled = True
+        self._fp8_dynamic_scale = mm_cfg.mm_encoder_fp8_scale_path is None
+        self.fp8_quant = QuantFP8(static=True, group_shape=GroupShape.PER_TENSOR)


Require FlashInfer backend when enabling FP8 ViT attention

This branch enables FP8 whenever mm_encoder_attn_dtype == "fp8", but it does not ensure self.attn_backend is FLASHINFER. With default CUDA backend selection, ViT prefers FLASH_ATTN before FLASHINFER, so users can set --mm-encoder-attn-dtype fp8 and still execute _forward_fa/_forward_sdpa paths with no FP8 quantization at all. That creates a silent config mismatch where the FP8 option appears enabled but is effectively ignored unless backend is separately forced.

Useful? React with 👍 / 👎.

feat: FP8 ViT Attention w/ FlashInfer

7cec407

Signed-off-by: AlpinDale <alpindale@gmail.com>

AlpinDale merged commit 1241957 into main Apr 28, 2026
1 check failed

AlpinDale deleted the feat/fp8-vit-attn branch April 28, 2026 09:05

chatgpt-codex-connector Bot reviewed Apr 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: FP8 ViT Attention w/ FlashInfer#1660

feat: FP8 ViT Attention w/ FlashInfer#1660
AlpinDale merged 1 commit into
mainfrom
feat/fp8-vit-attn

AlpinDale commented Apr 28, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 28, 2026

Uh oh!

chatgpt-codex-connector Bot Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

AlpinDale commented Apr 28, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant