[Cherry-Pick][Optimization] Enable text-only deployment for multimodal models(#7183) by EmmonsCurse · Pull Request #7233 · PaddlePaddle/FastDeploy

EmmonsCurse · 2026-04-08T03:25:57Z

Cherry-pick of #7183 (authored by @K11OntheBoat) to release/2.6.

Motivation

在部署多模态模型的时候，当开启--deploy-modality 'text' 开关，获得一个干净的纯文runtime. 不会有多余的多模部分来干扰服务的资源和推理性能. 收益: xx 多模态模型在使用后, 纯文 benchamrk，QPS 提升2.5倍.

Modifications

enable_mm 代表模型具有多模态能力. enable_mm_runtime 代表多模态runtime，enable_mm_runtime=false 代表纯文runtime.

Usage or Command

多模态模型起服务带上--deploy-modality 'text'开关.

Accuracy Tests

Base 模型，打开和关闭--deploy-modality 'text' ，纯文请求的输入token和输出token一致.

Checklist

Add at least a tag in the PR title.
Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

Co-authored-by: liuruian <liuruian@MacBook-Pro.local>

paddle-bot · 2026-04-08T03:26:10Z

Thanks for your contribution!

PaddlePaddle-bot

🤖 AI Code Review | 2026-04-08 11:55 CST

📋 Review 摘要

PR 概述：为多模态模型启用纯文本部署模式，通过 --deploy-modality 'text' 开关在多模态模型上获得纯文本 runtime，提升纯文本推理性能（QPS 提升 2.5 倍）

变更范围：config.py, engine/, worker/, model_executor/layers/attention/, input/, output/, spec_decode/

影响面 Tag：[Optimization] [DataProcessor] [KVCache]

📝 PR 规范检查

PR 模板检查未通过，以下是问题：

标题建议（可直接复制）：

[Cherry-Pick][Optimization] Enable text-only deployment for multimodal models(#7183) （已有，符合规范）

Checklist 问题：

Add at least a tag in the PR title. ✅ 已有 [Cherry-Pick][Optimization]
Format your code, run pre-commit before commit. ✅ Pre-commit 通过
Add unit tests. ⚠️ 未添加单元测试
Provide accuracy results. ✅ PR 描述中提到已验证
If the current PR is submitting to the release branch... ✅ Cherry-pick PR

注意：Approval 检查失败是因为修改了 fastdeploy/spec_decode 目录，需要特定开发者批准（@freeliuzc, @Deleter-D）。

问题

级别	文件	概述
🟡 建议	`fastdeploy/worker/input_batch.py:229-231`	`image_features` 初始化逻辑可优化

总体评价

代码逻辑正确，设计合理。通过新增 enable_mm_runtime 和 enable_rope_3d_runtime 属性，成功实现了多模态模型在纯文本模式下的部署优化。所有关键路径的 model_config.enable_mm 替换都已正确完成，TEXT 模式下强制关闭 3D RoPE 的逻辑清晰。

建议补充 DeployModality.TEXT 模式的单元测试，并确认 Approval 检查所需的批准。

PaddlePaddle-bot · 2026-04-08T03:55:53Z

            )
+            if self.is_mm_model:
+                self.image_features = None
+                self.image_features_list = None


🟡 建议 image_features 和 image_features_list 初始化逻辑可以更清晰。

当前逻辑只在 enable_mm=False 且 is_mm_model=True 时显式设置为 None。虽然在 _prepare_inputs 和 swap_states 中有 is not None 检查保证安全，但显式初始化所有分支会提高代码可读性和健壮性。

建议在 else 分支中也添加：

self.image_features = None # Built before the forward self.image_features_list = None

codecov-commenter · 2026-04-08T04:57:00Z

Codecov Report

❌ Patch coverage is 62.00000% with 19 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release/2.6@36909bf). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/worker/input_batch.py	41.17%	8 Missing and 2 partials ⚠️
fastdeploy/config.py	57.14%	4 Missing and 2 partials ⚠️
fastdeploy/engine/async_llm.py	0.00%	0 Missing and 1 partial ⚠️
fastdeploy/engine/common_engine.py	66.66%	0 Missing and 1 partial ⚠️
...executor/layers/attention/dsa_attention_backend.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@              Coverage Diff               @@
##             release/2.6    #7233   +/-   ##
==============================================
  Coverage               ?   73.86%           
==============================================
  Files                  ?      376           
  Lines                  ?    52903           
  Branches               ?     8251           
==============================================
  Hits                   ?    39079           
  Misses                 ?    11095           
  Partials               ?     2729

Flag	Coverage Δ
GPU	`73.86% <62.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Split enable_mm (PaddlePaddle#7183)

4e0d632

Co-authored-by: liuruian <liuruian@MacBook-Pro.local>

EmmonsCurse had a problem deploying to Metax_ci April 8, 2026 03:26 — with GitHub Actions Failure

EmmonsCurse mentioned this pull request Apr 8, 2026

[Optimization] Enable text-only deployment for multimodal models #7183

Merged

5 tasks

PaddlePaddle-bot reviewed Apr 8, 2026

View reviewed changes

freeliuzc approved these changes Apr 8, 2026

View reviewed changes

zhoutianzi666 approved these changes Apr 8, 2026

View reviewed changes

Jiang-Jia-Jun merged commit 6b78981 into PaddlePaddle:release/2.6 Apr 8, 2026
32 of 37 checks passed

EmmonsCurse deleted the cherry-pick/7183/release/2.6 branch April 12, 2026 11:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Cherry-Pick][Optimization] Enable text-only deployment for multimodal models(#7183)#7233

[Cherry-Pick][Optimization] Enable text-only deployment for multimodal models(#7183)#7233
Jiang-Jia-Jun merged 1 commit into
PaddlePaddle:release/2.6from
EmmonsCurse:cherry-pick/7183/release/2.6

EmmonsCurse commented Apr 8, 2026

Uh oh!

paddle-bot Bot commented Apr 8, 2026

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot Apr 8, 2026

Uh oh!

codecov-commenter commented Apr 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

EmmonsCurse commented Apr 8, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot Bot commented Apr 8, 2026

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

📝 PR 规范检查

问题

总体评价

Uh oh!

PaddlePaddle-bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Apr 8, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants