Skip to content

[Cherry-Pick][Optimization] Enable text-only deployment for multimodal models(#7183)#7233

Merged
Jiang-Jia-Jun merged 1 commit intoPaddlePaddle:release/2.6from
EmmonsCurse:cherry-pick/7183/release/2.6
Apr 8, 2026
Merged

[Cherry-Pick][Optimization] Enable text-only deployment for multimodal models(#7183)#7233
Jiang-Jia-Jun merged 1 commit intoPaddlePaddle:release/2.6from
EmmonsCurse:cherry-pick/7183/release/2.6

Conversation

@EmmonsCurse
Copy link
Copy Markdown
Collaborator

Cherry-pick of #7183 (authored by @K11OntheBoat) to release/2.6.

devPR:#7183


Motivation

在部署多模态模型的时候,当开启--deploy-modality 'text' 开关,获得一个干净的纯文runtime. 不会有多余的多模部分来干扰服务的资源和推理性能. 收益: xx 多模态模型在使用后, 纯文 benchamrk,QPS 提升2.5倍.

Modifications

enable_mm 代表模型具有多模态能力. enable_mm_runtime 代表多模态runtime,enable_mm_runtime=false 代表纯文runtime.

Usage or Command

多模态模型起服务带上--deploy-modality 'text'开关.

Accuracy Tests

Base 模型,打开和关闭--deploy-modality 'text' ,纯文请求的输入token和输出token一致.

Checklist

  • Add at least a tag in the PR title.
  • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
  • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

Co-authored-by: liuruian <liuruian@MacBook-Pro.local>
@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 8, 2026

Thanks for your contribution!

Copy link
Copy Markdown

@fastdeploy-bot fastdeploy-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-08 11:55 CST

📋 Review 摘要

PR 概述:为多模态模型启用纯文本部署模式,通过 --deploy-modality 'text' 开关在多模态模型上获得纯文本 runtime,提升纯文本推理性能(QPS 提升 2.5 倍)

变更范围:config.py, engine/, worker/, model_executor/layers/attention/, input/, output/, spec_decode/

影响面 Tag[Optimization] [DataProcessor] [KVCache]

📝 PR 规范检查

PR 模板检查未通过,以下是问题:

标题建议(可直接复制):

  • [Cherry-Pick][Optimization] Enable text-only deployment for multimodal models(#7183) (已有,符合规范)

Checklist 问题

  • Add at least a tag in the PR title. ✅ 已有 [Cherry-Pick][Optimization]
  • Format your code, run pre-commit before commit. ✅ Pre-commit 通过
  • Add unit tests. ⚠️ 未添加单元测试
  • Provide accuracy results. ✅ PR 描述中提到已验证
  • If the current PR is submitting to the release branch... ✅ Cherry-pick PR

注意:Approval 检查失败是因为修改了 fastdeploy/spec_decode 目录,需要特定开发者批准(@freeliuzc, @Deleter-D)。

问题

级别 文件 概述
🟡 建议 fastdeploy/worker/input_batch.py:229-231 image_features 初始化逻辑可优化

总体评价

代码逻辑正确,设计合理。通过新增 enable_mm_runtimeenable_rope_3d_runtime 属性,成功实现了多模态模型在纯文本模式下的部署优化。所有关键路径的 model_config.enable_mm 替换都已正确完成,TEXT 模式下强制关闭 3D RoPE 的逻辑清晰。

建议补充 DeployModality.TEXT 模式的单元测试,并确认 Approval 检查所需的批准。

)
if self.is_mm_model:
self.image_features = None
self.image_features_list = None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 image_featuresimage_features_list 初始化逻辑可以更清晰。

当前逻辑只在 enable_mm=Falseis_mm_model=True 时显式设置为 None。虽然在 _prepare_inputsswap_states 中有 is not None 检查保证安全,但显式初始化所有分支会提高代码可读性和健壮性。

建议在 else 分支中也添加:

self.image_features = None  # Built before the forward
self.image_features_list = None

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 62.00000% with 19 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release/2.6@36909bf). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/worker/input_batch.py 41.17% 8 Missing and 2 partials ⚠️
fastdeploy/config.py 57.14% 4 Missing and 2 partials ⚠️
fastdeploy/engine/async_llm.py 0.00% 0 Missing and 1 partial ⚠️
fastdeploy/engine/common_engine.py 66.66% 0 Missing and 1 partial ⚠️
...executor/layers/attention/dsa_attention_backend.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff               @@
##             release/2.6    #7233   +/-   ##
==============================================
  Coverage               ?   73.86%           
==============================================
  Files                  ?      376           
  Lines                  ?    52903           
  Branches               ?     8251           
==============================================
  Hits                   ?    39079           
  Misses                 ?    11095           
  Partials               ?     2729           
Flag Coverage Δ
GPU 73.86% <62.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Jiang-Jia-Jun Jiang-Jia-Jun merged commit 6b78981 into PaddlePaddle:release/2.6 Apr 8, 2026
32 of 37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants