[Cherry-Pick][Optimization] Enable text-only deployment for multimodal models(#7183)#7233
Conversation
Co-authored-by: liuruian <liuruian@MacBook-Pro.local>
|
Thanks for your contribution! |
fastdeploy-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review |
2026-04-08 11:55 CST
📋 Review 摘要
PR 概述:为多模态模型启用纯文本部署模式,通过 --deploy-modality 'text' 开关在多模态模型上获得纯文本 runtime,提升纯文本推理性能(QPS 提升 2.5 倍)
变更范围:config.py, engine/, worker/, model_executor/layers/attention/, input/, output/, spec_decode/
影响面 Tag:[Optimization] [DataProcessor] [KVCache]
📝 PR 规范检查
PR 模板检查未通过,以下是问题:
标题建议(可直接复制):
[Cherry-Pick][Optimization] Enable text-only deployment for multimodal models(#7183)(已有,符合规范)
Checklist 问题:
- Add at least a tag in the PR title. ✅ 已有
[Cherry-Pick][Optimization] - Format your code, run
pre-commitbefore commit. ✅ Pre-commit 通过 - Add unit tests.
⚠️ 未添加单元测试 - Provide accuracy results. ✅ PR 描述中提到已验证
- If the current PR is submitting to the
releasebranch... ✅ Cherry-pick PR
注意:Approval 检查失败是因为修改了 fastdeploy/spec_decode 目录,需要特定开发者批准(@freeliuzc, @Deleter-D)。
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🟡 建议 | fastdeploy/worker/input_batch.py:229-231 |
image_features 初始化逻辑可优化 |
总体评价
代码逻辑正确,设计合理。通过新增 enable_mm_runtime 和 enable_rope_3d_runtime 属性,成功实现了多模态模型在纯文本模式下的部署优化。所有关键路径的 model_config.enable_mm 替换都已正确完成,TEXT 模式下强制关闭 3D RoPE 的逻辑清晰。
建议补充 DeployModality.TEXT 模式的单元测试,并确认 Approval 检查所需的批准。
| ) | ||
| if self.is_mm_model: | ||
| self.image_features = None | ||
| self.image_features_list = None |
There was a problem hiding this comment.
🟡 建议 image_features 和 image_features_list 初始化逻辑可以更清晰。
当前逻辑只在 enable_mm=False 且 is_mm_model=True 时显式设置为 None。虽然在 _prepare_inputs 和 swap_states 中有 is not None 检查保证安全,但显式初始化所有分支会提高代码可读性和健壮性。
建议在 else 分支中也添加:
self.image_features = None # Built before the forward
self.image_features_list = None
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## release/2.6 #7233 +/- ##
==============================================
Coverage ? 73.86%
==============================================
Files ? 376
Lines ? 52903
Branches ? 8251
==============================================
Hits ? 39079
Misses ? 11095
Partials ? 2729
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
6b78981
into
PaddlePaddle:release/2.6
Cherry-pick of #7183 (authored by @K11OntheBoat) to
release/2.6.devPR:#7183
Motivation
在部署多模态模型的时候,当开启--deploy-modality 'text' 开关,获得一个干净的纯文runtime. 不会有多余的多模部分来干扰服务的资源和推理性能. 收益: xx 多模态模型在使用后, 纯文 benchamrk,QPS 提升2.5倍.
Modifications
enable_mm 代表模型具有多模态能力. enable_mm_runtime 代表多模态runtime,enable_mm_runtime=false 代表纯文runtime.
Usage or Command
多模态模型起服务带上--deploy-modality 'text'开关.
Accuracy Tests
Base 模型,打开和关闭--deploy-modality 'text' ,纯文请求的输入token和输出token一致.
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.