[Models] add fleet model fallback 2#7964
Conversation
|
Thanks for your contribution! |
CI报告基于以下代码生成(30分钟更新一次): 1 Required任务 : 8/10 通过
2 失败详情🔴 Approval — 需要 Approval(置信度: 高)该 Job 需要人工 Approval,完成审批后 CI 才会继续执行。 建议操作:请通过人工审批。 |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #7964 +/- ##
==========================================
Coverage ? 67.56%
==========================================
Files ? 468
Lines ? 65903
Branches ? 10169
==========================================
Hits ? 44525
Misses ? 18540
Partials ? 2838
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-06-03 13:56:29
📋 Review 摘要
PR 概述:新增 paddlefleet 作为模型推理后端,通过将 PaddleFleet TransformerLayer 中的 core_attention 替换为 FastDeploy Attention 内核,实现对 KV Cache 和高性能 Attention 的复用
变更范围:model_executor/models/paddleformers/、model_executor/graph_optimization/、engine/args_utils.py、config.py、tests/model_executor/fallback/
影响面 Tag:[Models] [FDConfig] [Graph Optimization] [CI] [Engine]
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🟡 建议 | fastdeploy/model_executor/models/paddleformers/base_fleet.py |
assert 用于运行时校验,Python -O 下将静默失效 |
历史 Findings 修复情况
| Finding | 问题 | 状态 |
|---|---|---|
| F1 | args_utils.py help 字符串拼接缺少空格分隔 |
|
| F2 | graph_opt_backend 不支持 *args,CUDAGraph 开启时可能 TypeError |
🔄 部分修复 |
| F3 | multi_latent_attention 直接访问可能 AttributeError |
✅ 已修复 |
| F4 | conftest.py 日志版本号与实际安装版本不一致 | |
| F5 | graph_opt_backend(*args, **kwargs) 传递 *args 时 GraphOptBackend.__call__ 仅接受 **kwargs,CUDAGraph 开启时仍会 TypeError |
📝 PR 规范检查
符合规范。
总体评价
F3(MLA AttributeError)已修复,整体实现逻辑清晰。历史 F1/F4 仍未修复,F5 核心问题(GraphOptBackend.__call__ 不接受 *args)仍存在,CUDAGraph 路径下启用 paddlefleet 会报 TypeError;建议跟进或在 PR 中明确说明 paddlefleet 暂不支持 CUDAGraph 模式。新增代码中 assert forward_meta is not None 需替换为显式 raise。
gongshaotian
left a comment
There was a problem hiding this comment.
LGTM for GraphOptBackend
|
/re-run all-failed |
Motivation
新增 PaddleFleet 作为模型推理后端(
--model-impl paddlefleet),通过将 PaddleFleet TransformerLayer 中的core_attention替换为 FastDeploy Attention 内核,实现在 PaddleFleet 模型结构上复用 FastDeploy 的 KV Cache 和高性能 Attention 计算。 #7732 的清晰版Modifications
config.py: 新增paddlefleet到ModelImpl类型定义engine/args_utils.py: 支持--model-impl paddlefleetCLI 参数,并补充校验逻辑model_executor/models/paddleformers/base_fleet.py: 新增PaddleFleetModelBase基类、FastDeployAttention层及patch_paddlefleet_core_attention替换函数model_executor/models/paddleformers/__init__.py: 注册PaddleFleetForCausalLM模型类Usage or Command
python -m fastdeploy.entrypoints.openai.api_server \ --model /path/to/model \ --model-impl paddlefleetAccuracy Tests
N/A(本 PR 新增 PaddleFleet 推理后端,尚未提供与参考实现的 logits 对齐数据)
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.