Skip to content

[Models] add fleet model fallback 2#7964

Merged
gongshaotian merged 6 commits into
PaddlePaddle:developfrom
xiaoguoguo626807:fleet1
Jun 3, 2026
Merged

[Models] add fleet model fallback 2#7964
gongshaotian merged 6 commits into
PaddlePaddle:developfrom
xiaoguoguo626807:fleet1

Conversation

@xiaoguoguo626807

Copy link
Copy Markdown
Contributor

Motivation

新增 PaddleFleet 作为模型推理后端(--model-impl paddlefleet),通过将 PaddleFleet TransformerLayer 中的 core_attention 替换为 FastDeploy Attention 内核,实现在 PaddleFleet 模型结构上复用 FastDeploy 的 KV Cache 和高性能 Attention 计算。 #7732 的清晰版

Modifications

  • config.py: 新增 paddlefleetModelImpl 类型定义
  • engine/args_utils.py: 支持 --model-impl paddlefleet CLI 参数,并补充校验逻辑
  • model_executor/models/paddleformers/base_fleet.py: 新增 PaddleFleetModelBase 基类、FastDeployAttention 层及 patch_paddlefleet_core_attention 替换函数
  • model_executor/models/paddleformers/__init__.py: 注册 PaddleFleetForCausalLM 模型类
  • test_fallback_fleet_model.py` 需要独立的 PaddleFormers 和 PaddleFleet 依赖,使用 pytest conftest.py 钩子机制,在测试运行时动态安装依赖,避免污染全局环境

Usage or Command

python -m fastdeploy.entrypoints.openai.api_server \
    --model /path/to/model \
    --model-impl paddlefleet

Accuracy Tests

N/A(本 PR 新增 PaddleFleet 推理后端,尚未提供与参考实现的 logits 对齐数据)

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot

paddle-bot Bot commented Jun 2, 2026

Copy link
Copy Markdown

Thanks for your contribution!

PaddlePaddle-bot

This comment was marked as outdated.

@PaddlePaddle-bot

PaddlePaddle-bot commented Jun 2, 2026

Copy link
Copy Markdown

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-06-02 20:14:56

CI报告基于以下代码生成(30分钟更新一次):
PR commit: 80e85f8 | Merge base: 4474188 (branch: develop)


1 Required任务 : 8/10 通过

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
41(0) 41 37 2 1 1 0
任务 错误类型 置信度 日志
Approval 需要 Approval Job

2 失败详情

🔴 Approval — 需要 Approval(置信度: 高)

该 Job 需要人工 Approval,完成审批后 CI 才会继续执行。

建议操作:请通过人工审批。

@codecov-commenter

codecov-commenter commented Jun 2, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 96.91358% with 10 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@3fe8f7c). Learn more about missing BASE report.

Files with missing lines Patch % Lines
.../model_executor/models/paddleformers/base_fleet.py 96.68% 1 Missing and 9 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7964   +/-   ##
==========================================
  Coverage           ?   67.56%           
==========================================
  Files              ?      468           
  Lines              ?    65903           
  Branches           ?    10169           
==========================================
  Hits               ?    44525           
  Misses             ?    18540           
  Partials           ?     2838           
Flag Coverage Δ
GPU 77.67% <96.91%> (?)
XPU 7.02% <0.30%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-06-03 13:56:29

📋 Review 摘要

PR 概述:新增 paddlefleet 作为模型推理后端,通过将 PaddleFleet TransformerLayer 中的 core_attention 替换为 FastDeploy Attention 内核,实现对 KV Cache 和高性能 Attention 的复用
变更范围model_executor/models/paddleformers/model_executor/graph_optimization/engine/args_utils.pyconfig.pytests/model_executor/fallback/
影响面 Tag[Models] [FDConfig] [Graph Optimization] [CI] [Engine]

问题

级别 文件 概述
🟡 建议 fastdeploy/model_executor/models/paddleformers/base_fleet.py assert 用于运行时校验,Python -O 下将静默失效

历史 Findings 修复情况

Finding 问题 状态
F1 args_utils.py help 字符串拼接缺少空格分隔 ⚠️ 仍存在
F2 graph_opt_backend 不支持 *args,CUDAGraph 开启时可能 TypeError 🔄 部分修复
F3 multi_latent_attention 直接访问可能 AttributeError ✅ 已修复
F4 conftest.py 日志版本号与实际安装版本不一致 ⚠️ 仍存在
F5 graph_opt_backend(*args, **kwargs) 传递 *argsGraphOptBackend.__call__ 仅接受 **kwargs,CUDAGraph 开启时仍会 TypeError ⚠️ 仍存在

📝 PR 规范检查

符合规范。

总体评价

F3(MLA AttributeError)已修复,整体实现逻辑清晰。历史 F1/F4 仍未修复,F5 核心问题(GraphOptBackend.__call__ 不接受 *args)仍存在,CUDAGraph 路径下启用 paddlefleet 会报 TypeError;建议跟进或在 PR 中明确说明 paddlefleet 暂不支持 CUDAGraph 模式。新增代码中 assert forward_meta is not None 需替换为显式 raise。

@gongshaotian gongshaotian left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for GraphOptBackend

@xiaoguoguo626807

Copy link
Copy Markdown
Contributor Author

/re-run all-failed

@gongshaotian gongshaotian merged commit f161fea into PaddlePaddle:develop Jun 3, 2026
44 of 49 checks passed
@xiaoguoguo626807 xiaoguoguo626807 deleted the fleet1 branch June 3, 2026 07:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants