[Cherry-Pick][RL] Support cpu tensor broadcast(#7833)#7840
Conversation
|
Thanks for your contribution! |
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览有 1 个 Required 任务失败,需优先处理后方可合并。
2 任务状态汇总2.1 Required任务 : 9/10 通过
2.2 可选任务 — 23/27 通过
3 失败详情(仅 required)Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — 覆盖率不达标(置信度: 高)Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage
覆盖率详情:
根因详情: 关键日志: 修复建议:
修复建议摘要: 为新增gloo broadcast代码添加单测或申请豁免 关联变更: |
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-05-18 12:15:39
📋 Review 摘要
PR 概述:将 RL 动态权重广播信号从 broadcast_object_list(触发 GPU kernel + DtoH)替换为 gloo backend 纯 CPU tensor broadcast,消除不必要的 CUDA 同步开销。
变更范围:fastdeploy/rl/dynamic_weight_manager.py、fastdeploy/worker/worker_process.py
影响面 Tag:[RL]
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🟡 建议 | fastdeploy/rl/dynamic_weight_manager.py:355 |
使用 Paddle 内部私有 API _get_group_map_by_name,存在版本兼容风险;pg.process_group 未做 hasattr 防护 |
CI 注:
Check PR Templateworkflow 已失败(exit code 7),根因为Modifications等 section 为空,修复 PR 描述后应自动通过。
📝 PR 规范检查
Modifications、Usage or Command、Accuracy Tests 三个 section 均为空(仅保留模板注释),Checklist 全未勾选,导致 Check PR Template CI 失败。标题格式 [Cherry-Pick][RL] Support cpu tensor broadcast(#7833) 符合 cherry-pick 规范 ✓
PR 描述建议(可直接复制):
## Motivation
paddle 建立通信组时默认的 backend 是 NCCL,此时 `paddle.distributed.broadcast` 不支持广播 CPU tensor,而 `paddle.distributed.broadcast_object_list` 仍会调用 GPU kernel 并引入 DtoH 同步拷贝。当只需要广播 CPU 信号量时,可用 gloo backend 单独建组,完全绕开 NCCL 和 GPU,nsys 上不会出现任何 CUDA kernel 和 DtoH/HtoD。gloo 带宽和延迟比 NCCL 差,但广播一个信号值数据量极小,实际影响可忽略。
## Modifications
- `fastdeploy/worker/worker_process.py`:
- `__init__` 中 ranks > 1 时创建 gloo backend 进程组 `self.gloo_group`
- `_broadcast_model_weights_signal` 改用 CPU int32 tensor + `paddle.distributed.broadcast` + gloo group,替代原来触发 GPU kernel 和 DtoH 拷贝的 `broadcast_object_list`
- `event_loop_normal` 中两处调用改为传入 `self.gloo_group`
- `fastdeploy/rl/dynamic_weight_manager.py`:
- `clear_parameters` 中全局 `shutdown_process_group()` 之前,先将无 `shutdown()` 方法的 Gloo ProcessGroup 从 paddle 内部 registry 中移除,避免 AttributeError
## Usage or Command
N/A
## Accuracy Tests
N/A(本次变更仅替换信号量广播实现,不影响模型权重和精度)
## Checklist
- [x] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [x] Provide accuracy results.
- [x] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.总体评价
实现思路清晰,用 gloo backend CPU broadcast 替代 NCCL broadcast_object_list 是规避 GPU 同步开销的合理做法。主要关注点是 clear_parameters 中对 Paddle 内部私有 API 的直接操作,建议加防御性保护以提升跨版本稳定性;同时需补全 PR 描述以通过 CI 模板检查。
| # before the global sweep to avoid AttributeError. | ||
| from paddle.distributed.collective import _get_group_map_by_name | ||
|
|
||
| for name, pg in list(_get_group_map_by_name().items()): |
There was a problem hiding this comment.
🟡 建议 使用了 Paddle 内部私有 API _get_group_map_by_name,存在版本兼容风险
_get_group_map_by_name以_开头,是 Paddle 的内部实现细节,不在公开 API 保证范围内,Paddle 版本升级时可能被改名/移除/返回格式变化,导致ImportError或逻辑静默失效。- 下一行直接访问
pg.process_group,未先检查hasattr(pg, 'process_group'),若 Paddle 内部 Group 对象结构变化会抛出AttributeError(注释中说的 "to avoid AttributeError" 反而自身也有 AttributeError 风险)。
建议加防御性保护:
try:
from paddle.distributed.collective import _get_group_map_by_name
for name, pg in list(_get_group_map_by_name().items()):
proc_group = getattr(pg, 'process_group', None)
if proc_group is not None and not hasattr(proc_group, "shutdown"):
_get_group_map_by_name().pop(name, None)
except (ImportError, AttributeError):
pass # paddle version without gloo registry; safe to skip
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## release/2.6 #7840 +/- ##
==============================================
Coverage ? 72.45%
==============================================
Files ? 381
Lines ? 54155
Branches ? 8460
==============================================
Hits ? 39236
Misses ? 12161
Partials ? 2758
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
9894b32
into
PaddlePaddle:release/2.6
Motivation
paddle 建立通信组时默认的 backend 是NCCL,此时 paddle.distributed.broadcast 不支持广播CPU tensor,paddle.distributed.broadcast_object_list 仍会调用GPU kernel并引入DtoH同步拷贝。
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.