[Cpp API Compatibility] Align cuda compat by youge325 · Pull Request #78808 · PaddlePaddle/Paddle

youge325 · 2026-04-25T07:58:08Z

PR Category

Execute Infrastructure

PR Types

Bug fixes

Description

拆分自 #78707

对齐 torch::cuda::synchronize 与 c10::cuda::CUDAGuard 的 CUDA 兼容性语义，并修复相关编译与平台问题：

重写 torch::cuda::synchronize
- 使用 c10::cuda::CUDAGuard 替代原有的直接 cudaDeviceSynchronize / hipDeviceSynchronize 调用。
- 匹配 PyTorch 语义：device_index == -1 表示同步当前设备；显式设备同步完成后不得泄漏被修改的当前设备。
修复编译与链接问题
- 修复 CPU-only 编译失败（条件编译 #if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP)）。
- 修复 Windows 链接错误：为 torch::cuda 的公开 API 添加 PADDLE_API 符号导出。
重构 c10::cuda::CUDAGuard / OptionalCUDAGuard
- 移除对 paddle::platform::CUDADeviceGuard 的依赖，改为直接调用 phi::backends::gpu::SetDeviceId。
- 在析构函数、set_device、set_index、reset 等路径中显式恢复原始设备，确保与 PyTorch CUDAGuard 的行为一致。
新增单测
- ATen_CUDAContext_test.cc 中补充 4 组测试，验证：
  - torch::cuda::synchronize 不泄漏当前设备
  - CUDAGuard 多次切换后正确恢复原始设备
  - OptionalCUDAGuard::reset 正确清理状态

是否引起精度变化

否

paddle-bot · 2026-04-25T07:58:13Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

youge325 · 2026-04-25T13:14:51Z

/re-run all-failed

codecov-commenter · 2026-04-25T13:20:04Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@4919688). Learn more about missing BASE report.

Additional details and impacted files

@@             Coverage Diff             @@
##             develop    #78808   +/-   ##
===========================================
  Coverage           ?   100.00%           
===========================================
  Files              ?         2           
  Lines              ?        23           
  Branches           ?         0           
===========================================
  Hits               ?        23           
  Misses             ?         0           
  Partials           ?         0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

youge325 · 2026-04-25T14:11:04Z

/re-run all-failed

youge325 · 2026-04-26T10:13:34Z

/re-run all-failed

This reverts commit 99029cd.

youge325 · 2026-04-28T10:24:42Z

/re-run all-failed

youge325 · 2026-04-30T01:41:15Z

/re-run all-failed

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-30T13:06:17Z

@@ -129,23 +139,26 @@ struct OptionalCUDAGuard {
  std::optional<Device> current_device() const { return current_device_; }

  void reset() {
-    guard_.reset();
+    if (original_device_.has_value()) {
+      // Always restore to original_device_ to handle external device changes.
+      // This matches PyTorch OptionalDeviceGuard semantics.
+      phi::backends::gpu::SetDeviceId(
+          static_cast<int>(original_device_->index()));
+    }


OptionalCUDAGuard::~OptionalCUDAGuard() calls reset(), and reset() always restores original_device_ via SetDeviceId whenever original_device_ is set. This can cause the same unintended first-call cudaSetDevice side effects even when no device switch occurred (e.g., set_device() called with the current device). Consider restoring only if the current device differs from original_device_ and/or if the guard actually changed the device, then clear the optionals.

Thanks for the careful read. The behavior matches PyTorch's InlineDeviceGuard: PyTorch's destructor also unconditionally calls impl_.uncheckedSetDevice(original_device_), and the no-op short-circuit happens one layer down inside c10::cuda::SetDevice, which compares cur_device == device via cudaGetDevice before calling cudaSetDevice. Our compat path keeps the same shape: phi::backends::gpu::SetDeviceId performs the same cudaGetDevice-then-compare short-circuit internally, so calling reset() on a never-mutated guard does not actually trigger a cudaSetDevice. Keeping reset() always restore original_device_ is therefore the right call for parity with PyTorch.

Copilot · 2026-04-30T13:06:18Z

 void synchronize(int64_t device_index) {
  TORCH_CHECK(is_available(), "No CUDA GPUs are available");
  auto num_gpus = cuda::device_count();
-  TORCH_CHECK(device_index < 0 || device_index < num_gpus,
-              "Device index out of range: ",
-              device_index);
-// TODO(yongqiang) need using DeviceGuard
+  TORCH_CHECK(
+      device_index == -1 || (device_index >= 0 && device_index < num_gpus),
+      "Device index out of range: ",
+      device_index);


In CPU-only builds, torch::cuda::is_available() currently calls torch::cuda::device_count(), but device_count() throws when CUDA/HIP is not compiled in. This makes the #else branch in synchronize() unreachable and causes synchronize() to throw from device_count() instead of the intended "not compiled with CUDA"/"no GPUs" error path. Consider making device_count() return 0 (or gating is_available() / the pre-checks behind #if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP)), so is_available() is a non-throwing query and synchronize() reports a consistent error on CPU-only builds.

SigureMo · 2026-04-30T19:10:00Z

确认一下这个 PR 不影响 ABI 兼容性是么？

我再验证一下，上次改了cpp文件，再跑一遍 ABI Check

再跑一遍 ABI Check

可以考虑先把那个 PR 处理下，就不用每次都单独搞一下了，加一下 review 检查，@SigureMo 和 @BingooYang 吧

static-check 过了，这个 PR 里相关改动可以 revert 掉了

In CPU-only builds, c10::cuda::device_count() / torch::cuda::device_count() previously threw "Cannot visit device count" via PADDLE_THROW. This made is_available() unsafe to call and caused synchronize() to surface the wrong error message. Match PyTorch semantics: return 0 in CPU-only builds so that is_available() returns false and synchronize() falls through the existing TORCH_CHECK(is_available(), "No CUDA GPUs are available") guard. The unreachable #else PADDLE_THROW branch in synchronize() is removed. Adds three CPU-only regression tests: - DeviceCountReturnsZeroInCpuOnly - IsAvailableFalseAndNoThrowInCpuOnly - SynchronizeReportsNoGpuMessageInCpuOnly Addresses Copilot review comment 3168115261.

This reverts commit 391d5f2.

This reverts commit d0bcc46.

SigureMo

LGTMeow

SigureMo · 2026-05-01T16:17:42Z

 private:
  Device original_device_;
  Device current_device_;
-  paddle::platform::CUDADeviceGuard guard_;


移除对 paddle::platform::CUDADeviceGuard 的依赖，改为直接调用 phi::backends::gpu::SetDeviceId。

话说 paddle 的 guard 是有什么问题吗？

#78707 (review) codex review的时候发现的，单测不够充分，或者说当时单测的设计没有发现问题

喔喔是那个问题

不过这里是否可以通过修复 paddle 内的 guard 实现？看起来单纯是 paddle guard 的 bug？这里应该也不会出现修改后导致破坏之前行为的问题？

应该可以修，因为每次需要SetDevice的时候都调用SetDeviceIndex，导致prev_id_被反复覆盖，实际上应该只在构造CUDADeviceGuard的时候做一次设置prev_id_的操作，这个逻辑应该被拆分出来，之前没有错可能是因为只有一次性调用CUDADeviceGuard的场景，不需要反反复复SetDevice或者SetDeviceIndex

#55498 这好像是有意设计的，析构的时候不回到原设备，等我再想想

啊……有些历史原因这里不能直接复用的话就按照当前方式来就好，看起来不是好解的问题

youge325 added 5 commits April 25, 2026 15:54

align torch::cuda::synchronize with PyTorch using CUDAGuard

acce468

fix cpu build

0738877

fix Linking Error on Windows

e863dc8

try to fix CUDAGuard

eaa7c9b

fix

1ffad12

Copilot AI review requested due to automatic review settings April 25, 2026 07:58

paddle-bot Bot added the contributor External developers label Apr 25, 2026

Copilot started reviewing on behalf of youge325 April 25, 2026 07:58 View session

Copilot AI reviewed Apr 25, 2026

View reviewed changes

Add ABI symbol compatibility check

99029cd

Revert "Add ABI symbol compatibility check"

7d054b1

This reverts commit 99029cd.

Merge branch 'develop' into cAlign-cuda-compat

68f1047

SigureMo requested a review from Copilot April 30, 2026 08:07

Copilot started reviewing on behalf of SigureMo April 30, 2026 08:08 View session

Copilot AI reviewed Apr 30, 2026

View reviewed changes

Comment thread paddle/phi/api/include/compat/torch/csrc/api/include/torch/cuda.cpp Outdated

Comment thread paddle/phi/api/include/compat/c10/cuda/CUDAGuard.h Outdated

Comment thread paddle/phi/api/include/compat/c10/cuda/CUDAGuard.h Outdated

youge325 added 2 commits April 30, 2026 16:22

Merge branch 'develop' into cAlign-cuda-compat

12753e7

fix

bc8bc55

youge325 requested a review from Copilot April 30, 2026 13:00

Copilot started reviewing on behalf of youge325 April 30, 2026 13:00 View session

Copilot AI reviewed Apr 30, 2026

View reviewed changes

SigureMo reviewed Apr 30, 2026

View reviewed changes

youge325 added 2 commits May 1, 2026 21:18

Add ABI symbol compatibility check

d0bcc46

youge325 added 3 commits May 1, 2026 21:18

Restrict ABI check to compat symbols

391d5f2

Revert "Restrict ABI check to compat symbols"

c11df37

This reverts commit 391d5f2.

Revert "Add ABI symbol compatibility check"

6d428d9

This reverts commit d0bcc46.

SigureMo approved these changes May 1, 2026

View reviewed changes

SigureMo reviewed May 1, 2026

View reviewed changes

SigureMo merged commit de1a390 into PaddlePaddle:develop May 1, 2026
83 of 85 checks passed

youge325 deleted the cAlign-cuda-compat branch May 2, 2026 03:23

Conversation

youge325 commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

PR Types

Description

是否引起精度变化

Uh oh!

paddle-bot Bot commented Apr 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

youge325 commented Apr 25, 2026

Uh oh!

codecov-commenter commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

youge325 commented Apr 25, 2026

Uh oh!

youge325 commented Apr 26, 2026

Uh oh!

youge325 commented Apr 28, 2026

Uh oh!

youge325 commented Apr 30, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SigureMo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

youge325 May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

youge325 commented Apr 25, 2026 •

edited

Loading

codecov-commenter commented Apr 25, 2026 •

edited

Loading

youge325 May 1, 2026 •

edited

Loading