[Cpp API Compatibility] Align cuda compat#78808
[Cpp API Compatibility] Align cuda compat#78808SigureMo merged 15 commits intoPaddlePaddle:developfrom
Conversation
|
你的PR提交成功,感谢你对开源项目的贡献! |
|
/re-run all-failed |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #78808 +/- ##
===========================================
Coverage ? 100.00%
===========================================
Files ? 2
Lines ? 23
Branches ? 0
===========================================
Hits ? 23
Misses ? 0
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
/re-run all-failed |
1 similar comment
|
/re-run all-failed |
This reverts commit 99029cd.
|
/re-run all-failed |
|
/re-run all-failed |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -129,23 +139,26 @@ struct OptionalCUDAGuard { | |||
| std::optional<Device> current_device() const { return current_device_; } | |||
|
|
|||
| void reset() { | |||
| guard_.reset(); | |||
| if (original_device_.has_value()) { | |||
| // Always restore to original_device_ to handle external device changes. | |||
| // This matches PyTorch OptionalDeviceGuard semantics. | |||
| phi::backends::gpu::SetDeviceId( | |||
| static_cast<int>(original_device_->index())); | |||
| } | |||
There was a problem hiding this comment.
OptionalCUDAGuard::~OptionalCUDAGuard() calls reset(), and reset() always restores original_device_ via SetDeviceId whenever original_device_ is set. This can cause the same unintended first-call cudaSetDevice side effects even when no device switch occurred (e.g., set_device() called with the current device). Consider restoring only if the current device differs from original_device_ and/or if the guard actually changed the device, then clear the optionals.
There was a problem hiding this comment.
Thanks for the careful read. The behavior matches PyTorch's InlineDeviceGuard: PyTorch's destructor also unconditionally calls impl_.uncheckedSetDevice(original_device_), and the no-op short-circuit happens one layer down inside c10::cuda::SetDevice, which compares cur_device == device via cudaGetDevice before calling cudaSetDevice. Our compat path keeps the same shape: phi::backends::gpu::SetDeviceId performs the same cudaGetDevice-then-compare short-circuit internally, so calling reset() on a never-mutated guard does not actually trigger a cudaSetDevice. Keeping reset() always restore original_device_ is therefore the right call for parity with PyTorch.
| void synchronize(int64_t device_index) { | ||
| TORCH_CHECK(is_available(), "No CUDA GPUs are available"); | ||
| auto num_gpus = cuda::device_count(); | ||
| TORCH_CHECK(device_index < 0 || device_index < num_gpus, | ||
| "Device index out of range: ", | ||
| device_index); | ||
| // TODO(yongqiang) need using DeviceGuard | ||
| TORCH_CHECK( | ||
| device_index == -1 || (device_index >= 0 && device_index < num_gpus), | ||
| "Device index out of range: ", | ||
| device_index); |
There was a problem hiding this comment.
In CPU-only builds, torch::cuda::is_available() currently calls torch::cuda::device_count(), but device_count() throws when CUDA/HIP is not compiled in. This makes the #else branch in synchronize() unreachable and causes synchronize() to throw from device_count() instead of the intended "not compiled with CUDA"/"no GPUs" error path. Consider making device_count() return 0 (or gating is_available() / the pre-checks behind #if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP)), so is_available() is a non-throwing query and synchronize() reports a consistent error on CPU-only builds.
There was a problem hiding this comment.
我再验证一下,上次改了cpp文件,再跑一遍 ABI Check
There was a problem hiding this comment.
再跑一遍 ABI Check
可以考虑先把那个 PR 处理下,就不用每次都单独搞一下了,加一下 review 检查,@SigureMo 和 @BingooYang 吧
There was a problem hiding this comment.
static-check 过了,这个 PR 里相关改动可以 revert 掉了
In CPU-only builds, c10::cuda::device_count() / torch::cuda::device_count() previously threw "Cannot visit device count" via PADDLE_THROW. This made is_available() unsafe to call and caused synchronize() to surface the wrong error message. Match PyTorch semantics: return 0 in CPU-only builds so that is_available() returns false and synchronize() falls through the existing TORCH_CHECK(is_available(), "No CUDA GPUs are available") guard. The unreachable #else PADDLE_THROW branch in synchronize() is removed. Adds three CPU-only regression tests: - DeviceCountReturnsZeroInCpuOnly - IsAvailableFalseAndNoThrowInCpuOnly - SynchronizeReportsNoGpuMessageInCpuOnly Addresses Copilot review comment 3168115261.
| private: | ||
| Device original_device_; | ||
| Device current_device_; | ||
| paddle::platform::CUDADeviceGuard guard_; |
There was a problem hiding this comment.
移除对
paddle::platform::CUDADeviceGuard的依赖,改为直接调用phi::backends::gpu::SetDeviceId。
话说 paddle 的 guard 是有什么问题吗?
There was a problem hiding this comment.
#78707 (review) codex review的时候发现的,单测不够充分,或者说当时单测的设计没有发现问题
There was a problem hiding this comment.
喔喔 是那个问题
不过这里是否可以通过修复 paddle 内的 guard 实现?看起来单纯是 paddle guard 的 bug?这里应该也不会出现修改后导致破坏之前行为的问题?
There was a problem hiding this comment.
啊……有些历史原因这里不能直接复用的话就按照当前方式来就好,看起来不是好解的问题

PR Category
Execute Infrastructure
PR Types
Bug fixes
Description
拆分自 #78707
对齐
torch::cuda::synchronize与c10::cuda::CUDAGuard的 CUDA 兼容性语义,并修复相关编译与平台问题:重写
torch::cuda::synchronizec10::cuda::CUDAGuard替代原有的直接cudaDeviceSynchronize/hipDeviceSynchronize调用。device_index == -1表示同步当前设备;显式设备同步完成后不得泄漏被修改的当前设备。修复编译与链接问题
#if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP))。torch::cuda的公开 API 添加PADDLE_API符号导出。重构
c10::cuda::CUDAGuard/OptionalCUDAGuardpaddle::platform::CUDADeviceGuard的依赖,改为直接调用phi::backends::gpu::SetDeviceId。set_device、set_index、reset等路径中显式恢复原始设备,确保与 PyTorchCUDAGuard的行为一致。新增单测
ATen_CUDAContext_test.cc中补充 4 组测试,验证:torch::cuda::synchronize不泄漏当前设备CUDAGuard多次切换后正确恢复原始设备OptionalCUDAGuard::reset正确清理状态是否引起精度变化
否