Commit 73a1aa6
authored
fix(pd): adapting code for hardware compatibility (#5047)
This pull request updates PaddlePaddle dependencies to nightly builds
and refactors device handling throughout the codebase to support more
flexible device selection (including XPU), improves compatibility, and
adds better flag management for tests. The most significant changes are
grouped below.
**Dependency Updates:**
* Updated PaddlePaddle and PaddlePaddle-GPU installation in
`.github/workflows/test_python.yml` and
`.github/workflows/test_cuda.yml` to use nightly builds
(`3.3.0.dev20251204`) from new package URLs, improving access to the
latest features and fixes.
[[1]](diffhunk://#diff-6d72d7142742932ca8c930aa674e12e3cf6c528566c88ab43f5cdb3169075f2fL50-R50)
[[2]](diffhunk://#diff-896a3c7f7278514eba7e2573f009d6739b3de6d4e1ece4b756e04cdbe5c3f3caL35-R35)
**Device Handling Refactor:**
* Replaced usage of `paddle.device.cuda.device_count()` and related
CUDA-specific APIs with more general `paddle.device.device_count()` and
`paddle.device.empty_cache()` in multiple locations, enabling support
for devices beyond CUDA (e.g., XPU).
[[1]](diffhunk://#diff-e3f56cd14511cf86a0db88d6d9ee5b08cf45374edfdef0625a0f519d94c58507L217-R217)
[[2]](diffhunk://#diff-c42cc453489450e30747781035e34ce592843893004b24481df3802b4fd6fa34L39-R39)
[[3]](diffhunk://#diff-c42cc453489450e30747781035e34ce592843893004b24481df3802b4fd6fa34L54-R54)
[[4]](diffhunk://#diff-e678abb052b278f8a479f8d13b839a9ec0effd9923478a850bc13758f918e1e9L32-R35)
[[5]](diffhunk://#diff-03ca05b7d964e1dd8ec22a81aff2d76b61b9f9b36111e384f177a04cc5a02f1eL9-R10)
* Updated logic for setting and retrieving device information in
`deepmd/pd/utils/env.py` to use `paddle.device.get_device()`, ensuring
correct device assignment for both CPU and GPU/XPU scenarios.
**Device Compatibility Improvements:**
* Enhanced `get_generator` in `deepmd/pd/utils/utils.py` to support XPU
devices and added a warning for unsupported device types, improving
compatibility and error messaging.
**Test Flag Management:**
* Added explicit management of the `FLAGS_use_stride_kernel` Paddle flag
in `source/tests/pd/test_multitask.py` to ensure proper test isolation
and restore flag values after tests.
[[1]](diffhunk://#diff-ad724907bbb8b6260857768d8f1fc7f0f2122b6b86c010efaf66f22f87c4170dR236-R242)
[[2]](diffhunk://#diff-ad724907bbb8b6260857768d8f1fc7f0f2122b6b86c010efaf66f22f87c4170dR280-R288)
* Set `FLAGS_use_stride_compute_kernel` environment variable to `0` in
workflow files to control kernel usage during tests.
[[1]](diffhunk://#diff-6d72d7142742932ca8c930aa674e12e3cf6c528566c88ab43f5cdb3169075f2fR64)
[[2]](diffhunk://#diff-896a3c7f7278514eba7e2573f009d6739b3de6d4e1ece4b756e04cdbe5c3f3caR63)
**Distributed Training Check:**
* Improved NCCL initialization check in distributed training setup to
handle cases where NCCL is not compiled, preventing assertion errors.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Release Notes
* **New Features**
* Added support for XPU device accelerators.
* **Refactor**
* Improved device detection and initialization to support non-CUDA
backends and multiple device types with device-agnostic APIs.
* **Chores**
* Updated testing workflows and dependencies.
* Enhanced test compatibility for non-GPU environments.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->1 parent 0ad7cbf commit 73a1aa6
8 files changed
Lines changed: 43 additions & 10 deletions
File tree
- .github/workflows
- deepmd/pd
- entrypoints
- utils
- source/tests/pd
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
50 | | - | |
| 50 | + | |
51 | 51 | | |
52 | 52 | | |
53 | 53 | | |
| |||
61 | 61 | | |
62 | 62 | | |
63 | 63 | | |
| 64 | + | |
64 | 65 | | |
65 | 66 | | |
66 | 67 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
35 | | - | |
| 35 | + | |
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
| |||
60 | 60 | | |
61 | 61 | | |
62 | 62 | | |
| 63 | + | |
63 | 64 | | |
64 | 65 | | |
65 | 66 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
95 | 95 | | |
96 | 96 | | |
97 | 97 | | |
98 | | - | |
| 98 | + | |
99 | 99 | | |
100 | 100 | | |
101 | 101 | | |
| |||
214 | 214 | | |
215 | 215 | | |
216 | 216 | | |
217 | | - | |
| 217 | + | |
218 | 218 | | |
219 | 219 | | |
220 | 220 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
39 | | - | |
| 39 | + | |
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
| |||
51 | 51 | | |
52 | 52 | | |
53 | 53 | | |
54 | | - | |
| 54 | + | |
55 | 55 | | |
56 | 56 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
32 | | - | |
| 32 | + | |
33 | 33 | | |
34 | 34 | | |
35 | | - | |
| 35 | + | |
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| 6 | + | |
6 | 7 | | |
7 | 8 | | |
8 | 9 | | |
| |||
345 | 346 | | |
346 | 347 | | |
347 | 348 | | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
348 | 355 | | |
349 | | - | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
350 | 364 | | |
351 | 365 | | |
352 | 366 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | | - | |
| 9 | + | |
| 10 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
| 14 | + | |
14 | 15 | | |
15 | 16 | | |
16 | 17 | | |
| |||
232 | 233 | | |
233 | 234 | | |
234 | 235 | | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
235 | 241 | | |
236 | 242 | | |
| 243 | + | |
| 244 | + | |
237 | 245 | | |
238 | 246 | | |
239 | 247 | | |
| |||
271 | 279 | | |
272 | 280 | | |
273 | 281 | | |
| 282 | + | |
274 | 283 | | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
275 | 289 | | |
276 | 290 | | |
| 291 | + | |
| 292 | + | |
277 | 293 | | |
278 | 294 | | |
279 | 295 | | |
| |||
0 commit comments