Commit 0a94b36
authored
Fix PinnedMemoryResource IPC NUMA ID derivation (NVIDIA#1699)
* Refactor _MemPool hierarchy: separate shared pool machinery from device-specific concerns
Move _dev_id, device_id, and peer_accessible_by from _MemPool into
DeviceMemoryResource. Eliminate _MemPoolOptions and refactor pool
initialization into freestanding cdef functions (MP_init_create_pool,
MP_init_current_pool, MP_raise_release_threshold) for cross-module
visibility. Extract __init__ bodies into inline cdef helpers (_DMR_init,
_PMR_init, _MMR_init) for consistency and shorter class definitions.
Implements device_id as -1 for PinnedMemoryResource and
ManagedMemoryResource since they are not device-bound.
Made-with: Cursor
* Fix PinnedMemoryResource IPC to derive NUMA ID from active device (NVIDIA#1603)
PinnedMemoryResource(ipc_enabled=True) hardcoded host NUMA ID 0, causing
failures on multi-NUMA systems where the active device is attached to a
different NUMA node. Now derives the NUMA ID from the current device's
host_numa_id attribute, and adds an explicit numa_id option for manual
override. Removes the _check_numa_nodes warning machinery in favor of
proper NUMA node selection.
Made-with: Cursor
* Add preferred_location_type option and query property to ManagedMemoryResource
Extends ManagedMemoryResourceOptions with a preferred_location_type field
("device", "host", "host_numa", or None) enabling NUMA-aware managed memory
pool placement. Adds ManagedMemoryResource.preferred_location property to
query the resolved setting. Fully backwards-compatible: existing code using
preferred_location alone continues to work unchanged.
Made-with: Cursor
* Remove redundant Python-side peer access cleanup; fix peer access tests
- Remove __dealloc__ and close() override from DeviceMemoryResource
that cleared peer access before destruction. The C++ RAII deleter
already handles this for owned pools (nvbug 5698116 workaround).
For non-owned pools (default device pool), clearing peer access
on handle disposal was incorrect behavior.
- Update peer access tests to use owned pools (DeviceMemoryResourceOptions())
instead of default pools. Default pools are shared and may have stale
peer access state from prior tests, causing test failures.
Made-with: Cursor
* Fix DeviceMemoryResource.peer_accessible_by for non-owned pools
For non-owned (default/current) pools, always query the CUDA driver
for peer access state instead of caching. This ensures multiple
wrappers around the same shared pool see consistent state.
Closes NVIDIA#1720
Made-with: Cursor1 parent 06e6065 commit 0a94b36
File tree
14 files changed
+820
-380
lines changed- cuda_core
- cuda/core/_memory
- docs/source/release
- tests
- memory_ipc
14 files changed
+820
-380
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
10 | | - | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
11 | 13 | | |
12 | 14 | | |
13 | 15 | | |
Lines changed: 154 additions & 23 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | | - | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
9 | 11 | | |
10 | 12 | | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
11 | 17 | | |
12 | 18 | | |
13 | 19 | | |
14 | 20 | | |
| 21 | + | |
15 | 22 | | |
16 | 23 | | |
17 | 24 | | |
18 | 25 | | |
19 | 26 | | |
20 | 27 | | |
21 | 28 | | |
22 | | - | |
23 | 29 | | |
24 | 30 | | |
25 | 31 | | |
| |||
122 | 128 | | |
123 | 129 | | |
124 | 130 | | |
125 | | - | |
126 | | - | |
127 | | - | |
128 | | - | |
129 | | - | |
130 | | - | |
131 | | - | |
132 | | - | |
133 | | - | |
134 | | - | |
135 | | - | |
136 | | - | |
137 | | - | |
138 | | - | |
139 | | - | |
140 | | - | |
141 | | - | |
142 | | - | |
143 | | - | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
144 | 134 | | |
145 | | - | |
| 135 | + | |
| 136 | + | |
146 | 137 | | |
147 | 138 | | |
148 | 139 | | |
| |||
199 | 190 | | |
200 | 191 | | |
201 | 192 | | |
| 193 | + | |
202 | 194 | | |
203 | 195 | | |
204 | 196 | | |
| |||
215 | 207 | | |
216 | 208 | | |
217 | 209 | | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
218 | 247 | | |
219 | 248 | | |
220 | 249 | | |
| |||
226 | 255 | | |
227 | 256 | | |
228 | 257 | | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
229 | 360 | | |
230 | 361 | | |
231 | 362 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
197 | 197 | | |
198 | 198 | | |
199 | 199 | | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
200 | 204 | | |
201 | 205 | | |
202 | 206 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | | - | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
0 commit comments