Skip to content

Commit 0a94b36

Browse files
authored
Fix PinnedMemoryResource IPC NUMA ID derivation (NVIDIA#1699)
* Refactor _MemPool hierarchy: separate shared pool machinery from device-specific concerns Move _dev_id, device_id, and peer_accessible_by from _MemPool into DeviceMemoryResource. Eliminate _MemPoolOptions and refactor pool initialization into freestanding cdef functions (MP_init_create_pool, MP_init_current_pool, MP_raise_release_threshold) for cross-module visibility. Extract __init__ bodies into inline cdef helpers (_DMR_init, _PMR_init, _MMR_init) for consistency and shorter class definitions. Implements device_id as -1 for PinnedMemoryResource and ManagedMemoryResource since they are not device-bound. Made-with: Cursor * Fix PinnedMemoryResource IPC to derive NUMA ID from active device (NVIDIA#1603) PinnedMemoryResource(ipc_enabled=True) hardcoded host NUMA ID 0, causing failures on multi-NUMA systems where the active device is attached to a different NUMA node. Now derives the NUMA ID from the current device's host_numa_id attribute, and adds an explicit numa_id option for manual override. Removes the _check_numa_nodes warning machinery in favor of proper NUMA node selection. Made-with: Cursor * Add preferred_location_type option and query property to ManagedMemoryResource Extends ManagedMemoryResourceOptions with a preferred_location_type field ("device", "host", "host_numa", or None) enabling NUMA-aware managed memory pool placement. Adds ManagedMemoryResource.preferred_location property to query the resolved setting. Fully backwards-compatible: existing code using preferred_location alone continues to work unchanged. Made-with: Cursor * Remove redundant Python-side peer access cleanup; fix peer access tests - Remove __dealloc__ and close() override from DeviceMemoryResource that cleared peer access before destruction. The C++ RAII deleter already handles this for owned pools (nvbug 5698116 workaround). For non-owned pools (default device pool), clearing peer access on handle disposal was incorrect behavior. - Update peer access tests to use owned pools (DeviceMemoryResourceOptions()) instead of default pools. Default pools are shared and may have stale peer access state from prior tests, causing test failures. Made-with: Cursor * Fix DeviceMemoryResource.peer_accessible_by for non-owned pools For non-owned (default/current) pools, always query the CUDA driver for peer access state instead of caching. This ensures multiple wrappers around the same shared pool see consistent state. Closes NVIDIA#1720 Made-with: Cursor
1 parent 06e6065 commit 0a94b36

14 files changed

+820
-380
lines changed
Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
1+
# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
#
33
# SPDX-License-Identifier: Apache-2.0
44

@@ -7,7 +7,9 @@ from cuda.core._memory._ipc cimport IPCDataForMR
77

88

99
cdef class DeviceMemoryResource(_MemPool):
10-
pass
10+
cdef:
11+
int _dev_id
12+
object _peer_accessible_by
1113

1214

1315
cpdef DMR_mempool_get_access(DeviceMemoryResource, int)

cuda_core/cuda/core/_memory/_device_memory_resource.pyx

Lines changed: 154 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,31 @@
1-
# SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
1+
# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
#
33
# SPDX-License-Identifier: Apache-2.0
44

55
from __future__ import annotations
66

77
from cuda.bindings cimport cydriver
8-
from cuda.core._memory._memory_pool cimport _MemPool, _MemPoolOptions
8+
from cuda.core._memory._memory_pool cimport (
9+
_MemPool, MP_init_create_pool, MP_raise_release_threshold,
10+
)
911
from cuda.core._memory cimport _ipc
1012
from cuda.core._memory._ipc cimport IPCAllocationHandle
13+
from cuda.core._resource_handles cimport (
14+
as_cu,
15+
get_device_mempool,
16+
)
1117
from cuda.core._utils.cuda_utils cimport (
1218
check_or_create_options,
1319
HANDLE_RETURN,
1420
)
21+
from cpython.mem cimport PyMem_Malloc, PyMem_Free
1522

1623
from dataclasses import dataclass
1724
import multiprocessing
1825
import platform # no-cython-lint
1926
import uuid
2027

2128
from cuda.core._utils.cuda_utils import check_multiprocessing_start_method
22-
from cuda.core._resource_handles cimport as_cu
2329

2430
__all__ = ['DeviceMemoryResource', 'DeviceMemoryResourceOptions']
2531

@@ -122,27 +128,12 @@ cdef class DeviceMemoryResource(_MemPool):
122128
associated MMR.
123129
"""
124130

125-
def __init__(self, device_id: Device | int, options=None):
126-
from .._device import Device
127-
cdef int dev_id = Device(device_id).device_id
128-
cdef DeviceMemoryResourceOptions opts = check_or_create_options(
129-
DeviceMemoryResourceOptions, options, "DeviceMemoryResource options",
130-
keep_none=True
131-
)
132-
cdef _MemPoolOptions opts_base = _MemPoolOptions()
133-
134-
cdef bint ipc_enabled = False
135-
if opts:
136-
ipc_enabled = opts.ipc_enabled
137-
if ipc_enabled and not _ipc.is_supported():
138-
raise RuntimeError("IPC is not available on {platform.system()}")
139-
opts_base._max_size = opts.max_size
140-
opts_base._use_current = False
141-
opts_base._ipc_enabled = ipc_enabled
142-
opts_base._location = cydriver.CUmemLocationType.CU_MEM_LOCATION_TYPE_DEVICE
143-
opts_base._type = cydriver.CUmemAllocationType.CU_MEM_ALLOCATION_TYPE_PINNED
131+
def __cinit__(self, *args, **kwargs):
132+
self._dev_id = cydriver.CU_DEVICE_INVALID
133+
self._peer_accessible_by = None
144134

145-
super().__init__(dev_id, opts_base)
135+
def __init__(self, device_id: Device | int, options=None):
136+
_DMR_init(self, device_id, options)
146137

147138
def __reduce__(self):
148139
return DeviceMemoryResource.from_registry, (self.uuid,)
@@ -199,6 +190,7 @@ cdef class DeviceMemoryResource(_MemPool):
199190
_ipc.MP_from_allocation_handle(cls, alloc_handle))
200191
from .._device import Device
201192
mr._dev_id = Device(device_id).device_id
193+
mr._peer_accessible_by = ()
202194
return mr
203195

204196
def get_allocation_handle(self) -> IPCAllocationHandle:
@@ -215,6 +207,43 @@ cdef class DeviceMemoryResource(_MemPool):
215207
raise RuntimeError("Memory resource is not IPC-enabled")
216208
return self._ipc_data._alloc_handle
217209

210+
@property
211+
def device_id(self) -> int:
212+
"""The associated device ordinal."""
213+
return self._dev_id
214+
215+
@property
216+
def peer_accessible_by(self):
217+
"""
218+
Get or set the devices that can access allocations from this memory
219+
pool. Access can be modified at any time and affects all allocations
220+
from this memory pool.
221+
222+
Returns a tuple of sorted device IDs that currently have peer access to
223+
allocations from this memory pool.
224+
225+
When setting, accepts a sequence of Device objects or device IDs.
226+
Setting to an empty sequence revokes all peer access.
227+
228+
For non-owned pools (the default or current device pool), the state
229+
is always queried from the driver to reflect changes made by other
230+
wrappers or direct driver calls.
231+
232+
Examples
233+
--------
234+
>>> dmr = DeviceMemoryResource(0)
235+
>>> dmr.peer_accessible_by = [1] # Grant access to device 1
236+
>>> assert dmr.peer_accessible_by == (1,)
237+
>>> dmr.peer_accessible_by = [] # Revoke access
238+
"""
239+
if not self._mempool_owned:
240+
_DMR_query_peer_access(self)
241+
return self._peer_accessible_by
242+
243+
@peer_accessible_by.setter
244+
def peer_accessible_by(self, devices):
245+
_DMR_set_peer_accessible_by(self, devices)
246+
218247
@property
219248
def is_device_accessible(self) -> bool:
220249
"""Return True. This memory resource provides device-accessible buffers."""
@@ -226,6 +255,108 @@ cdef class DeviceMemoryResource(_MemPool):
226255
return False
227256

228257

258+
cdef inline _DMR_query_peer_access(DeviceMemoryResource self):
259+
"""Query the driver for the actual peer access state of this pool."""
260+
cdef int total
261+
cdef cydriver.CUmemAccess_flags flags
262+
cdef cydriver.CUmemLocation location
263+
cdef list peers = []
264+
265+
with nogil:
266+
HANDLE_RETURN(cydriver.cuDeviceGetCount(&total))
267+
268+
location.type = cydriver.CUmemLocationType.CU_MEM_LOCATION_TYPE_DEVICE
269+
for dev_id in range(total):
270+
if dev_id == self._dev_id:
271+
continue
272+
location.id = dev_id
273+
with nogil:
274+
HANDLE_RETURN(cydriver.cuMemPoolGetAccess(&flags, as_cu(self._h_pool), &location))
275+
if flags == cydriver.CUmemAccess_flags.CU_MEM_ACCESS_FLAGS_PROT_READWRITE:
276+
peers.append(dev_id)
277+
278+
self._peer_accessible_by = tuple(sorted(peers))
279+
280+
281+
cdef inline _DMR_set_peer_accessible_by(DeviceMemoryResource self, devices):
282+
from .._device import Device
283+
284+
cdef set[int] target_ids = {Device(dev).device_id for dev in devices}
285+
target_ids.discard(self._dev_id)
286+
this_dev = Device(self._dev_id)
287+
cdef list bad = [dev for dev in target_ids if not this_dev.can_access_peer(dev)]
288+
if bad:
289+
raise ValueError(f"Device {self._dev_id} cannot access peer(s): {', '.join(map(str, bad))}")
290+
if not self._mempool_owned:
291+
_DMR_query_peer_access(self)
292+
cdef set[int] cur_ids = set(self._peer_accessible_by)
293+
cdef set[int] to_add = target_ids - cur_ids
294+
cdef set[int] to_rm = cur_ids - target_ids
295+
cdef size_t count = len(to_add) + len(to_rm)
296+
cdef cydriver.CUmemAccessDesc* access_desc = NULL
297+
cdef size_t i = 0
298+
299+
if count > 0:
300+
access_desc = <cydriver.CUmemAccessDesc*>PyMem_Malloc(count * sizeof(cydriver.CUmemAccessDesc))
301+
if access_desc == NULL:
302+
raise MemoryError("Failed to allocate memory for access descriptors")
303+
304+
try:
305+
for dev_id in to_add:
306+
access_desc[i].flags = cydriver.CUmemAccess_flags.CU_MEM_ACCESS_FLAGS_PROT_READWRITE
307+
access_desc[i].location.type = cydriver.CUmemLocationType.CU_MEM_LOCATION_TYPE_DEVICE
308+
access_desc[i].location.id = dev_id
309+
i += 1
310+
311+
for dev_id in to_rm:
312+
access_desc[i].flags = cydriver.CUmemAccess_flags.CU_MEM_ACCESS_FLAGS_PROT_NONE
313+
access_desc[i].location.type = cydriver.CUmemLocationType.CU_MEM_LOCATION_TYPE_DEVICE
314+
access_desc[i].location.id = dev_id
315+
i += 1
316+
317+
with nogil:
318+
HANDLE_RETURN(cydriver.cuMemPoolSetAccess(as_cu(self._h_pool), access_desc, count))
319+
finally:
320+
if access_desc != NULL:
321+
PyMem_Free(access_desc)
322+
323+
self._peer_accessible_by = tuple(target_ids)
324+
325+
326+
cdef inline _DMR_init(DeviceMemoryResource self, device_id, options):
327+
from .._device import Device
328+
cdef int dev_id = Device(device_id).device_id
329+
cdef DeviceMemoryResourceOptions opts = check_or_create_options(
330+
DeviceMemoryResourceOptions, options, "DeviceMemoryResource options",
331+
keep_none=True
332+
)
333+
cdef bint ipc_enabled = False
334+
cdef size_t max_size = 0
335+
336+
self._dev_id = dev_id
337+
338+
if opts is not None:
339+
ipc_enabled = opts.ipc_enabled
340+
if ipc_enabled and not _ipc.is_supported():
341+
raise RuntimeError(f"IPC is not available on {platform.system()}")
342+
max_size = opts.max_size
343+
344+
if opts is None:
345+
self._h_pool = get_device_mempool(dev_id)
346+
self._mempool_owned = False
347+
MP_raise_release_threshold(self)
348+
else:
349+
self._peer_accessible_by = ()
350+
MP_init_create_pool(
351+
self,
352+
cydriver.CUmemLocationType.CU_MEM_LOCATION_TYPE_DEVICE,
353+
dev_id,
354+
cydriver.CUmemAllocationType.CU_MEM_ALLOCATION_TYPE_PINNED,
355+
ipc_enabled,
356+
max_size,
357+
)
358+
359+
229360
# Note: this is referenced in instructions to debug nvbug 5698116.
230361
cpdef DMR_mempool_get_access(DeviceMemoryResource dmr, int device_id):
231362
"""

cuda_core/cuda/core/_memory/_ipc.pyx

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -197,6 +197,10 @@ cdef _MemPool MP_from_allocation_handle(cls, alloc_handle):
197197
uuid = getattr(alloc_handle, 'uuid', None) # no-cython-lint
198198
mr = registry.get(uuid)
199199
if mr is not None:
200+
if not isinstance(mr, cls):
201+
raise TypeError(
202+
f"Registry contains a {type(mr).__name__} for uuid "
203+
f"{uuid}, but {cls.__name__} was requested")
200204
return mr
201205

202206
# Ensure we have an allocation handle. Duplicate the file descriptor, if
Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
1-
# SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
1+
# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
#
33
# SPDX-License-Identifier: Apache-2.0
44

55
from cuda.core._memory._memory_pool cimport _MemPool
66

77

88
cdef class ManagedMemoryResource(_MemPool):
9-
pass
9+
cdef:
10+
str _pref_loc_type
11+
int _pref_loc_id

0 commit comments

Comments
 (0)