Is this a duplicate?
Area
cuda.core
Is your feature request related to a problem? Please describe.
I would like to be able to use the equivalent of cuMemCreate, cuMemMap, and friends via a cuda.core MemoryResource
Describe the solution you'd like
I'd like to have a VMMAllocatedMemoryResource which I can create on a Device() for which allocate() will use the cuMem*** driver APIs to create memory.
Describe alternatives you've considered
Currently the only alternative is to use the bindings APIs directly. Since the cuMem*** functions are synchronous, there's no way to fit this with the MemPool APIs as-is (this is my current understanding, at least.)
Additional context
This is useful to support NVSHMEM/NCCL external buffer registration, or for more interesting cases like growing allocations without changing pointer addresses, or EGM on Grace-Hopper or Grace-Blackwell systems.
Is this a duplicate?
Area
cuda.core
Is your feature request related to a problem? Please describe.
I would like to be able to use the equivalent of
cuMemCreate,cuMemMap, and friends via acuda.coreMemoryResourceDescribe the solution you'd like
I'd like to have a
VMMAllocatedMemoryResourcewhich I can create on aDevice()for whichallocate()will use thecuMem***driver APIs to create memory.Describe alternatives you've considered
Currently the only alternative is to use the bindings APIs directly. Since the
cuMem***functions are synchronous, there's no way to fit this with the MemPool APIs as-is (this is my current understanding, at least.)Additional context
This is useful to support NVSHMEM/NCCL external buffer registration, or for more interesting cases like growing allocations without changing pointer addresses, or EGM on Grace-Hopper or Grace-Blackwell systems.