Python API for HOST_ACCESSIBLE OrtValue allocation#28038
Python API for HOST_ACCESSIBLE OrtValue allocation#28038ericcraw wants to merge 1 commit intomicrosoft:mainfrom
Conversation
Adds memory_info= parameter to OrtValue.ortvalue_from_shape_and_type(), backed by two new C-level factory methods that look up the registered shared allocator via the full OrtMemoryInfo (including mem_type). This is required because the current shared allocator query doesn't include the memory type making HOST_ACCESSIBLE invisible to python. UsesCpuMemory() is used in GetPyObjFromTensor so that tensors in HOST_ACCESSIBLE memory are returned as zero-copy numpy views.
| :param memory_info: An OrtMemoryInfo from an OrtEpDevice (e.g. via ep_device.memory_info(OrtDeviceMemoryType.HOST_ACCESSIBLE)). When provided, the allocator matching this memory info is used directly, which allows allocating HOST_ACCESSIBLE memory for zero-copy numpy interop. The device_type, device_id, and vendor_id parameters are ignored when memory_info is provided. | ||
| """ | ||
|
|
||
| if memory_info is not None: |
There was a problem hiding this comment.
|
No Python test exercising the new memory_info= parameter or verifying that HOST_ACCESSIBLE OrtValues produce zero-copy numpy views. |
There was a problem hiding this comment.
Pull request overview
Adds Python-level support for allocating OrtValue tensors using an explicit OrtMemoryInfo (including mem_type) so plugin EP HOST_ACCESSIBLE shared allocators can be selected, enabling zero-copy numpy interop for those tensors.
Changes:
- Update tensor-to-numpy conversion to treat
HOST_ACCESSIBLEtensors as CPU-memory-compatible viaOrtDevice::UsesCpuMemory(). - Add new pybind factory methods to allocate
OrtValuefrom shape/type using a fullOrtMemoryInfolookup. - Extend
OrtValue.ortvalue_from_shape_and_type()Python API with an optionalmemory_info=parameter to route allocations through those new factories.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| onnxruntime/python/onnxruntime_pybind_state.cc | Enables zero-copy numpy views for HOST_ACCESSIBLE tensors via UsesCpuMemory(). |
| onnxruntime/python/onnxruntime_pybind_ortvalue.cc | Adds OrtMemoryInfo-based OrtValue allocation factories using shared allocator lookup. |
| onnxruntime/python/onnxruntime_inference_collection.py | Exposes memory_info= on OrtValue.ortvalue_from_shape_and_type() and dispatches to new C++ factories. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| auto& env = GetOrtEnv()->GetEnvironment(); | ||
| AllocatorPtr allocator = env.GetRegisteredSharedAllocator(memory_info); | ||
|
|
||
| if (!allocator) { | ||
| throw std::runtime_error("No shared allocator found for the given OrtMemoryInfo."); | ||
| } |
There was a problem hiding this comment.
The new OrtValueFromShapeAndTypeWithMemoryInfo throws a generic error when no shared allocator is found. This can be hard to diagnose (e.g., mem_type mismatch between DEFAULT vs HOST_ACCESSIBLE). Consider including key details from the requested memory_info (device type/vendor/id, device mem type, and OrtMemType) in the exception message so callers can see what was looked up.
| if memory_info is not None: | ||
| if isinstance(element_type, int): | ||
| return cls( | ||
| C.OrtValue.ortvalue_from_shape_and_onnx_type_for_memory_info( | ||
| shape, | ||
| element_type, | ||
| memory_info, | ||
| ) | ||
| ) | ||
| return cls( | ||
| C.OrtValue.ortvalue_from_shape_and_type_for_memory_info( | ||
| shape, | ||
| element_type, | ||
| memory_info, | ||
| ) | ||
| ) |
There was a problem hiding this comment.
The new memory_info allocation path and the UsesCpuMemory() zero-copy numpy conversion path don’t appear to have test coverage. Adding a Python test that allocates an OrtValue using memory_info=ep_device.memory_info(OrtDeviceMemoryType.HOST_ACCESSIBLE) and validates ort_value.numpy() works (and ideally is zero-copy) would protect this behavior and prevent regressions.
| const auto device_type = device.Type(); | ||
| // Create an numpy array on top of the OrtValue memory, no copy | ||
| if (device_type == OrtDevice::CPU) { | ||
| // Create an numpy array on top of the OrtValue memory, no copy. |
There was a problem hiding this comment.
Grammar: use "a numpy array" (not "an numpy array").
| // Create an numpy array on top of the OrtValue memory, no copy. | |
| // Create a numpy array on top of the OrtValue memory, no copy. |
Description
Adds memory_info= parameter to OrtValue.ortvalue_from_shape_and_type(), backed by two new C-level factory methods that look up the registered shared allocator via the full OrtMemoryInfo (including mem_type).
This is required because the current shared allocator query doesn't include the memory type making HOST_ACCESSIBLE invisible to python. UsesCpuMemory() is used in GetPyObjFromTensor so that tensors in HOST_ACCESSIBLE memory are returned as zero-copy numpy views.
Motivation and Context
Enable zero copy interop between numpy and ortvalue.
This is a follow up for #28037