Skip to content

Latest commit

 

History

History
116 lines (93 loc) · 6.09 KB

File metadata and controls

116 lines (93 loc) · 6.09 KB
.. currentmodule:: cuda.core

cuda.core 0.7.0 Release Notes

Highlights

  • Introduced support for explicit graph construction. CUDA graphs can now be built programmatically by adding nodes and edges, and their topology can be modified after construction.
  • Added CUDA-OpenGL interoperability support, enabling zero-copy sharing of GPU memory between CUDA compute kernels and OpenGL renderers.
  • Added :class:`TensorMapDescriptor` for Hopper+ TMA (Tensor Memory Accelerator) bulk data movement, with automatic kernel argument integration.
  • :class:`~utils.StridedMemoryView` now supports DLPack export via from_dlpack() array API.

New features

New examples

Fixes and enhancements

  • Fixed managed memory buffers being misclassified as kDLCUDAHost in DLPack device mapping. They are now correctly reported as kDLCUDAManaged. (#1863)
  • Fixed IPC-enabled pinned memory pools using a hardcoded NUMA node ID of 0 instead of the NUMA node closest to the active CUDA device. On multi-NUMA systems where the device is attached to a non-zero host NUMA node, this could cause pool creation or allocation failures. (#1603)
  • Fixed :attr:`DeviceMemoryResource.peer_accessible_by` returning stale results when wrapping a non-owned (default) memory pool. The property now always queries the CUDA driver for non-owned pools, so multiple wrappers around the same pool see consistent state. (#1720)
  • Fixed a bare except clause in stream acceptance that silently swallowed all exceptions, including KeyboardInterrupt and SystemExit. Only the expected "protocol not supported" case is now caught. (#1631)
  • :class:`~utils.StridedMemoryView` now validates strides at construction time so unsupported layouts fail immediately instead of on first metadata access. (#1429)
  • IPC file descriptor cleanup now uses a C++ shared_ptr with a POSIX deleter, avoiding cryptic errors when a :class:`DeviceMemoryResource` is destroyed during Python shutdown.
  • Improved error message when :class:`ManagedMemoryResource` is called without options on platforms that lack a default managed memory pool (e.g. WSL2). (#1617)
  • Handle properties on core API objects now return None during Python shutdown instead of crashing.
  • Reduced Python overhead in :class:`Program` and :class:`Linker` by moving compilation and linking operations to the C level and releasing the GIL during backend calls. This benefits workloads that create many programs or linkers, and enables concurrent compilation in multithreaded applications.
  • Error enum explanations are now derived from cuda-bindings docstrings when available (bindings 12.9.6+ or 13.2.0+), with frozen tables as fallback for older versions.
  • Improved optional dependency handling for NVVM and nvJitLink imports so that only genuinely missing optional modules are treated as unavailable; unrelated import failures now surface normally, and cuda.core now depends directly on cuda-pathfinder.