|
| 1 | +.. SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
| 2 | +.. SPDX-License-Identifier: Apache-2.0 |
| 3 | +
|
| 4 | +.. currentmodule:: cuda.core |
| 5 | + |
| 6 | +``cuda.core`` 0.7.0 Release Notes |
| 7 | +================================= |
| 8 | + |
| 9 | + |
| 10 | +Highlights |
| 11 | +---------- |
| 12 | + |
| 13 | +- Introduced support for explicit graph construction. CUDA graphs can now be |
| 14 | + built programmatically by adding nodes and edges, and their topology can be |
| 15 | + modified after construction. |
| 16 | +- Added CUDA-OpenGL interoperability support, enabling zero-copy sharing of |
| 17 | + GPU memory between CUDA compute kernels and OpenGL renderers. |
| 18 | +- Added :class:`TensorMapDescriptor` for Hopper+ TMA (Tensor Memory Accelerator) |
| 19 | + bulk data movement, with automatic kernel argument integration. |
| 20 | +- :class:`~utils.StridedMemoryView` now supports DLPack export via |
| 21 | + ``from_dlpack()`` array API. |
| 22 | + |
| 23 | + |
| 24 | +New features |
| 25 | +------------ |
| 26 | + |
| 27 | +- Added the :mod:`cuda.core.graph` public module containing |
| 28 | + :class:`~graph.GraphDef` for explicit graph construction, typed node |
| 29 | + subclasses, and supporting types. :class:`~graph.GraphBuilder` (stream |
| 30 | + capture) also moves into this module. |
| 31 | + |
| 32 | +- Added :meth:`~graph.GraphBuilder.callback` for CPU callbacks during stream |
| 33 | + capture, mirroring the existing :meth:`~graph.GraphDef.callback` API. |
| 34 | + |
| 35 | +- Added :class:`GraphicsResource` for CUDA-OpenGL interoperability. |
| 36 | + Factory classmethods :meth:`~GraphicsResource.from_gl_buffer` and |
| 37 | + :meth:`~GraphicsResource.from_gl_image` register OpenGL objects for CUDA |
| 38 | + access, and mapping returns a :class:`Buffer` for zero-copy kernel use. |
| 39 | + |
| 40 | +- Added :class:`TensorMapDescriptor` wrapping the CUDA driver's ``CUtensorMap`` |
| 41 | + for Hopper+ TMA (Tensor Memory Accelerator) bulk data movement. |
| 42 | + :class:`~utils.StridedMemoryView` gains an :meth:`~utils.StridedMemoryView.as_tensor_map` |
| 43 | + method for convenient descriptor creation, with automatic dtype inference, stride |
| 44 | + computation, and first-class kernel argument integration. |
| 45 | + |
| 46 | +- Added DLPack export support to :class:`~utils.StridedMemoryView` via |
| 47 | + ``__dlpack__`` and ``__dlpack_device__``, complementing the existing import |
| 48 | + path. |
| 49 | + |
| 50 | +- Added the DLPack C exchange API (``__dlpack_c_exchange_api__``) to |
| 51 | + :class:`~utils.StridedMemoryView`. |
| 52 | + |
| 53 | +- Added NVRTC precompiled header (PCH) support (CUDA 12.8+). |
| 54 | + :class:`ProgramOptions` gains ``pch``, ``create_pch``, ``use_pch``, |
| 55 | + ``pch_dir``, and related options. :attr:`Program.pch_status` reports the |
| 56 | + PCH creation outcome, and :meth:`~Program.compile` automatically resizes the NVRTC |
| 57 | + PCH heap and retries when PCH creation fails due to heap exhaustion. |
| 58 | + |
| 59 | +- Added NUMA-aware managed memory pool placement. |
| 60 | + :class:`ManagedMemoryResourceOptions` gains a ``preferred_location_type`` |
| 61 | + option (``"device"``, ``"host"``, or ``"host_numa"``), and |
| 62 | + :attr:`ManagedMemoryResource.preferred_location` queries the resolved |
| 63 | + location. The existing ``preferred_location`` parameter retains full |
| 64 | + backwards compatibility. |
| 65 | + |
| 66 | +- Added NUMA-aware pinned memory pool placement. |
| 67 | + :class:`PinnedMemoryResourceOptions` gains a ``numa_id`` option, and |
| 68 | + :attr:`PinnedMemoryResource.numa_id` queries the host NUMA node ID used for |
| 69 | + pool placement. When ``ipc_enabled=True`` and ``numa_id`` is not set, the |
| 70 | + NUMA node is automatically derived from the current CUDA device. |
| 71 | + |
| 72 | +- Added support for CUDA 13.2. |
| 73 | + |
| 74 | + |
| 75 | +New examples |
| 76 | +------------ |
| 77 | + |
| 78 | +- ``gl_interop_plasma.py``: Real-time plasma effect demonstrating CUDA-OpenGL |
| 79 | + interoperability via :class:`GraphicsResource`. |
| 80 | +- ``tma_tensor_map.py``: TMA bulk data movement using |
| 81 | + :class:`TensorMapDescriptor` on Hopper+ GPUs. |
| 82 | + |
| 83 | + |
| 84 | +Fixes and enhancements |
| 85 | +---------------------- |
| 86 | + |
| 87 | +- Fixed managed memory buffers being misclassified as ``kDLCUDAHost`` in DLPack |
| 88 | + device mapping. They are now correctly reported as ``kDLCUDAManaged``. |
| 89 | + (`#1863 <https://github.com/NVIDIA/cuda-python/pull/1863>`__) |
| 90 | +- Fixed IPC-enabled pinned memory pools using a hardcoded NUMA node ID of ``0`` |
| 91 | + instead of the NUMA node closest to the active CUDA device. On multi-NUMA |
| 92 | + systems where the device is attached to a non-zero host NUMA node, this could |
| 93 | + cause pool creation or allocation failures. (`#1603 <https://github.com/NVIDIA/cuda-python/issues/1603>`__) |
| 94 | +- Fixed :attr:`DeviceMemoryResource.peer_accessible_by` returning stale results when wrapping |
| 95 | + a non-owned (default) memory pool. The property now always queries the CUDA driver for |
| 96 | + non-owned pools, so multiple wrappers around the same pool see consistent state. (`#1720 <https://github.com/NVIDIA/cuda-python/issues/1720>`__) |
| 97 | +- Fixed a bare ``except`` clause in stream acceptance that silently swallowed all exceptions, |
| 98 | + including ``KeyboardInterrupt`` and ``SystemExit``. Only the expected "protocol not |
| 99 | + supported" case is now caught. (`#1631 <https://github.com/NVIDIA/cuda-python/issues/1631>`__) |
| 100 | +- :class:`~utils.StridedMemoryView` now validates strides at construction time so unsupported |
| 101 | + layouts fail immediately instead of on first metadata access. (`#1429 <https://github.com/NVIDIA/cuda-python/issues/1429>`__) |
| 102 | +- IPC file descriptor cleanup now uses a C++ ``shared_ptr`` with a POSIX deleter, avoiding |
| 103 | + cryptic errors when a :class:`DeviceMemoryResource` is destroyed during Python shutdown. |
| 104 | +- Improved error message when :class:`ManagedMemoryResource` is called without options on platforms |
| 105 | + that lack a default managed memory pool (e.g. WSL2). (`#1617 <https://github.com/NVIDIA/cuda-python/issues/1617>`__) |
| 106 | +- Handle properties on core API objects now return ``None`` during Python shutdown instead of |
| 107 | + crashing. |
| 108 | +- Reduced Python overhead in :class:`Program` and :class:`Linker` by moving compilation and |
| 109 | + linking operations to the C level and releasing the GIL during backend calls. This benefits |
| 110 | + workloads that create many programs or linkers, and enables concurrent compilation in |
| 111 | + multithreaded applications. |
| 112 | +- Error enum explanations are now derived from ``cuda-bindings`` docstrings when available |
| 113 | + (bindings 12.9.6+ or 13.2.0+), with frozen tables as fallback for older versions. |
| 114 | +- Improved optional dependency handling for NVVM and nvJitLink imports so that only genuinely |
| 115 | + missing optional modules are treated as unavailable; unrelated import failures now surface |
| 116 | + normally, and ``cuda.core`` now depends directly on ``cuda-pathfinder``. |
0 commit comments