.. currentmodule:: cuda.core.experimental
Released on TBD
- This is the last release that officially supports Python 3.9.
- Fix for :class:`LaunchConfig` grid parameter unit conversion when thread block clusters are used.
- CUDA 11 support dropped: CUDA 11 support is no longer tested and it may or may not work with cuda.bindings and CTK 11.x. Users are encouraged to migrate to CUDA 12.x or 13.x.
- Support for
cuda-bindings(andcuda-python) < 12.6.2 is dropped. Internally,cuda.corenow always requires the new binding module layout. As per thecuda-bindingssupport policy), CUDA 12 users are encouraged to use the latestcuda-bindings12.9.x, which is backward-compatible with all CUDA Toolkit 12.y. - LaunchConfig grid parameter interpretation: When :attr:`LaunchConfig.cluster` is specified, the :attr:`LaunchConfig.grid` parameter now correctly represents the number of clusters instead of blocks. Previously, the grid parameter was incorrectly interpreted as blocks, causing a mismatch with the expected C++ behavior. This change ensures that
LaunchConfig(grid=4, cluster=2, block=32)correctly produces 4 clusters × 2 blocks/cluster = 8 total blocks, matching the C++ equivalentcudax::make_hierarchy(cudax::grid_dims(4), cudax::cluster_dims(2), cudax::block_dims(32)). - When :class:`Buffer` is closed, :attr:`Buffer.handle` is now set to
None. It was previously set to0by accident.
- Added :attr:`Device.arch` property that returns the compute capability as a string (e.g., '75' for CC 7.5), providing a convenient alternative to manually concatenating the compute capability tuple.
- CUDA 13.x testing support through new
test-cu13dependency group. - Stream-ordered memory allocation can now be shared on Linux via :class:`DeviceMemoryResource`.
- Added NVVM IR support to :class:`Program`. NVVM IR is now understood with
code_type="nvvm". - Added an :attr:`ObjectCode.code_type` attribute for querying the code type.
None.
- Improved :class:`DeviceMemoryResource` allocation performance when there are no active allocations by setting a higher release threshold (addresses issue #771).
- Improved :class:`StridedMemoryView` creation time performance by optimizing shape and strides tuple creation using Python/C API (addresses issue #449).
- Fix :class:`LaunchConfig` grid unit conversion when cluster is set (addresses issue #867).
- Fixed a bug in :class:`GraphBuilder.add_child` where dependencies extracted from capturing stream were passed inconsistently with num_dependencies parameter (addresses issue #843).
- Make :class:`Buffer` creation more performant.
- Enabled :class:`MemoryResource` subclasses to accept :class:`Device` objects, in addition to previously supported device ordinals.
- Fixed a bug in :class:`Stream` and other classes where object cleanup would error during interpreter shutdown.
- :class:`StridedMemoryView` of an underlying array using the DLPack protocol will no longer leak memory.
- General performance improvement.