|
| 1 | +.. SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
| 2 | +.. SPDX-License-Identifier: Apache-2.0 |
| 3 | +
|
| 4 | +.. currentmodule:: cuda.core.experimental |
| 5 | + |
| 6 | +``cuda.core`` 0.X.Y Release Notes |
| 7 | +================================= |
| 8 | + |
| 9 | +Released on TBD |
| 10 | + |
| 11 | + |
| 12 | +Highlights |
| 13 | +---------- |
| 14 | + |
| 15 | +- This is the last release that officially supports Python 3.9. |
| 16 | +- Fix for :class:`LaunchConfig` grid parameter unit conversion when thread block clusters are used. |
| 17 | + |
| 18 | + |
| 19 | +Breaking Changes |
| 20 | +---------------- |
| 21 | + |
| 22 | +- CUDA 11 support dropped: CUDA 11 is no longer tested and it may or may not work with ``cuda.bindings`` and CTK 11.x. Users are encouraged to migrate to CUDA 12.x or 13.x. |
| 23 | +- Support for ``cuda-bindings`` (and ``cuda-python``) < 12.6.2 is dropped. Internally, ``cuda.core`` now always requires the `new binding module layout <https://nvidia.github.io/cuda-python/cuda-bindings/latest/release/12.6.1-notes.html#cuda-namespace-cleanup-with-a-new-module-layout>`_. As per the ``cuda-bindings`` `support policy <https://nvidia.github.io/cuda-python/cuda-bindings/latest/support.html>`_), CUDA 12 users are encouraged to use the latest ``cuda-bindings`` 12.9.x, which is backward-compatible with all CUDA Toolkit 12.y. |
| 24 | +- Change in :class:`LaunchConfig` grid parameter interpretation: When :attr:`LaunchConfig.cluster` is specified, the :attr:`LaunchConfig.grid` parameter now correctly represents the number of clusters instead of blocks. Previously, the grid parameter was incorrectly interpreted as blocks, causing a mismatch with the expected C++ behavior. This change ensures that ``LaunchConfig(grid=4, cluster=2, block=32)`` correctly produces 4 clusters × 2 blocks/cluster = 8 total blocks, matching the C++ equivalent ``cudax::make_hierarchy(cudax::grid_dims(4), cudax::cluster_dims(2), cudax::block_dims(32))``. |
| 25 | +- The :class:`Buffer` objects now deallocate on the stream that was used to allocate it, instead of on the default stream. We encourage users to overwrite the deallocation stream explicitly through the :meth:`~Buffer.close` method if desired. Establishing a proper stream order is the user responsibility. |
| 26 | + |
| 27 | + |
| 28 | +New features |
| 29 | +------------ |
| 30 | + |
| 31 | +- Added :attr:`Device.arch` property that returns the compute capability as a string (e.g., '75' for CC 7.5), providing a convenient alternative to manually concatenating the compute capability tuple. |
| 32 | +- CUDA 13.x testing support through new ``test-cu13`` dependency group. |
| 33 | +- Stream-ordered memory allocation can now be shared on Linux via :class:`DeviceMemoryResource`. |
| 34 | +- Added NVVM IR support to :class:`Program`. NVVM IR is now understood with ``code_type="nvvm"``. |
| 35 | +- Added an :attr:`ObjectCode.code_type` attribute for querying the code type. |
| 36 | +- Added :class:`VirtualMemoryResource` for low-level virtual memory management on Linux. |
| 37 | + |
| 38 | + |
| 39 | +New examples |
| 40 | +------------ |
| 41 | + |
| 42 | +None. |
| 43 | + |
| 44 | + |
| 45 | +Fixes and enhancements |
| 46 | +---------------------- |
| 47 | + |
| 48 | +- Improved :class:`DeviceMemoryResource` allocation performance when there are no active allocations by setting a higher release threshold (addresses issue #771). |
| 49 | +- Improved :class:`StridedMemoryView` creation time performance by optimizing shape and strides tuple creation using Python/C API (addresses issue #449). |
| 50 | +- Fix :class:`LaunchConfig` grid unit conversion when cluster is set (addresses issue #867). |
| 51 | +- Fixed a bug in :class:`GraphBuilder.add_child` where dependencies extracted from capturing stream were passed inconsistently with num_dependencies parameter (addresses issue #843). |
| 52 | +- Make :class:`Buffer` creation more performant. |
| 53 | +- Enabled :class:`MemoryResource` subclasses to accept :class:`Device` objects, in addition to previously supported device ordinals. |
| 54 | +- Fixed a bug in :class:`Stream` and other classes where object cleanup would error during interpreter shutdown. |
| 55 | +- :class:`StridedMemoryView` of an underlying array using the DLPack protocol will no longer leak memory. |
| 56 | +- General performance improvement. |
| 57 | +- Fixed incorrect index usage in vector_add example |
0 commit comments