Skip to content

Latest commit

 

History

History
55 lines (37 loc) · 3.53 KB

File metadata and controls

55 lines (37 loc) · 3.53 KB
.. currentmodule:: cuda.core.experimental

cuda.core 0.X.Y Release Notes

Released on TBD

Highlights

  • This is the last release that officially supports Python 3.9.
  • Fix for :class:`LaunchConfig` grid parameter unit conversion when thread block clusters are used.

Breaking Changes

  • CUDA 11 support dropped: CUDA 11 support is no longer tested and it may or may not work with cuda.bindings and CTK 11.x. Users are encouraged to migrate to CUDA 12.x or 13.x.
  • Support for cuda-bindings (and cuda-python) < 12.6.2 is dropped. Internally, cuda.core now always requires the new binding module layout. As per the cuda-bindings support policy), CUDA 12 users are encouraged to use the latest cuda-bindings 12.9.x, which is backward-compatible with all CUDA Toolkit 12.y.
  • LaunchConfig grid parameter interpretation: When :attr:`LaunchConfig.cluster` is specified, the :attr:`LaunchConfig.grid` parameter now correctly represents the number of clusters instead of blocks. Previously, the grid parameter was incorrectly interpreted as blocks, causing a mismatch with the expected C++ behavior. This change ensures that LaunchConfig(grid=4, cluster=2, block=32) correctly produces 4 clusters × 2 blocks/cluster = 8 total blocks, matching the C++ equivalent cudax::make_hierarchy(cudax::grid_dims(4), cudax::cluster_dims(2), cudax::block_dims(32)).
  • When :class:`Buffer` is closed, :attr:`Buffer.handle` is now set to None. It was previously set to 0 by accident.

New features

  • Added :attr:`Device.arch` property that returns the compute capability as a string (e.g., '75' for CC 7.5), providing a convenient alternative to manually concatenating the compute capability tuple.
  • CUDA 13.x testing support through new test-cu13 dependency group.
  • Stream-ordered memory allocation can now be shared on Linux via :class:`DeviceMemoryResource`.
  • Added NVVM IR support to :class:`Program`. NVVM IR is now understood with code_type="nvvm".
  • Added an :attr:`ObjectCode.code_type` attribute for querying the code type.

New examples

None.

Fixes and enhancements

  • Improved :class:`DeviceMemoryResource` allocation performance when there are no active allocations by setting a higher release threshold (addresses issue #771).
  • Improved :class:`StridedMemoryView` creation time performance by optimizing shape and strides tuple creation using Python/C API (addresses issue #449).
  • Fix :class:`LaunchConfig` grid unit conversion when cluster is set (addresses issue #867).
  • Fixed a bug in :class:`GraphBuilder.add_child` where dependencies extracted from capturing stream were passed inconsistently with num_dependencies parameter (addresses issue #843).
  • Make :class:`Buffer` creation more performant.
  • Enabled :class:`MemoryResource` subclasses to accept :class:`Device` objects, in addition to previously supported device ordinals.
  • Fixed a bug in :class:`Stream` and other classes where object cleanup would error during interpreter shutdown.
  • :class:`StridedMemoryView` of an underlying array using the DLPack protocol will no longer leak memory.
  • General performance improvement.