Skip to content

Latest commit

 

History

History
89 lines (58 loc) · 3.49 KB

File metadata and controls

89 lines (58 loc) · 3.49 KB
.. currentmodule:: cuda.core

cuda.core 0.6.0 Release Notes

Highlights

  • Added the cuda.core.system module for NVML-based system and device queries.
  • Several :class:`~utils.StridedMemoryView` improvements, including bfloat16 dlpack support and numpy array interoperability.
  • Improved support for Python object protocols across core API classes.
  • Performance improvements through Cythonization and reduced Python overhead.

Breaking Changes

  • Building cuda.core from source now requires cuda-bindings >= 12.9.0, due to Cython-level dependencies on the NVVM bindings (cynvvm). Pre-built wheels are unaffected. The previous minimum was 12.8.0.

New features

  • Added the cuda.core.system module for NVML-based system and device queries, including device attributes, clocks, temperatures, fans, events, and PCI information.
  • :class:`~utils.StridedMemoryView` improvements:
    • Added from_array_interface constructor for creating views from numpy arrays.
    • Improved structured dtype array support.
    • Added bfloat16 dlpack support when the optional ml_dtypes package is installed.
  • Added public access to default CUDA streams via module-level constants LEGACY_DEFAULT_STREAM and PER_THREAD_DEFAULT_STREAM, replacing the previous workaround of using Stream.from_handle(0).
  • Added :meth:`Kernel.from_handle` for wrapping an existing CUfunction handle into a :class:`Kernel` object, enabling interoperability with foreign CUDA handles.
  • Added __eq__, __hash__, __weakref__, and __repr__ support for core API classes including :class:`Buffer`, :class:`LaunchConfig`, :class:`Kernel`, :class:`ObjectCode`, :class:`Stream`, and :class:`Event`.
  • Added NVVM extra_sources and use_libdevice options to :class:`ProgramOptions` for multi-module NVVM compilation and automatic libdevice loading.
  • Added CUDA version compatibility check at import time to detect mismatches between cuda.core and the installed cuda-bindings version.
  • Program.compile() now automatically resizes the NVRTC PCH heap and retries when precompiled header creation fails due to heap exhaustion. The pch_status property reports the PCH creation outcome ("created", "not_attempted", "failed", or None).

Fixes and enhancements

  • Eliminated spurious CUDA driver errors during interpreter shutdown by ensuring resources are destroyed in the correct order.
  • Fixed a bug preventing weak references to core API objects.
  • Fixed zero-sized allocations in legacy memory resources, which previously failed on certain platforms.
  • Improved performance by Cythonizing :class:`Program` and :class:`ObjectCode` internals.
  • Reduced :class:`~utils.StridedMemoryView` construction overhead.
  • __hash__ and __eq__ on core API classes no longer require a CUDA context.
  • Device attribute queries now gracefully handle unsupported attributes on older CUDA drivers, returning sensible defaults instead of raising errors.
  • Added a warning when :class:`ManagedMemoryResource` is created on platforms without concurrent managed access support.
  • Reduced wheel and installed package sizes by excluding Cython source files and build artifacts from distribution packages.
  • Slightly improved typing support.