.. currentmodule:: cuda.core
- Added the
cuda.core.systemmodule for NVML-based system and device queries. - Several :class:`~utils.StridedMemoryView` improvements, including bfloat16 dlpack support and numpy array interoperability.
- Improved support for Python object protocols across core API classes.
- Performance improvements through Cythonization and reduced Python overhead.
- Building
cuda.corefrom source now requirescuda-bindings>= 12.9.0, due to Cython-level dependencies on the NVVM bindings (cynvvm). Pre-built wheels are unaffected. The previous minimum was 12.8.0.
- Added the
cuda.core.systemmodule for NVML-based system and device queries, including device attributes, clocks, temperatures, fans, events, and PCI information. - :class:`~utils.StridedMemoryView` improvements:
- Added
from_array_interfaceconstructor for creating views from numpy arrays. - Improved structured dtype array support.
- Added bfloat16 dlpack support when the optional
ml_dtypespackage is installed.
- Added
- Added public access to default CUDA streams via module-level constants
LEGACY_DEFAULT_STREAMandPER_THREAD_DEFAULT_STREAM, replacing the previous workaround of usingStream.from_handle(0). - Added :meth:`Kernel.from_handle` for wrapping an existing
CUfunctionhandle into a :class:`Kernel` object, enabling interoperability with foreign CUDA handles. - Added
__eq__,__hash__,__weakref__, and__repr__support for core API classes including :class:`Buffer`, :class:`LaunchConfig`, :class:`Kernel`, :class:`ObjectCode`, :class:`Stream`, and :class:`Event`. - Added NVVM
extra_sourcesanduse_libdeviceoptions to :class:`ProgramOptions` for multi-module NVVM compilation and automatic libdevice loading. - Added CUDA version compatibility check at import time to detect mismatches between
cuda.coreand the installedcuda-bindingsversion. Program.compile()now automatically resizes the NVRTC PCH heap and retries when precompiled header creation fails due to heap exhaustion. Thepch_statusproperty reports the PCH creation outcome ("created","not_attempted","failed", orNone).
- Eliminated spurious CUDA driver errors during interpreter shutdown by ensuring resources are destroyed in the correct order.
- Fixed a bug preventing weak references to core API objects.
- Fixed zero-sized allocations in legacy memory resources, which previously failed on certain platforms.
- Improved performance by Cythonizing :class:`Program` and :class:`ObjectCode` internals.
- Reduced :class:`~utils.StridedMemoryView` construction overhead.
__hash__and__eq__on core API classes no longer require a CUDA context.- Device attribute queries now gracefully handle unsupported attributes on older CUDA drivers, returning sensible defaults instead of raising errors.
- Added a warning when :class:`ManagedMemoryResource` is created on platforms without concurrent managed access support.
- Reduced wheel and installed package sizes by excluding Cython source files and build artifacts from distribution packages.
- Slightly improved typing support.