Skip to content

Latest commit

 

History

History
38 lines (27 loc) · 1.41 KB

File metadata and controls

38 lines (27 loc) · 1.41 KB
.. currentmodule:: cuda.core

cuda.core 1.0.0 Release Notes

Highlights

  • TBD

New features

  • Added the :mod:`cuda.core.checkpoint` module for CUDA process checkpointing, including string process state queries, lock/checkpoint/restore/unlock operations, and GPU UUID remapping support for restore. (#1343)

Fixes and enhancements

  • :class:`~utils.StridedMemoryView` now provides a fast path for torch.Tensor objects via PyTorch's AOT Inductor (AOTI) stable C ABI. When a torch.Tensor is passed to any from_* classmethod (from_dlpack, from_cuda_array_interface, from_array_interface, or from_any_interface), tensor metadata is read directly from the underlying C struct, bypassing the DLPack and CUDA Array Interface protocol overhead. This yields ~7-20x faster StridedMemoryView construction for PyTorch tensors (depending on whether stream ordering is required). Proper CUDA stream ordering is established between PyTorch's current stream and the consumer stream, matching the DLPack synchronization contract. Requires PyTorch >= 2.3. (#749)