Skip to content

Latest commit

 

History

History
40 lines (26 loc) · 1.72 KB

File metadata and controls

40 lines (26 loc) · 1.72 KB
.. currentmodule:: cuda.core

cuda.core 0.6.0 Release Notes

New features

  • Added public access to default CUDA streams via module-level constants LEGACY_DEFAULT_STREAM and PER_THREAD_DEFAULT_STREAM

    Users can now access default streams directly from the cuda.core namespace:

    from cuda.core import LEGACY_DEFAULT_STREAM, PER_THREAD_DEFAULT_STREAM
    
    # Use legacy default stream (synchronizes with all blocking streams)
    LEGACY_DEFAULT_STREAM.sync()
    
    # Use per-thread default stream (non-blocking, thread-local)
    PER_THREAD_DEFAULT_STREAM.sync()

    The legacy default stream synchronizes with all blocking streams in the same CUDA context, ensuring strict ordering but potentially limiting concurrency. The per-thread default stream is local to the calling thread and does not synchronize with other streams, enabling concurrent execution in multi-threaded applications.

    This replaces the previous undocumented workaround of using Stream.from_handle(0) to access the legacy default stream.

  • Added :func:`~cuda.core.utils.make_aligned_dtype` utility for creating structured NumPy dtypes with GPU-compatible alignment. Field offsets and the total itemsize are recomputed so that each field is naturally aligned and the structure size is a multiple of the largest member alignment. An explicit alignment can be requested and is stored in the dtype's metadata under the key "__cuda_alignment__". (Resolves :issue:`734`.)

Fixes and enhancements

None.