You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
cuda.core: require explicit stream for stream-scheduling APIs (#2001)
Removes the implicit fallback to default_stream() (or NULL) on APIs that
schedule work on a stream. `stream` is now a required keyword-only
argument; `Stream_accept(None)` raises TypeError.
Affected APIs:
- MemoryResource.allocate / deallocate and overrides on
DeviceMemoryResource, PinnedMemoryResource, ManagedMemoryResource,
LegacyPinnedMemoryResource, GraphMemoryResource.
- Device.allocate.
- GraphicsResource.map.
- KernelOccupancy.max_potential_cluster_size / max_active_clusters.
- Graph.launch (stream was previously positional).
Stream_accept is promoted to cpdef so the pure-Python legacy/sync
resources can call it.
Also fixes a latent bug uncovered while doing this: the C++ MR
deallocation callback in Buffer's GC path was calling
`mr.deallocate(ptr, size, stream)` positionally, which would fail with
the new keyword-only signature for every garbage-collected
DeviceMemoryResource/GraphMemoryResource buffer. Switched to
`stream=stream`.
VirtualMemoryResource is exempt because cuMemCreate / cuMemMap are
synchronous and not stream-ordered; it now accepts (and validates) an
optional stream instead of rejecting any non-None value.
Buffer.from_ipc_descriptor is also exempt: stream there only seeds the
deallocation stream stored in the handle (no work is scheduled), the
same shape as Buffer.close(stream=None).
Tests, examples, and the v1.0.0 release note are updated accordingly.
Co-authored-by: Cursor <cursoragent@cursor.com>
0 commit comments