Commit 50c19d0
cuda.core: require explicit stream for stream-scheduling APIs (#2020)
* cuda.core: require explicit stream for stream-scheduling APIs (#2001)
Removes the implicit fallback to default_stream() (or NULL) on APIs that
schedule work on a stream. `stream` is now a required keyword-only
argument; `Stream_accept(None)` raises TypeError.
Affected APIs:
- MemoryResource.allocate / deallocate and overrides on
DeviceMemoryResource, PinnedMemoryResource, ManagedMemoryResource,
LegacyPinnedMemoryResource, GraphMemoryResource.
- Device.allocate.
- GraphicsResource.map.
- KernelOccupancy.max_potential_cluster_size / max_active_clusters.
- Graph.launch (stream was previously positional).
Stream_accept is promoted to cpdef so the pure-Python legacy/sync
resources can call it.
Also fixes a latent bug uncovered while doing this: the C++ MR
deallocation callback in Buffer's GC path was calling
`mr.deallocate(ptr, size, stream)` positionally, which would fail with
the new keyword-only signature for every garbage-collected
DeviceMemoryResource/GraphMemoryResource buffer. Switched to
`stream=stream`.
VirtualMemoryResource is exempt because cuMemCreate / cuMemMap are
synchronous and not stream-ordered; it now accepts (and validates) an
optional stream instead of rejecting any non-None value.
Buffer.from_ipc_descriptor is also exempt: stream there only seeds the
deallocation stream stored in the handle (no work is scheduled), the
same shape as Buffer.close(stream=None).
Tests, examples, and the v1.0.0 release note are updated accordingly.
Co-authored-by: Cursor <cursoragent@cursor.com>
* cuda.core: also require explicit stream for Buffer.from_ipc_descriptor (#2001)
Buffer.from_ipc_descriptor previously fell back to default_stream() when
stream=None. That fallback is exactly the implicit-fallback pattern issue
#2001 removes (the chosen stream depends on global state, not the call
site), so it does not belong in the same exemption category as
Buffer.close(stream=None) / GraphicsResource.unmap(stream=None) which
genuinely reuse an existing stream.
stream is now keyword-only and required. Internal validation goes through
Stream_accept like the other tightened APIs. Tests and the v1.0.0 release
note updated accordingly.
Co-authored-by: Cursor <cursoragent@cursor.com>
* cuda.core: align deallocate signatures and revert Graph.launch (#2001)
- Make `deallocate` keyword-only on the synchronous resources
(`LegacyPinnedMemoryResource`, `_SynchronousMemoryResource`,
`VirtualMemoryResource`) so every memory-resource API obeys the
kw-only rule, with `stream=None` as the default since these resources
do not actually use the stream.
- Revert `Graph.launch` to take `stream` positionally. It is the same
shape as the kernel `launch(stream, config, kernel, *args)` API
(already exempt in the issue) and shouldn't be the odd one out.
- Tighten `VirtualMemoryResource.deallocate` docstring to match
`allocate`.
- Mark unused lambda args in `test_pass_object` as `_stream` to silence
ARG005.
Co-authored-by: Cursor <cursoragent@cursor.com>
* cuda.core: tighten test mocks and add Stream_accept(None) test (#2001)
Review follow-ups:
- Tighten the test-only `MemoryResource` subclasses (`DummyDeviceMemoryResource`,
`DummyHostMemoryResource`, `DummyPinnedMemoryResource`,
`DummyUnifiedMemoryResource`, `TrackingMR`, `StreamCaptureMR`) to match the
new public API: `allocate(self, size, *, stream)` and
`deallocate(self, ptr, size, *, stream)` with no default. Previously the
mocks accepted `stream=None` positionally, which let tests bypass the new
explicit-stream policy.
- Update the affected helper functions and call sites in `test_memory.py` to
pass `stream=device.default_stream` explicitly. Fix the
`super().deallocate(ptr, size, stream)` positional call in
`test_mr_deallocate_receives_stream` to use `stream=stream`.
- Update `helpers/buffers.py` similarly (`make_scratch_buffer`, `PatternGen`).
- Add a direct test for the centralized `Stream_accept(None)` -> `TypeError`
behavior in `test_stream.py`.
- Tighten the release note for `Buffer.from_ipc_descriptor`: lead with the
removal of the silent fallback to the default stream rather than the
positional-to-keyword shift.
Co-authored-by: Cursor <cursoragent@cursor.com>
* cuda.core: fix Buffer pickle path broken by kw-only stream (#2001)
`Buffer._reduce_helper` (the pickle/unpickle factory) previously called
`Buffer.from_ipc_descriptor(mr, ipc_descriptor)` without a stream and
relied on the implicit `default_stream()` fallback inside
`Buffer_from_ipc_descriptor`. Making `from_ipc_descriptor`'s stream a
required keyword-only argument broke this code path, causing every
multiprocessing IPC test that pickles a `Buffer` (test_send_buffers,
test_memory_ipc, test_event_ipc, test_serialize, test_workerpool, ...)
to fail in the child process with:
TypeError: from_ipc_descriptor() needs keyword-only argument stream
Fix: pass `default_stream()` explicitly from `_reduce_helper`. The
parent process's stream isn't portable across processes, so the pickle
path cannot thread an explicit stream through. The receiver can still
override the deallocation stream via `buffer.close(stream=...)`.
The user-facing rule still holds: callers of `Buffer.from_ipc_descriptor`
must pass an explicit stream.
Co-authored-by: Cursor <cursoragent@cursor.com>
* cuda.core: relax kw-only TypeError regex for Cython funcs (#2001)
Cython-generated functions raise
"FUNC() needs keyword-only argument stream"
while pure-Python functions raise
"FUNC() missing 1 required keyword-only argument: 'stream'"
The new tests for `Kernel.occupancy.max_potential_cluster_size`,
`Kernel.occupancy.max_active_clusters`, and `GraphicsResource.map`
were matching only the CPython phrasing and failed against the
Cython forms. Loosen the regex to `keyword-only argument`, which
matches both.
Co-authored-by: Cursor <cursoragent@cursor.com>
* cuda.core: review fixes for #2001 (graph_update.py + _legacy.py)
- examples/graph_update.py: use the dedicated `stream` created at the
top of the example for the pinned allocation, instead of
`device.default_stream`. Better model for users (Leo).
- _memory/_legacy.py: route the user-supplied `stream` through
`Stream_accept` in `LegacyPinnedMemoryResource.deallocate` and
`_SynchronousMemoryResource.deallocate` so a non-`Stream` argument
raises the clean `TypeError` from `Stream_accept` instead of an
`AttributeError` from `.sync()` (matches the validation the matching
`allocate` methods already do).
Co-authored-by: Cursor <cursoragent@cursor.com>
* cuda.core: drop unused stream= kwargs from sync MR call sites (#2001)
Synchronous memory resources (`LegacyPinnedMemoryResource`,
`_SynchronousMemoryResource`, the various test mocks `DummyDeviceMR`,
`DummyHostMR`, `DummyPinnedMR`, `DummyUnifiedMR`, `NullMemoryResource`,
`TrackingMR`, `StreamCaptureMR`) take a stream argument purely for
interface conformance with stream-ordered MRs but never use it.
Forcing every caller to manufacture a stream just to discard it adds
ceremony and a misleading model.
Switch these MRs' allocate/deallocate signatures to keyword-only
`stream=None` (validated via `Stream_accept` when provided), and drop
the now-unused `stream=...` kwargs from ~35 call sites across
examples, tests, and helpers. Also drop the `device` parameter from
`buffer_initialization` and `buffer_close` test helpers (no longer
needed) and remove leftover Device-setup boilerplate from the
NullMemoryResource dlpack-failure tests.
The user-facing rule is unchanged for the genuinely stream-ordered
APIs (`DeviceMemoryResource`, `PinnedMemoryResource`,
`ManagedMemoryResource`, `GraphMemoryResource`, `Device.allocate`,
`Buffer.from_ipc_descriptor`, etc.): stream remains required and
keyword-only. The release note is updated to reflect the sync-MR
exemption (folding `LegacyPinnedMemoryResource` in alongside
`VirtualMemoryResource`).
Co-authored-by: Cursor <cursoragent@cursor.com>
* cuda.core: fix C++ teardown leak when buffer has no attached stream (#2001)
Issue: the C++ ``shared_ptr`` deleter for a buffer's device-pointer
handle invokes ``MemoryResource.deallocate`` via ``_mr_dealloc_callback``.
The handle's deallocation stream is set separately via
``set_deallocation_stream``; if it was never set (e.g. buffers minted via
``Buffer.from_handle(ptr, size, mr=mr)`` from DLPack import, IPC import,
or third-party adapters), the callback would pass ``stream=None`` to
``mr.deallocate``. After the strict-stream changes for #2001, the
stream-ordered MR overrides reject ``stream=None`` via ``Stream_accept``
and raise ``TypeError``. The ``noexcept`` callback catches the exception,
prints a warning to stderr, and returns -- silently **leaking** the
underlying CUDA allocation (and any associated IPC handles).
Fix: when ``h_stream`` is empty in ``_mr_dealloc_callback``, fall back
to ``default_stream()`` instead of ``None``. The C++ teardown path is
the unique legitimate "no-stream-context" caller (no Python frame from
which to obtain a stream), so this is the one place where an implicit
default-stream fallback is necessary; everywhere else the policy
remains "stream is required and must be passed explicitly".
Add ``test_mr_dealloc_callback_falls_back_to_default_stream`` covering
the regression: a strict stream-ordered mock MR is used to back a
``Buffer.from_handle`` (no attached stream), and the test asserts that
``deallocate`` is invoked with the default stream rather than failing
with ``TypeError`` and leaking.
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: Cursor <cursoragent@cursor.com>1 parent 598a966 commit 50c19d0
36 files changed
Lines changed: 396 additions & 250 deletions
File tree
- cuda_core
- cuda/core
- _memory
- graph
- docs/source/release
- examples
- tests
- graph
- helpers
- memory_ipc
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1394 | 1394 | | |
1395 | 1395 | | |
1396 | 1396 | | |
1397 | | - | |
| 1397 | + | |
1398 | 1398 | | |
1399 | 1399 | | |
1400 | 1400 | | |
1401 | 1401 | | |
1402 | 1402 | | |
1403 | | - | |
1404 | | - | |
1405 | 1403 | | |
1406 | 1404 | | |
1407 | 1405 | | |
| |||
1410 | 1408 | | |
1411 | 1409 | | |
1412 | 1410 | | |
1413 | | - | |
1414 | | - | |
1415 | | - | |
| 1411 | + | |
| 1412 | + | |
| 1413 | + | |
| 1414 | + | |
1416 | 1415 | | |
1417 | 1416 | | |
1418 | 1417 | | |
| |||
1421 | 1420 | | |
1422 | 1421 | | |
1423 | 1422 | | |
1424 | | - | |
| 1423 | + | |
1425 | 1424 | | |
1426 | 1425 | | |
1427 | 1426 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
15 | | - | |
| 15 | + | |
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
| |||
206 | 206 | | |
207 | 207 | | |
208 | 208 | | |
209 | | - | |
| 209 | + | |
210 | 210 | | |
211 | 211 | | |
212 | 212 | | |
| |||
220 | 220 | | |
221 | 221 | | |
222 | 222 | | |
223 | | - | |
224 | | - | |
225 | | - | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
226 | 227 | | |
227 | 228 | | |
228 | 229 | | |
| |||
248 | 249 | | |
249 | 250 | | |
250 | 251 | | |
251 | | - | |
| 252 | + | |
252 | 253 | | |
253 | 254 | | |
254 | 255 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
460 | 460 | | |
461 | 461 | | |
462 | 462 | | |
463 | | - | |
| 463 | + | |
464 | 464 | | |
465 | 465 | | |
466 | 466 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
27 | | - | |
| 27 | + | |
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
| |||
49 | 49 | | |
50 | 50 | | |
51 | 51 | | |
52 | | - | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
53 | 67 | | |
54 | | - | |
55 | | - | |
56 | | - | |
57 | | - | |
| 68 | + | |
| 69 | + | |
58 | 70 | | |
59 | 71 | | |
60 | 72 | | |
| |||
119 | 131 | | |
120 | 132 | | |
121 | 133 | | |
122 | | - | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
123 | 139 | | |
124 | 140 | | |
125 | 141 | | |
| |||
158 | 174 | | |
159 | 175 | | |
160 | 176 | | |
161 | | - | |
| 177 | + | |
162 | 178 | | |
163 | | - | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
164 | 191 | | |
165 | 192 | | |
166 | 193 | | |
| |||
215 | 242 | | |
216 | 243 | | |
217 | 244 | | |
218 | | - | |
| 245 | + | |
219 | 246 | | |
220 | 247 | | |
221 | 248 | | |
| |||
490 | 517 | | |
491 | 518 | | |
492 | 519 | | |
493 | | - | |
| 520 | + | |
494 | 521 | | |
495 | 522 | | |
496 | 523 | | |
497 | 524 | | |
498 | 525 | | |
499 | 526 | | |
500 | | - | |
501 | | - | |
502 | | - | |
503 | | - | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
504 | 531 | | |
505 | 532 | | |
506 | 533 | | |
| |||
510 | 537 | | |
511 | 538 | | |
512 | 539 | | |
513 | | - | |
| 540 | + | |
514 | 541 | | |
515 | 542 | | |
516 | 543 | | |
| |||
519 | 546 | | |
520 | 547 | | |
521 | 548 | | |
522 | | - | |
523 | | - | |
524 | | - | |
525 | | - | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
526 | 553 | | |
527 | 554 | | |
528 | 555 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
17 | | - | |
| 17 | + | |
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
| |||
104 | 104 | | |
105 | 105 | | |
106 | 106 | | |
107 | | - | |
| 107 | + | |
108 | 108 | | |
109 | 109 | | |
110 | 110 | | |
111 | | - | |
112 | | - | |
| 111 | + | |
| 112 | + | |
113 | 113 | | |
114 | | - | |
| 114 | + | |
115 | 115 | | |
116 | 116 | | |
117 | 117 | | |
118 | | - | |
119 | | - | |
| 118 | + | |
| 119 | + | |
120 | 120 | | |
121 | 121 | | |
122 | 122 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
10 | | - | |
| 10 | + | |
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | | - | |
23 | 22 | | |
24 | 23 | | |
25 | 24 | | |
| |||
171 | 170 | | |
172 | 171 | | |
173 | 172 | | |
174 | | - | |
175 | | - | |
176 | | - | |
177 | | - | |
| 173 | + | |
178 | 174 | | |
179 | 175 | | |
180 | 176 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| 11 | + | |
11 | 12 | | |
12 | 13 | | |
13 | 14 | | |
| |||
27 | 28 | | |
28 | 29 | | |
29 | 30 | | |
30 | | - | |
| 31 | + | |
31 | 32 | | |
32 | 33 | | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
33 | 39 | | |
34 | 40 | | |
35 | 41 | | |
36 | 42 | | |
37 | 43 | | |
38 | | - | |
| 44 | + | |
39 | 45 | | |
40 | 46 | | |
41 | 47 | | |
42 | 48 | | |
43 | 49 | | |
44 | 50 | | |
45 | | - | |
46 | | - | |
| 51 | + | |
47 | 52 | | |
48 | | - | |
| 53 | + | |
| 54 | + | |
49 | 55 | | |
50 | 56 | | |
51 | 57 | | |
52 | 58 | | |
53 | 59 | | |
54 | 60 | | |
55 | 61 | | |
56 | | - | |
| 62 | + | |
57 | 63 | | |
58 | 64 | | |
59 | 65 | | |
| |||
62 | 68 | | |
63 | 69 | | |
64 | 70 | | |
65 | | - | |
66 | | - | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
67 | 74 | | |
| 75 | + | |
| 76 | + | |
68 | 77 | | |
69 | | - | |
| 78 | + | |
70 | 79 | | |
71 | 80 | | |
72 | 81 | | |
| |||
96 | 105 | | |
97 | 106 | | |
98 | 107 | | |
99 | | - | |
100 | | - | |
101 | | - | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
102 | 112 | | |
103 | | - | |
| 113 | + | |
| 114 | + | |
104 | 115 | | |
105 | 116 | | |
106 | 117 | | |
107 | 118 | | |
108 | 119 | | |
109 | 120 | | |
110 | 121 | | |
111 | | - | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
112 | 125 | | |
113 | | - | |
| 126 | + | |
114 | 127 | | |
115 | 128 | | |
116 | 129 | | |
| |||
0 commit comments