Commit f29d7fd
authored
cudax/stf: migrate internal/ launch + host_launch_scope from cuda_safe_call to cuda_try (#9249)
* cudax/stf: migrate internal/ launch + host_launch_scope to cuda_try
Third internal/ slice, covering the kernel/host launch scopes and their
shared event-timing pattern.
- Convert eligible calls to the templated cuda_try<F> form: cudaLaunchKernelExC,
cudaGraphAddKernelNode (out-param -> ref), cudaGraphKernelNodeSetAttribute,
cudaFreeAsync, cudaEventRecord (start), cudaGraphAddHostNode (out-param -> ref),
cudaLaunchHostFunc.
- cudaEventCreate and cudaMallocAsync stay in the runtime-status form: both are
overload sets (cuda_runtime.h flags overload / templated wrapper), so
cuda_try<F> cannot name them.
- Event timing's end record/synchronize/elapsed run inside the noexcept
SCOPE(exit) body, so they keep cuda_safe_call: a CUDA error there should
abort rather than throw through the guard (which would std::terminate).
- The two stream-path cudaLaunchHostFunc enqueues now get a SCOPE(fail) that
deletes the heap callback args (resolved / wrapper) if the enqueue throws --
the callback only takes ownership once the enqueue succeeds, so this closes
the leak the new throw path would otherwise introduce. The graph-path host
nodes are already covered because their args are owned by a ctx resource added
before the node is created.
Pre-existing and left as-is: the timing events created here are never
cudaEventDestroy'd (a leak in the calibration path, unrelated to this change).
* cudax/stf: use cuda_try<cudaEventCreateWithFlags> for timing event creation
cudaEventCreate is an overload set (cuda_runtime.h adds a flags overload),
so it cannot be named by the templated cuda_try<F> form. Use the
non-overloaded cudaEventCreateWithFlags with cudaEventDefault instead,
which is exactly what cudaEventCreate(&e) does internally, so behavior is
unchanged while keeping the templated form.
* cudax/stf: fix host_launch callback-arg ownership ordering
The host_launch callback args (resolved / wrapper) are heap-allocated and
guarded by SCOPE(fail) { delete ...; }. Transfer of ownership to the
graph-path ctx resource was happening in the wrong order:
- Untyped path: `resolved` was set to nullptr right after add_resource,
but it is also used as the host node's userData. That made the graph
node receive a null userData, so the callback dereferenced null on the
success path.
- Typed path: add_resource ran before cudaGraphAddHostNode, so a throw
from the node creation would delete `wrapper` twice (SCOPE(fail) plus
the resource's release_in_callback).
Fix both by creating the host node first (while resolved/wrapper is still
a valid userData), then handing ownership to the ctx resource, then
nulling the pointer once at the end to disarm SCOPE(fail). On a throw
before that point the resource has not been added, so SCOPE(fail) is the
sole owner and frees the args exactly once.
* cudax/stf: own host_launch callback args with unique_ptr
Replace the raw new + SCOPE(fail){delete} + manual nulling design for the
host_launch callback arguments with std::unique_ptr. The args are borrowed
via .get() for the host node userData / cudaLaunchHostFunc argument and the
ctx resource, and ownership is handed off with .release() once the node has
been created (graph) or the launch has been enqueued (stream). On a throw
before that point the unique_ptr frees the args; afterwards the ctx resource
(graph) or the callback (stream) owns and frees them. Adds <memory>.
* cudax/stf: free launch temp device memory via SCOPE(exit)
launch_impl allocates a temporary device buffer (cudaMallocAsync) and freed
it after cuda_launcher returned. Now that cuda_launcher throws on error (via
cuda_try<cudaLaunchKernelExC>), the trailing cudaFreeAsync was skipped on a
throw, leaking the buffer. Free it from a SCOPE(exit) placed right after the
allocation so it runs on both normal and exceptional exit. cuda_safe_call is
used inside the noexcept SCOPE(exit) body.
* cudax/stf: check cudaGetDevice in launch timing path
The cudaGetDevice call in the timing branch was unchecked. Use the
templated cuda_try<cudaGetDevice> form so a failure is reported.
* [STF] Initialize host_launch_scope timing events to nullptr
Match launch.cuh and satisfy GCC -Wmaybe-uninitialized: if cuda_try throws
before both events are created, SCOPE(exit) still runs with record_time set.1 parent b29b61a commit f29d7fd
2 files changed
Lines changed: 105 additions & 70 deletions
Lines changed: 65 additions & 46 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
| 38 | + | |
38 | 39 | | |
39 | 40 | | |
40 | 41 | | |
| |||
238 | 239 | | |
239 | 240 | | |
240 | 241 | | |
241 | | - | |
| 242 | + | |
242 | 243 | | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
243 | 248 | | |
244 | 249 | | |
245 | 250 | | |
246 | | - | |
247 | | - | |
248 | | - | |
249 | | - | |
250 | | - | |
251 | | - | |
252 | | - | |
253 | | - | |
254 | | - | |
255 | | - | |
256 | 251 | | |
257 | 252 | | |
258 | 253 | | |
259 | 254 | | |
260 | 255 | | |
261 | | - | |
| 256 | + | |
262 | 257 | | |
| 258 | + | |
| 259 | + | |
263 | 260 | | |
264 | 261 | | |
265 | 262 | | |
| |||
280 | 277 | | |
281 | 278 | | |
282 | 279 | | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
283 | 294 | | |
284 | 295 | | |
285 | 296 | | |
286 | | - | |
287 | | - | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
288 | 301 | | |
289 | 302 | | |
290 | 303 | | |
| |||
298 | 311 | | |
299 | 312 | | |
300 | 313 | | |
301 | | - | |
302 | | - | |
303 | | - | |
304 | | - | |
305 | | - | |
306 | | - | |
307 | | - | |
308 | 314 | | |
309 | | - | |
310 | | - | |
311 | | - | |
| 315 | + | |
| 316 | + | |
312 | 317 | | |
313 | | - | |
314 | | - | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
315 | 324 | | |
316 | 325 | | |
317 | 326 | | |
318 | 327 | | |
319 | | - | |
| 328 | + | |
320 | 329 | | |
321 | | - | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
322 | 335 | | |
323 | 336 | | |
324 | 337 | | |
325 | | - | |
| 338 | + | |
| 339 | + | |
326 | 340 | | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
327 | 346 | | |
328 | 347 | | |
329 | 348 | | |
| |||
338 | 357 | | |
339 | 358 | | |
340 | 359 | | |
341 | | - | |
342 | | - | |
343 | | - | |
344 | | - | |
345 | | - | |
346 | | - | |
347 | | - | |
348 | | - | |
| 360 | + | |
349 | 361 | | |
350 | 362 | | |
351 | | - | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
352 | 371 | | |
353 | 372 | | |
354 | 373 | | |
| |||
365 | 384 | | |
366 | 385 | | |
367 | 386 | | |
368 | | - | |
369 | | - | |
370 | | - | |
371 | | - | |
372 | | - | |
373 | 387 | | |
374 | 388 | | |
375 | 389 | | |
376 | 390 | | |
377 | | - | |
| 391 | + | |
378 | 392 | | |
379 | | - | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
380 | 398 | | |
381 | 399 | | |
382 | 400 | | |
383 | | - | |
| 401 | + | |
384 | 402 | | |
| 403 | + | |
385 | 404 | | |
386 | 405 | | |
387 | 406 | | |
| |||
Lines changed: 40 additions & 24 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
64 | 64 | | |
65 | 65 | | |
66 | 66 | | |
67 | | - | |
| 67 | + | |
68 | 68 | | |
69 | 69 | | |
70 | 70 | | |
| |||
81 | 81 | | |
82 | 82 | | |
83 | 83 | | |
84 | | - | |
| 84 | + | |
85 | 85 | | |
86 | 86 | | |
87 | 87 | | |
88 | 88 | | |
89 | 89 | | |
90 | 90 | | |
91 | 91 | | |
92 | | - | |
| 92 | + | |
93 | 93 | | |
94 | 94 | | |
95 | 95 | | |
| |||
120 | 120 | | |
121 | 121 | | |
122 | 122 | | |
123 | | - | |
| 123 | + | |
124 | 124 | | |
125 | 125 | | |
126 | 126 | | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
127 | 137 | | |
128 | 138 | | |
129 | 139 | | |
130 | 140 | | |
131 | 141 | | |
132 | | - | |
133 | | - | |
134 | | - | |
135 | | - | |
136 | | - | |
137 | 142 | | |
138 | 143 | | |
139 | 144 | | |
| |||
358 | 363 | | |
359 | 364 | | |
360 | 365 | | |
361 | | - | |
362 | | - | |
363 | | - | |
364 | | - | |
365 | | - | |
366 | | - | |
367 | | - | |
368 | | - | |
369 | | - | |
370 | | - | |
371 | | - | |
372 | | - | |
373 | | - | |
374 | | - | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
375 | 372 | | |
376 | 373 | | |
377 | 374 | | |
| |||
403 | 400 | | |
404 | 401 | | |
405 | 402 | | |
406 | | - | |
| 403 | + | |
407 | 404 | | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
408 | 408 | | |
409 | 409 | | |
410 | 410 | | |
| |||
426 | 426 | | |
427 | 427 | | |
428 | 428 | | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
429 | 445 | | |
430 | 446 | | |
431 | 447 | | |
| |||
0 commit comments