Skip to content

Commit 3199107

Browse files
committed
fix(core): retain cuLinkCreate optionValues array for CUlinkState lifetime
The CUDA driver docs state: "optionValues must remain valid for the life of the CUlinkState if output options are used." The driver writes log- fill sizes (output) back into the optionValues slots for CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES and CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES. Linker_init previously declared c_jit_keys/c_jit_values as local cdef vector[...] on the stack of Linker_init; they were destroyed when the function returned, leaving the driver with dangling writes during subsequent cuLinkAddData/cuLinkComplete/cuLinkDestroy calls. This was always latent. It became reachable with the per-instance backend dispatch (CTK 12.9.1 runners now select the driver linker when they pair with a driver 13 install), and only manifested on driver 13 as heap corruption that killed the next NVRTC or link call. Promote the two arrays to cdef class fields declared after _culink_handle in the pxd. Cython's tp_dealloc destroys C++ fields in pxd declaration order, so the vectors are destroyed after the shared_ptr deleter runs cuLinkDestroy. The cuda.bindings high-level wrapper (driver.cuLinkCreate) already handles this by attaching a keepalive to CUlinkState; cuda.core's low-level cydriver.cuLinkCreate path did not. Also drop the now-unused void_ptr ctypedef.
1 parent c25f5f9 commit 3199107

File tree

2 files changed

+33
-9
lines changed

2 files changed

+33
-9
lines changed

cuda_core/cuda/core/_linker.pxd

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,25 @@
22
#
33
# SPDX-License-Identifier: Apache-2.0
44

5+
from libcpp.vector cimport vector
6+
7+
from cuda.bindings cimport cydriver
8+
59
from ._resource_handles cimport NvJitLinkHandle, CuLinkHandle
610

711

812
cdef class Linker:
913
cdef:
1014
NvJitLinkHandle _nvjitlink_handle
1115
CuLinkHandle _culink_handle
16+
# _drv_jit_keys/_drv_jit_values are the C arrays handed to cuLinkCreate.
17+
# The driver retains a reference to the optionValues array for the life
18+
# of the CUlinkState (it writes back log-size outputs into its slots),
19+
# so these must live past cuLinkCreate and outlive cuLinkDestroy.
20+
# Declared after _culink_handle so their C++ destructors run AFTER
21+
# cuLinkDestroy executes during tp_dealloc.
22+
vector[cydriver.CUjit_option] _drv_jit_keys
23+
vector[void*] _drv_jit_values
1224
bint _use_nvjitlink
1325
object _drv_log_bufs # formatted_options list (driver); None for nvjitlink
1426
str _info_log # decoded log; None until link() or pre-link get_*_log()

cuda_core/cuda/core/_linker.pyx

Lines changed: 21 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,6 @@ from cuda.core._utils.cuda_utils import (
4242
from cuda.core._utils.version import driver_version
4343

4444
ctypedef const char* const_char_ptr
45-
ctypedef void* void_ptr
4645

4746
__all__ = ["Linker", "LinkerOptions"]
4847

@@ -460,8 +459,8 @@ cdef inline int Linker_init(Linker self, tuple object_codes, object options) exc
460459
cdef cydriver.CUlinkState c_raw_culink
461460
cdef Py_ssize_t c_num_opts, i
462461
cdef vector[const_char_ptr] c_str_opts
463-
cdef vector[cydriver.CUjit_option] c_jit_keys
464-
cdef vector[void_ptr] c_jit_values
462+
cdef cydriver.CUjit_option* c_drv_jit_keys_ptr
463+
cdef void** c_drv_jit_values_ptr
465464

466465
self._options = options = check_or_create_options(LinkerOptions, options, "Linker options")
467466

@@ -501,19 +500,32 @@ cdef inline int Linker_init(Linker self, tuple object_codes, object options) exc
501500
# the driver writes into via raw pointers during linking operations.
502501
self._drv_log_bufs = formatted_options
503502
c_num_opts = len(option_keys)
504-
c_jit_keys.resize(c_num_opts)
505-
c_jit_values.resize(c_num_opts)
503+
# Store the option key/value arrays as instance members so they outlive
504+
# the cuLinkCreate call. CUDA driver docs require optionValues to
505+
# remain valid for the life of the CUlinkState when output options are
506+
# used (the driver writes log-fill sizes back into the array). The
507+
# pxd declaration order ensures these vectors are destroyed AFTER
508+
# _culink_handle -- i.e. after cuLinkDestroy has run.
509+
self._drv_jit_keys.resize(c_num_opts)
510+
self._drv_jit_values.resize(c_num_opts)
506511
for i in range(c_num_opts):
507-
c_jit_keys[i] = <cydriver.CUjit_option><int>option_keys[i]
512+
self._drv_jit_keys[i] = <cydriver.CUjit_option><int>option_keys[i]
508513
val = formatted_options[i]
509514
if isinstance(val, bytearray):
510-
c_jit_values[i] = <void*>PyByteArray_AS_STRING(val)
515+
self._drv_jit_values[i] = <void*>PyByteArray_AS_STRING(val)
511516
else:
512-
c_jit_values[i] = <void*><intptr_t>int(val)
517+
self._drv_jit_values[i] = <void*><intptr_t>int(val)
518+
# Capture the vector data() pointers before entering nogil to keep
519+
# the nogil region free of any attribute access on self.
520+
c_drv_jit_keys_ptr = self._drv_jit_keys.data()
521+
c_drv_jit_values_ptr = self._drv_jit_values.data()
513522
try:
514523
with nogil:
515524
HANDLE_RETURN(cydriver.cuLinkCreate(
516-
<unsigned int>c_num_opts, c_jit_keys.data(), c_jit_values.data(), &c_raw_culink))
525+
<unsigned int>c_num_opts,
526+
c_drv_jit_keys_ptr,
527+
c_drv_jit_values_ptr,
528+
&c_raw_culink))
517529
except CUDAError as e:
518530
Linker_annotate_error_log(self, e)
519531
raise

0 commit comments

Comments
 (0)