From da321a9c7759b49aeabc969fa51b88a975ee3281 Mon Sep 17 00:00:00 2001 From: Cedric AUGONNET Date: Thu, 9 Apr 2026 14:17:23 +0200 Subject: [PATCH 1/2] Support interop with active green contexts Teach the CUDA runtime to borrow externally activated green-context-derived execution contexts while continuing to reject ordinary non-primary contexts, and scope loaded kernels to the active execution context so resets do not reuse stale handles. Protect destructive reset/close paths from touching borrowed contexts, add CUDA 13 interop coverage, and document the supported phase-1 behavior. Made-with: Cursor --- GREEN_CONTEXT_SUPPORT.md | 111 +++++++++ numba_cuda/numba/cuda/api.py | 2 + numba_cuda/numba/cuda/codegen.py | 18 +- numba_cuda/numba/cuda/cuda_paths.py | 19 +- numba_cuda/numba/cuda/cudadrv/devices.py | 51 +++- numba_cuda/numba/cuda/cudadrv/driver.py | 50 +++- numba_cuda/numba/cuda/cudadrv/nvrtc.py | 8 +- .../cuda/tests/cudadrv/test_context_stack.py | 218 +++++++++++++++++- 8 files changed, 448 insertions(+), 29 deletions(-) create mode 100644 GREEN_CONTEXT_SUPPORT.md diff --git a/GREEN_CONTEXT_SUPPORT.md b/GREEN_CONTEXT_SUPPORT.md new file mode 100644 index 000000000..6ecd94c39 --- /dev/null +++ b/GREEN_CONTEXT_SUPPORT.md @@ -0,0 +1,111 @@ +# Green Context Support in `numba-cuda` + +This note summarizes the current state of green-context support in this repository, the implemented scope, and the remaining limitations. + +## Current Status + +`numba-cuda` now supports a limited, interop-first form of green-context support. + +The supported model is: + +- external code creates a CUDA green context, +- external code converts it to a `CUcontext` with `cuCtxFromGreenCtx()`, +- external code makes that context current, +- Numba borrows the active context and operates inside it. + +This is intentionally narrower than full first-class green-context support. + +## What Works In Phase 1 + +When a green-context-derived `CUcontext` is already active: + +- `cuda.current_context()` can return a Numba context wrapper for it, +- `@cuda.require_context` APIs can use it, +- `cuda.device_array()` works in the active green context, +- `@cuda.jit` kernels can be loaded and launched in that context, +- `cuda.external_stream()` can wrap a stream created for that green context, +- CUDA Array Interface import and stream synchronization work in that context. + +The implementation also keeps loaded CUDA functions context-aware so that a kernel loaded in one execution context is not reused incorrectly in another execution context on the same device. + +## What Is Still Rejected + +This change does not make all non-primary contexts valid. + +The following is still intentionally rejected: + +- ordinary non-primary contexts created with APIs such as `cuCtxCreate()`, +- any non-primary context that is not recognized as a green context. + +The historical error: + +```text +RuntimeError: Numba cannot operate on non-primary CUDA context +``` + +still applies to those unsupported contexts. + +## Ownership Model + +Green-context-derived contexts are treated as borrowed, externally managed contexts. + +In practice this means: + +- Numba can use the active green context, +- Numba does not claim ownership of creating or destroying that context, +- Numba does not store it as the device primary context, +- destructive subsystem reset is blocked while borrowed contexts are still live. + +This prevents `cuda.close()` or `devices.reset()` from accidentally resetting or releasing state that was not created and owned by Numba. + +## Context And Cache Behavior + +The implementation now distinguishes between: + +- the device primary context, and +- borrowed green-context-derived execution contexts. + +Loaded CUDA function handles are cached by execution-context identity instead of only by `device.id`. + +Context reset also advances a context-generation key so that handles tied to unloaded modules are not reused after `Context.reset()`. + +The cached cubin path also recreates a fresh `ObjectCode` wrapper per load so that unloading one module does not leave later loads with a stale object handle. + +## Explicit Phase 1 Limits + +The following are still out of scope: + +- public APIs to create green contexts from Numba, +- public APIs to select green contexts through `cuda.gpus[...]`, +- changing `cuda.select_device()` to choose green contexts, +- broad support for arbitrary multi-context-per-device workflows, +- multithreaded use of the same green context, +- treating all non-primary contexts as equivalent to green contexts. + +`cuda.gpus[...]` and `cuda.select_device()` remain primary-context APIs. + +## Practical Constraints + +The current design is still mostly device-centric: + +- each device still has one retained primary context managed by Numba, +- borrowed green contexts are attached by handle only when already active, +- reset and shutdown semantics are conservative whenever external ownership is involved. + +This keeps the implementation compatible with the existing primary-context model while adding a narrow green-context interop path. + +## Remaining Work For Broader Support + +Full green-context support would still require additional work, including: + +1. Public APIs for creating and managing green contexts. +2. A clearer execution-context abstraction across more subsystems. +3. Broader auditing of context-sensitive caches and resource ownership. +4. Defined behavior for multiple execution contexts on one device beyond the interop path. +5. Clear multithreading rules for borrowed green contexts. + +## Bottom Line + +Green-context interop is now supported when the green-context-derived `CUcontext` is created and activated externally first. + +Full first-class green-context management is not implemented yet, and non-green non-primary contexts remain unsupported. diff --git a/numba_cuda/numba/cuda/api.py b/numba_cuda/numba/cuda/api.py index 31688d607..ba3ff4231 100644 --- a/numba_cuda/numba/cuda/api.py +++ b/numba_cuda/numba/cuda/api.py @@ -511,6 +511,8 @@ def close(): Explicitly clears all contexts in the current thread, and destroys all contexts if the current thread is the main thread. """ + devices.require_resettable() + # Must clear memsys object in case it has been used already from .memory_management import rtsys diff --git a/numba_cuda/numba/cuda/codegen.py b/numba_cuda/numba/cuda/codegen.py index 85770e83e..465e36dee 100644 --- a/numba_cuda/numba/cuda/codegen.py +++ b/numba_cuda/numba/cuda/codegen.py @@ -182,11 +182,11 @@ def __init__( self._ptx_cache = {} # Maps CC -> LTO-IR self._ltoir_cache = {} - # Maps CC -> cubin + # Maps CC -> cubin bytes self._cubin_cache = {} # Maps CC -> linker info output for cubin self._linkerinfo_cache = {} - # Maps Device numeric ID -> cufunc + # Maps execution-context cache key -> cufunc self._cufunc_cache = {} self._max_registers = max_registers @@ -321,9 +321,9 @@ def _link_all(self, linker, cc, ignore_nonlto=False): def get_cubin(self, cc=None): cc = self._ensure_cc(cc) - cubin = self._cubin_cache.get(cc, None) - if cubin: - return cubin + cubin_code = self._cubin_cache.get(cc, None) + if cubin_code is not None: + return driver.ObjectCode.from_cubin(cubin_code, name=self._name) if self._lto and config.DUMP_ASSEMBLY: ptx = self.get_lto_ptx(cc=cc) @@ -338,7 +338,7 @@ def get_cubin(self, cc=None): self._link_all(linker, cc, ignore_nonlto=False) cubin = linker.complete() - self._cubin_cache[cc] = cubin + self._cubin_cache[cc] = cubin.code self._linkerinfo_cache[cc] = linker.info_log return cubin @@ -352,9 +352,9 @@ def get_cufunc(self): raise RuntimeError(msg) ctx = devices.get_context() - device = ctx.device + cache_key = ctx.cache_key - cufunc = self._cufunc_cache.get(device.id, None) + cufunc = self._cufunc_cache.get(cache_key, None) if cufunc: return cufunc cubin = self.get_cubin() @@ -366,7 +366,7 @@ def get_cufunc(self): cufunc = module.get_function(self._entry_name) # Populate caches - self._cufunc_cache[device.id] = cufunc + self._cufunc_cache[cache_key] = cufunc return cufunc diff --git a/numba_cuda/numba/cuda/cuda_paths.py b/numba_cuda/numba/cuda/cuda_paths.py index 1c7f8e827..d23dc9a21 100644 --- a/numba_cuda/numba/cuda/cuda_paths.py +++ b/numba_cuda/numba/cuda/cuda_paths.py @@ -14,6 +14,9 @@ from contextlib import contextmanager _env_path_tuple = namedtuple("_env_path_tuple", ["by", "info"]) +_header_dir_info = namedtuple( + "_header_dir_info", ["found_via", "abs_path"] +) SEARCH_PRIORITY = [ "Conda environment", @@ -82,6 +85,20 @@ def _find_valid_path(options): return "", None +def _locate_nvidia_header_directory(name): + locator = getattr(pathfinder, "locate_nvidia_header_directory", None) + if locator is None: + locator = getattr(pathfinder, "find_nvidia_header_directory", None) + if locator is None: + return None + located = locator(name) + if located is None: + return None + if hasattr(located, "abs_path") and hasattr(located, "found_via"): + return located + return _header_dir_info("cuda.pathfinder", located) + + def _get_libdevice_path_decision(): options = _build_options( [ @@ -425,7 +442,7 @@ def get_current_cuda_target_name(): def _get_include_dir(): """Find the root include directory.""" - located_header_dir = pathfinder.locate_nvidia_header_directory("cudart") + located_header_dir = _locate_nvidia_header_directory("cudart") if located_header_dir is not None: if not os.path.exists( os.path.join( diff --git a/numba_cuda/numba/cuda/cudadrv/devices.py b/numba_cuda/numba/cuda/cudadrv/devices.py index a36994380..acbe942db 100644 --- a/numba_cuda/numba/cuda/cudadrv/devices.py +++ b/numba_cuda/numba/cuda/cudadrv/devices.py @@ -18,7 +18,7 @@ import threading from contextlib import contextmanager -from .driver import driver +from .driver import driver, is_green_context_handle class _DeviceList: @@ -91,8 +91,8 @@ def __str__(self): class _Runtime: """Emulate the CUDA runtime context management. - It owns all Devices and Contexts. - Keeps at most one Context per Device + It owns all Devices and primary Contexts, and borrows active external + contexts by handle when needed. """ def __init__(self): @@ -130,8 +130,8 @@ def ensure_context(self): def get_or_create_context(self, devnum): """Returns the primary context and push+create it if needed - for *devnum*. If *devnum* is None, use the active CUDA context (must - be primary) or create a new one with ``devnum=0``. + for *devnum*. If *devnum* is None, use the active CUDA context or + create a new one with ``devnum=0``. """ if devnum is None: attached_ctx = self._get_attached_context() @@ -155,10 +155,15 @@ def _get_or_create_context_uncached(self, devnum): if not ac: return self._activate_context_for(0) else: - # Get primary context for the active device - ctx = self.gpus[ac.devnum].get_primary_context() - # Is active context the primary context? - if ctx.handle != ac.context_handle: + gpu = self.gpus[ac.devnum] + primary_ctx = gpu.get_primary_context() + if primary_ctx.handle == ac.context_handle: + ctx = primary_ctx + elif is_green_context_handle(ac.context_handle): + ctx = gpu.get_or_create_borrowed_context( + ac.context_handle + ) + else: raise RuntimeError( "Numba cannot operate on non-primary" f" CUDA context {int(ac.context_handle):x}" @@ -190,6 +195,7 @@ def reset(self): """Clear all contexts in the thread. Destroy the context if and only if we are in the main thread. """ + self._ensure_resettable() # Pop all active context. while driver.pop_active_context() is not None: pass @@ -203,6 +209,28 @@ def _destroy_all_contexts(self): for gpu in self.gpus: gpu.reset() + def _ensure_resettable(self): + if self._has_borrowed_contexts(): + raise RuntimeError( + "Cannot reset CUDA subsystem while borrowed CUDA contexts " + "are still live" + ) + + with driver.get_active_context() as ac: + if not ac: + return + + gpu = self.gpus[ac.devnum] + primary_ctx = gpu.get_primary_context() + if primary_ctx.handle != ac.context_handle: + raise RuntimeError( + "Cannot reset CUDA subsystem while a non-primary CUDA " + "context is active" + ) + + def _has_borrowed_contexts(self): + return any(gpu.has_borrowed_contexts() for gpu in self.gpus) + _runtime = _Runtime() @@ -245,3 +273,8 @@ def reset(): """ _runtime.reset() + + +def require_resettable(): + """Raise if a destructive reset would touch externally-managed contexts.""" + _runtime._ensure_resettable() diff --git a/numba_cuda/numba/cuda/cudadrv/driver.py b/numba_cuda/numba/cuda/cudadrv/driver.py index a126ff96d..4a0dc7334 100644 --- a/numba_cuda/numba/cuda/cudadrv/driver.py +++ b/numba_cuda/numba/cuda/cudadrv/driver.py @@ -463,6 +463,23 @@ def __bool__(self): driver = Driver() +def is_green_context_handle(handle): + if not hasattr(binding, "CUgreenCtx") or not hasattr( + binding, "cuGreenCtxGetId" + ): + return False + + green_handle = binding.CUgreenCtx(int(handle)) + try: + driver.cuGreenCtxGetId(green_handle) + except CudaAPIError as e: + if e.code == binding.CUresult.CUDA_ERROR_INVALID_CONTEXT: + return False + raise + else: + return True + + class Device: """ The device object owns the CUDA contexts. This is owned by the driver @@ -491,6 +508,7 @@ def __init__(self, devnum: int) -> None: self.name = self._dev.name self.uuid = f"GPU-{self._dev.uuid}" self.primary_context = None + self._borrowed_contexts = weakref.WeakValueDictionary() def get_device_identity(self): return { @@ -534,17 +552,30 @@ def get_primary_context(self): f"{self} has compute capability < {MIN_REQUIRED_CC}" ) - self._dev.set_current() - if CUDA_CORE_GT_0_6: - ctx_handle = self._dev.context.handle - else: - ctx_handle = self._dev.context._handle + ctx_handle = driver.cuDevicePrimaryCtxRetain(self.id) self.primary_context = ctx = Context( weakref.proxy(self), ctx_handle, ) return ctx + def get_or_create_borrowed_context(self, handle): + handle_value = int(handle) + + if (ctx := self.primary_context) is not None: + if int(ctx.handle) == handle_value: + return ctx + + if (ctx := self._borrowed_contexts.get(handle_value)) is not None: + return ctx + + ctx = Context(weakref.proxy(self), handle, borrowed=True) + self._borrowed_contexts[handle_value] = ctx + return ctx + + def has_borrowed_contexts(self): + return bool(self._borrowed_contexts) + def release_primary_context(self): """ Release reference to primary context if it has been retained. @@ -1054,9 +1085,11 @@ class Context: Contexts should not be constructed directly by user code. """ - def __init__(self, device, handle): + def __init__(self, device, handle, borrowed=False): self.device = device self.handle = handle + self.borrowed = borrowed + self._generation = 0 self.allocations = utils.UniqueDict() self.deallocations = _PendingDeallocs() _ensure_memory_manager() @@ -1069,6 +1102,7 @@ def reset(self): """ Clean up all owned resources in this context. """ + self._generation += 1 # Free owned resources _logger.info("reset context of device %s", self.device.id) self.memory_manager.reset() @@ -1144,6 +1178,10 @@ def prepare_for_use(self): """ self.memory_manager.initialize() + @property + def cache_key(self): + return int(self.handle), self._generation + def push(self): """ Pushes this context on the current CPU Thread. diff --git a/numba_cuda/numba/cuda/cudadrv/nvrtc.py b/numba_cuda/numba/cuda/cudadrv/nvrtc.py index 8eb881b6c..1fde818b2 100644 --- a/numba_cuda/numba/cuda/cudadrv/nvrtc.py +++ b/numba_cuda/numba/cuda/cudadrv/nvrtc.py @@ -5,9 +5,11 @@ CCSupportError, ) from numba.cuda import config -from numba.cuda.cuda_paths import get_cuda_paths +from numba.cuda.cuda_paths import ( + get_cuda_paths, + _locate_nvidia_header_directory, +) from numba.cuda.utils import _readenv -from cuda import pathfinder import os import warnings import functools @@ -115,7 +117,7 @@ def compile(src, name, cc, ltoir=False, lineinfo=False, debug=False): elif nvrtc_ver_major == 13: numba_include = f"{os.path.join(numba_cuda_path, 'include', '13')}" - cccl_found_header_dir = pathfinder.locate_nvidia_header_directory("cccl") + cccl_found_header_dir = _locate_nvidia_header_directory("cccl") if cccl_found_header_dir is not None: # TODO: Not every kernel needs cccl, so it shouldn't # be added to the include path for every kernel. diff --git a/numba_cuda/numba/cuda/tests/cudadrv/test_context_stack.py b/numba_cuda/numba/cuda/tests/cudadrv/test_context_stack.py index 9304b25ec..74275d054 100644 --- a/numba_cuda/numba/cuda/tests/cudadrv/test_context_stack.py +++ b/numba_cuda/numba/cuda/tests/cudadrv/test_context_stack.py @@ -2,9 +2,18 @@ # SPDX-License-Identifier: BSD-2-Clause import numbers +from contextlib import contextmanager +from unittest.mock import patch + +import numpy as np from numba import cuda -from numba.cuda.testing import unittest, CUDATestCase, skip_on_cudasim +from numba.cuda.testing import ( + unittest, + CUDATestCase, + ForeignArray, + skip_on_cudasim, +) from numba.cuda.cudadrv import driver @@ -155,5 +164,212 @@ def foo(a): self.test_attached_primary(do) +@skip_on_cudasim("CUDA HW required") +class TestGreenContextInterop(CUDATestCase): + def tearDown(self): + super().tearDown() + with driver.driver.get_active_context() as ac: + if ac: + cuda.current_context().reset() + + def _require_green_context_support(self): + if driver.driver.get_version() < (13, 0): + self.skipTest("CUDA 13+ required for green contexts") + + required = ( + "CUgreenCtxCreate_flags", + "cuDeviceGetDevResource", + "cuDevResourceGenerateDesc", + "cuGreenCtxCreate", + "cuCtxFromGreenCtx", + "cuGreenCtxStreamCreate", + "cuGreenCtxDestroy", + "cuStreamGetGreenCtx", + ) + missing = [ + name for name in required if not hasattr(driver.binding, name) + ] + if missing: + self.skipTest( + "Green context bindings are unavailable: " + + ", ".join(sorted(missing)) + ) + + @contextmanager + def green_context(self): + self._require_green_context_support() + + the_driver = driver.driver + binding = driver.binding + dev = binding.CUdevice(0) + green_ctx = None + ctx_handle = None + stream = None + + try: + resource = the_driver.cuDeviceGetDevResource( + dev, binding.CUdevResourceType.CU_DEV_RESOURCE_TYPE_SM + ) + desc = the_driver.cuDevResourceGenerateDesc([resource], 1) + green_ctx = the_driver.cuGreenCtxCreate( + desc, + dev, + binding.CUgreenCtxCreate_flags.CU_GREEN_CTX_DEFAULT_STREAM.value, + ) + ctx_handle = the_driver.cuCtxFromGreenCtx(green_ctx) + stream = the_driver.cuGreenCtxStreamCreate( + green_ctx, + binding.CUstream_flags.CU_STREAM_NON_BLOCKING.value, + 0, + ) + the_driver.cuCtxPushCurrent(ctx_handle) + except driver.CudaAPIError as e: + if green_ctx is not None: + the_driver.cuGreenCtxDestroy(green_ctx) + self.skipTest(f"Green contexts are unavailable: {e}") + + try: + yield ctx_handle, stream + finally: + if stream is not None: + the_driver.cuStreamDestroy(stream) + if ctx_handle is not None: + popped = the_driver.cuCtxPopCurrent() + self.assertEqual(int(popped), int(ctx_handle)) + if green_ctx is not None: + the_driver.cuGreenCtxDestroy(green_ctx) + + def test_attached_green_context(self): + with self.green_context() as (ctx_handle, _): + my_ctx = cuda.current_context() + self.assertEqual(int(my_ctx.handle), int(ctx_handle)) + self.assertTrue(my_ctx.borrowed) + + def test_cudajit_in_attached_green_context(self): + with self.green_context() as (_, stream_handle): + stream = cuda.external_stream(int(stream_handle)) + + @cuda.jit + def fill(a): + i = cuda.grid(1) + if i < a.size: + a[i] = i + + a = cuda.device_array(10, dtype=np.int32) + fill[1, 10, stream](a) + stream.synchronize() + + np.testing.assert_array_equal( + a.copy_to_host(), np.arange(10, dtype=np.int32) + ) + + def test_cuda_array_interface_sync_in_green_context(self): + with self.green_context() as (_, stream_handle): + stream = cuda.external_stream(int(stream_handle)) + foreign = ForeignArray( + cuda.device_array(10, dtype=np.int32, stream=stream) + ) + + @cuda.jit + def touch(arr): + i = cuda.grid(1) + if i < arr.size: + arr[i] = i + 1 + + with patch.object( + cuda.cudadrv.driver.Stream, "synchronize", return_value=None + ) as mock_sync: + imported = cuda.as_cuda_array(foreign) + + self.assertTrue(imported.stream.external) + self.assertEqual(int(imported.stream.handle), int(stream_handle)) + mock_sync.assert_called_once_with() + + with patch.object( + cuda.cudadrv.driver.Stream, "synchronize", return_value=None + ) as mock_sync: + touch[1, 10](foreign) + + mock_sync.assert_called_once_with() + + def test_cufunc_cache_is_context_specific(self): + from numba import types + + sig = (types.int32[::1],) + + @cuda.jit(sig) + def fill(a): + i = cuda.grid(1) + if i < a.size: + a[i] = i + + primary_ctx = cuda.current_context() + primary = cuda.device_array(10, dtype=np.int32) + fill[1, 10](primary) + primary_key = primary_ctx.cache_key + + with self.green_context() as (_, _stream_handle): + green_ctx = cuda.current_context() + green = cuda.device_array(10, dtype=np.int32) + fill[1, 10](green) + np.testing.assert_array_equal( + green.copy_to_host(), np.arange(10, dtype=np.int32) + ) + green_key = green_ctx.cache_key + + fill[1, 10](primary) + np.testing.assert_array_equal( + primary.copy_to_host(), np.arange(10, dtype=np.int32) + ) + + cufunc_cache = fill.overloads[sig]._codelibrary._cufunc_cache + self.assertIn(primary_key, cufunc_cache) + self.assertIn(green_key, cufunc_cache) + self.assertEqual(len(cufunc_cache), 2) + + def test_borrowed_context_reset_reloads_modules(self): + with self.green_context(): + @cuda.jit + def fill(a): + i = cuda.grid(1) + if i < a.size: + a[i] = i + + ctx = cuda.current_context() + before = ctx.cache_key + + first = cuda.device_array(10, dtype=np.int32) + fill[1, 10](first) + np.testing.assert_array_equal( + first.copy_to_host(), np.arange(10, dtype=np.int32) + ) + overload = next(iter(fill.overloads.values())) + + ctx.reset() + after = ctx.cache_key + self.assertNotEqual(before, after) + + second = cuda.device_array(10, dtype=np.int32) + fill[1, 10](second) + np.testing.assert_array_equal( + second.copy_to_host(), np.arange(10, dtype=np.int32) + ) + + cufunc_cache = overload._codelibrary._cufunc_cache + self.assertIn(before, cufunc_cache) + self.assertIn(after, cufunc_cache) + + def test_close_rejected_in_borrowed_context(self): + with self.green_context(): + cuda.current_context() + with self.assertRaises(RuntimeError) as raises: + cuda.close() + + self.assertIn( + "borrowed CUDA contexts are still live", + str(raises.exception), + ) + + if __name__ == "__main__": unittest.main() From fb1885b00ecf1dc121bc164f8e1faababc1ae809 Mon Sep 17 00:00:00 2001 From: Cedric AUGONNET Date: Thu, 9 Apr 2026 14:21:14 +0200 Subject: [PATCH 2/2] Remove temporary green context support note Keep the branch focused on the implementation and tests by dropping the temporary support summary from the committed changes. Made-with: Cursor --- GREEN_CONTEXT_SUPPORT.md | 111 --------------------------------------- 1 file changed, 111 deletions(-) delete mode 100644 GREEN_CONTEXT_SUPPORT.md diff --git a/GREEN_CONTEXT_SUPPORT.md b/GREEN_CONTEXT_SUPPORT.md deleted file mode 100644 index 6ecd94c39..000000000 --- a/GREEN_CONTEXT_SUPPORT.md +++ /dev/null @@ -1,111 +0,0 @@ -# Green Context Support in `numba-cuda` - -This note summarizes the current state of green-context support in this repository, the implemented scope, and the remaining limitations. - -## Current Status - -`numba-cuda` now supports a limited, interop-first form of green-context support. - -The supported model is: - -- external code creates a CUDA green context, -- external code converts it to a `CUcontext` with `cuCtxFromGreenCtx()`, -- external code makes that context current, -- Numba borrows the active context and operates inside it. - -This is intentionally narrower than full first-class green-context support. - -## What Works In Phase 1 - -When a green-context-derived `CUcontext` is already active: - -- `cuda.current_context()` can return a Numba context wrapper for it, -- `@cuda.require_context` APIs can use it, -- `cuda.device_array()` works in the active green context, -- `@cuda.jit` kernels can be loaded and launched in that context, -- `cuda.external_stream()` can wrap a stream created for that green context, -- CUDA Array Interface import and stream synchronization work in that context. - -The implementation also keeps loaded CUDA functions context-aware so that a kernel loaded in one execution context is not reused incorrectly in another execution context on the same device. - -## What Is Still Rejected - -This change does not make all non-primary contexts valid. - -The following is still intentionally rejected: - -- ordinary non-primary contexts created with APIs such as `cuCtxCreate()`, -- any non-primary context that is not recognized as a green context. - -The historical error: - -```text -RuntimeError: Numba cannot operate on non-primary CUDA context -``` - -still applies to those unsupported contexts. - -## Ownership Model - -Green-context-derived contexts are treated as borrowed, externally managed contexts. - -In practice this means: - -- Numba can use the active green context, -- Numba does not claim ownership of creating or destroying that context, -- Numba does not store it as the device primary context, -- destructive subsystem reset is blocked while borrowed contexts are still live. - -This prevents `cuda.close()` or `devices.reset()` from accidentally resetting or releasing state that was not created and owned by Numba. - -## Context And Cache Behavior - -The implementation now distinguishes between: - -- the device primary context, and -- borrowed green-context-derived execution contexts. - -Loaded CUDA function handles are cached by execution-context identity instead of only by `device.id`. - -Context reset also advances a context-generation key so that handles tied to unloaded modules are not reused after `Context.reset()`. - -The cached cubin path also recreates a fresh `ObjectCode` wrapper per load so that unloading one module does not leave later loads with a stale object handle. - -## Explicit Phase 1 Limits - -The following are still out of scope: - -- public APIs to create green contexts from Numba, -- public APIs to select green contexts through `cuda.gpus[...]`, -- changing `cuda.select_device()` to choose green contexts, -- broad support for arbitrary multi-context-per-device workflows, -- multithreaded use of the same green context, -- treating all non-primary contexts as equivalent to green contexts. - -`cuda.gpus[...]` and `cuda.select_device()` remain primary-context APIs. - -## Practical Constraints - -The current design is still mostly device-centric: - -- each device still has one retained primary context managed by Numba, -- borrowed green contexts are attached by handle only when already active, -- reset and shutdown semantics are conservative whenever external ownership is involved. - -This keeps the implementation compatible with the existing primary-context model while adding a narrow green-context interop path. - -## Remaining Work For Broader Support - -Full green-context support would still require additional work, including: - -1. Public APIs for creating and managing green contexts. -2. A clearer execution-context abstraction across more subsystems. -3. Broader auditing of context-sensitive caches and resource ownership. -4. Defined behavior for multiple execution contexts on one device beyond the interop path. -5. Clear multithreading rules for borrowed green contexts. - -## Bottom Line - -Green-context interop is now supported when the green-context-derived `CUcontext` is created and activated externally first. - -Full first-class green-context management is not implemented yet, and non-green non-primary contexts remain unsupported.