Skip to content

Add green context support#1976

Draft
leofang wants to merge 6 commits intoNVIDIA:mainfrom
leofang:leof/green-ctx-v1
Draft

Add green context support#1976
leofang wants to merge 6 commits intoNVIDIA:mainfrom
leofang:leof/green-ctx-v1

Conversation

@leofang
Copy link
Copy Markdown
Member

@leofang leofang commented Apr 25, 2026

Close #1563. Close #112.

Summary

Add green context support to cuda.core v1.0 — the push-model API for querying device resources, splitting SMs, and creating/using green contexts.

Design

See the companion design doc for full rationale. Key decisions:

  • Unified Context type — no user-visible GreenContext subclass. A single Context wraps either a primary CUcontext or a CUgreenCtx + derived CUcontext. ctx.is_green distinguishes them. Inspired by the CUDA runtime's execution-context (EC) abstraction.
  • dev.resources namespaceDeviceResources groups hardware resource queries (dev.resources.sm, dev.resources.workqueue). Follows the existing "plural = namespace" pattern (dev.properties, kernel.attributes).
  • SMResourceOptions with SoA broadcasting — single dataclass for SMResource.split(). Scalar fields broadcast; count drives the group count. count=None means discovery mode (translated to smCount=0 internally).
  • Merged workqueue typesWorkqueueResource merges CU_DEV_RESOURCE_TYPE_WORKQUEUE_CONFIG and CU_DEV_RESOURCE_TYPE_WORKQUEUE under one user-facing class. Strings for option values (e.g. sharing_scope="green_ctx_balanced").
  • ContextOptions(resources=[...])dev.create_context() — resource descriptor generation and cuGreenCtxCreate are internal. The user passes pre-split resource objects.
  • ctx.close() does not manage the context stack — the user must swap out via dev.set_current(prev) before closing. Closing a current context raises RuntimeError.

New public API

  • Device.resourcesDeviceResources (namespace: .sm, .workqueue)
  • SMResource — properties: sm_count, min_partition_size, coscheduled_alignment, flags, handle; method: split(options, *, dry_run=False)
  • SMResourceOptionscount, coscheduled_sm_count, preferred_coscheduled_sm_count
  • WorkqueueResource — method: configure(options)
  • WorkqueueResourceOptionssharing_scope
  • ContextOptions.resources — accepts Sequence[SMResource | WorkqueueResource]
  • Context.is_green — bool property

Implementation details

C++ handle layer (resource_handles.hpp/cpp):

  • GreenCtxHandle (shared_ptr<const CUgreenCtx>) — owning handle; destructor calls cuGreenCtxDestroy.
  • ContextBox gains a GreenCtxHandle field so the derived CUcontext keeps the green ctx alive. get_context_green_ctx() provides reverse lookup.
  • create_green_ctx_handle() combines cuDevResourceGenerateDesc + cuGreenCtxCreate in one call — the descriptor is transient (no DevResourceDescHandle needed since CUDA has no explicit destroy for it).
  • context_registry / stream_registry (HandleRegistry) deduplicate handles by raw CUDA pointer, enabling identity-preserving set_current swaps.

Bug fix — stream context tracking:

  • StreamBox now carries a ContextHandle dependency, populated at creation time.
  • get_stream_context() returns it without a driver call.
  • Stream._from_handle and Stream_ensure_ctx prefer the registry-backed handle before falling back to cuStreamGetCtx. This fixes a latent issue where streams created in a green context would lose their context association after a set_current swap.

Version guards:

  • Compile-time: IF CUDA_CORE_BUILD_MAJOR >= 13 gates cuDevSmResourceSplit (the general/structured form).
  • Runtime: cy_driver_version() >= (12, 4, 0) for all green ctx APIs; >= (13, 1, 0) for structured splits.
  • CUDA 12.x fallback: cuDevSmResourceSplitByCount for basic (homogeneous) splits. Per-group coscheduled_sm_count and heterogeneous counts require 13.1+ and raise NotImplementedError on 12.x.
  • Green ctx function pointers loaded via _get_optional_driver_fn — graceful NULL when bindings lack the symbol.

Test coverage

27 tests in test_green_context.py, organized with proper pytest fixtures and classes:

  • Fixtures: sm_resource, wq_resource, green_ctx (with CUDAError → skip), green_ctx_active (push/pop with try/finally), fill_kernel
  • _use_green_ctx context manager for safe push/pop in all tests — prevents context stack leaks on failure
  • TestSMResourceQuery — properties, arch constraints (pre-Hopper vs Hopper+)
  • TestWorkqueueResource — query, configure valid/invalid
  • TestSMResourceSplitValidation — scalar/Sequence mismatch, negative count, dry-run blocked
  • TestSMResourceSplit — single/two-group splits with arch-aligned counts, discovery mode, alignment, dry-run parity
  • TestGreenContextLifecycleis_green, identity-preserving swap, stream/event context tracking, close-while-current guard
  • TestGreenContextKernelLaunch — compile + launch + host-verify in green ctx, two independent green contexts with different fill values, SM + workqueue combined

Validation

CUDA_HOME=... pip install -e . --no-build-isolation
python -m pytest tests/test_green_context.py -v             # 26 passed, 1 skipped (arch)
python -m pytest tests/test_device.py tests/test_stream.py tests/test_event.py tests/test_context.py -v  # no regressions

-- Leo's bot

@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot Bot commented Apr 25, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions Bot added the cuda.core Everything related to the cuda.core module label Apr 25, 2026
@leofang leofang changed the title Add cuda.core green context v1 API Add green context support Apr 25, 2026
@leofang leofang added P0 High priority - Must do! feature New feature or request labels Apr 25, 2026
@leofang leofang self-assigned this Apr 25, 2026
@leofang leofang added this to the cuda.core v1.0.0 milestone Apr 25, 2026
Restructure tests into fixtures + classes with full resource cleanup:
- Fixtures: sm_resource, wq_resource, green_ctx (with CUDAError skip),
  green_ctx_active (with try/finally restore), fill_kernel
- _use_green_ctx context manager for safe push/pop in all tests
- TestSMResourceQuery: properties, arch constraints per CC
- TestSMResourceSplit: single/two-group splits, discovery, alignment,
  dry-run vs real parity
- TestGreenContextKernelLaunch: compile + launch + verify in green ctx,
  two independent green contexts, SM + workqueue combined

All set_current calls are paired with restore in finally blocks to
prevent context stack leaks on test failure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@leofang
Copy link
Copy Markdown
Member Author

leofang commented Apr 25, 2026

/ok to test ac5c0fc

@github-actions
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module feature New feature or request P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GreenContext: Support allocating SMs [EPIC] Support green contexts

1 participant