Overview
CUDA 13 introduced green contexts (cuGreenCtxCreate / cuCtxFromGreenCtx), which are a lightweight alternative to traditional contexts, that can be used to select a subset of device resources. This allows the developer to, for example, select SMs from distinct spatial partitions of the GPU and target them via CUDA stream operations, kernel launches, etc. As opposed to the current token-based rate limiter backed by an NVML-polling watcher thread in HAMi-core, the SM allocation in green contexts is driver enforced with no polling or per-launch coordination overhead.
Proposed Design
A new USE_GREEN_CTX=1 environment variable gates the feature. When set, the cuCtxCreate hook (already in place) will compute target_sms = rount(total_sms * CUDA_DEVICE_SM_LIMIT / 100), create a green context scoped to that count, and return the derived CUcontext to the caller. The existing CUDA_DEVICE_SM_LIMIT environment variable is reused as-is. If cuGreenCtxCreate fails (e.g., on a driver older than CUDA 13.1), the hook returns the error explicitly rather than silently falling back. Default behavior is unchanged. Without USE_GREEN_CTX=1, everything works exactly as before.
Overview
CUDA 13 introduced green contexts (cuGreenCtxCreate / cuCtxFromGreenCtx), which are a lightweight alternative to traditional contexts, that can be used to select a subset of device resources. This allows the developer to, for example, select SMs from distinct spatial partitions of the GPU and target them via CUDA stream operations, kernel launches, etc. As opposed to the current token-based rate limiter backed by an NVML-polling watcher thread in HAMi-core, the SM allocation in green contexts is driver enforced with no polling or per-launch coordination overhead.
Proposed Design
A new USE_GREEN_CTX=1 environment variable gates the feature. When set, the cuCtxCreate hook (already in place) will compute target_sms = rount(total_sms * CUDA_DEVICE_SM_LIMIT / 100), create a green context scoped to that count, and return the derived CUcontext to the caller. The existing CUDA_DEVICE_SM_LIMIT environment variable is reused as-is. If cuGreenCtxCreate fails (e.g., on a driver older than CUDA 13.1), the hook returns the error explicitly rather than silently falling back. Default behavior is unchanged. Without USE_GREEN_CTX=1, everything works exactly as before.