CI: allow specifying custom driver versions in test matrix (#2176)

leofang · web-flow · commit 88b55cb64a7e · 2026-06-09T17:57:15.000Z
* CI: allow specifying custom driver versions in test matrix Extends the DRIVER field in ci/test-matrix.yml beyond 'latest'/'earliest' to accept an explicit version string (e.g. '580.65.06'). For Linux, ci/tools/install_gpu_driver.sh (adapted from nv-gha-runners/vm-images PR #256) swaps the driver in-job via nsenter when the row uses a custom version; for Windows, ci/tools/install_gpu_driver.ps1 is split into install + configure_driver_mode, with the install step gated on the DRIVER value and the mode step always running. The matrix row is routed to a 'latest' runner image when the DRIVER is a custom version (the install scripts perform the swap themselves). Container privileges on Linux (--privileged --pid=host) are added only on rows with a custom DRIVER. Custom DRIVER + FLAVOR=wsl is rejected eagerly in the compute-matrix step. Two existing nightly-numba-cuda rows exercise the new path: - Linux amd64 / 13.3.0 / l4 -> 580.65.06 - Windows amd64 / 13.3.0 / l4 -> 610.47 Closes #293 Closes #1265 * CI: fix Linux driver nsenter re-exec, swap Windows version, enable ci.yml dispatch - install_gpu_driver.sh: pipe the script body to the host-side bash via stdin (bash -s < "$0") instead of re-execing "$0". The script lives in the GH workspace mount (container-only), so the relative path doesn't resolve after nsenter switches the mount namespace. The < "$0" fd is opened before nsenter and survives the flip. - test-matrix.yml: Windows nightly-numba-cuda row 610.47 -> 596.36 (610.47 isn't published on the CDN; install hit 404). - ci.yml: add workflow_dispatch: trigger so the pipeline can be re-run manually. The existing should-skip / detect-changes gates already handle non-PR events. * CI: move 'Ensure GPU is working' after 'Install GPU driver' on Linux So nvidia-smi validates the post-install driver state on custom-DRIVER rows. Windows test-wheel + coverage already use Install -> Configure -> Ensure; this brings the Linux test-wheel job into line. * CI: flip two PR-matrix Linux rows to DRIVER=610.43.02 Exercises the custom-driver install path on every PR (not just nightly). Both rows are amd64 / 13.3.0 / local-CTK, on l4 and rtxpro6000 -- both in the 'open' kernel-module flavor (only Volta needs 'legacy'). * CI: restart nvidia-persistenced on Linux; poll nvidia-smi on Windows Linux: After install_gpu_driver.sh stops nvidia-persistenced and the apt purge removes the package, the .run installer reinstalls the systemd service but leaves it stopped. cuda.core's test_persistence_mode_enabled fails with NVML_ERROR_UNKNOWN on driver 610.43.02 when the daemon is not running; explicitly start it again at the end of host_install(). Windows: configure_driver_mode.ps1's trailing 'Start-Sleep -Seconds 5' is not enough on slower-coming-back-up multi-GPU rows (observed: 2x H100 MCDM). Replace it with a poll-until-success loop on nvidia-smi with a 60s deadline, matching the runner-team nvgha-driver.ps1 pattern. Previously masked because every Windows row used to run the full install pipeline; with custom-DRIVER plumbing, latest/earliest rows skip the install and the cycle is no longer preceded by warm-up time. * CI: re-enable persistence mode after Linux driver swap Runner-latest L4 images come up with Persistence-M=On (set somewhere in the runner team's image setup, not in cuda-python). Our .run install leaves it Off, which breaks cuda.core's test_persistence_mode_enabled on driver 610.43.02 -- the test calls device.is_persistence_mode_enabled = False on a device that already reports False, and 610.43.02 returns NVML_ERROR_UNKNOWN for that no-op set. Restore the runner baseline by calling `nvidia-smi -pm 1` at the end of host_install() (sets the kernel persistence flag directly via NVML). Also daemon-reload + start nvidia-persistenced.service best-effort so tools that look for the daemon find it; `set -x` around this trailing block so the next run's log confirms which lines fired. * CI: preserve SUID bit when refreshing container nvidia binaries refresh_container_libs() used 'cp -f --remove-destination' (verbatim from the runner team's nvgha-driver), which without -p/--preserve strips the SUID/SGID bits on the destination. /usr/bin/nvidia-modprobe ships 4755 and NVML's state-changing calls (e.g. nvmlDeviceSetPersistenceMode) route through it; once SUID is gone the container-side call returns NVML_ERROR_UNKNOWN, which is what cuda.core's test_persistence_mode_enabled was hitting. Add a stat diagnostic line at the end of refresh_container_libs() so the next CI log records nvidia-modprobe's post-refresh mode. * CI: exec nvidia-persistenced directly after Linux driver swap The `--silent --no-questions` .run installer drops /usr/bin/nvidia- persistenced but does not reliably install a usable systemd unit, so `systemctl start nvidia-persistenced.service` was a no-op (verified in CI logs: `+ true` after the start). With the daemon down, the /run/nvidia-persistenced/socket bind-mounted into the test container is stale, and NVML state-changing calls (e.g. nvmlDeviceSetPersistenceMode) made by root inside the container return NVML_ERROR_UNKNOWN -- which is what cuda.core's test_persistence_mode_enabled has been failing on. Verified on ComputeLab with the same driver (610.43.02), same GPU arch (Ada L40S), root in container: with the daemon up, the SET call returns NVML_SUCCESS; with the daemon down it returns UnknownError. Fix: exec /usr/bin/nvidia-persistenced directly. The binary self-daemonizes and creates the socket on its own. (Same latent gap exists in nv-gha-runners/vm-images' nvgha-driver; will flag upstream.) * CI: pass --user root to nvidia-persistenced after Linux driver swap nvidia-persistenced defaults to `--user nvidia-persistenced`, which our apt-purge of `nvidia-compute-utils-*` removed. Without that user the daemon's setuid(3) post-fork fails and the process exits silently -- the `nvidia-smi -pm 1` right after sees Persistence-M briefly On (daemon held it), then it flips back to Off (daemon gone), and the test container's NVML SET call later returns NVML_ERROR_UNKNOWN. Pass --user root so the daemon doesn't depend on a user account that the purge deleted. Also add a `pgrep nvidia-persistenced` + `ls -la /run/nvidia-persistenced/` diagnostic so the next CI log proves the daemon is alive when the test starts. * CI: add fast-feedback probe-driver-swap job (workflow_dispatch only) Allocates one L4 GPU + privileged container, runs install_gpu_driver.sh with DRIVER=610.43.02, then drives nvmlDeviceSetPersistenceMode via raw ctypes -- the exact NVML call that cuda.core's test_persistence_mode_enabled exercises. Exits 1 on NVML_ERROR_UNKNOWN so the smoke test fails loudly when the install path leaves the daemon dead. Total runtime ~5 min vs ~30 min for the full test matrix. Triggered by workflow_dispatch only -- this is an opt-in debugging job, not regular PR or nightly traffic. * CI: drop workflow_dispatch gate on probe-driver-swap so it runs on every PR * CI: stop refresh_container_libs from clobbering /run/nvidia-persistenced refresh_container_libs() walks /proc/self/mountinfo for entries containing 'nvidia' or 'libcuda'. /run/nvidia-persistenced/socket matches that pattern and was being umount'd + cp'd over -- which breaks the container's view of the daemon's IPC socket (the container ends up with a 0-link unlinked socket inode instead of the live host one). Without a working socket, NVML state-changing calls inside the container return NVML_ERROR_UNKNOWN -- which is exactly what cuda.core's test_persistence_mode_enabled was hitting. Restrict the refresh to /usr/(bin|lib) so it only touches the actual binaries + shared libraries that change version with the driver swap. /dev/nvidia*, /proc/driver/nvidia, /run/nvidia-*, /tmp/nvidia-mps are all left as the toolkit set them up. Same latent gap exists in nv-gha-runners/vm-images' nvgha-driver; their CUDA-runtime validation workload never queries the daemon socket so they haven't surfaced it. * CI: take down nvidia-persistenced via pkill, not systemctl The packaged nvidia-persistenced.service has `RuntimeDirectory=nvidia-persistenced`, which makes systemd `unlink()` /run/nvidia-persistenced/ when the unit stops. The container has that directory bind-mounted from the host as of container-start time. When systemd removes the inode and our subsequent `/usr/bin/nvidia-persistenced --user root` call re-creates it, the container's bind mount is stranded on the deleted inode -- its /run/nvidia-persistenced/socket shows up with link count 0 and NVML state-changing calls return NVML_ERROR_UNKNOWN. `pkill -TERM nvidia-persistenced` sends SIGTERM directly to the daemon, which exits cleanly without involving systemd's RuntimeDirectory cleanup. The host dir keeps its inode across the swap; the container's bind mount stays valid; the new daemon's socket is visible to in-container NVML clients. * CI: re-bind /run/nvidia-persistenced into container after driver swap The container's bind mount of /run/nvidia-persistenced/ is taken at container-start time and pinned to the host directory's then-current inode. Across the install the host directory gets recreated under a fresh inode (the daemon's shutdown + restart cycle replaces it), and the container is stranded on the deleted inode -- socket file shows up with link count 0 inside the container, NVML state-changing calls return NVML_ERROR_UNKNOWN. After refresh_container_libs, umount the stale bind, mkdir the local mount point if missing, and re-bind from /proc/1/root/run/nvidia- persistenced (the host's current view via the privileged container's host-pid-ns access). CAP_SYS_ADMIN required, which custom-DRIVER rows already grant via --privileged --pid=host. * CI: drop install_gpu_driver.sh experiments that turned out non-load-bearing - Revert `pkill -TERM nvidia-persistenced` to `systemctl stop`; pkill alone didn't prevent the host dir's inode from flipping, the re-bind of /run/nvidia-persistenced/ is what restores the container's view. - Drop `nvidia-smi -pm 1`; the test exercises NVML's set call, which succeeds once the daemon socket is reachable regardless of current Persistence-M state. - Trim `set -x` blocks and `pgrep`/`ls -la`/`stat` diagnostics that served their purpose during debugging. Keeps the load-bearing changes (nsenter bash -s, /usr/(bin|lib) refresh filter, exec nvidia-persistenced --user root, the /run/nvidia-persistenced re-bind, cp --preserve=mode) and brings the diff against Justin's nvgha-driver back down to the strict minimum. * Revert: remove the probe-driver-swap fast-feedback job Added in a3f1573 for fast iteration on install_gpu_driver.sh; no longer needed now that the script has stabilized. * CI: address Mike's review comments on PR 2176 - ci.yml: `workflow_dispatch:` -> `workflow_dispatch: {}` so the empty mapping reads as intentional rather than ambiguous YAML. - test-wheel-linux.yml: declare `util-linux` in `Install dependencies` instead of running a second apt-get inline; util-linux ships in ubuntu:22.04 by default so this is mostly belt-and-suspenders, but it removes the redundant apt-get call. - install_gpu_driver.sh: drop `2>/dev/null` on `systemctl stop` so real errors surface (`|| true` keeps the script non-fatal). The redirect was inherited verbatim from nv-gha-runners/vm-images PR 256 with no specific need.
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -24,6 +24,7 @@ on:
   schedule:
     # every 24 hours at midnight UTC
     - cron: "0 0 * * *"
+  workflow_dispatch: {}
 
 jobs:
   ci-vars:
diff --git a/.github/workflows/coverage.yml b/.github/workflows/coverage.yml
@@ -281,13 +281,15 @@ jobs:
         uses: nv-gha-runners/setup-proxy-cache@main
         continue-on-error: true
 
-      - name: Update driver
+      # DRIVER above is 'latest' so install_gpu_driver.ps1 is intentionally
+      # skipped (it errors on latest/earliest); configure_driver_mode.ps1
+      # still runs to put the pre-installed driver into TCC mode.
+      - name: Configure driver mode
         shell: powershell
         env:
           DRIVER_MODE: "TCC"
-          GPU_TYPE: "a100"
         run: |
-          ci/tools/install_gpu_driver.ps1
+          ci/tools/configure_driver_mode.ps1
 
       - name: Ensure GPU is working
         run: |
diff --git a/.github/workflows/test-wheel-linux.yml b/.github/workflows/test-wheel-linux.yml
@@ -85,8 +85,13 @@ jobs:
           # Read base matrix from YAML file for the specific architecture
           TEST_MATRIX=$(yq -o json ".linux[\"${MATRIX_TYPE}\"] | map(select(.ARCH == \"${ARCH}\"))" ci/test-matrix.yml)
 
-          # Apply matrix filter and wrap in include structure
-          MATRIX=$(echo "$TEST_MATRIX" | jq -c '${{ inputs.matrix_filter }} | if (. | length) > 0 then {include: .} else "Error: Empty matrix\n" | halt_error(1) end')
+          # Apply matrix filter; reject custom DRIVER + FLAVOR=wsl (the
+          # in-container driver swap doesn't work under WSL); add a
+          # RUNNER_DRIVER field that maps any custom version back to
+          # 'latest' (the install script swaps the driver itself, so we
+          # need to land on the runner that ships with the most recent
+          # pre-installed driver); wrap in include structure.
+          MATRIX=$(echo "$TEST_MATRIX" | jq -c '${{ inputs.matrix_filter }} | if any(.[]; .DRIVER != "latest" and .DRIVER != "earliest" and .FLAVOR == "wsl") then "Error: custom DRIVER is not supported with FLAVOR=wsl\n" | halt_error(1) else . end | map(. + {RUNNER_DRIVER: (if .DRIVER == "latest" or .DRIVER == "earliest" then .DRIVER else "latest" end)}) | if (. | length) > 0 then {include: .} else "Error: Empty matrix\n" | halt_error(1) end')
 
           echo "MATRIX=${MATRIX}" | tee --append "${GITHUB_OUTPUT}"
 
@@ -101,23 +106,23 @@ jobs:
     strategy:
       fail-fast: false
       matrix: ${{ fromJSON(needs.compute-matrix.outputs.MATRIX) }}
-    runs-on: "${{ matrix.FLAVOR || 'linux' }}-${{ matrix.ARCH }}-gpu-${{ matrix.GPU }}-${{ matrix.DRIVER }}-${{ matrix.GPU_COUNT }}"
+    runs-on: "${{ matrix.FLAVOR || 'linux' }}-${{ matrix.ARCH }}-gpu-${{ matrix.GPU }}-${{ matrix.RUNNER_DRIVER }}-${{ matrix.GPU_COUNT }}"
     # TODO: remove continue-on-error once 3.15 is officially supported
     continue-on-error: ${{ startsWith(matrix.PY_VER, '3.15') }}
     # The build stage could fail but we want the CI to keep moving.
     if: ${{ github.repository_owner == 'nvidia' && !cancelled() }}
     # Our self-hosted runners require a container
     # TODO: use a different (nvidia?) container
     container:
-      options: -u root --security-opt seccomp=unconfined --shm-size 16g
+      # Custom-DRIVER rows need --privileged --pid=host so install_gpu_driver.sh
+      # can nsenter to the host for the install + refresh the toolkit bind mounts
+      # back inside the container. Stock options for latest/earliest rows.
+      options: ${{ ((matrix.DRIVER == 'latest' || matrix.DRIVER == 'earliest') && '-u root --security-opt seccomp=unconfined --shm-size 16g') || '-u root --security-opt seccomp=unconfined --shm-size 16g --privileged --pid=host' }}
       image: ubuntu:22.04
       env:
         NVIDIA_VISIBLE_DEVICES: ${{ env.NVIDIA_VISIBLE_DEVICES }}
         PIP_CACHE_DIR: "/tmp/pip-cache"
     steps:
-      - name: Ensure GPU is working
-        run: nvidia-smi
-
       - name: Checkout ${{ github.event.repository.name }}
         uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10  # v6.0.3
 
@@ -129,10 +134,22 @@ jobs:
         uses: ./.github/actions/install_unix_deps
         continue-on-error: false
         with:
-          # for artifact fetching, graphics libs, g++ required for cffi in example
-          dependencies: "jq wget libgl1 libegl1 g++"
+          # for artifact fetching, graphics libs, g++ required for cffi in
+          # example; util-linux for `nsenter` (custom-DRIVER rows re-exec
+          # install_gpu_driver.sh onto the host through nsenter)
+          dependencies: "jq wget libgl1 libegl1 g++ util-linux"
           dependent_exes: "jq wget"
 
+      - name: Install GPU driver
+        if: ${{ matrix.DRIVER != 'latest' && matrix.DRIVER != 'earliest' }}
+        env:
+          DRIVER: ${{ matrix.DRIVER }}
+          GPU_TYPE: ${{ matrix.GPU }}
+        run: ./ci/tools/install_gpu_driver.sh
+
+      - name: Ensure GPU is working
+        run: nvidia-smi
+
       - name: Set environment variables
         env:
           BUILD_CUDA_VER: ${{ inputs.build-ctk-ver }}
diff --git a/.github/workflows/test-wheel-windows.yml b/.github/workflows/test-wheel-windows.yml
@@ -81,8 +81,11 @@ jobs:
           # Read base matrix from YAML file for the specific architecture
           TEST_MATRIX=$(yq -o json ".windows[\"${MATRIX_TYPE}\"] | map(select(.ARCH == \"${ARCH}\"))" ci/test-matrix.yml)
 
-          # Apply matrix filter and wrap in include structure
-          MATRIX=$(echo "$TEST_MATRIX" | jq -c '${{ inputs.matrix_filter }} | if (. | length) > 0 then {include: .} else "Error: Empty matrix\n" | halt_error(1) end')
+          # Apply matrix filter; add a RUNNER_DRIVER field that maps any
+          # custom DRIVER version back to 'latest' (install_gpu_driver.ps1
+          # swaps the driver itself, so the runner must be the one that
+          # ships the most recent pre-installed driver); wrap in include.
+          MATRIX=$(echo "$TEST_MATRIX" | jq -c '${{ inputs.matrix_filter }} | map(. + {RUNNER_DRIVER: (if .DRIVER == "latest" or .DRIVER == "earliest" then .DRIVER else "latest" end)}) | if (. | length) > 0 then {include: .} else "Error: Empty matrix\n" | halt_error(1) end')
 
           echo "MATRIX=${MATRIX}" | tee --append "${GITHUB_OUTPUT}"
 
@@ -97,7 +100,7 @@ jobs:
     if: ${{ github.repository_owner == 'nvidia' && !cancelled() }}
     # TODO: remove continue-on-error once 3.15 is officially supported
     continue-on-error: ${{ startsWith(matrix.PY_VER, '3.15') }}
-    runs-on: "windows-${{ matrix.ARCH }}-gpu-${{ matrix.GPU }}-${{ matrix.DRIVER }}-${{ matrix.GPU_COUNT }}"
+    runs-on: "windows-${{ matrix.ARCH }}-gpu-${{ matrix.GPU }}-${{ matrix.RUNNER_DRIVER }}-${{ matrix.GPU_COUNT }}"
     steps:
       - name: Checkout ${{ github.event.repository.name }}
         uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10  # v6.0.3
@@ -108,13 +111,20 @@ jobs:
         with:
           enable-apt: true
 
-      - name: Update driver
+      - name: Install GPU driver
+        if: ${{ matrix.DRIVER != 'latest' && matrix.DRIVER != 'earliest' }}
         env:
-          DRIVER_MODE: ${{ matrix.DRIVER_MODE }}
+          DRIVER: ${{ matrix.DRIVER }}
           GPU_TYPE: ${{ matrix.GPU }}
         run: |
           ci/tools/install_gpu_driver.ps1
 
+      - name: Configure driver mode
+        env:
+          DRIVER_MODE: ${{ matrix.DRIVER_MODE }}
+        run: |
+          ci/tools/configure_driver_mode.ps1
+
       - name: Ensure GPU is working
         run: |
           nvidia-smi
diff --git a/ci/test-matrix.yml b/ci/test-matrix.yml
@@ -13,7 +13,16 @@
 # Windows entries also include DRIVER_MODE.
 #
 # Notes:
+# - DRIVER accepts:
+#     * 'latest'   - use the runner's pre-installed latest driver (no install step)
+#     * 'earliest' - use the runner's pre-installed earliest driver (no install step)
+#     * a version string (e.g. '580.65.06')
+#                  - install that version via ci/tools/install_gpu_driver.sh (Linux)
+#                    or ci/tools/install_gpu_driver.ps1 (Windows) at the start of the
+#                    job. The matrix row is routed to the 'latest' runner image (the
+#                    install scripts swap the driver themselves).
 # - DRIVER: 'earliest' does not work with CUDA 12.9.1
+# - DRIVER: a custom version is not supported with FLAVOR=wsl on Linux.
 
 linux:
   pull-request:
@@ -29,10 +38,10 @@ linux:
     - { ARCH: 'amd64', PY_VER: '3.12',  CUDA_VER: '13.3.0', LOCAL_CTK: '0', GPU: 'l4',         GPU_COUNT: '1', DRIVER: 'latest' }
     - { ARCH: 'amd64', PY_VER: '3.13',  CUDA_VER: '12.9.1', LOCAL_CTK: '0', GPU: 'v100',       GPU_COUNT: '1', DRIVER: 'latest' }
     - { ARCH: 'amd64', PY_VER: '3.13',  CUDA_VER: '13.0.2', LOCAL_CTK: '1', GPU: 'rtxpro6000', GPU_COUNT: '1', DRIVER: 'latest' }
-    - { ARCH: 'amd64', PY_VER: '3.13',  CUDA_VER: '13.3.0', LOCAL_CTK: '1', GPU: 'rtxpro6000', GPU_COUNT: '1', DRIVER: 'latest' }
+    - { ARCH: 'amd64', PY_VER: '3.13',  CUDA_VER: '13.3.0', LOCAL_CTK: '1', GPU: 'rtxpro6000', GPU_COUNT: '1', DRIVER: '610.43.02' }
     - { ARCH: 'amd64', PY_VER: '3.14',  CUDA_VER: '12.9.1', LOCAL_CTK: '0', GPU: 't4',         GPU_COUNT: '1', DRIVER: 'latest' }
     - { ARCH: 'amd64', PY_VER: '3.14',  CUDA_VER: '13.0.2', LOCAL_CTK: '1', GPU: 'l4',         GPU_COUNT: '1', DRIVER: 'latest' }
-    - { ARCH: 'amd64', PY_VER: '3.14',  CUDA_VER: '13.3.0', LOCAL_CTK: '1', GPU: 'l4',         GPU_COUNT: '1', DRIVER: 'latest' }
+    - { ARCH: 'amd64', PY_VER: '3.14',  CUDA_VER: '13.3.0', LOCAL_CTK: '1', GPU: 'l4',         GPU_COUNT: '1', DRIVER: '610.43.02' }
     - { ARCH: 'amd64', PY_VER: '3.14t', CUDA_VER: '12.9.1', LOCAL_CTK: '1', GPU: 't4',         GPU_COUNT: '1', DRIVER: 'latest' }
     - { ARCH: 'amd64', PY_VER: '3.14t', CUDA_VER: '13.0.2', LOCAL_CTK: '1', GPU: 'l4',         GPU_COUNT: '1', DRIVER: 'latest' }
     - { ARCH: 'amd64', PY_VER: '3.14t', CUDA_VER: '13.3.0', LOCAL_CTK: '1', GPU: 'l4',         GPU_COUNT: '1', DRIVER: 'latest' }
@@ -77,7 +86,7 @@ linux:
     - { MODE: 'nightly-pytorch',    ARCH: 'arm64', PY_VER: '3.12', CUDA_VER: '13.0.2', LOCAL_CTK: '0', GPU: 'l4', GPU_COUNT: '1', DRIVER: 'latest', TORCH_VER: '2.9.1',  TORCH_CUDA: 'cu130' }
     # nightly-numba-cuda
     - { MODE: 'nightly-numba-cuda', ARCH: 'amd64', PY_VER: '3.12', CUDA_VER: '12.9.1', LOCAL_CTK: '0', GPU: 'l4', GPU_COUNT: '1', DRIVER: 'latest' }
-    - { MODE: 'nightly-numba-cuda', ARCH: 'amd64', PY_VER: '3.12', CUDA_VER: '13.3.0', LOCAL_CTK: '0', GPU: 'l4', GPU_COUNT: '1', DRIVER: 'latest' }
+    - { MODE: 'nightly-numba-cuda', ARCH: 'amd64', PY_VER: '3.12', CUDA_VER: '13.3.0', LOCAL_CTK: '0', GPU: 'l4', GPU_COUNT: '1', DRIVER: '580.65.06' }
     - { MODE: 'nightly-numba-cuda', ARCH: 'arm64', PY_VER: '3.12', CUDA_VER: '12.9.1', LOCAL_CTK: '0', GPU: 'l4', GPU_COUNT: '1', DRIVER: 'latest' }
     - { MODE: 'nightly-numba-cuda', ARCH: 'arm64', PY_VER: '3.12', CUDA_VER: '13.3.0', LOCAL_CTK: '0', GPU: 'l4', GPU_COUNT: '1', DRIVER: 'latest' }
     # nightly-standard (arm64 l4×2 — nightly-only per runner team request)
@@ -116,4 +125,4 @@ windows:
     - { MODE: 'nightly-pytorch',    ARCH: 'amd64', PY_VER: '3.12', CUDA_VER: '13.0.2', LOCAL_CTK: '0', GPU: 'l4', GPU_COUNT: '1', DRIVER: 'latest', DRIVER_MODE: 'TCC', TORCH_VER: '2.9.1',  TORCH_CUDA: 'cu130' }
     # nightly-numba-cuda
     - { MODE: 'nightly-numba-cuda', ARCH: 'amd64', PY_VER: '3.12', CUDA_VER: '12.9.1', LOCAL_CTK: '0', GPU: 'l4', GPU_COUNT: '1', DRIVER: 'latest', DRIVER_MODE: 'TCC' }
-    - { MODE: 'nightly-numba-cuda', ARCH: 'amd64', PY_VER: '3.12', CUDA_VER: '13.3.0', LOCAL_CTK: '0', GPU: 'l4', GPU_COUNT: '1', DRIVER: 'latest', DRIVER_MODE: 'TCC' }
+    - { MODE: 'nightly-numba-cuda', ARCH: 'amd64', PY_VER: '3.12', CUDA_VER: '13.3.0', LOCAL_CTK: '0', GPU: 'l4', GPU_COUNT: '1', DRIVER: '596.36',  DRIVER_MODE: 'TCC' }
diff --git a/ci/tools/configure_driver_mode.ps1 b/ci/tools/configure_driver_mode.ps1
@@ -0,0 +1,58 @@
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+#
+# SPDX-License-Identifier: Apache-2.0
+#
+# configure_driver_mode.ps1 -- set the NVIDIA driver mode on a Windows CI
+# runner and cycle the display devices so the new mode takes effect
+# without rebooting. Always runs (whether or not install_gpu_driver.ps1
+# just ran). When install_gpu_driver.ps1 has run, this single device
+# cycle also activates the freshly-installed driver.
+#
+# Inputs (env):
+#   DRIVER_MODE  One of WDDM, TCC, MCDM.
+
+function Set-DriverMode {
+
+    # Map matrix DRIVER_MODE to nvidia-smi -fdm code.
+    # This assumes we have the prior knowledge on which GPU can use which mode.
+    $driver_mode = $env:DRIVER_MODE
+    if ($driver_mode -eq "WDDM") {
+        Write-Output "Setting driver mode to WDDM..."
+        nvidia-smi -fdm 0
+    } elseif ($driver_mode -eq "TCC") {
+        Write-Output "Setting driver mode to TCC..."
+        nvidia-smi -fdm 1
+    } elseif ($driver_mode -eq "MCDM") {
+        Write-Output "Setting driver mode to MCDM..."
+        nvidia-smi -fdm 2
+    } else {
+        Write-Output "Unknown driver mode: $driver_mode"
+        exit 1
+    }
+
+    # Only restart NVIDIA display adapters, not other display devices (e.g. QEMU VGA)
+    $nvidia_devices = Get-PnpDevice -Class Display -FriendlyName "NVIDIA*"
+    foreach ($device in $nvidia_devices) {
+        Write-Output "Restarting device: $($device.FriendlyName) ($($device.InstanceId))"
+        pnputil /disable-device "$($device.InstanceId)"
+        pnputil /enable-device "$($device.InstanceId)"
+    }
+
+    # Poll nvidia-smi until NVML can initialize, or give up after ~60s.
+    # A fixed sleep is not enough on slower-coming-back-up multi-GPU rows
+    # (e.g. 2x H100 MCDM) where pnputil enable returns before NVML is
+    # ready. Pattern borrowed from the runner-team `nvgha-driver.ps1`.
+    Write-Output "Waiting for nvidia-smi/NVML to come back up after device cycle..."
+    $deadline = (Get-Date).AddSeconds(60)
+    do {
+        Start-Sleep -Seconds 2
+        & nvidia-smi.exe 2>&1 | Out-Null
+    } while ($LASTEXITCODE -ne 0 -and (Get-Date) -lt $deadline)
+    if ($LASTEXITCODE -ne 0) {
+        Write-Error "nvidia-smi did not return cleanly within 60s of the device cycle"
+        exit 1
+    }
+}
+
+# Run the functions
+Set-DriverMode
diff --git a/ci/tools/install_gpu_driver.ps1 b/ci/tools/install_gpu_driver.ps1
@@ -1,13 +1,30 @@
 # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 #
 # SPDX-License-Identifier: Apache-2.0
+#
+# install_gpu_driver.ps1 -- install a specific NVIDIA driver version on a
+# Windows CI runner. Driver-mode selection and the post-install device
+# power-cycle are the responsibility of configure_driver_mode.ps1, which
+# the workflow runs immediately after this script (or by itself when
+# DRIVER is 'latest'/'earliest' and the runner already brings up the
+# right driver).
+#
+# Inputs (env):
+#   DRIVER    Driver version, e.g. "610.47". Must NOT be 'latest' or
+#             'earliest' -- those are runner-pre-installed and the
+#             workflow is expected to skip this script for them.
+#   GPU_TYPE  Lower-case GPU label from the matrix (e.g. "l4", "rtx4090").
+#             Selects the data-center vs desktop installer variant.
 
 # Install the driver
 function Install-Driver {
 
-    # Set the correct URL, filename, and arguments to the installer
-    # This driver is picked to support Windows 11 & CUDA 13.0
-    $version = '581.15'
+    # Driver version is plumbed from the matrix via the DRIVER env var.
+    $version = $env:DRIVER
+    if (-not $version -or $version -eq 'latest' -or $version -eq 'earliest') {
+        Write-Error "DRIVER env var must be a specific version string (e.g. '610.47'); got '$version'."
+        exit 1
+    }
 
     # Get GPU type from environment variable
     $gpu_type = $env:GPU_TYPE
@@ -54,33 +71,7 @@ function Install-Driver {
     # Install the file with the specified path from earlier
     Write-Output 'Running the driver installer...'
     Start-Process -FilePath $filepath -ArgumentList $install_args -Wait
-    Write-Output 'Done!'
-
-    # Handle driver mode configuration
-    # This assumes we have the prior knowledge on which GPU can use which mode.
-    $driver_mode = $env:DRIVER_MODE
-    if ($driver_mode -eq "WDDM") {
-        Write-Output "Setting driver mode to WDDM..."
-        nvidia-smi -fdm 0
-    } elseif ($driver_mode -eq "TCC") {
-        Write-Output "Setting driver mode to TCC..."
-        nvidia-smi -fdm 1
-    } elseif ($driver_mode -eq "MCDM") {
-        Write-Output "Setting driver mode to MCDM..."
-        nvidia-smi -fdm 2
-    } else {
-        Write-Output "Unknown driver mode: $driver_mode"
-        exit 1
-    }
-    # Only restart NVIDIA display adapters, not other display devices (e.g. QEMU VGA)
-    $nvidia_devices = Get-PnpDevice -Class Display -FriendlyName "NVIDIA*"
-    foreach ($device in $nvidia_devices) {
-        Write-Output "Restarting device: $($device.FriendlyName) ($($device.InstanceId))"
-        pnputil /disable-device "$($device.InstanceId)"
-        pnputil /enable-device "$($device.InstanceId)"
-    }
-    # Give it a minute to settle:
-    Start-Sleep -Seconds 5
+    Write-Output 'Install complete; driver mode + device cycle handled by configure_driver_mode.ps1.'
 }
 
 # Run the functions
diff --git a/ci/tools/install_gpu_driver.sh b/ci/tools/install_gpu_driver.sh