Skip to content

Commit 3508f75

Browse files
kkraus14Copilotleofangpre-commit-ci[bot]cryos
authored
Use CTK 12.9.1 for cuda-bindings 12.9.x (#955)
* bump all CI jobs to CUDA 12.9.1 * CI: Consolidate test matrix configurations into ci/test-matrix.json with hard-coded values, optimized checkout, and prepared Windows self-hosted runner migration (#889) * Initial plan * Consolidate test matrices from workflows into ci/test-matrix.json Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Hard-code all GPU and ARCH values in test-matrix.json with 6 fields per entry Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Update Windows test matrix with a100 GPU and latest-1 driver, configure self-hosted runners Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * fix * Revert eed0b71 and change Windows DRIVER from latest-1 to latest Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Add proxy cache setup to Windows workflow for self-hosted runners Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Remove Git for Windows and gh CLI installation steps, add T4 GPU support to Windows matrix Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Set fetch-depth: 1 for checkout steps and favor L4/T4 over A100 GPUs for Windows testing Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Revert Windows workflow to GitHub-hosted runners with TODO comments for future self-hosted migration Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * [pre-commit.ci] auto code formatting * Revert Win runner name change for now --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> Co-authored-by: Leo Fang <leof@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * forgot to add windows * rerun codegen with 12.9.1 and update result/error explanations * First stab at the filter for CUDA < 13 in CI * Get data from the top-level array * Use the map function on select output * CI: Move to self-hosted Windows GPU runners Migrate the Windows testing to use the new NV GHA runners. Cherry-pick #958. --------- Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> Co-authored-by: Leo Fang <leof@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marcus D. Hanwell <mhanwell@nvidia.com>
1 parent cca1dbf commit 3508f75

27 files changed

Lines changed: 192 additions & 125 deletions

.github/workflows/ci.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,7 @@ jobs:
7070
build-type: pull-request
7171
host-platform: ${{ matrix.host-platform }}
7272
build-ctk-ver: ${{ needs.ci-vars.outputs.CUDA_BUILD_VER }}
73+
matrix_filter: "map(select([.CUDA_VER // empty | split(\".\")[] | tonumber] as $v | ($v[0] < 13)))"
7374

7475
test-windows:
7576
strategy:
@@ -90,6 +91,7 @@ jobs:
9091
build-type: pull-request
9192
host-platform: ${{ matrix.host-platform }}
9293
build-ctk-ver: ${{ needs.ci-vars.outputs.CUDA_BUILD_VER }}
94+
matrix_filter: "map(select([.CUDA_VER // empty | split(\".\")[] | tonumber] as $v | ($v[0] < 13)))"
9395

9496
# doc:
9597
# name: Docs

.github/workflows/test-wheel-linux.yml

Lines changed: 21 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -34,74 +34,43 @@ jobs:
3434
outputs:
3535
MATRIX: ${{ steps.compute-matrix.outputs.MATRIX }}
3636
steps:
37+
- name: Checkout ${{ github.event.repository.name }}
38+
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
39+
with:
40+
fetch-depth: 1
41+
3742
- name: Validate Test Type
3843
run: |
3944
if [[ "$BUILD_TYPE" != "pull-request" ]] && [[ "$BUILD_TYPE" != "nightly" ]] && [[ "$BUILD_TYPE" != "branch" ]]; then
4045
echo "Invalid build type! Must be one of 'nightly', 'pull-request', or 'branch'."
4146
exit 1
4247
fi
48+
4349
- name: Compute Python Test Matrix
4450
id: compute-matrix
4551
run: |
46-
# Set a default GPU based upon architecture.
47-
gpu="l4"
48-
if [[ "${ARCH}" == "arm64" ]]; then
49-
gpu="a100"
50-
fi
51-
# Add a special entry for the H100 runner on amd64.
52-
special_runner=""
53-
if [[ "${ARCH}" == "amd64" ]]; then
54-
special_runner="- { ARCH: ${ARCH}, PY_VER: '3.13', CUDA_VER: '12.9.0', LOCAL_CTK: '1', GPU: 'H100', DRIVER: 'latest' }"
55-
fi
56-
57-
# Please keep the matrices sorted in ascending order by the following:
58-
#
59-
# [PY_VER, CUDA_VER, LOCAL_CTK, GPU, DRIVER]
60-
#
61-
# Note that DRIVER: `earliest` does not work with CUDA 12.9.0 and LOCAL_CTK: 0 does not work with CUDA 12.0.1.
62-
#
63-
export MATRICES="
64-
pull-request:
65-
- { ARCH: ${ARCH}, PY_VER: '3.9', CUDA_VER: '12.0.1', LOCAL_CTK: '1', GPU: ${gpu}, DRIVER: 'latest' }
66-
- { ARCH: ${ARCH}, PY_VER: '3.9', CUDA_VER: '12.9.0', LOCAL_CTK: '0', GPU: ${gpu}, DRIVER: 'latest' }
67-
- { ARCH: ${ARCH}, PY_VER: '3.10', CUDA_VER: '12.9.0', LOCAL_CTK: '1', GPU: ${gpu}, DRIVER: 'latest' }
68-
- { ARCH: ${ARCH}, PY_VER: '3.11', CUDA_VER: '12.9.0', LOCAL_CTK: '0', GPU: ${gpu}, DRIVER: 'latest' }
69-
- { ARCH: ${ARCH}, PY_VER: '3.12', CUDA_VER: '12.0.1', LOCAL_CTK: '1', GPU: ${gpu}, DRIVER: 'earliest' }
70-
- { ARCH: ${ARCH}, PY_VER: '3.12', CUDA_VER: '12.9.0', LOCAL_CTK: '1', GPU: ${gpu}, DRIVER: 'latest' }
71-
- { ARCH: ${ARCH}, PY_VER: '3.13', CUDA_VER: '12.0.1', LOCAL_CTK: '1', GPU: ${gpu}, DRIVER: 'latest' }
72-
- { ARCH: ${ARCH}, PY_VER: '3.13', CUDA_VER: '12.9.0', LOCAL_CTK: '0', GPU: ${gpu}, DRIVER: 'latest' }
73-
${special_runner}
74-
nightly:
75-
- { ARCH: ${ARCH}, PY_VER: '3.9', CUDA_VER: '12.0.1', LOCAL_CTK: '1', GPU: ${gpu}, DRIVER: 'latest' }
76-
- { ARCH: ${ARCH}, PY_VER: '3.9', CUDA_VER: '12.9.0', LOCAL_CTK: '0', GPU: ${gpu}, DRIVER: 'latest' }
77-
- { ARCH: ${ARCH}, PY_VER: '3.9', CUDA_VER: '12.9.0', LOCAL_CTK: '1', GPU: ${gpu}, DRIVER: 'latest' }
78-
- { ARCH: ${ARCH}, PY_VER: '3.10', CUDA_VER: '12.0.1', LOCAL_CTK: '1', GPU: ${gpu}, DRIVER: 'latest' }
79-
- { ARCH: ${ARCH}, PY_VER: '3.10', CUDA_VER: '12.9.0', LOCAL_CTK: '0', GPU: ${gpu}, DRIVER: 'latest' }
80-
- { ARCH: ${ARCH}, PY_VER: '3.10', CUDA_VER: '12.9.0', LOCAL_CTK: '1', GPU: ${gpu}, DRIVER: 'latest' }
81-
- { ARCH: ${ARCH}, PY_VER: '3.11', CUDA_VER: '12.0.1', LOCAL_CTK: '1', GPU: ${gpu}, DRIVER: 'latest' }
82-
- { ARCH: ${ARCH}, PY_VER: '3.11', CUDA_VER: '12.9.0', LOCAL_CTK: '0', GPU: ${gpu}, DRIVER: 'latest' }
83-
- { ARCH: ${ARCH}, PY_VER: '3.11', CUDA_VER: '12.9.0', LOCAL_CTK: '1', GPU: ${gpu}, DRIVER: 'latest' }
84-
- { ARCH: ${ARCH}, PY_VER: '3.12', CUDA_VER: '12.0.1', LOCAL_CTK: '1', GPU: ${gpu}, DRIVER: 'latest' }
85-
- { ARCH: ${ARCH}, PY_VER: '3.12', CUDA_VER: '12.9.0', LOCAL_CTK: '0', GPU: ${gpu}, DRIVER: 'latest' }
86-
- { ARCH: ${ARCH}, PY_VER: '3.12', CUDA_VER: '12.9.0', LOCAL_CTK: '1', GPU: ${gpu}, DRIVER: 'latest' }
87-
- { ARCH: ${ARCH}, PY_VER: '3.13', CUDA_VER: '12.0.1', LOCAL_CTK: '1', GPU: ${gpu}, DRIVER: 'latest' }
88-
- { ARCH: ${ARCH}, PY_VER: '3.13', CUDA_VER: '12.9.0', LOCAL_CTK: '0', GPU: ${gpu}, DRIVER: 'latest' }
89-
- { ARCH: ${ARCH}, PY_VER: '3.13', CUDA_VER: '12.9.0', LOCAL_CTK: '1', GPU: ${gpu}, DRIVER: 'latest' }
90-
${special_runner}
91-
"
92-
9352
# Use the nightly matrix for branch tests
9453
MATRIX_TYPE="${BUILD_TYPE}"
9554
if [[ "${MATRIX_TYPE}" == "branch" ]]; then
9655
MATRIX_TYPE="nightly"
9756
fi
98-
export MATRIX_TYPE
99-
TEST_MATRIX=$(yq -n 'env(MATRICES) | .[strenv(MATRIX_TYPE)]')
100-
export TEST_MATRIX
57+
58+
# Read base matrix from JSON file for the specific architecture
59+
TEST_MATRIX=$(jq --arg arch "$ARCH" --arg matrix_type "$MATRIX_TYPE" '
60+
.linux[$matrix_type] |
61+
map(select(.ARCH == $arch))
62+
' ci/test-matrix.json)
63+
64+
# Add special runner for amd64 if applicable
65+
if [[ "${ARCH}" == "amd64" ]]; then
66+
SPECIAL_RUNNERS=$(jq '
67+
.linux.special_runners.amd64
68+
' ci/test-matrix.json)
69+
TEST_MATRIX=$(jq --argjson special "$SPECIAL_RUNNERS" '. + $special' <<< "$TEST_MATRIX")
70+
fi
10171
10272
MATRIX="$(
103-
yq -n -o json 'env(TEST_MATRIX)' | \
104-
jq -c '${{ inputs.matrix_filter }} | if (. | length) > 0 then {include: .} else "Error: Empty matrix\n" | halt_error(1) end'
73+
jq -c '${{ inputs.matrix_filter }} | if (. | length) > 0 then {include: .} else "Error: Empty matrix\n" | halt_error(1) end' <<< "$TEST_MATRIX"
10574
)"
10675
10776
echo "MATRIX=${MATRIX}" | tee --append "${GITHUB_OUTPUT}"

.github/workflows/test-wheel-windows.yml

Lines changed: 19 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,11 @@ jobs:
3232
outputs:
3333
MATRIX: ${{ steps.compute-matrix.outputs.MATRIX }}
3434
steps:
35+
- name: Checkout ${{ github.event.repository.name }}
36+
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
37+
with:
38+
fetch-depth: 1
39+
3540
- name: Validate Test Type
3641
run: |
3742
if [[ "$BUILD_TYPE" != "pull-request" ]] && [[ "$BUILD_TYPE" != "nightly" ]] && [[ "$BUILD_TYPE" != "branch" ]]; then
@@ -41,71 +46,50 @@ jobs:
4146
- name: Compute Python Test Matrix
4247
id: compute-matrix
4348
run: |
44-
# Please keep the matrices sorted in ascending order by the following:
45-
#
46-
# [PY_VER, CUDA_VER, LOCAL_CTK]
47-
#
48-
export MATRICES="
49-
pull-request:
50-
- { ARCH: ${ARCH}, PY_VER: '3.12', CUDA_VER: '12.9.0', LOCAL_CTK: '0' }
51-
- { ARCH: ${ARCH}, PY_VER: '3.12', CUDA_VER: '12.9.0', LOCAL_CTK: '1' }
52-
nightly:
53-
- { ARCH: ${ARCH}, PY_VER: '3.12', CUDA_VER: '12.9.0', LOCAL_CTK: '0' }
54-
- { ARCH: ${ARCH}, PY_VER: '3.12', CUDA_VER: '12.9.0', LOCAL_CTK: '1' }
55-
"
56-
5749
# Use the nightly matrix for branch tests
5850
MATRIX_TYPE="${BUILD_TYPE}"
5951
if [[ "${MATRIX_TYPE}" == "branch" ]]; then
6052
MATRIX_TYPE="nightly"
6153
fi
62-
export MATRIX_TYPE
63-
TEST_MATRIX=$(yq -n 'env(MATRICES) | .[strenv(MATRIX_TYPE)]')
64-
export TEST_MATRIX
54+
55+
# Read base matrix from JSON file for the specific architecture
56+
TEST_MATRIX=$(jq --arg arch "$ARCH" --arg matrix_type "$MATRIX_TYPE" '
57+
.windows[$matrix_type] |
58+
map(select(.ARCH == $arch))
59+
' ci/test-matrix.json)
6560
6661
MATRIX="$(
67-
yq -n -o json 'env(TEST_MATRIX)' | \
68-
jq -c '${{ inputs.matrix_filter }} | if (. | length) > 0 then {include: .} else "Error: Empty matrix\n" | halt_error(1) end'
62+
jq -c '${{ inputs.matrix_filter }} | if (. | length) > 0 then {include: .} else "Error: Empty matrix\n" | halt_error(1) end' <<< "$TEST_MATRIX"
6963
)"
7064
7165
echo "MATRIX=${MATRIX}" | tee --append "${GITHUB_OUTPUT}"
7266
7367
test:
74-
name: py${{ matrix.PY_VER }}, ${{ matrix.CUDA_VER }}, ${{ (matrix.LOCAL_CTK == '1' && 'local') || 'wheels' }}
68+
name: py${{ matrix.PY_VER }}, ${{ matrix.CUDA_VER }}, ${{ (matrix.LOCAL_CTK == '1' && 'local') || 'wheels' }}, GPU ${{ matrix.GPU }}
7569
# The build stage could fail but we want the CI to keep moving.
7670
needs: compute-matrix
7771
strategy:
7872
fail-fast: false
7973
matrix: ${{ fromJSON(needs.compute-matrix.outputs.MATRIX) }}
8074
if: ${{ github.repository_owner == 'nvidia' && !cancelled() }}
81-
runs-on: 'cuda-python-windows-gpu-github'
75+
runs-on: "windows-${{ matrix.ARCH }}-gpu-${{ matrix.GPU }}-${{ matrix.DRIVER }}-1"
8276
steps:
8377
- name: Checkout ${{ github.event.repository.name }}
84-
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
78+
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
8579
with:
8680
fetch-depth: 0
8781

82+
- name: Setup proxy cache
83+
uses: nv-gha-runners/setup-proxy-cache@main
84+
continue-on-error: true
85+
8886
- name: Update driver
8987
run: |
9088
.github/workflows/install_gpu_driver.ps1
9189
9290
- name: Ensure GPU is working
9391
run: nvidia-smi
9492

95-
- name: Install Git for Windows
96-
# the GPU runner image does not have Git Bash pre-installed...
97-
env:
98-
# doesn't seem there's an easy way to avoid hard-coding it?
99-
GFW_EXE_URL: https://github.com/git-for-windows/git/releases/download/v2.49.0.windows.1/PortableGit-2.49.0-64-bit.7z.exe
100-
run: |
101-
Invoke-WebRequest -Uri "$env:GFW_EXE_URL" -OutFile "PortableGit.7z.exe"
102-
# Self-extracting, see https://gitforwindows.org/zip-archives-extracting-the-released-archives.html
103-
Start-Process .\PortableGit.7z.exe -Wait -Verbose -ArgumentList '-y -gm2'
104-
ls -l PortableGit
105-
echo "$((Get-Location).Path)\\PortableGit\\bin" >> $env:GITHUB_PATH
106-
$env:Path += ";$((Get-Location).Path)\\PortableGit\\bin"
107-
bash --version
108-
10993
- name: Set environment variables
11094
env:
11195
BUILD_CUDA_VER: ${{ inputs.build-ctk-ver }}
@@ -131,21 +115,6 @@ jobs:
131115
name: ${{ env.CUDA_BINDINGS_ARTIFACT_NAME }}
132116
path: ${{ env.CUDA_BINDINGS_ARTIFACTS_DIR }}
133117

134-
- name: Install gh cli
135-
# the GPU runner image does not have gh pre-installed...
136-
env:
137-
# doesn't seem there's an easy way to avoid hard-coding it?
138-
GH_MSI_URL: https://github.com/cli/cli/releases/download/v2.67.0/gh_2.67.0_windows_amd64.msi
139-
run: |
140-
Invoke-WebRequest -Uri "$env:GH_MSI_URL" -OutFile "gh_installer.msi"
141-
Start-Process msiexec.exe -Wait -Verbose -ArgumentList '/i "gh_installer.msi" /qn'
142-
$GH_POSSIBLE_PATHS = "C:\\Program Files\\GitHub CLI", "C:\\Program Files (x86)\\GitHub CLI"
143-
foreach ($p in $GH_POSSIBLE_PATHS) {
144-
echo "$p" >> $env:GITHUB_PATH
145-
$env:Path += ";$p"
146-
}
147-
gh --version
148-
149118
- name: Download cuda-pathfinder build artifacts from main branch
150119
env:
151120
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

0 commit comments

Comments
 (0)