Skip to content

Commit c0a10cb

Browse files
committed
ci(cuda): pin newest arch (sm_120 Blackwell) for the fast validation build
Change the fast-path CUDA_ARCH from 90 to 120 (the newest CUDA 13.2 compute capability, consumer Blackwell / RTX 50xx) per request. Only affects the fast single-arch validation build (PR/push); publish runs still build the full arch set. Bump as newer GPU generations ship. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LjWiKSyNzqqpobSKYRiew5
1 parent 698258d commit c0a10cb

2 files changed

Lines changed: 5 additions & 2 deletions

File tree

.github/workflows/publish.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -191,7 +191,9 @@ jobs:
191191
# reaches Central is always built with the full set. CI has no GPU, so the fast path pins a
192192
# fixed CUDA_ARCH ('native' would fail at configure). '0' => full (release-safe), '1' => fast.
193193
CUDA_FAST_BUILD: ${{ inputs.publish_to_central && '0' || '1' }}
194-
CUDA_ARCH: '90'
194+
# Newest CUDA 13.2 architecture: sm_120 (consumer Blackwell / RTX 50xx). Only used on the
195+
# fast validation path; bump as newer GPU generations ship. Releases ignore it (full set).
196+
CUDA_ARCH: '120'
195197
DOCKCROSS_ARGS: "-e SCCACHE_WEBDAV_ENDPOINT -e SCCACHE_WEBDAV_TOKEN -e USE_CACHE -e SCCACHE_LOG -e SCCACHE_ERROR_LOG -e RUST_BACKTRACE -e CUDA_FAST_BUILD -e CUDA_ARCH"
196198
steps:
197199
- uses: actions/checkout@v6

CLAUDE.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,8 @@ publishing to Central** — it is wired as `CUDA_FAST_BUILD: ${{ inputs.publish_
7171
(`'0'`=full, `'1'`=fast). Because the `publish-snapshot`/`publish-release` jobs require
7272
`publish_to_central`, **every artifact that reaches Central is built with the full arch set** while
7373
ordinary PR/push CI stays fast. CI has no GPU, so the fast path pins a fixed `CUDA_ARCH` (default
74-
`90` in the job env) — `native` would fail at configure. Both `CUDA_FAST_BUILD` and `CUDA_ARCH` are
74+
`120` — the newest CUDA 13.2 arch, sm_120 / consumer Blackwell — in the job env) — `native`
75+
would fail at configure. Both `CUDA_FAST_BUILD` and `CUDA_ARCH` are
7576
forwarded into the dockcross container via `DOCKCROSS_ARGS` `-e`. To cache the nvcc kernels too you
7677
would add `-DCMAKE_CUDA_COMPILER_LAUNCHER=sccache` (gated behind the same probe), but sccache's nvcc
7778
caching is unreliable — the arch knob is the better lever and is what this repo ships.

0 commit comments

Comments
 (0)