Skip to content

Build MSBuild CUDA for the same GPU architectures as CMake#6359

Merged
Fedr merged 3 commits into
masterfrom
msbuild-cuda-architectures
Jul 2, 2026
Merged

Build MSBuild CUDA for the same GPU architectures as CMake#6359
Fedr merged 3 commits into
masterfrom
msbuild-cuda-architectures

Conversation

@Fedr

@Fedr Fedr commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

The MSBuild build of MRCuda never set CodeGeneration, so it fell back to the default from NVIDIA's CUDA <ver>.props: compute_52,sm_52 on CUDA 11.4/12.0 and compute_75,sm_75 on CUDA 13.2. The shipped binaries carried native SASS for that single old architecture plus its PTX, so every modern GPU JIT-compiled the PTX at first kernel load (startup delay) and then ran code not tuned for its architecture.

This PR makes MSBuild target the same GPU architectures as the CMake build (cmake/Modules/ConfigureCuda.cmake). The change lives entirely in platform.props, so any CUDA project that imports it (and chains %(AdditionalOptions) in its CudaCompile blocks, as MRCuda.vcxproj does) inherits the architectures automatically — including CUDA projects in dependent repositories; MRCuda.vcxproj itself is untouched.

platform.props defines MRCudaGencode from MRCudaVersion and injects it via a CudaCompile item definition. NVIDIA's default CodeGeneration (defined unconditionally in the CUDA props, which are imported after platform.props, so it cannot be overridden from there) is left in place: it always supplies the oldest architecture of the corresponding CMake set, and MRCudaGencode adds the remaining ones plus PTX for the newest:

CUDA (toolset) from default CodeGeneration added by MRCudaGencode
11.4 (v142) SASS 52 + PTX 52 SASS 60, 61, 70, 75 + PTX 75
12.0 (v143) SASS 52 + PTX 52 SASS 60, 61, 70, 75, 86, 89 + PTX 89
13.2 (v145) SASS 75 + PTX 75 SASS 86, 89, 120 + PTX 120

The only difference from the CMake fatbins is one extra embedded PTX for that oldest architecture (the driver always JIT-picks the newest compatible PTX, so behavior is identical). A custom MRCudaVersion (via CustomMRPlatform.props) that matches no branch leaves MRCudaGencode empty and keeps the stock behavior.

Verified locally on the v143/CUDA 12.0 and v145/CUDA 13.2 paths: cuobjdump on the produced objects shows the full CMake set (e.g. 7 SASS cubins + sm_52/sm_89 PTX for CUDA 12.0). The previous revision of this PR (same architectures via explicit flags in MRCuda.vcxproj) passed full CI on all six MSBuild jobs including msvc-2019/CUDA 11.4.

Cost, measured on MRCudaFastWindingNumber.cu with CUDA 12.0: compile time 3.8 s → 10.9 s per .cu file (the arch-independent host compile and CUDA front-end run once; only ptx/sass generation multiplies), object size 115 KB → ~460 KB. In exchange, no JIT delay at first use and arch-tuned kernels on all supported GPUs, same as the CMake-built packages.

🤖 Generated with Claude Code

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@Fedr Fedr added the full-ci run all steps label Jul 2, 2026
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@Fedr Fedr merged commit 715d3fe into master Jul 2, 2026
47 checks passed
@Fedr Fedr deleted the msbuild-cuda-architectures branch July 2, 2026 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants