Skip to content

Support clang as the CUDA compiler and add clang-CUDA CI coverage#12165

Merged
RAMitchell merged 17 commits into
dmlc:masterfrom
RAMitchell:codex/clang-cuda-upstream
Apr 20, 2026
Merged

Support clang as the CUDA compiler and add clang-CUDA CI coverage#12165
RAMitchell merged 17 commits into
dmlc:masterfrom
RAMitchell:codex/clang-cuda-upstream

Conversation

@RAMitchell
Copy link
Copy Markdown
Member

Summary

  • support configuring XGBoost with clang as the CUDA compiler for CUDA builds
  • update the GPUTreeShap submodule to the merged upstream clang-CUDA compatibility fix
  • add a compile-only CI job that builds CUDA with clang 21.1.8
  • add a clang-tidy smoke job that runs directly from a clang-generated CUDA compilation database

Motivation

The existing CUDA build path is NVCC-specific, which makes native clang-tidy -p difficult to use correctly for CUDA translation units. This change adds an explicit clang-CUDA build path so we can validate compiler compatibility directly and run clang-tidy from an actual clang-generated compile database instead of reconstructing commands.

Testing

  • bash -n ops/pipeline/build-cuda-clang.sh ops/pipeline/run-clang-tidy-clang-cuda.sh
  • PATH=/home/nfs/rorym/anaconda3/envs/xgboost/bin:$PATH XGBOOST_SKIP_CLANG_INSTALL=1 XGBOOST_CLANG_PREFIX=/home/nfs/rorym/anaconda3/envs/xgboost XGBOOST_CMAKE_PREFIX_PATH=/home/nfs/rorym/anaconda3/envs/xgboost XGBOOST_NCCL_INCLUDE_DIR=/home/nfs/rorym/anaconda3/envs/xgboost/include XGBOOST_NCCL_ROOT=/home/nfs/rorym/anaconda3/envs/xgboost XGBOOST_GPU_COMPUTE_VER=70 XGBOOST_TIDY_JOBS=2 ops/pipeline/run-clang-tidy-clang-cuda.sh --build-dir build-clang-tidy-ci

Notes

  • the new CI job is compile-only for now
  • the clang-tidy smoke currently checks src/common/timer.cc and src/predictor/interpretability/shap.cu

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds first-class support for building XGBoost’s CUDA code with clang as the CUDA compiler, and wires up CI to compile the clang-CUDA configuration and run a clang-tidy smoke check from a clang-generated compilation database.

Changes:

  • Extend CMake CUDA configuration/flag handling to support both NVCC and clang-CUDA.
  • Update a few CUDA/device utilities to be clang-CUDA compatible (device annotations, byteswap intrinsics, bitfield ops).
  • Add CI scripts and GitHub Actions jobs for clang-CUDA compile-only coverage and a clang-tidy smoke job based on the clang-CUDA compile DB.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/common/quantile.cu Adjusts device/host annotation on a Thrust transform lambda for clang-CUDA compatibility.
src/common/bitfield.h Refactors device/host annotations and operators to compile cleanly under clang-CUDA.
include/xgboost/byteswap.h Switches CUDA-device byteswap to clang builtins when compiling with clang-CUDA.
cmake/Utils.cmake Splits CUDA flag logic by CUDA compiler (NVIDIA vs Clang) and adds helper for host flag wrapping.
CMakeLists.txt Uses toolkit version when available for CUDA minimum-version enforcement (clang-CUDA friendly).
ops/pipeline/build-cuda-clang.sh New script to configure/build CUDA with clang as the CUDA compiler (CI-focused).
ops/pipeline/run-clang-tidy-clang-cuda.sh New script to generate a clang-CUDA compile DB and run clang-tidy against it.
.github/workflows/main.yml Adds a compile-only “build-cuda-clang” CI job.
.github/workflows/lint.yml Adds a clang-tidy smoke job that runs from the clang-CUDA compilation database.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ops/pipeline/run-clang-tidy-clang-cuda.sh Outdated
Comment thread src/common/bitfield.h Outdated
Comment thread CMakeLists.txt
@hcho3
Copy link
Copy Markdown
Collaborator

hcho3 commented Apr 15, 2026

@RAMitchell Does this have the potential for speeding up the clang-tidy job? This is one of the longest running jobs in the CI right now.

@RAMitchell
Copy link
Copy Markdown
Member Author

I am hoping so :)

@RAMitchell RAMitchell marked this pull request as ready for review April 17, 2026 08:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ops/pipeline/build-cuda-clang.sh Outdated
Comment thread ops/pipeline/run-clang-tidy-clang-cuda.sh Outdated
Comment thread ops/pipeline/build-cuda-clang.sh
@RAMitchell RAMitchell force-pushed the codex/clang-cuda-upstream branch from 0763e10 to fdea029 Compare April 18, 2026 11:59
@RAMitchell RAMitchell merged commit e441821 into dmlc:master Apr 20, 2026
81 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants