build: googletest 1.15.2->1.17.0 + opt-in CUDA_FAST_BUILD single-arch dev knob

claude · claude · commit df71f9ca26f2 · 2026-06-20T14:07:10.000Z
googletest: bump the BUILD_TESTING-only FetchContent (used only by jllama_test's C++ unit tests, not the shipped library and not coupled to llama.cpp) from v1.15.2 to v1.17.0. There is no constraint behind the tag — it is just latest-stable; CLAUDE.md now says to bump it periodically. CUDA_FAST_BUILD: add an opt-in, default-OFF env knob to build_cuda_linux.sh that builds CUDA for a single architecture (default 'native', override CUDA_ARCH=<cc>) instead of the full release arch set, to speed up local iteration. Default + CI/release behaviour is unchanged (full arch set), so released jars keep full GPU coverage. nvcc .cu kernels are not sccache-cached (limited support), so fewer archs is the real CUDA build-time lever; rationale documented in CLAUDE.md and inline. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LjWiKSyNzqqpobSKYRiew5
diff --git a/.github/build_cuda_linux.sh b/.github/build_cuda_linux.sh
@@ -15,4 +15,26 @@ sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute
 
 sudo dnf install -y cuda-toolkit-13-2
 
-exec .github/build.sh $@ -DGGML_CUDA=1 -DCMAKE_CUDA_COMPILER=/usr/local/cuda-13.2/bin/nvcc
+# CUDA target architectures — build-speed knob.
+#
+# Default (CUDA_FAST_BUILD unset): we do NOT pass CMAKE_CUDA_ARCHITECTURES, so ggml/llama.cpp
+# compiles its full default arch set. That is exactly what release artifacts must ship (every
+# supported GPU generation) and is the slow part of this ~70 min job: nvcc recompiles each .cu
+# kernel once per architecture. sccache caches the gcc C/C++ TUs but NOT the nvcc .cu kernels
+# (sccache's nvcc support is limited/experimental), so the per-arch nvcc passes dominate even
+# with the cache on — which is why this knob exists as the real CUDA build-time lever.
+#
+# Dev fast build (CUDA_FAST_BUILD=1): compile for a SINGLE architecture instead of the full
+# set, removing most of the nvcc time. Defaults to `native` (the build machine's own GPU —
+# needs a GPU present at configure time); override with CUDA_ARCH, e.g. CUDA_ARCH=90. This is
+# a MANUAL local-dev knob only: CI and release never set it, because an artifact built this
+# way runs on a single GPU generation. (Direct-cmake equivalent: -DCMAKE_CUDA_ARCHITECTURES=native.)
+CUDA_ARCH_ARGS=""
+case "${CUDA_FAST_BUILD:-}" in
+  1 | true | TRUE | yes | on)
+    CUDA_ARCH_ARGS="-DCMAKE_CUDA_ARCHITECTURES=${CUDA_ARCH:-native}"
+    echo "build_cuda_linux.sh: CUDA_FAST_BUILD set -> ${CUDA_ARCH_ARGS} (DEV ONLY — not release-distributable)"
+    ;;
+esac
+
+exec .github/build.sh $@ -DGGML_CUDA=1 -DCMAKE_CUDA_COMPILER=/usr/local/cuda-13.2/bin/nvcc $CUDA_ARCH_ARGS
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -38,6 +38,37 @@ git add .github/build_cuda_linux.sh pom.xml CLAUDE.md
 git commit -m "Upgrade CUDA from 13.2 to 13.3"
 ```
 
+### Fast local CUDA builds (`CUDA_FAST_BUILD`) — single-arch speed knob
+
+The CUDA artifact must ship kernels for **every supported GPU generation**, so the default
+build — and every CI/release build — compiles the **full `CMAKE_CUDA_ARCHITECTURES` set** that
+ggml/llama.cpp selects. nvcc recompiles each `.cu` kernel once per architecture, which is the
+dominant cost of the ~70 min CUDA job. **`sccache` does not help here:** it caches the gcc
+C/C++ TUs but not the nvcc `.cu` kernels (sccache's nvcc support is limited/experimental), so
+the per-arch nvcc passes remain even with the cache on. The one reliable lever to cut that time
+is to build **fewer architectures**.
+
+`build_cuda_linux.sh` therefore honors an **opt-in** env knob — default **off** (full arch set,
+release-safe):
+
+```bash
+# Full release build (default): all archs — slow, runs on every GPU generation.
+.github/build_cuda_linux.sh "-DOS_NAME=Linux -DOS_ARCH=x86_64"
+
+# Fast local dev build: one arch only. Defaults to `native` (the build machine's own GPU;
+# needs a GPU present at configure time). Override with CUDA_ARCH=<cc>, e.g. CUDA_ARCH=90.
+CUDA_FAST_BUILD=1 .github/build_cuda_linux.sh "-DOS_NAME=Linux -DOS_ARCH=x86_64"
+CUDA_FAST_BUILD=1 CUDA_ARCH=90 .github/build_cuda_linux.sh "-DOS_NAME=Linux -DOS_ARCH=x86_64"
+# Direct-cmake equivalent: cmake -B build -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=native
+```
+
+**Why a separate, off-by-default flag (never enable it in CI/release):** an artifact built with
+`CUDA_FAST_BUILD` runs on only the single GPU generation it was compiled for. The flag exists
+purely to speed up **local iteration**; the CI CUDA job leaves it unset, so released jars keep
+full arch coverage. To cache the nvcc kernels too you would add
+`-DCMAKE_CUDA_COMPILER_LAUNCHER=sccache` (gated behind the same probe), but sccache's nvcc
+caching is unreliable — the arch knob is the better lever and is what this repo ships.
+
 ## Android minimum API level
 
 Current Android minimum API level: **28** (Android 9.0 Pie)
@@ -735,6 +766,12 @@ ctest --test-dir build --output-on-failure -R "ResultsToJson"
 
 llama.cpp is fetched via CMake FetchContent, pinned to `GIT_TAG b9682`.
 
+**GoogleTest** is a separate `BUILD_TESTING`-only FetchContent (`GIT_TAG v1.17.0`), used solely
+by the `jllama_test` C++ unit-test binary — not by the shipped library, and not coupled to the
+llama.cpp pin or the bundled nlohmann/json. There is **no constraint behind the exact tag**; it
+is just the latest stable at the time it was last touched. Bump it from time to time (nothing
+auto-tracks it), pairing the bump with a green `C++ Tests` CI run.
+
 ```
 build/_deps/llama.cpp-src/tools/server/   ← server-task.h, server-common.h, etc.
 build/_deps/llama.cpp-src/include/        ← llama.h, llama-cpp.h
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -383,7 +383,10 @@ if(BUILD_TESTING)
     FetchContent_Declare(
         googletest
         GIT_REPOSITORY https://github.com/google/googletest.git
-        GIT_TAG        v1.15.2
+        # No constraint behind this exact tag — GoogleTest is only used by this repo's own
+        # C++ unit tests (jllama_test), not by the shipped library and not tied to llama.cpp.
+        # It is just "latest stable at the time"; bump it from time to time (see CLAUDE.md).
+        GIT_TAG        v1.17.0
     )
     # Keep GTest on the same CRT as the rest of the project.
     # OFF means GTest respects CMAKE_MSVC_RUNTIME_LIBRARY (static /MT here).