Skip to content

Commit df71f9c

Browse files
committed
build: googletest 1.15.2->1.17.0 + opt-in CUDA_FAST_BUILD single-arch dev knob
googletest: bump the BUILD_TESTING-only FetchContent (used only by jllama_test's C++ unit tests, not the shipped library and not coupled to llama.cpp) from v1.15.2 to v1.17.0. There is no constraint behind the tag — it is just latest-stable; CLAUDE.md now says to bump it periodically. CUDA_FAST_BUILD: add an opt-in, default-OFF env knob to build_cuda_linux.sh that builds CUDA for a single architecture (default 'native', override CUDA_ARCH=<cc>) instead of the full release arch set, to speed up local iteration. Default + CI/release behaviour is unchanged (full arch set), so released jars keep full GPU coverage. nvcc .cu kernels are not sccache-cached (limited support), so fewer archs is the real CUDA build-time lever; rationale documented in CLAUDE.md and inline. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LjWiKSyNzqqpobSKYRiew5
1 parent 625d743 commit df71f9c

3 files changed

Lines changed: 64 additions & 2 deletions

File tree

.github/build_cuda_linux.sh

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,4 +15,26 @@ sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute
1515

1616
sudo dnf install -y cuda-toolkit-13-2
1717

18-
exec .github/build.sh $@ -DGGML_CUDA=1 -DCMAKE_CUDA_COMPILER=/usr/local/cuda-13.2/bin/nvcc
18+
# CUDA target architectures — build-speed knob.
19+
#
20+
# Default (CUDA_FAST_BUILD unset): we do NOT pass CMAKE_CUDA_ARCHITECTURES, so ggml/llama.cpp
21+
# compiles its full default arch set. That is exactly what release artifacts must ship (every
22+
# supported GPU generation) and is the slow part of this ~70 min job: nvcc recompiles each .cu
23+
# kernel once per architecture. sccache caches the gcc C/C++ TUs but NOT the nvcc .cu kernels
24+
# (sccache's nvcc support is limited/experimental), so the per-arch nvcc passes dominate even
25+
# with the cache on — which is why this knob exists as the real CUDA build-time lever.
26+
#
27+
# Dev fast build (CUDA_FAST_BUILD=1): compile for a SINGLE architecture instead of the full
28+
# set, removing most of the nvcc time. Defaults to `native` (the build machine's own GPU —
29+
# needs a GPU present at configure time); override with CUDA_ARCH, e.g. CUDA_ARCH=90. This is
30+
# a MANUAL local-dev knob only: CI and release never set it, because an artifact built this
31+
# way runs on a single GPU generation. (Direct-cmake equivalent: -DCMAKE_CUDA_ARCHITECTURES=native.)
32+
CUDA_ARCH_ARGS=""
33+
case "${CUDA_FAST_BUILD:-}" in
34+
1 | true | TRUE | yes | on)
35+
CUDA_ARCH_ARGS="-DCMAKE_CUDA_ARCHITECTURES=${CUDA_ARCH:-native}"
36+
echo "build_cuda_linux.sh: CUDA_FAST_BUILD set -> ${CUDA_ARCH_ARGS} (DEV ONLY — not release-distributable)"
37+
;;
38+
esac
39+
40+
exec .github/build.sh $@ -DGGML_CUDA=1 -DCMAKE_CUDA_COMPILER=/usr/local/cuda-13.2/bin/nvcc $CUDA_ARCH_ARGS

CLAUDE.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,37 @@ git add .github/build_cuda_linux.sh pom.xml CLAUDE.md
3838
git commit -m "Upgrade CUDA from 13.2 to 13.3"
3939
```
4040

41+
### Fast local CUDA builds (`CUDA_FAST_BUILD`) — single-arch speed knob
42+
43+
The CUDA artifact must ship kernels for **every supported GPU generation**, so the default
44+
build — and every CI/release build — compiles the **full `CMAKE_CUDA_ARCHITECTURES` set** that
45+
ggml/llama.cpp selects. nvcc recompiles each `.cu` kernel once per architecture, which is the
46+
dominant cost of the ~70 min CUDA job. **`sccache` does not help here:** it caches the gcc
47+
C/C++ TUs but not the nvcc `.cu` kernels (sccache's nvcc support is limited/experimental), so
48+
the per-arch nvcc passes remain even with the cache on. The one reliable lever to cut that time
49+
is to build **fewer architectures**.
50+
51+
`build_cuda_linux.sh` therefore honors an **opt-in** env knob — default **off** (full arch set,
52+
release-safe):
53+
54+
```bash
55+
# Full release build (default): all archs — slow, runs on every GPU generation.
56+
.github/build_cuda_linux.sh "-DOS_NAME=Linux -DOS_ARCH=x86_64"
57+
58+
# Fast local dev build: one arch only. Defaults to `native` (the build machine's own GPU;
59+
# needs a GPU present at configure time). Override with CUDA_ARCH=<cc>, e.g. CUDA_ARCH=90.
60+
CUDA_FAST_BUILD=1 .github/build_cuda_linux.sh "-DOS_NAME=Linux -DOS_ARCH=x86_64"
61+
CUDA_FAST_BUILD=1 CUDA_ARCH=90 .github/build_cuda_linux.sh "-DOS_NAME=Linux -DOS_ARCH=x86_64"
62+
# Direct-cmake equivalent: cmake -B build -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=native
63+
```
64+
65+
**Why a separate, off-by-default flag (never enable it in CI/release):** an artifact built with
66+
`CUDA_FAST_BUILD` runs on only the single GPU generation it was compiled for. The flag exists
67+
purely to speed up **local iteration**; the CI CUDA job leaves it unset, so released jars keep
68+
full arch coverage. To cache the nvcc kernels too you would add
69+
`-DCMAKE_CUDA_COMPILER_LAUNCHER=sccache` (gated behind the same probe), but sccache's nvcc
70+
caching is unreliable — the arch knob is the better lever and is what this repo ships.
71+
4172
## Android minimum API level
4273

4374
Current Android minimum API level: **28** (Android 9.0 Pie)
@@ -735,6 +766,12 @@ ctest --test-dir build --output-on-failure -R "ResultsToJson"
735766

736767
llama.cpp is fetched via CMake FetchContent, pinned to `GIT_TAG b9682`.
737768

769+
**GoogleTest** is a separate `BUILD_TESTING`-only FetchContent (`GIT_TAG v1.17.0`), used solely
770+
by the `jllama_test` C++ unit-test binary — not by the shipped library, and not coupled to the
771+
llama.cpp pin or the bundled nlohmann/json. There is **no constraint behind the exact tag**; it
772+
is just the latest stable at the time it was last touched. Bump it from time to time (nothing
773+
auto-tracks it), pairing the bump with a green `C++ Tests` CI run.
774+
738775
```
739776
build/_deps/llama.cpp-src/tools/server/ ← server-task.h, server-common.h, etc.
740777
build/_deps/llama.cpp-src/include/ ← llama.h, llama-cpp.h

CMakeLists.txt

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -383,7 +383,10 @@ if(BUILD_TESTING)
383383
FetchContent_Declare(
384384
googletest
385385
GIT_REPOSITORY https://github.com/google/googletest.git
386-
GIT_TAG v1.15.2
386+
# No constraint behind this exact tag — GoogleTest is only used by this repo's own
387+
# C++ unit tests (jllama_test), not by the shipped library and not tied to llama.cpp.
388+
# It is just "latest stable at the time"; bump it from time to time (see CLAUDE.md).
389+
GIT_TAG v1.17.0
387390
)
388391
# Keep GTest on the same CRT as the rest of the project.
389392
# OFF means GTest respects CMAKE_MSVC_RUNTIME_LIBRARY (static /MT here).

0 commit comments

Comments
 (0)