Skip to content

Commit f1a28eb

Browse files
committed
Upgrade llama.cpp from b9789 to b9803
Bump the pinned llama.cpp tag and document the upstream API churn. Version bump: - CMakeLists.txt: FetchContent GIT_TAG + TTS-generator LLAMA_TAG b9789 -> b9803 - README.md: badge + release link - CLAUDE.md: current-version, build-recipe, and sccache/test references Verification: - Both patches (0001 win32-arg-parse-embed-guard, 0002 server load-progress callback) re-verified to apply cleanly against the actual b9803 sources via `git apply --check` (github.com git-clone is blocked in this sandbox, so a full FetchContent build runs in CI). Patch 0001's common_params_parse target region is byte-identical to b9789; b9803's arg.cpp churn is confined to the common_models_handler rewrite and set_examples tags, which don't overlap the patched hunks. OuteTTS generator anchors hold (tts.cpp unchanged apart from the main()-only parse flip). - No project C++ TU references any removed symbol (common_params_handle_models, common_download_model, common_skip_download_exception, skip_download, preset_only) and none includes download.h directly, so the download-pipeline rewrite and download.h's new hf-cache.h include stay inside upstream-compiled files. The mtmd CLI-gating change is beneficial (the project links the mtmd library, not the now-skipped CLI executables). Doc-only Java follow-up: the b9803 removal of common_skip_download_exception and the skip_download option made the ModelFlag.SKIP_DOWNLOAD / SkipDownloadFailureTranslator / ModelUnavailableException javadoc stale. The comments were corrected; runtime behaviour is unchanged because --skip-download was never a registered upstream arg, so the heuristic still keys on the parse-failure message. Appended b9789->b9803 rows to docs/history/llama-cpp-breaking-changes.md (download refactor, common.h changes, mtmd CLI gating, EAGLE-3/download subcommand new features, CUDA/OpenCL/model-internal changes, patch re-verification). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_015iMgeCXHE9UNu359GbFyXj
1 parent 212634e commit f1a28eb

7 files changed

Lines changed: 40 additions & 27 deletions

File tree

CLAUDE.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
66

77
Java bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) via JNI, providing a high-level API for LLM inference in Java. The Java layer communicates with a native C++ library through JNI.
88

9-
Current llama.cpp pinned version: **b9789**
9+
Current llama.cpp pinned version: **b9803**
1010

1111
## Upgrading CUDA Version
1212

@@ -241,7 +241,7 @@ needs no extra step here, `build-webui` re-reads the tag and rebuilds the matchi
241241
ships no UI):
242242
```bash
243243
# needs node/npm + network; embed.cpp is plain C++17 (no npm)
244-
git clone --depth 1 --branch b9789 https://github.com/ggml-org/llama.cpp /tmp/lc
244+
git clone --depth 1 --branch b9803 https://github.com/ggml-org/llama.cpp /tmp/lc
245245
( cd /tmp/lc/tools/ui && npm ci && npm run build \
246246
&& ( cd dist && find . -type f -not -path './_gzip/*' \
247247
| while read -r f; do mkdir -p "_gzip/$(dirname "$f")"; gzip -9 -c "$f" > "_gzip/$f"; done ) \
@@ -275,7 +275,7 @@ plus a cache token are present, `build.sh` adds
275275
- `SCCACHE_WEBDAV_TOKEN: ${{ secrets.DEPOT_TOKEN }}` — a Depot **organization** token, stored
276276
as the repo secret **`DEPOT_TOKEN`**.
277277

278-
Because `sccache` is **content-addressed** and llama.cpp is pinned (`GIT_TAG b9789`), the
278+
Because `sccache` is **content-addressed** and llama.cpp is pinned (`GIT_TAG b9803`), the
279279
~280 upstream object files are byte-identical every run, so a warm cache recompiles only the
280280
*changed* files. Depot's cache is **shared across all branches** (unlike GitHub's
281281
per-branch `actions/cache`), so every branch builds incrementally; a `b<nnnn>` version bump
@@ -957,7 +957,7 @@ ctest --test-dir build --output-on-failure -R "ResultsToJson"
957957

958958
#### Upstream source location (in CMake build tree)
959959

960-
llama.cpp is fetched via CMake FetchContent, pinned to `GIT_TAG b9789`.
960+
llama.cpp is fetched via CMake FetchContent, pinned to `GIT_TAG b9803`.
961961

962962
**GoogleTest** is a separate `BUILD_TESTING`-only FetchContent (`GIT_TAG v1.17.0`), used solely
963963
by the `jllama_test` C++ unit-test binary — not by the shipped library, and not coupled to the

CMakeLists.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -143,7 +143,7 @@ set(LLAMA_BUILD_APP OFF CACHE BOOL "" FORCE)
143143
FetchContent_Declare(
144144
llama.cpp
145145
GIT_REPOSITORY https://github.com/ggerganov/llama.cpp.git
146-
GIT_TAG b9789
146+
GIT_TAG b9803
147147
PATCH_COMMAND ${CMAKE_COMMAND}
148148
-DPATCH_DIR=${CMAKE_CURRENT_SOURCE_DIR}/patches
149149
-DLLAMA_SRC=<SOURCE_DIR>
@@ -166,7 +166,7 @@ execute_process(
166166
COMMAND ${CMAKE_COMMAND}
167167
-DTTS_SRC=${llama.cpp_SOURCE_DIR}/tools/tts/tts.cpp
168168
-DOUT_CPP=${JLLAMA_TTS_GEN_CPP}
169-
-DLLAMA_TAG=b9789
169+
-DLLAMA_TAG=b9803
170170
-P ${CMAKE_CURRENT_SOURCE_DIR}/cmake/generate-tts-upstream.cmake
171171
RESULT_VARIABLE JLLAMA_TTS_GEN_RESULT
172172
)

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
**Build:**
88
![Java 8+](https://img.shields.io/badge/Java-8%2B-informational)
99
![Platform](https://img.shields.io/badge/Platform-Linux%20%7C%20macOS%20%7C%20Windows%20%7C%20Android-lightgrey)
10-
[![llama.cpp b9789](https://img.shields.io/badge/llama.cpp-%23b9789-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9789)
10+
[![llama.cpp b9803](https://img.shields.io/badge/llama.cpp-%23b9803-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9803)
1111
[![JPMS](https://img.shields.io/badge/JPMS-modular%20JAR-25A162)](https://openjdk.org/projects/jigsaw/)
1212
![JUnit](https://img.shields.io/badge/tested%20with-JUnit6-25A162)
1313
[![JSpecify](https://img.shields.io/badge/JSpecify-1.0.0%20%40NullMarked-25A162)](https://jspecify.dev)

docs/history/llama-cpp-breaking-changes.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -386,3 +386,9 @@ Used during `llama.cpp` version bumps: when upgrading, scan this file from the r
386386
| b9739–b9789 | `common/json-schema-to-grammar.cpp` (Java-test impact) | The JSON-schema &#x2192; GBNF serializer changed where it emits the `space` whitespace rule: a closing object is now `… )? space "}"` (was `… )? "}" space`) and a root-level `string` rule no longer appends a trailing `space` (`string ::= "\"" char* "\""`, was `… "\"" space`). Functionally equivalent (leading- vs trailing-whitespace placement) but byte-different, so the pinned expectation in `LlamaModelTest.testJsonSchemaToGrammar` was updated to the b9789 output. `LlamaModel.jsonSchemaToGrammar` is a pure JNI call (no model), so this failed on every platform's Java-test job; the new expectation was verified locally against the built b9789 `libjllama`. Test-data change only |
387387
| b9739–b9789 | `tools/server/server-context.cpp` (**patch target**, regression) | `server_context::load_model` now **unconditionally** installs the server's own load-progress reporter on `params_base.load_progress_callback` immediately before `common_init_from_params` (b9739 called `common_init_from_params(params_base)` with no such assignment). This clobbered libjllama's `LoadProgressCallback` JNI trampoline (set on `common_params.load_progress_callback` before `load_model`), so `LoadProgressCallbackTest` observed zero progress updates and the abort-on-`false` path stopped throwing. Fixed by new **`patches/0002-server-preserve-caller-load-progress-callback.patch`**, which guards the install behind `if (params_base.load_progress_callback == nullptr)` so a caller-supplied callback survives (standalone `llama-server` keeps its reporter — the field is null there). Re-verified to apply + reverse-apply cleanly against b9789 and to compile clean (ctest still 454/454) |
388388
| b9739–b9789 | upstream build / verification | Local build with `GIT_TAG b9789` verified clean on Linux x86_64 (GCC 13.3; sources were pre-staged from release tarballs + both patches hand-applied because this sandbox blocks `github.com` git-clone, so `FetchContent`'s git path and `PATCH_COMMAND` could not run &mdash; the published CI pipeline uses the normal git FetchContent path). `cmake -B build -DBUILD_TESTING=ON` configures cleanly (the OuteTTS build-time extraction **and** the refreshed Windows patch both pass their fail-loud anchor checks against b9789), `cmake --build build --config Release -j$(nproc)` links `libjllama.so` + `jllama_test` with zero warnings on any project translation unit, and `ctest --test-dir build --output-on-failure` reports **454/454 tests passing**. Every upstream breaking change in this range is absorbed inside upstream-compiled translation units, so no project C++ *source* edits were required &mdash; **but** PR CI's model-backed Java suite (which the restricted sandbox cannot run) surfaced two project-side fixes captured in the two rows above: the `json-schema-to-grammar` test-expectation update and the `load_progress_callback` server regression (`patches/0002`) |
389+
| b9789–b9803 | `common/arg.{cpp,h}` + `common/download.{cpp,h}` + `common/common.h` (model-download refactor) | The model-download pipeline was rewritten: `common_params_handle_models()` / `common_params_handle_models_params` / `common_download_model()` / `common_download_model_result` **removed** and replaced by a two-phase `common_models_handler` API (`common_models_handler_init()` builds the HF plan + opts; `common_models_handler_apply()` runs the parallel `common_download_task` list); `common_download_opts::skip_download` / `::preset_only` and the whole `common_skip_download_exception` type **removed**; new `common_download_get_hf_plan()` / `common_download_run_tasks()` / `common_download_get_all_parts()`; `download.h` now `#include`s `hf-cache.h`. Project C++ references none of these — verified `grep -rn "common_params_handle_models\|common_download_model\|common_skip_download\|skip_download\|preset_only" src/main/cpp src/test/cpp` &#x2192; zero matches, and no project TU includes `download.h` directly. All consumers (arg parsing, `server-models.cpp`, `llama-bench.cpp`) are upstream-compiled. No project C++ source changes required. **Java doc-only follow-up:** `ModelFlag.SKIP_DOWNLOAD` / `SkipDownloadFailureTranslator` / `ModelUnavailableException` javadoc referenced the now-removed `common_skip_download_exception`; the comments were corrected (the feature's runtime behaviour is unchanged — `--skip-download` was never a registered upstream arg, so it still produces a parse-failure the heuristic translates) |
390+
| b9789–b9803 | `common/common.h` | `common_params_model` gained `bool empty()` and `get_name()` became `const` (additive); `common_params::skip_download` field **removed**; new `LLAMA_EXAMPLE_DOWNLOAD` enumerator appended before `LLAMA_EXAMPLE_COUNT`. None surfaced by `ModelParameters`; consumed inside upstream-compiled TUs. No project source changes required |
391+
| b9789–b9803 | `CMakeLists.txt` + `tools/mtmd/CMakeLists.txt` | New top-level `LLAMA_BUILD_MTMD` option for standalone library-only mtmd builds; the mtmd **CLI** executables (`llama-llava-cli`, `llama-gemma3-cli`, `llama-minicpmv-cli`, `llama-qwen2vl-cli`, `llama-mtmd-cli`, `llama-mtmd-debug`) are now gated behind `if (LLAMA_BUILD_TOOLS)`. The project adds `tools/mtmd` directly with `LLAMA_BUILD_TOOLS=OFF`, so after this bump those CLI executables are **no longer built as collateral** — beneficial (less build time); the `mtmd` **library** target the project links still builds via the `if (TARGET mtmd)` block above the gate. No project source changes required |
392+
| b9789–b9803 | `common/arg.cpp` + `docs/speculative.md` | **New feature** — EAGLE-3 speculative decoding (`--spec-type draft-eagle3`): a small one-layer draft transformer that reads the target model's hidden states for higher acceptance; plus a new standalone `llama download` / `llama get` subcommand (`app/download.cpp`, `LLAMA_EXAMPLE_DOWNLOAD`) and a `--mtp` download flag. Server-level CLI; not surfaced by `ModelParameters`/`InferenceParameters`. Could later feed an inference-parameter setter (`--spec-type`). No project source changes required |
393+
| b9789–b9803 | `ggml/src/ggml-cuda/{binbcast,cpy}.cu` + `ggml-opencl` + `src/llama-model.{cpp,h}` + `src/models/lfm2.cpp` | Backend/model-internal only: CUDA `binbcast`/`cpy` kernels reworked for >INT_MAX index safety (int→uint32/int64 widening + overflow guards); OpenCL flushes the profiling batch on context teardown; new `LLM_TYPE_230M` mapped for LFM2 (`n_ff == 2560`). No API surface visible to `jllama.cpp`; CUDA set only affects the `cuda13-linux-x86-64` classifier, OpenCL only the `opencl-android-aarch64` classifier. No project source changes required |
394+
| b9789–b9803 | upstream verification (sandbox) | Both `patches/0001-win32-arg-parse-embed-guard.patch` (37 files) and `patches/0002-server-preserve-caller-load-progress-callback.patch` re-verified to apply cleanly against b9803 via `git apply --check` over the actual b9803 sources fetched from `raw.githubusercontent.com` (github.com git-clone is blocked in this sandbox, so a full `FetchContent` build could not run — exit 0 for both patches). Patch 0001's `common_params_parse` target region is byte-identical to b9789; the b9803 arg.cpp churn is confined to the `common_models_handler` rewrite and `set_examples` tags, which don't overlap the patched hunks. OuteTTS generator anchors hold (upstream `tts.cpp` unchanged in this range apart from patch 0001's main()-only parse flip). Full build + `ctest` to be confirmed by the CI pipeline |

src/main/java/net/ladenthin/llama/args/ModelFlag.java

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -119,11 +119,13 @@ public enum ModelFlag {
119119
* Skip any model file download — only validation is performed. Useful for air-gapped or
120120
* pre-staged-model deployments where any outbound network call is a failure mode.
121121
*
122-
* <p>When this flag is set and the configured model file is missing or invalid (e.g. ETag
123-
* mismatch), upstream throws {@code common_skip_download_exception} during arg parsing,
124-
* which is caught inside {@code common_params_parse_ex} and surfaces as a {@code false}
125-
* return; the Java layer translates that combined signal into a typed
126-
* {@link net.ladenthin.llama.exception.ModelUnavailableException}.</p>
122+
* <p>{@code --skip-download} is not a registered upstream argument, so passing it makes
123+
* upstream arg parsing fail and {@code common_params_parse} return {@code false}; the Java
124+
* layer translates that parse-failure signal (combined with this flag) into a typed
125+
* {@link net.ladenthin.llama.exception.ModelUnavailableException}. (Earlier llama.cpp builds
126+
* raised a {@code common_skip_download_exception} here; that type and the {@code skip_download}
127+
* download option were removed in b9803, but the heuristic is unaffected — it keys on the
128+
* parse-failure message, not the C++ exception.)</p>
127129
*/
128130
SKIP_DOWNLOAD("--skip-download");
129131

src/main/java/net/ladenthin/llama/exception/ModelUnavailableException.java

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,15 @@
1414
* forbidden to.
1515
*
1616
* <p>Lets air-gapped / pre-staged-model deployments distinguish &quot;model file
17-
* absent&quot; from generic configuration errors. Upstream raises
18-
* {@code common_skip_download_exception} which is caught inside
19-
* {@code common_params_parse_ex} and surfaces as a {@code false} return; the
20-
* Java layer combines that with the {@code SKIP_DOWNLOAD} flag to recognise the
21-
* skip-download case and translate it to this typed exception.</p>
17+
* absent&quot; from generic configuration errors. The {@code --skip-download}
18+
* flag is not a registered upstream argument, so upstream arg parsing fails and
19+
* {@code common_params_parse} returns {@code false}; the Java layer combines that
20+
* parse-failure signal with the {@code SKIP_DOWNLOAD} flag to recognise the
21+
* skip-download case and translate it to this typed exception. (Earlier llama.cpp
22+
* builds raised a {@code common_skip_download_exception} for this; that type was
23+
* removed in b9803 together with the {@code skip_download} download option, but
24+
* the Java-side heuristic is unaffected because it keys on the parse-failure
25+
* message, not the C++ exception.)</p>
2226
*/
2327
public class ModelUnavailableException extends LlamaException {
2428

src/main/java/net/ladenthin/llama/loader/SkipDownloadFailureTranslator.java

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -22,16 +22,17 @@
2222
*
2323
* <h2>Why a heuristic and not a direct exception catch</h2>
2424
*
25-
* <p>Upstream raises {@code common_skip_download_exception} inside
26-
* {@code common_download_file_single} when {@code --skip-download} is set and
27-
* the file is missing or has a stale ETag. However that exception is caught
28-
* INSIDE upstream's own {@code common_params_parse_ex} (at
29-
* {@code common/arg.cpp:476}) and surfaces only as a {@code false} return
30-
* from {@code common_params_parse}. The JNI layer reports the {@code false}
31-
* return as a generic {@link net.ladenthin.llama.exception.LlamaException} with the message
32-
* {@value #LOAD_PARSE_FAILED_MESSAGE}. The Java layer therefore cannot catch
33-
* the C++ exception directly and instead recognises the combined signal:
34-
* {@code SKIP_DOWNLOAD} flag set + JNI message matches.</p>
25+
* <p>{@code --skip-download} is not a registered upstream argument, so passing
26+
* it makes upstream arg parsing fail and {@code common_params_parse} return
27+
* {@code false}. The JNI layer reports that {@code false} return as a generic
28+
* {@link net.ladenthin.llama.exception.LlamaException} with the message
29+
* {@value #LOAD_PARSE_FAILED_MESSAGE}. The Java layer recognises the combined
30+
* signal: {@code SKIP_DOWNLOAD} flag set + JNI message matches. (Earlier
31+
* llama.cpp builds raised a {@code common_skip_download_exception} inside
32+
* {@code common_download_file_single} for this case, caught within upstream's own
33+
* {@code common_params_parse_ex}; that type and the {@code skip_download} option
34+
* were removed in b9803, but the heuristic is unaffected because it keys on the
35+
* parse-failure message rather than the C++ exception.)</p>
3536
*/
3637
public final class SkipDownloadFailureTranslator {
3738

0 commit comments

Comments
 (0)