Upgrade llama.cpp from b9789 to b9803

claude · claude · commit f1a28eb131d8 · 2026-06-26T04:01:15.000Z
Bump the pinned llama.cpp tag and document the upstream API churn. Version bump: - CMakeLists.txt: FetchContent GIT_TAG + TTS-generator LLAMA_TAG b9789 -> b9803 - README.md: badge + release link - CLAUDE.md: current-version, build-recipe, and sccache/test references Verification: - Both patches (0001 win32-arg-parse-embed-guard, 0002 server load-progress callback) re-verified to apply cleanly against the actual b9803 sources via `git apply --check` (github.com git-clone is blocked in this sandbox, so a full FetchContent build runs in CI). Patch 0001's common_params_parse target region is byte-identical to b9789; b9803's arg.cpp churn is confined to the common_models_handler rewrite and set_examples tags, which don't overlap the patched hunks. OuteTTS generator anchors hold (tts.cpp unchanged apart from the main()-only parse flip). - No project C++ TU references any removed symbol (common_params_handle_models, common_download_model, common_skip_download_exception, skip_download, preset_only) and none includes download.h directly, so the download-pipeline rewrite and download.h's new hf-cache.h include stay inside upstream-compiled files. The mtmd CLI-gating change is beneficial (the project links the mtmd library, not the now-skipped CLI executables). Doc-only Java follow-up: the b9803 removal of common_skip_download_exception and the skip_download option made the ModelFlag.SKIP_DOWNLOAD / SkipDownloadFailureTranslator / ModelUnavailableException javadoc stale. The comments were corrected; runtime behaviour is unchanged because --skip-download was never a registered upstream arg, so the heuristic still keys on the parse-failure message. Appended b9789->b9803 rows to docs/history/llama-cpp-breaking-changes.md (download refactor, common.h changes, mtmd CLI gating, EAGLE-3/download subcommand new features, CUDA/OpenCL/model-internal changes, patch re-verification). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_015iMgeCXHE9UNu359GbFyXj
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
 
 Java bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) via JNI, providing a high-level API for LLM inference in Java. The Java layer communicates with a native C++ library through JNI.
 
-Current llama.cpp pinned version: **b9789**
+Current llama.cpp pinned version: **b9803**
 
 ## Upgrading CUDA Version
 
@@ -241,7 +241,7 @@ needs no extra step here, `build-webui` re-reads the tag and rebuilds the matchi
 ships no UI):
 ```bash
 # needs node/npm + network; embed.cpp is plain C++17 (no npm)
-git clone --depth 1 --branch b9789 https://github.com/ggml-org/llama.cpp /tmp/lc
+git clone --depth 1 --branch b9803 https://github.com/ggml-org/llama.cpp /tmp/lc
 ( cd /tmp/lc/tools/ui && npm ci && npm run build \
   && ( cd dist && find . -type f -not -path './_gzip/*' \
        | while read -r f; do mkdir -p "_gzip/$(dirname "$f")"; gzip -9 -c "$f" > "_gzip/$f"; done ) \
@@ -275,7 +275,7 @@ plus a cache token are present, `build.sh` adds
 - `SCCACHE_WEBDAV_TOKEN: ${{ secrets.DEPOT_TOKEN }}` — a Depot **organization** token, stored
   as the repo secret **`DEPOT_TOKEN`**.
 
-Because `sccache` is **content-addressed** and llama.cpp is pinned (`GIT_TAG b9789`), the
+Because `sccache` is **content-addressed** and llama.cpp is pinned (`GIT_TAG b9803`), the
 ~280 upstream object files are byte-identical every run, so a warm cache recompiles only the
 *changed* files. Depot's cache is **shared across all branches** (unlike GitHub's
 per-branch `actions/cache`), so every branch builds incrementally; a `b<nnnn>` version bump
@@ -957,7 +957,7 @@ ctest --test-dir build --output-on-failure -R "ResultsToJson"
 
 #### Upstream source location (in CMake build tree)
 
-llama.cpp is fetched via CMake FetchContent, pinned to `GIT_TAG b9789`.
+llama.cpp is fetched via CMake FetchContent, pinned to `GIT_TAG b9803`.
 
 **GoogleTest** is a separate `BUILD_TESTING`-only FetchContent (`GIT_TAG v1.17.0`), used solely
 by the `jllama_test` C++ unit-test binary — not by the shipped library, and not coupled to the
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -143,7 +143,7 @@ set(LLAMA_BUILD_APP OFF CACHE BOOL "" FORCE)
 FetchContent_Declare(
 	llama.cpp
 	GIT_REPOSITORY https://github.com/ggerganov/llama.cpp.git
-	GIT_TAG        b9789
+	GIT_TAG        b9803
 	PATCH_COMMAND  ${CMAKE_COMMAND}
 		-DPATCH_DIR=${CMAKE_CURRENT_SOURCE_DIR}/patches
 		-DLLAMA_SRC=<SOURCE_DIR>
@@ -166,7 +166,7 @@ execute_process(
     COMMAND ${CMAKE_COMMAND}
         -DTTS_SRC=${llama.cpp_SOURCE_DIR}/tools/tts/tts.cpp
         -DOUT_CPP=${JLLAMA_TTS_GEN_CPP}
-        -DLLAMA_TAG=b9789
+        -DLLAMA_TAG=b9803
         -P ${CMAKE_CURRENT_SOURCE_DIR}/cmake/generate-tts-upstream.cmake
     RESULT_VARIABLE JLLAMA_TTS_GEN_RESULT
 )
diff --git a/README.md b/README.md
@@ -7,7 +7,7 @@
 **Build:**  
 ![Java 8+](https://img.shields.io/badge/Java-8%2B-informational)  
 ![Platform](https://img.shields.io/badge/Platform-Linux%20%7C%20macOS%20%7C%20Windows%20%7C%20Android-lightgrey)  
-[![llama.cpp b9789](https://img.shields.io/badge/llama.cpp-%23b9789-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9789)  
+[![llama.cpp b9803](https://img.shields.io/badge/llama.cpp-%23b9803-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9803)  
 [![JPMS](https://img.shields.io/badge/JPMS-modular%20JAR-25A162)](https://openjdk.org/projects/jigsaw/)  
 ![JUnit](https://img.shields.io/badge/tested%20with-JUnit6-25A162)  
 [![JSpecify](https://img.shields.io/badge/JSpecify-1.0.0%20%40NullMarked-25A162)](https://jspecify.dev)  
diff --git a/docs/history/llama-cpp-breaking-changes.md b/docs/history/llama-cpp-breaking-changes.md
@@ -386,3 +386,9 @@ Used during `llama.cpp` version bumps: when upgrading, scan this file from the r
 | b9739–b9789 | `common/json-schema-to-grammar.cpp` (Java-test impact) | The JSON-schema &#x2192; GBNF serializer changed where it emits the `space` whitespace rule: a closing object is now `… )? space "}"` (was `… )? "}" space`) and a root-level `string` rule no longer appends a trailing `space` (`string ::= "\"" char* "\""`, was `… "\"" space`). Functionally equivalent (leading- vs trailing-whitespace placement) but byte-different, so the pinned expectation in `LlamaModelTest.testJsonSchemaToGrammar` was updated to the b9789 output. `LlamaModel.jsonSchemaToGrammar` is a pure JNI call (no model), so this failed on every platform's Java-test job; the new expectation was verified locally against the built b9789 `libjllama`. Test-data change only |
 | b9739–b9789 | `tools/server/server-context.cpp` (**patch target**, regression) | `server_context::load_model` now **unconditionally** installs the server's own load-progress reporter on `params_base.load_progress_callback` immediately before `common_init_from_params` (b9739 called `common_init_from_params(params_base)` with no such assignment). This clobbered libjllama's `LoadProgressCallback` JNI trampoline (set on `common_params.load_progress_callback` before `load_model`), so `LoadProgressCallbackTest` observed zero progress updates and the abort-on-`false` path stopped throwing. Fixed by new **`patches/0002-server-preserve-caller-load-progress-callback.patch`**, which guards the install behind `if (params_base.load_progress_callback == nullptr)` so a caller-supplied callback survives (standalone `llama-server` keeps its reporter — the field is null there). Re-verified to apply + reverse-apply cleanly against b9789 and to compile clean (ctest still 454/454) |
 | b9739–b9789 | upstream build / verification | Local build with `GIT_TAG b9789` verified clean on Linux x86_64 (GCC 13.3; sources were pre-staged from release tarballs + both patches hand-applied because this sandbox blocks `github.com` git-clone, so `FetchContent`'s git path and `PATCH_COMMAND` could not run &mdash; the published CI pipeline uses the normal git FetchContent path). `cmake -B build -DBUILD_TESTING=ON` configures cleanly (the OuteTTS build-time extraction **and** the refreshed Windows patch both pass their fail-loud anchor checks against b9789), `cmake --build build --config Release -j$(nproc)` links `libjllama.so` + `jllama_test` with zero warnings on any project translation unit, and `ctest --test-dir build --output-on-failure` reports **454/454 tests passing**. Every upstream breaking change in this range is absorbed inside upstream-compiled translation units, so no project C++ *source* edits were required &mdash; **but** PR CI's model-backed Java suite (which the restricted sandbox cannot run) surfaced two project-side fixes captured in the two rows above: the `json-schema-to-grammar` test-expectation update and the `load_progress_callback` server regression (`patches/0002`) |
+| b9789–b9803 | `common/arg.{cpp,h}` + `common/download.{cpp,h}` + `common/common.h` (model-download refactor) | The model-download pipeline was rewritten: `common_params_handle_models()` / `common_params_handle_models_params` / `common_download_model()` / `common_download_model_result` **removed** and replaced by a two-phase `common_models_handler` API (`common_models_handler_init()` builds the HF plan + opts; `common_models_handler_apply()` runs the parallel `common_download_task` list); `common_download_opts::skip_download` / `::preset_only` and the whole `common_skip_download_exception` type **removed**; new `common_download_get_hf_plan()` / `common_download_run_tasks()` / `common_download_get_all_parts()`; `download.h` now `#include`s `hf-cache.h`. Project C++ references none of these — verified `grep -rn "common_params_handle_models\|common_download_model\|common_skip_download\|skip_download\|preset_only" src/main/cpp src/test/cpp` &#x2192; zero matches, and no project TU includes `download.h` directly. All consumers (arg parsing, `server-models.cpp`, `llama-bench.cpp`) are upstream-compiled. No project C++ source changes required. **Java doc-only follow-up:** `ModelFlag.SKIP_DOWNLOAD` / `SkipDownloadFailureTranslator` / `ModelUnavailableException` javadoc referenced the now-removed `common_skip_download_exception`; the comments were corrected (the feature's runtime behaviour is unchanged — `--skip-download` was never a registered upstream arg, so it still produces a parse-failure the heuristic translates) |
+| b9789–b9803 | `common/common.h` | `common_params_model` gained `bool empty()` and `get_name()` became `const` (additive); `common_params::skip_download` field **removed**; new `LLAMA_EXAMPLE_DOWNLOAD` enumerator appended before `LLAMA_EXAMPLE_COUNT`. None surfaced by `ModelParameters`; consumed inside upstream-compiled TUs. No project source changes required |
+| b9789–b9803 | `CMakeLists.txt` + `tools/mtmd/CMakeLists.txt` | New top-level `LLAMA_BUILD_MTMD` option for standalone library-only mtmd builds; the mtmd **CLI** executables (`llama-llava-cli`, `llama-gemma3-cli`, `llama-minicpmv-cli`, `llama-qwen2vl-cli`, `llama-mtmd-cli`, `llama-mtmd-debug`) are now gated behind `if (LLAMA_BUILD_TOOLS)`. The project adds `tools/mtmd` directly with `LLAMA_BUILD_TOOLS=OFF`, so after this bump those CLI executables are **no longer built as collateral** — beneficial (less build time); the `mtmd` **library** target the project links still builds via the `if (TARGET mtmd)` block above the gate. No project source changes required |
+| b9789–b9803 | `common/arg.cpp` + `docs/speculative.md` | **New feature** — EAGLE-3 speculative decoding (`--spec-type draft-eagle3`): a small one-layer draft transformer that reads the target model's hidden states for higher acceptance; plus a new standalone `llama download` / `llama get` subcommand (`app/download.cpp`, `LLAMA_EXAMPLE_DOWNLOAD`) and a `--mtp` download flag. Server-level CLI; not surfaced by `ModelParameters`/`InferenceParameters`. Could later feed an inference-parameter setter (`--spec-type`). No project source changes required |
+| b9789–b9803 | `ggml/src/ggml-cuda/{binbcast,cpy}.cu` + `ggml-opencl` + `src/llama-model.{cpp,h}` + `src/models/lfm2.cpp` | Backend/model-internal only: CUDA `binbcast`/`cpy` kernels reworked for >INT_MAX index safety (int→uint32/int64 widening + overflow guards); OpenCL flushes the profiling batch on context teardown; new `LLM_TYPE_230M` mapped for LFM2 (`n_ff == 2560`). No API surface visible to `jllama.cpp`; CUDA set only affects the `cuda13-linux-x86-64` classifier, OpenCL only the `opencl-android-aarch64` classifier. No project source changes required |
+| b9789–b9803 | upstream verification (sandbox) | Both `patches/0001-win32-arg-parse-embed-guard.patch` (37 files) and `patches/0002-server-preserve-caller-load-progress-callback.patch` re-verified to apply cleanly against b9803 via `git apply --check` over the actual b9803 sources fetched from `raw.githubusercontent.com` (github.com git-clone is blocked in this sandbox, so a full `FetchContent` build could not run — exit 0 for both patches). Patch 0001's `common_params_parse` target region is byte-identical to b9789; the b9803 arg.cpp churn is confined to the `common_models_handler` rewrite and `set_examples` tags, which don't overlap the patched hunks. OuteTTS generator anchors hold (upstream `tts.cpp` unchanged in this range apart from patch 0001's main()-only parse flip). Full build + `ctest` to be confirmed by the CI pipeline |
diff --git a/src/main/java/net/ladenthin/llama/args/ModelFlag.java b/src/main/java/net/ladenthin/llama/args/ModelFlag.java
@@ -119,11 +119,13 @@ public enum ModelFlag {
      * Skip any model file download — only validation is performed. Useful for air-gapped or
      * pre-staged-model deployments where any outbound network call is a failure mode.
      *
-     * <p>When this flag is set and the configured model file is missing or invalid (e.g. ETag
-     * mismatch), upstream throws {@code common_skip_download_exception} during arg parsing,
-     * which is caught inside {@code common_params_parse_ex} and surfaces as a {@code false}
-     * return; the Java layer translates that combined signal into a typed
-     * {@link net.ladenthin.llama.exception.ModelUnavailableException}.</p>
+     * <p>{@code --skip-download} is not a registered upstream argument, so passing it makes
+     * upstream arg parsing fail and {@code common_params_parse} return {@code false}; the Java
+     * layer translates that parse-failure signal (combined with this flag) into a typed
+     * {@link net.ladenthin.llama.exception.ModelUnavailableException}. (Earlier llama.cpp builds
+     * raised a {@code common_skip_download_exception} here; that type and the {@code skip_download}
+     * download option were removed in b9803, but the heuristic is unaffected — it keys on the
+     * parse-failure message, not the C++ exception.)</p>
      */
     SKIP_DOWNLOAD("--skip-download");
 
diff --git a/src/main/java/net/ladenthin/llama/exception/ModelUnavailableException.java b/src/main/java/net/ladenthin/llama/exception/ModelUnavailableException.java
@@ -14,11 +14,15 @@
  * forbidden to.
  *
  * <p>Lets air-gapped / pre-staged-model deployments distinguish &quot;model file
- * absent&quot; from generic configuration errors. Upstream raises
- * {@code common_skip_download_exception} which is caught inside
- * {@code common_params_parse_ex} and surfaces as a {@code false} return; the
- * Java layer combines that with the {@code SKIP_DOWNLOAD} flag to recognise the
- * skip-download case and translate it to this typed exception.</p>
+ * absent&quot; from generic configuration errors. The {@code --skip-download}
+ * flag is not a registered upstream argument, so upstream arg parsing fails and
+ * {@code common_params_parse} returns {@code false}; the Java layer combines that
+ * parse-failure signal with the {@code SKIP_DOWNLOAD} flag to recognise the
+ * skip-download case and translate it to this typed exception. (Earlier llama.cpp
+ * builds raised a {@code common_skip_download_exception} for this; that type was
+ * removed in b9803 together with the {@code skip_download} download option, but
+ * the Java-side heuristic is unaffected because it keys on the parse-failure
+ * message, not the C++ exception.)</p>
  */
 public class ModelUnavailableException extends LlamaException {
 
diff --git a/src/main/java/net/ladenthin/llama/loader/SkipDownloadFailureTranslator.java b/src/main/java/net/ladenthin/llama/loader/SkipDownloadFailureTranslator.java
@@ -22,16 +22,17 @@
  *
  * <h2>Why a heuristic and not a direct exception catch</h2>
  *
- * <p>Upstream raises {@code common_skip_download_exception} inside
- * {@code common_download_file_single} when {@code --skip-download} is set and
- * the file is missing or has a stale ETag. However that exception is caught
- * INSIDE upstream's own {@code common_params_parse_ex} (at
- * {@code common/arg.cpp:476}) and surfaces only as a {@code false} return
- * from {@code common_params_parse}. The JNI layer reports the {@code false}
- * return as a generic {@link net.ladenthin.llama.exception.LlamaException} with the message
- * {@value #LOAD_PARSE_FAILED_MESSAGE}. The Java layer therefore cannot catch
- * the C++ exception directly and instead recognises the combined signal:
- * {@code SKIP_DOWNLOAD} flag set + JNI message matches.</p>
+ * <p>{@code --skip-download} is not a registered upstream argument, so passing
+ * it makes upstream arg parsing fail and {@code common_params_parse} return
+ * {@code false}. The JNI layer reports that {@code false} return as a generic
+ * {@link net.ladenthin.llama.exception.LlamaException} with the message
+ * {@value #LOAD_PARSE_FAILED_MESSAGE}. The Java layer recognises the combined
+ * signal: {@code SKIP_DOWNLOAD} flag set + JNI message matches. (Earlier
+ * llama.cpp builds raised a {@code common_skip_download_exception} inside
+ * {@code common_download_file_single} for this case, caught within upstream's own
+ * {@code common_params_parse_ex}; that type and the {@code skip_download} option
+ * were removed in b9803, but the heuristic is unaffected because it keys on the
+ * parse-failure message rather than the C++ exception.)</p>
  */
 public final class SkipDownloadFailureTranslator {