You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bump the pinned llama.cpp tag and document the upstream API churn.
Version bump:
- CMakeLists.txt: FetchContent GIT_TAG + TTS-generator LLAMA_TAG b9789 -> b9803
- README.md: badge + release link
- CLAUDE.md: current-version, build-recipe, and sccache/test references
Verification:
- Both patches (0001 win32-arg-parse-embed-guard, 0002 server load-progress
callback) re-verified to apply cleanly against the actual b9803 sources via
`git apply --check` (github.com git-clone is blocked in this sandbox, so a
full FetchContent build runs in CI). Patch 0001's common_params_parse target
region is byte-identical to b9789; b9803's arg.cpp churn is confined to the
common_models_handler rewrite and set_examples tags, which don't overlap the
patched hunks. OuteTTS generator anchors hold (tts.cpp unchanged apart from
the main()-only parse flip).
- No project C++ TU references any removed symbol (common_params_handle_models,
common_download_model, common_skip_download_exception, skip_download,
preset_only) and none includes download.h directly, so the download-pipeline
rewrite and download.h's new hf-cache.h include stay inside upstream-compiled
files. The mtmd CLI-gating change is beneficial (the project links the mtmd
library, not the now-skipped CLI executables).
Doc-only Java follow-up: the b9803 removal of common_skip_download_exception
and the skip_download option made the ModelFlag.SKIP_DOWNLOAD /
SkipDownloadFailureTranslator / ModelUnavailableException javadoc stale. The
comments were corrected; runtime behaviour is unchanged because --skip-download
was never a registered upstream arg, so the heuristic still keys on the
parse-failure message.
Appended b9789->b9803 rows to docs/history/llama-cpp-breaking-changes.md
(download refactor, common.h changes, mtmd CLI gating, EAGLE-3/download
subcommand new features, CUDA/OpenCL/model-internal changes, patch
re-verification).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_015iMgeCXHE9UNu359GbFyXj
Copy file name to clipboardExpand all lines: CLAUDE.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
6
6
7
7
Java bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) via JNI, providing a high-level API for LLM inference in Java. The Java layer communicates with a native C++ library through JNI.
8
8
9
-
Current llama.cpp pinned version: **b9789**
9
+
Current llama.cpp pinned version: **b9803**
10
10
11
11
## Upgrading CUDA Version
12
12
@@ -241,7 +241,7 @@ needs no extra step here, `build-webui` re-reads the tag and rebuilds the matchi
241
241
ships no UI):
242
242
```bash
243
243
# needs node/npm + network; embed.cpp is plain C++17 (no npm)
Copy file name to clipboardExpand all lines: docs/history/llama-cpp-breaking-changes.md
+6Lines changed: 6 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -386,3 +386,9 @@ Used during `llama.cpp` version bumps: when upgrading, scan this file from the r
386
386
| b9739–b9789 |`common/json-schema-to-grammar.cpp` (Java-test impact) | The JSON-schema → GBNF serializer changed where it emits the `space` whitespace rule: a closing object is now `… )? space "}"` (was `… )? "}" space`) and a root-level `string` rule no longer appends a trailing `space` (`string ::= "\"" char* "\""`, was `… "\"" space`). Functionally equivalent (leading- vs trailing-whitespace placement) but byte-different, so the pinned expectation in `LlamaModelTest.testJsonSchemaToGrammar` was updated to the b9789 output. `LlamaModel.jsonSchemaToGrammar` is a pure JNI call (no model), so this failed on every platform's Java-test job; the new expectation was verified locally against the built b9789 `libjllama`. Test-data change only |
387
387
| b9739–b9789 |`tools/server/server-context.cpp` (**patch target**, regression) |`server_context::load_model` now **unconditionally** installs the server's own load-progress reporter on `params_base.load_progress_callback` immediately before `common_init_from_params` (b9739 called `common_init_from_params(params_base)` with no such assignment). This clobbered libjllama's `LoadProgressCallback` JNI trampoline (set on `common_params.load_progress_callback` before `load_model`), so `LoadProgressCallbackTest` observed zero progress updates and the abort-on-`false` path stopped throwing. Fixed by new **`patches/0002-server-preserve-caller-load-progress-callback.patch`**, which guards the install behind `if (params_base.load_progress_callback == nullptr)` so a caller-supplied callback survives (standalone `llama-server` keeps its reporter — the field is null there). Re-verified to apply + reverse-apply cleanly against b9789 and to compile clean (ctest still 454/454) |
388
388
| b9739–b9789 | upstream build / verification | Local build with `GIT_TAG b9789` verified clean on Linux x86_64 (GCC 13.3; sources were pre-staged from release tarballs + both patches hand-applied because this sandbox blocks `github.com` git-clone, so `FetchContent`'s git path and `PATCH_COMMAND` could not run — the published CI pipeline uses the normal git FetchContent path). `cmake -B build -DBUILD_TESTING=ON` configures cleanly (the OuteTTS build-time extraction **and** the refreshed Windows patch both pass their fail-loud anchor checks against b9789), `cmake --build build --config Release -j$(nproc)` links `libjllama.so` + `jllama_test` with zero warnings on any project translation unit, and `ctest --test-dir build --output-on-failure` reports **454/454 tests passing**. Every upstream breaking change in this range is absorbed inside upstream-compiled translation units, so no project C++ *source* edits were required — **but** PR CI's model-backed Java suite (which the restricted sandbox cannot run) surfaced two project-side fixes captured in the two rows above: the `json-schema-to-grammar` test-expectation update and the `load_progress_callback` server regression (`patches/0002`) |
389
+
| b9789–b9803 | `common/arg.{cpp,h}` + `common/download.{cpp,h}` + `common/common.h` (model-download refactor) | The model-download pipeline was rewritten: `common_params_handle_models()` / `common_params_handle_models_params` / `common_download_model()` / `common_download_model_result` **removed** and replaced by a two-phase `common_models_handler` API (`common_models_handler_init()` builds the HF plan + opts; `common_models_handler_apply()` runs the parallel `common_download_task` list); `common_download_opts::skip_download` / `::preset_only` and the whole `common_skip_download_exception` type **removed**; new `common_download_get_hf_plan()` / `common_download_run_tasks()` / `common_download_get_all_parts()`; `download.h` now `#include`s `hf-cache.h`. Project C++ references none of these — verified `grep -rn "common_params_handle_models\|common_download_model\|common_skip_download\|skip_download\|preset_only" src/main/cpp src/test/cpp` → zero matches, and no project TU includes `download.h` directly. All consumers (arg parsing, `server-models.cpp`, `llama-bench.cpp`) are upstream-compiled. No project C++ source changes required. **Java doc-only follow-up:** `ModelFlag.SKIP_DOWNLOAD` / `SkipDownloadFailureTranslator` / `ModelUnavailableException` javadoc referenced the now-removed `common_skip_download_exception`; the comments were corrected (the feature's runtime behaviour is unchanged — `--skip-download` was never a registered upstream arg, so it still produces a parse-failure the heuristic translates) |
390
+
| b9789–b9803 |`common/common.h`|`common_params_model` gained `bool empty()` and `get_name()` became `const` (additive); `common_params::skip_download` field **removed**; new `LLAMA_EXAMPLE_DOWNLOAD` enumerator appended before `LLAMA_EXAMPLE_COUNT`. None surfaced by `ModelParameters`; consumed inside upstream-compiled TUs. No project source changes required |
391
+
| b9789–b9803 |`CMakeLists.txt` + `tools/mtmd/CMakeLists.txt`| New top-level `LLAMA_BUILD_MTMD` option for standalone library-only mtmd builds; the mtmd **CLI** executables (`llama-llava-cli`, `llama-gemma3-cli`, `llama-minicpmv-cli`, `llama-qwen2vl-cli`, `llama-mtmd-cli`, `llama-mtmd-debug`) are now gated behind `if (LLAMA_BUILD_TOOLS)`. The project adds `tools/mtmd` directly with `LLAMA_BUILD_TOOLS=OFF`, so after this bump those CLI executables are **no longer built as collateral** — beneficial (less build time); the `mtmd`**library** target the project links still builds via the `if (TARGET mtmd)` block above the gate. No project source changes required |
392
+
| b9789–b9803 |`common/arg.cpp` + `docs/speculative.md`|**New feature** — EAGLE-3 speculative decoding (`--spec-type draft-eagle3`): a small one-layer draft transformer that reads the target model's hidden states for higher acceptance; plus a new standalone `llama download` / `llama get` subcommand (`app/download.cpp`, `LLAMA_EXAMPLE_DOWNLOAD`) and a `--mtp` download flag. Server-level CLI; not surfaced by `ModelParameters`/`InferenceParameters`. Could later feed an inference-parameter setter (`--spec-type`). No project source changes required |
393
+
| b9789–b9803 |`ggml/src/ggml-cuda/{binbcast,cpy}.cu` + `ggml-opencl` + `src/llama-model.{cpp,h}` + `src/models/lfm2.cpp`| Backend/model-internal only: CUDA `binbcast`/`cpy` kernels reworked for >INT_MAX index safety (int→uint32/int64 widening + overflow guards); OpenCL flushes the profiling batch on context teardown; new `LLM_TYPE_230M` mapped for LFM2 (`n_ff == 2560`). No API surface visible to `jllama.cpp`; CUDA set only affects the `cuda13-linux-x86-64` classifier, OpenCL only the `opencl-android-aarch64` classifier. No project source changes required |
394
+
| b9789–b9803 | upstream verification (sandbox) | Both `patches/0001-win32-arg-parse-embed-guard.patch` (37 files) and `patches/0002-server-preserve-caller-load-progress-callback.patch` re-verified to apply cleanly against b9803 via `git apply --check` over the actual b9803 sources fetched from `raw.githubusercontent.com` (github.com git-clone is blocked in this sandbox, so a full `FetchContent` build could not run — exit 0 for both patches). Patch 0001's `common_params_parse` target region is byte-identical to b9789; the b9803 arg.cpp churn is confined to the `common_models_handler` rewrite and `set_examples` tags, which don't overlap the patched hunks. OuteTTS generator anchors hold (upstream `tts.cpp` unchanged in this range apart from patch 0001's main()-only parse flip). Full build + `ctest` to be confirmed by the CI pipeline |
0 commit comments