Skip to content

Commit 9a9ac4f

Browse files
committed
Upgrade llama.cpp from b9739 to b9789
Bump the pinned llama.cpp tag and refresh the Windows argv patch for the upgraded source. Every upstream breaking change in this range is absorbed inside upstream-compiled translation units; no project C++ source edits were required. - CMakeLists.txt: GIT_TAG + LLAMA_TAG b9739 -> b9789. - README.md / CLAUDE.md / publish.yml / TODO.md: version badge, pinned- version notes, WebUI clone example, aarch64 GCC rationale. - patches/0001-win32-arg-parse-embed-guard.patch: refreshed for b9789. Upstream replaced the original #24779 argv override with the count-guard form (if utf8.buf.size() == argc), which is exactly the variant that breaks the Windows server-integration tests, so the patch still drops it entirely and keeps "(void) utf8;". Re-verified to apply and reverse-apply cleanly (idempotent) against b9789 common/arg.cpp. - docs/history/llama-cpp-breaking-changes.md: new b9739-b9789 rows (json-partial.{h,cpp} removed -> peg-parser; chat.h message-span restructure; server-task n_before_user -> message_spans; new llama_model_n_layer_nextn; mtmd/clip progress_callback; server-models child-process download refactor). Verified locally on Linux x86_64 (GCC 13.3): cmake configure passes the fail-loud OuteTTS extraction and refreshed-patch anchor checks against b9789, the full Release build links libjllama.so + jllama_test with zero warnings on any project translation unit, and ctest reports 454/454 passing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SLQk4Fk7vk7R4f2za1KxYg
1 parent 7633baf commit 9a9ac4f

7 files changed

Lines changed: 30 additions & 18 deletions

File tree

.github/workflows/publish.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -264,7 +264,7 @@ jobs:
264264
# Native ARM64 build on GitHub's free arm64 runner, mirroring upstream llama.cpp's
265265
# `ubuntu-cpu` aarch64 release job (ubuntu-24.04-arm + GCC 14). Replaces the former dockcross
266266
# `linux-arm64-lts` cross-compile (GCC 8.5, glibc 2.17), which can no longer compile llama.cpp
267-
# b9739 — its C++17 CTAD-in-`new` needs GCC >= 12. Building natively also lets us run the C++
267+
# b9789 — its C++17 CTAD-in-`new` needs GCC >= 12. Building natively also lets us run the C++
268268
# unit suite (ctest) on real ARM hardware for the first time (the cross build ran no tests).
269269
# Trade-off: the glibc floor rises 2.17 -> ~2.39, the same envelope upstream's own ARM binaries
270270
# require. GGML_NATIVE=OFF keeps the artifact portable across ARMv8 CPU generations (no

CLAUDE.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
66

77
Java bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) via JNI, providing a high-level API for LLM inference in Java. The Java layer communicates with a native C++ library through JNI.
88

9-
Current llama.cpp pinned version: **b9739**
9+
Current llama.cpp pinned version: **b9789**
1010

1111
## Upgrading CUDA Version
1212

@@ -241,7 +241,7 @@ needs no extra step here, `build-webui` re-reads the tag and rebuilds the matchi
241241
ships no UI):
242242
```bash
243243
# needs node/npm + network; embed.cpp is plain C++17 (no npm)
244-
git clone --depth 1 --branch b9739 https://github.com/ggml-org/llama.cpp /tmp/lc
244+
git clone --depth 1 --branch b9789 https://github.com/ggml-org/llama.cpp /tmp/lc
245245
( cd /tmp/lc/tools/ui && npm ci && npm run build \
246246
&& ( cd dist && find . -type f -not -path './_gzip/*' \
247247
| while read -r f; do mkdir -p "_gzip/$(dirname "$f")"; gzip -9 -c "$f" > "_gzip/$f"; done ) \
@@ -275,7 +275,7 @@ plus a cache token are present, `build.sh` adds
275275
- `SCCACHE_WEBDAV_TOKEN: ${{ secrets.DEPOT_TOKEN }}` — a Depot **organization** token, stored
276276
as the repo secret **`DEPOT_TOKEN`**.
277277

278-
Because `sccache` is **content-addressed** and llama.cpp is pinned (`GIT_TAG b9739`), the
278+
Because `sccache` is **content-addressed** and llama.cpp is pinned (`GIT_TAG b9789`), the
279279
~280 upstream object files are byte-identical every run, so a warm cache recompiles only the
280280
*changed* files. Depot's cache is **shared across all branches** (unlike GitHub's
281281
per-branch `actions/cache`), so every branch builds incrementally; a `b<nnnn>` version bump
@@ -382,7 +382,7 @@ Current patches:
382382

383383
| Patch | Fixes |
384384
|-------|-------|
385-
| `0001-win32-arg-parse-embed-guard.patch` | Windows JNI regression from llama.cpp **#24779** (b9739): `common_params_parse` unconditionally replaced the caller's argv with the process command line (`GetCommandLineW`), so an embedded/JNI caller (`java.exe`) lost its `--model …` args → "Failed to parse model parameters". The patch **drops the override for our build** (keeps the `make_utf8_argv()` call referenced so there's no `-Wunused-function`, but never adopts its result), so the caller's already-UTF-8 argv is always used. This is **deterministic** — an earlier count-guard variant (only override when the re-derived arg count equals `argc`) collided on the server-integration tests whose argv length happened to equal `java.exe`'s and kept them failing. The upstream PR can instead expose an opt-out / `common_params_parse_argv` that preserves the standalone tools' UTF-8 fix. |
385+
| `0001-win32-arg-parse-embed-guard.patch` | Windows JNI regression from llama.cpp **#24779** (introduced b9739): `common_params_parse` replaced the caller's argv with the process command line (`GetCommandLineW`), so an embedded/JNI caller (`java.exe`) lost its `--model …` args → "Failed to parse model parameters". The patch **drops the override for our build** (keeps the `make_utf8_argv()` call referenced so there's no `-Wunused-function`, but never adopts its result), so the caller's already-UTF-8 argv is always used. This is **deterministic** — a count-guard variant (only override when the re-derived arg count equals `argc`) collided on the server-integration tests whose argv length happened to equal `java.exe`'s and kept them failing, and **upstream b9789 now ships exactly that count-guard form** (`if (static_cast<int>(utf8.buf.size()) == argc) { argv = utf8.ptrs.data(); }`), so the patch still drops it rather than adopting upstream's guard. The upstream PR can instead expose an opt-out / `common_params_parse_argv` that preserves the standalone tools' UTF-8 fix. |
386386

387387
## OuteTTS build-time extraction (`cmake/generate-tts-upstream.cmake`)
388388

@@ -888,7 +888,7 @@ now **"Build and Test Linux aarch64"**) builds **natively on `ubuntu-24.04-arm`*
888888
llama.cpp's own `ubuntu-cpu` aarch64 release job (`ubuntu-24.04-arm` + **GCC 14**).
889889

890890
**Why it moved off dockcross.** The old `dockcross/linux-arm64-lts` image ships **GCC 8.5 / glibc
891-
2.17**; llama.cpp **b9739** uses C++17 CTAD-in-`new`, which needs **GCC ≥ 12**, so the cross build
891+
2.17**; llama.cpp **b9789** uses C++17 CTAD-in-`new`, which needs **GCC ≥ 12**, so the cross build
892892
stopped compiling. Upstream solved the same problem by building natively on `ubuntu-24.04-arm` with
893893
GCC 14 and ships a **glibc ≈ 2.39** ARM binary with no old-glibc compatibility layer. This repo now
894894
does the same: the aarch64 artifact's **glibc floor rises 2.17 → ~2.39** — the same envelope
@@ -956,7 +956,7 @@ ctest --test-dir build --output-on-failure -R "ResultsToJson"
956956

957957
#### Upstream source location (in CMake build tree)
958958

959-
llama.cpp is fetched via CMake FetchContent, pinned to `GIT_TAG b9739`.
959+
llama.cpp is fetched via CMake FetchContent, pinned to `GIT_TAG b9789`.
960960

961961
**GoogleTest** is a separate `BUILD_TESTING`-only FetchContent (`GIT_TAG v1.17.0`), used solely
962962
by the `jllama_test` C++ unit-test binary — not by the shipped library, and not coupled to the

CMakeLists.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -143,7 +143,7 @@ set(LLAMA_BUILD_APP OFF CACHE BOOL "" FORCE)
143143
FetchContent_Declare(
144144
llama.cpp
145145
GIT_REPOSITORY https://github.com/ggerganov/llama.cpp.git
146-
GIT_TAG b9739
146+
GIT_TAG b9789
147147
PATCH_COMMAND ${CMAKE_COMMAND}
148148
-DPATCH_DIR=${CMAKE_CURRENT_SOURCE_DIR}/patches
149149
-DLLAMA_SRC=<SOURCE_DIR>
@@ -166,7 +166,7 @@ execute_process(
166166
COMMAND ${CMAKE_COMMAND}
167167
-DTTS_SRC=${llama.cpp_SOURCE_DIR}/tools/tts/tts.cpp
168168
-DOUT_CPP=${JLLAMA_TTS_GEN_CPP}
169-
-DLLAMA_TAG=b9739
169+
-DLLAMA_TAG=b9789
170170
-P ${CMAKE_CURRENT_SOURCE_DIR}/cmake/generate-tts-upstream.cmake
171171
RESULT_VARIABLE JLLAMA_TTS_GEN_RESULT
172172
)

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
**Build:**
88
![Java 8+](https://img.shields.io/badge/Java-8%2B-informational)
99
![Platform](https://img.shields.io/badge/Platform-Linux%20%7C%20macOS%20%7C%20Windows%20%7C%20Android-lightgrey)
10-
[![llama.cpp b9739](https://img.shields.io/badge/llama.cpp-%23b9739-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9739)
10+
[![llama.cpp b9789](https://img.shields.io/badge/llama.cpp-%23b9789-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9789)
1111
[![JPMS](https://img.shields.io/badge/JPMS-modular%20JAR-25A162)](https://openjdk.org/projects/jigsaw/)
1212
![JUnit](https://img.shields.io/badge/tested%20with-JUnit6-25A162)
1313
[![JSpecify](https://img.shields.io/badge/JSpecify-1.0.0%20%40NullMarked-25A162)](https://jspecify.dev)

TODO.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -164,7 +164,7 @@ primary goal: agentic tool-calling with Qwen):
164164
What remains is manual validation against the actual editor clients — point Copilot's Ollama provider /
165165
a Custom Endpoint, Claude Code, and a Responses client at the running server — since a server-side
166166
round-trip confirms the wire shapes but not each client's own parser.
167-
- **Gemma 4 tool-calling validation.** Confirm the pinned llama.cpp (`b9739`) includes the Gemma 4
167+
- **Gemma 4 tool-calling validation.** Confirm the pinned llama.cpp (`b9789`) includes the Gemma 4
168168
tool-call parser fixes; if not, bump per the upgrade procedure.
169169
- **NativeServer — wire upstream `server.cpp` routes to JNI (in progress; scaffold landed `dd264b2`).**
170170
The upstream HTTP transport (`tools/server/server-http.cpp` + the cpp-httplib backend) is already

docs/history/llama-cpp-breaking-changes.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -373,3 +373,14 @@ Used during `llama.cpp` version bumps: when upgrading, scan this file from the r
373373
| b9682–b9739 | `ggml/src/ggml-sycl/` | New conv2d, conv2d_dw, conv2d_transpose, conv3d SYCL ops; Q1_0 quantization support. Internal SYCL backend, no project changes required |
374374
| b9682–b9739 | `ggml/src/ggml-cuda/` | New `col2im_1d` CUDA op. Internal CUDA backend, no project changes required |
375375
| b9682–b9739 | `ggml/src/ggml-metal/` | ROPE_BACK Metal support; concat kernel extended to additional types. Internal Metal backend, no project changes required |
376+
| b9739–b9789 | `common/json-partial.{h,cpp}` (removed) + `common/peg-parser.{h,cpp}` + `common/chat.cpp` | The standalone partial-JSON parser was **deleted** (`json-partial.h`/`.cpp`, &minus;363 lines) and its incremental-JSON handling folded into the PEG parser (`peg-parser.cpp` +194/&minus;81). Partial JSON during streaming tool-call parsing is now produced by `peg-parser` instead of `common_json_parse`. Project never included `json-partial.h` &mdash; verified `grep -rn "json-partial\|common_json_parse" src/main/cpp src/test/cpp` &#x2192; zero matches. All consumers stay inside upstream-compiled `chat.cpp`. No project source changes required |
377+
| b9739–b9789 | `common/chat.h` + `common/chat.cpp` | Message-span types restructured: new `enum common_chat_role` (+ `common_chat_role_from_string`/`_to_string`); `common_chat_msg_span::role` and `common_chat_msg_delimiter::role` changed `std::string` &#x2192; `common_chat_role`; new container structs `common_chat_msg_spans` / `common_chat_msg_delimiters` (the latter with `tokenize()`/`split()`/`to_json()`); `common_chat_params::message_spans` (vector) &#x2192; `message_delimiters`; free function `common_chat_split_by_role()` **removed**, replaced by `common_chat_msg_delimiters_parse()`. `common_chat_msg_diff` (used by `test_server.cpp`) is **unchanged**. Project references none of the changed span/delimiter symbols &mdash; verified `grep -rn "message_spans\|common_chat_split_by_role\|common_chat_msg_span\|common_chat_msg_delimiter" src/main/cpp src/test/cpp` &#x2192; zero matches. Routing happens inside upstream-compiled `chat.cpp` / `server-*.cpp`. No project source changes required |
378+
| b9739–b9789 | `tools/server/server-task.h` + `server-context.cpp` + `server-common.{h,cpp}` | Context-checkpointing reworked from a precomputed offset to message spans: `task_params::n_before_user` (int32) **removed**, replaced by `task_params::message_spans` (`common_chat_msg_spans`); new `server_tokens::find_message_spans(const common_chat_msg_delimiters &)` helper. `test_server.cpp` asserts against `task_params::to_json()` but never references `n_before_user` &mdash; verified `grep -rn "n_before_user\|message_spans" src/test/cpp` &#x2192; zero matches, so it compiles and passes unchanged. Consumed inside upstream-compiled `server-context.cpp` linked into `jllama`. No project source changes required |
379+
| b9739–b9789 | `include/llama.h` | **New API** `llama_model_n_layer_nextn(const llama_model *)` &mdash; returns the number of NextN/MTP layers (additive; the surrounding accessor block was otherwise only column-realigned). Not called by project; could back a future introspection accessor. No project source changes required |
380+
| b9739–b9789 | `common/common.h` | `common_params::checkpoint_min_step` default raised `256` &#x2192; `8192` (minimum spacing between context checkpoints). Tuning default consumed inside upstream-compiled `server-context.cpp`; not surfaced by `ModelParameters`. No project source changes required |
381+
| b9739–b9789 | `common/arg.h` + `common/arg.cpp` + `common/download.h` | `common_params_handle_models()` gained a 3rd parameter &mdash; a `common_params_handle_models_params` struct (`{ common_download_callback*, bool preset_only }`) for router-mode preset-only downloads; `arg.h` now `#include`s `download.h`; new `common_download_opts::preset_only`. Project does not call `common_params_handle_models()` directly (arg parsing happens upstream) &mdash; `grep -rn common_params_handle_models src/` &#x2192; zero matches. No project source changes required |
382+
| b9739–b9789 | `common/arg.cpp` (**patch target**) | Upstream's Windows `common_params_parse` argv handling changed again: the unconditional `argc/argv = make_utf8_argv()` override (the original #24779 regression) became a **count-guard** `if (static_cast<int>(utf8.buf.size()) == argc) { argv = utf8.ptrs.data(); }`. That count-guard is exactly the variant this project already found breaks its Windows server-integration tests (argv length coincides with `java.exe`'s), so **`patches/0001-win32-arg-parse-embed-guard.patch` was refreshed** to drop the new form and keep `(void) utf8;` (caller's UTF-8 argv always used). The patch was re-verified to apply cleanly **and** reverse-cleanly (idempotency) against b9789 `common/arg.cpp` |
383+
| b9739–b9789 | `tools/mtmd/mtmd.h` + `tools/mtmd/clip.h` + `clip.cpp` + `mtmd.cpp` | **New feature** &mdash; multimodal model-load progress: new `mtmd_progress_callback` typedef + `progress_callback` / `progress_callback_user_data` fields on `mtmd_context_params` and `clip_context_params` (additive, appended to the structs; returning `false` aborts the load). Project does not aggregate-init either struct (`grep -rn mtmd_context_params src/` &#x2192; zero matches) so the new fields are harmless; could later feed a Java `LoadProgressCallback` for vision models. No project source changes required |
384+
| b9739–b9789 | `tools/server/server-models.{h,cpp}` + `server-context.h` | Multi-model router refactor: model downloading moved into a dedicated child-process mode (`enum server_child_mode`, `server_models::load(name, load_options)`, `server_child::run_download()`; old `server_models::download()` removed); `SERVER_STATE_DOWNLOADING` re-enabled in `server_state`. Project links `server-models.cpp` but does not drive the router (`grep -rn "server_models\|SERVER_CHILD_MODE" src/` &#x2192; zero matches). Compiles into `jllama` unchanged. No project source changes required |
385+
| b9739–b9789 | `ggml/src/ggml-{hexagon,vulkan,sycl,opencl,webgpu,cuda}/` + shaders | Backend-internal work only: Hexagon HTP matmul kernels re-tiled (`hmx-matmul-ops.c` &#x2192; `hmx-mm-kernels-tiled.h`); Vulkan gains a `conv3d_mm` shader + `get_rows_back` and folds the elementwise unary shaders (`clamp`/`cos`/`sin`/`sqrt`/`square`/`leaky_relu`.comp removed) into `unary.comp`; SYCL element-wise / conv3d additions; OpenCL Adreno norm/gemv tweaks; WebGPU `mul_mat_vec` refactor. No API surface visible to `jllama.cpp`; the OpenCL set only affects the `opencl-android-aarch64` classifier. No project source changes required |
386+
| b9739–b9789 | upstream build / verification | Local build with `GIT_TAG b9789` verified clean on Linux x86_64 (GCC 13.3; sources were pre-staged from release tarballs + the patch hand-applied because this sandbox blocks `github.com` git-clone, so `FetchContent`'s git path and `PATCH_COMMAND` could not run &mdash; the published CI pipeline uses the normal git FetchContent path). `cmake -B build -DBUILD_TESTING=ON` configures cleanly (the OuteTTS build-time extraction **and** the refreshed Windows patch both pass their fail-loud anchor checks against b9789), `cmake --build build --config Release -j$(nproc)` links `libjllama.so` + `jllama_test` with zero warnings on any project translation unit, and `ctest --test-dir build --output-on-failure` reports **454/454 tests passing**. Every upstream breaking change in this range is absorbed inside upstream-compiled translation units; no project C++ source edits were required for the version bump itself |

patches/0001-win32-arg-parse-embed-guard.patch

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
diff --git a/common/arg.cpp b/common/arg.cpp
22
--- a/common/arg.cpp
33
+++ b/common/arg.cpp
4-
@@ -924,10 +924,17 @@ bool common_params_parse(int argc, char ** argv, common_params & params, llama_e
4+
@@ -933,10 +933,18 @@ bool common_params_parse(int argc, char ** argv, common_params & params, llama_e
55
bool common_params_parse(int argc, char ** argv, common_params & params, llama_example ex, void(*print_usage)(int, char **)) {
66
#ifdef _WIN32
77
auto utf8 = make_utf8_argv();
8-
- if (!utf8.ptrs.empty()) {
9-
- argc = static_cast<int>(utf8.buf.size());
8+
- // repair argv only when it matches the process command line
9+
- if (static_cast<int>(utf8.buf.size()) == argc) {
1010
- argv = utf8.ptrs.data();
1111
- }
1212
+ // java-llama.cpp patch (PR #248): upstream (llama.cpp #24779) replaced the caller's argv with
@@ -15,10 +15,11 @@ diff --git a/common/arg.cpp b/common/arg.cpp
1515
+ // UTF-8 argv (GetStringUTFChars), and adopting GetCommandLineW discarded it -> common_params_parse
1616
+ // parsed java.exe's command line and failed with "Failed to parse model parameters". We keep the
1717
+ // make_utf8_argv() call (so it stays referenced -> -Wunused-function-clean) but do NOT adopt its
18-
+ // result, so the caller's already-UTF-8 argv is always used. This is deterministic: a count-guard
19-
+ // (only override when the re-derived arg count equals argc) collided on the server-integration
20-
+ // tests whose argv length happened to equal java.exe's, so they kept failing. The upstream PR
21-
+ // can instead expose an opt-out / a common_params_parse_argv that preserves the standalone fix.
18+
+ // result, so the caller's already-UTF-8 argv is always used. This is deterministic: upstream's
19+
+ // count-guard form here (only override when the re-derived arg count equals argc) collided on the
20+
+ // server-integration tests whose argv length happened to equal java.exe's, so they kept failing.
21+
+ // The upstream PR can instead expose an opt-out / a common_params_parse_argv that preserves the
22+
+ // standalone tools' UTF-8 fix.
2223
+ (void) utf8;
2324
#endif
2425

0 commit comments

Comments
 (0)