You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bump the pinned llama.cpp tag and refresh the Windows argv patch for the
upgraded source. Every upstream breaking change in this range is absorbed
inside upstream-compiled translation units; no project C++ source edits
were required.
- CMakeLists.txt: GIT_TAG + LLAMA_TAG b9739 -> b9789.
- README.md / CLAUDE.md / publish.yml / TODO.md: version badge, pinned-
version notes, WebUI clone example, aarch64 GCC rationale.
- patches/0001-win32-arg-parse-embed-guard.patch: refreshed for b9789.
Upstream replaced the original #24779 argv override with the count-guard
form (if utf8.buf.size() == argc), which is exactly the variant that
breaks the Windows server-integration tests, so the patch still drops it
entirely and keeps "(void) utf8;". Re-verified to apply and reverse-apply
cleanly (idempotent) against b9789 common/arg.cpp.
- docs/history/llama-cpp-breaking-changes.md: new b9739-b9789 rows
(json-partial.{h,cpp} removed -> peg-parser; chat.h message-span
restructure; server-task n_before_user -> message_spans;
new llama_model_n_layer_nextn; mtmd/clip progress_callback;
server-models child-process download refactor).
Verified locally on Linux x86_64 (GCC 13.3): cmake configure passes the
fail-loud OuteTTS extraction and refreshed-patch anchor checks against
b9789, the full Release build links libjllama.so + jllama_test with zero
warnings on any project translation unit, and ctest reports 454/454
passing.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SLQk4Fk7vk7R4f2za1KxYg
Copy file name to clipboardExpand all lines: CLAUDE.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
6
6
7
7
Java bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) via JNI, providing a high-level API for LLM inference in Java. The Java layer communicates with a native C++ library through JNI.
8
8
9
-
Current llama.cpp pinned version: **b9739**
9
+
Current llama.cpp pinned version: **b9789**
10
10
11
11
## Upgrading CUDA Version
12
12
@@ -241,7 +241,7 @@ needs no extra step here, `build-webui` re-reads the tag and rebuilds the matchi
241
241
ships no UI):
242
242
```bash
243
243
# needs node/npm + network; embed.cpp is plain C++17 (no npm)
@@ -275,7 +275,7 @@ plus a cache token are present, `build.sh` adds
275
275
-`SCCACHE_WEBDAV_TOKEN: ${{ secrets.DEPOT_TOKEN }}` — a Depot **organization** token, stored
276
276
as the repo secret **`DEPOT_TOKEN`**.
277
277
278
-
Because `sccache` is **content-addressed** and llama.cpp is pinned (`GIT_TAG b9739`), the
278
+
Because `sccache` is **content-addressed** and llama.cpp is pinned (`GIT_TAG b9789`), the
279
279
~280 upstream object files are byte-identical every run, so a warm cache recompiles only the
280
280
*changed* files. Depot's cache is **shared across all branches** (unlike GitHub's
281
281
per-branch `actions/cache`), so every branch builds incrementally; a `b<nnnn>` version bump
@@ -382,7 +382,7 @@ Current patches:
382
382
383
383
| Patch | Fixes |
384
384
|-------|-------|
385
-
|`0001-win32-arg-parse-embed-guard.patch`| Windows JNI regression from llama.cpp **#24779** (b9739): `common_params_parse` unconditionally replaced the caller's argv with the process command line (`GetCommandLineW`), so an embedded/JNI caller (`java.exe`) lost its `--model …` args → "Failed to parse model parameters". The patch **drops the override for our build** (keeps the `make_utf8_argv()` call referenced so there's no `-Wunused-function`, but never adopts its result), so the caller's already-UTF-8 argv is always used. This is **deterministic** — an earlier count-guard variant (only override when the re-derived arg count equals `argc`) collided on the server-integration tests whose argv length happened to equal `java.exe`'s and kept them failing. The upstream PR can instead expose an opt-out / `common_params_parse_argv` that preserves the standalone tools' UTF-8 fix. |
385
+
| `0001-win32-arg-parse-embed-guard.patch` | Windows JNI regression from llama.cpp **#24779** (introduced b9739): `common_params_parse` replaced the caller's argv with the process command line (`GetCommandLineW`), so an embedded/JNI caller (`java.exe`) lost its `--model …` args → "Failed to parse model parameters". The patch **drops the override for our build** (keeps the `make_utf8_argv()` call referenced so there's no `-Wunused-function`, but never adopts its result), so the caller's already-UTF-8 argv is always used. This is **deterministic** — a count-guard variant (only override when the re-derived arg count equals `argc`) collided on the server-integration tests whose argv length happened to equal `java.exe`'s and kept them failing, and **upstream b9789 now ships exactly that count-guard form** (`if (static_cast<int>(utf8.buf.size()) == argc) { argv = utf8.ptrs.data(); }`), so the patch still drops it rather than adopting upstream's guard. The upstream PR can instead expose an opt-out / `common_params_parse_argv` that preserves the standalone tools' UTF-8 fix. |
| b9682–b9739 |`ggml/src/ggml-cuda/`| New `col2im_1d` CUDA op. Internal CUDA backend, no project changes required |
375
375
| b9682–b9739 |`ggml/src/ggml-metal/`| ROPE_BACK Metal support; concat kernel extended to additional types. Internal Metal backend, no project changes required |
376
+
| b9739–b9789 |`common/json-partial.{h,cpp}` (removed) + `common/peg-parser.{h,cpp}` + `common/chat.cpp`| The standalone partial-JSON parser was **deleted** (`json-partial.h`/`.cpp`, −363 lines) and its incremental-JSON handling folded into the PEG parser (`peg-parser.cpp` +194/−81). Partial JSON during streaming tool-call parsing is now produced by `peg-parser` instead of `common_json_parse`. Project never included `json-partial.h`— verified `grep -rn "json-partial\|common_json_parse" src/main/cpp src/test/cpp`→ zero matches. All consumers stay inside upstream-compiled `chat.cpp`. No project source changes required |
377
+
| b9739–b9789 |`common/chat.h` + `common/chat.cpp`| Message-span types restructured: new `enum common_chat_role` (+ `common_chat_role_from_string`/`_to_string`); `common_chat_msg_span::role` and `common_chat_msg_delimiter::role` changed `std::string`→`common_chat_role`; new container structs `common_chat_msg_spans` / `common_chat_msg_delimiters` (the latter with `tokenize()`/`split()`/`to_json()`); `common_chat_params::message_spans` (vector) →`message_delimiters`; free function `common_chat_split_by_role()`**removed**, replaced by `common_chat_msg_delimiters_parse()`. `common_chat_msg_diff` (used by `test_server.cpp`) is **unchanged**. Project references none of the changed span/delimiter symbols — verified `grep -rn "message_spans\|common_chat_split_by_role\|common_chat_msg_span\|common_chat_msg_delimiter" src/main/cpp src/test/cpp`→ zero matches. Routing happens inside upstream-compiled `chat.cpp` / `server-*.cpp`. No project source changes required |
378
+
| b9739–b9789 |`tools/server/server-task.h` + `server-context.cpp` + `server-common.{h,cpp}`| Context-checkpointing reworked from a precomputed offset to message spans: `task_params::n_before_user` (int32) **removed**, replaced by `task_params::message_spans` (`common_chat_msg_spans`); new `server_tokens::find_message_spans(const common_chat_msg_delimiters &)` helper. `test_server.cpp` asserts against `task_params::to_json()` but never references `n_before_user`— verified `grep -rn "n_before_user\|message_spans" src/test/cpp`→ zero matches, so it compiles and passes unchanged. Consumed inside upstream-compiled `server-context.cpp` linked into `jllama`. No project source changes required |
379
+
| b9739–b9789 |`include/llama.h`|**New API**`llama_model_n_layer_nextn(const llama_model *)`— returns the number of NextN/MTP layers (additive; the surrounding accessor block was otherwise only column-realigned). Not called by project; could back a future introspection accessor. No project source changes required |
380
+
| b9739–b9789 |`common/common.h`|`common_params::checkpoint_min_step` default raised `256`→`8192` (minimum spacing between context checkpoints). Tuning default consumed inside upstream-compiled `server-context.cpp`; not surfaced by `ModelParameters`. No project source changes required |
381
+
| b9739–b9789 |`common/arg.h` + `common/arg.cpp` + `common/download.h`|`common_params_handle_models()` gained a 3rd parameter — a `common_params_handle_models_params` struct (`{ common_download_callback*, bool preset_only }`) for router-mode preset-only downloads; `arg.h` now `#include`s `download.h`; new `common_download_opts::preset_only`. Project does not call `common_params_handle_models()` directly (arg parsing happens upstream) —`grep -rn common_params_handle_models src/`→ zero matches. No project source changes required |
382
+
| b9739–b9789 |`common/arg.cpp` (**patch target**) | Upstream's Windows `common_params_parse` argv handling changed again: the unconditional `argc/argv = make_utf8_argv()` override (the original #24779 regression) became a **count-guard**`if (static_cast<int>(utf8.buf.size()) == argc) { argv = utf8.ptrs.data(); }`. That count-guard is exactly the variant this project already found breaks its Windows server-integration tests (argv length coincides with `java.exe`'s), so **`patches/0001-win32-arg-parse-embed-guard.patch` was refreshed** to drop the new form and keep `(void) utf8;` (caller's UTF-8 argv always used). The patch was re-verified to apply cleanly **and** reverse-cleanly (idempotency) against b9789 `common/arg.cpp`|
383
+
| b9739–b9789 |`tools/mtmd/mtmd.h` + `tools/mtmd/clip.h` + `clip.cpp` + `mtmd.cpp`|**New feature**— multimodal model-load progress: new `mtmd_progress_callback` typedef + `progress_callback` / `progress_callback_user_data` fields on `mtmd_context_params` and `clip_context_params` (additive, appended to the structs; returning `false` aborts the load). Project does not aggregate-init either struct (`grep -rn mtmd_context_params src/`→ zero matches) so the new fields are harmless; could later feed a Java `LoadProgressCallback` for vision models. No project source changes required |
384
+
| b9739–b9789 |`tools/server/server-models.{h,cpp}` + `server-context.h`| Multi-model router refactor: model downloading moved into a dedicated child-process mode (`enum server_child_mode`, `server_models::load(name, load_options)`, `server_child::run_download()`; old `server_models::download()` removed); `SERVER_STATE_DOWNLOADING` re-enabled in `server_state`. Project links `server-models.cpp` but does not drive the router (`grep -rn "server_models\|SERVER_CHILD_MODE" src/`→ zero matches). Compiles into `jllama` unchanged. No project source changes required |
385
+
| b9739–b9789 |`ggml/src/ggml-{hexagon,vulkan,sycl,opencl,webgpu,cuda}/` + shaders | Backend-internal work only: Hexagon HTP matmul kernels re-tiled (`hmx-matmul-ops.c`→`hmx-mm-kernels-tiled.h`); Vulkan gains a `conv3d_mm` shader + `get_rows_back` and folds the elementwise unary shaders (`clamp`/`cos`/`sin`/`sqrt`/`square`/`leaky_relu`.comp removed) into `unary.comp`; SYCL element-wise / conv3d additions; OpenCL Adreno norm/gemv tweaks; WebGPU `mul_mat_vec` refactor. No API surface visible to `jllama.cpp`; the OpenCL set only affects the `opencl-android-aarch64` classifier. No project source changes required |
386
+
| b9739–b9789 | upstream build / verification | Local build with `GIT_TAG b9789` verified clean on Linux x86_64 (GCC 13.3; sources were pre-staged from release tarballs + the patch hand-applied because this sandbox blocks `github.com` git-clone, so `FetchContent`'s git path and `PATCH_COMMAND` could not run — the published CI pipeline uses the normal git FetchContent path). `cmake -B build -DBUILD_TESTING=ON` configures cleanly (the OuteTTS build-time extraction **and** the refreshed Windows patch both pass their fail-loud anchor checks against b9789), `cmake --build build --config Release -j$(nproc)` links `libjllama.so` + `jllama_test` with zero warnings on any project translation unit, and `ctest --test-dir build --output-on-failure` reports **454/454 tests passing**. Every upstream breaking change in this range is absorbed inside upstream-compiled translation units; no project C++ source edits were required for the version bump itself |
0 commit comments