Skip to content

Commit 212634e

Browse files
Merge pull request #271 from bernardladenthin/claude/keen-babbage-yjmgwh
Upgrade llama.cpp from b9739 to b9789 and refresh Windows patch
2 parents 7633baf + aecdde0 commit 212634e

9 files changed

Lines changed: 578 additions & 35 deletions

File tree

.github/workflows/publish.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -264,7 +264,7 @@ jobs:
264264
# Native ARM64 build on GitHub's free arm64 runner, mirroring upstream llama.cpp's
265265
# `ubuntu-cpu` aarch64 release job (ubuntu-24.04-arm + GCC 14). Replaces the former dockcross
266266
# `linux-arm64-lts` cross-compile (GCC 8.5, glibc 2.17), which can no longer compile llama.cpp
267-
# b9739 — its C++17 CTAD-in-`new` needs GCC >= 12. Building natively also lets us run the C++
267+
# b9789 — its C++17 CTAD-in-`new` needs GCC >= 12. Building natively also lets us run the C++
268268
# unit suite (ctest) on real ARM hardware for the first time (the cross build ran no tests).
269269
# Trade-off: the glibc floor rises 2.17 -> ~2.39, the same envelope upstream's own ARM binaries
270270
# require. GGML_NATIVE=OFF keeps the artifact portable across ARMv8 CPU generations (no

CLAUDE.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
66

77
Java bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) via JNI, providing a high-level API for LLM inference in Java. The Java layer communicates with a native C++ library through JNI.
88

9-
Current llama.cpp pinned version: **b9739**
9+
Current llama.cpp pinned version: **b9789**
1010

1111
## Upgrading CUDA Version
1212

@@ -241,7 +241,7 @@ needs no extra step here, `build-webui` re-reads the tag and rebuilds the matchi
241241
ships no UI):
242242
```bash
243243
# needs node/npm + network; embed.cpp is plain C++17 (no npm)
244-
git clone --depth 1 --branch b9739 https://github.com/ggml-org/llama.cpp /tmp/lc
244+
git clone --depth 1 --branch b9789 https://github.com/ggml-org/llama.cpp /tmp/lc
245245
( cd /tmp/lc/tools/ui && npm ci && npm run build \
246246
&& ( cd dist && find . -type f -not -path './_gzip/*' \
247247
| while read -r f; do mkdir -p "_gzip/$(dirname "$f")"; gzip -9 -c "$f" > "_gzip/$f"; done ) \
@@ -275,7 +275,7 @@ plus a cache token are present, `build.sh` adds
275275
- `SCCACHE_WEBDAV_TOKEN: ${{ secrets.DEPOT_TOKEN }}` — a Depot **organization** token, stored
276276
as the repo secret **`DEPOT_TOKEN`**.
277277

278-
Because `sccache` is **content-addressed** and llama.cpp is pinned (`GIT_TAG b9739`), the
278+
Because `sccache` is **content-addressed** and llama.cpp is pinned (`GIT_TAG b9789`), the
279279
~280 upstream object files are byte-identical every run, so a warm cache recompiles only the
280280
*changed* files. Depot's cache is **shared across all branches** (unlike GitHub's
281281
per-branch `actions/cache`), so every branch builds incrementally; a `b<nnnn>` version bump
@@ -382,7 +382,8 @@ Current patches:
382382

383383
| Patch | Fixes |
384384
|-------|-------|
385-
| `0001-win32-arg-parse-embed-guard.patch` | Windows JNI regression from llama.cpp **#24779** (b9739): `common_params_parse` unconditionally replaced the caller's argv with the process command line (`GetCommandLineW`), so an embedded/JNI caller (`java.exe`) lost its `--model …` args → "Failed to parse model parameters". The patch **drops the override for our build** (keeps the `make_utf8_argv()` call referenced so there's no `-Wunused-function`, but never adopts its result), so the caller's already-UTF-8 argv is always used. This is **deterministic** — an earlier count-guard variant (only override when the re-derived arg count equals `argc`) collided on the server-integration tests whose argv length happened to equal `java.exe`'s and kept them failing. The upstream PR can instead expose an opt-out / `common_params_parse_argv` that preserves the standalone tools' UTF-8 fix. |
385+
| `0001-win32-arg-parse-embed-guard.patch` | Windows JNI regression from llama.cpp **#24779** (introduced b9739): on Windows `common_params_parse` re-derived argv from the **process** command line (`GetCommandLineW`) and adopted it, so an embedded/JNI caller (`java.exe`) lost its `--model …` args → "Failed to parse model parameters". b9789 narrowed the unconditional override to a **count-guard** (`if (static_cast<int>(utf8.buf.size()) == argc) { argv = utf8.ptrs.data(); }`), but that is exactly the variant the project already found breaks its Windows server-integration tests (when the embedded argv length coincides with `java.exe`'s). The patch carries the **complete upstream change** (so it can be submitted to llama.cpp verbatim and then dropped here): **(1)** `common_params_parse` parses **exactly the argv it is given** (no `GetCommandLineW` magic) and a new `common_params_parse_main()` wrapper holds the UTF-8 recovery for the standalone tools' `main()` (`common/arg.{cpp,h}`); **(2)** the **~34 standalone `main()` call sites** (every `common_params_parse(argc, argv, …)` across `tools/*`, `examples/*` and the `tests/*` programs) flip to `common_params_parse_main()`; **(3)** a `tests/test-arg-parser.cpp` regression case pins that `common_params_parse` honors a caller-supplied argv. The embedded caller (`jllama.cpp`) keeps calling `common_params_parse` and is never overridden. **Our subproject build compiles only the `arg.{cpp,h}` core** — `LLAMA_BUILD_TOOLS`/`LLAMA_BUILD_TESTS` are OFF for a FetchContent subproject — so the flips + test are applied-but-not-compiled here; they were validated via a one-off `-DLLAMA_BUILD_TOOLS=ON -DLLAMA_BUILD_TESTS=ON` build (the new test compiles and its asserts pass; `test-arg-parser`'s only red there is the live `ggml.ai` download check, which is sandbox-network, not the patch). Because it spans **37 files** it must be refreshed on every llama.cpp bump (the applier fails loud). |
386+
| `0002-server-preserve-caller-load-progress-callback.patch` | Load-progress-callback regression introduced in llama.cpp **b9789**: `server_context::load_model` (`tools/server/server-context.cpp`) now **unconditionally** installs the server's own load-progress reporter on `params_base.load_progress_callback` immediately before `common_init_from_params`, clobbering any callback the embedding caller already set. libjllama's `LoadProgressCallback` feature wires `common_params.load_progress_callback` to a JNI trampoline *before* calling `load_model`, so the bump silently killed it — `LoadProgressCallbackTest` saw zero progress updates and the abort-on-`false` path never threw. The patch guards the assignment with `if (params_base.load_progress_callback == nullptr)`, so the server installs its own reporter **only when the caller hasn't** — a caller-supplied callback survives and fires during load. Standalone `llama-server` (no caller callback, so the field is null) is unaffected. Same JNI-vs-standalone divergence class as `0001`. |
386387

387388
## OuteTTS build-time extraction (`cmake/generate-tts-upstream.cmake`)
388389

@@ -888,7 +889,7 @@ now **"Build and Test Linux aarch64"**) builds **natively on `ubuntu-24.04-arm`*
888889
llama.cpp's own `ubuntu-cpu` aarch64 release job (`ubuntu-24.04-arm` + **GCC 14**).
889890

890891
**Why it moved off dockcross.** The old `dockcross/linux-arm64-lts` image ships **GCC 8.5 / glibc
891-
2.17**; llama.cpp **b9739** uses C++17 CTAD-in-`new`, which needs **GCC ≥ 12**, so the cross build
892+
2.17**; llama.cpp **b9789** uses C++17 CTAD-in-`new`, which needs **GCC ≥ 12**, so the cross build
892893
stopped compiling. Upstream solved the same problem by building natively on `ubuntu-24.04-arm` with
893894
GCC 14 and ships a **glibc ≈ 2.39** ARM binary with no old-glibc compatibility layer. This repo now
894895
does the same: the aarch64 artifact's **glibc floor rises 2.17 → ~2.39** — the same envelope
@@ -956,7 +957,7 @@ ctest --test-dir build --output-on-failure -R "ResultsToJson"
956957

957958
#### Upstream source location (in CMake build tree)
958959

959-
llama.cpp is fetched via CMake FetchContent, pinned to `GIT_TAG b9739`.
960+
llama.cpp is fetched via CMake FetchContent, pinned to `GIT_TAG b9789`.
960961

961962
**GoogleTest** is a separate `BUILD_TESTING`-only FetchContent (`GIT_TAG v1.17.0`), used solely
962963
by the `jllama_test` C++ unit-test binary — not by the shipped library, and not coupled to the

CMakeLists.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -143,7 +143,7 @@ set(LLAMA_BUILD_APP OFF CACHE BOOL "" FORCE)
143143
FetchContent_Declare(
144144
llama.cpp
145145
GIT_REPOSITORY https://github.com/ggerganov/llama.cpp.git
146-
GIT_TAG b9739
146+
GIT_TAG b9789
147147
PATCH_COMMAND ${CMAKE_COMMAND}
148148
-DPATCH_DIR=${CMAKE_CURRENT_SOURCE_DIR}/patches
149149
-DLLAMA_SRC=<SOURCE_DIR>
@@ -166,7 +166,7 @@ execute_process(
166166
COMMAND ${CMAKE_COMMAND}
167167
-DTTS_SRC=${llama.cpp_SOURCE_DIR}/tools/tts/tts.cpp
168168
-DOUT_CPP=${JLLAMA_TTS_GEN_CPP}
169-
-DLLAMA_TAG=b9739
169+
-DLLAMA_TAG=b9789
170170
-P ${CMAKE_CURRENT_SOURCE_DIR}/cmake/generate-tts-upstream.cmake
171171
RESULT_VARIABLE JLLAMA_TTS_GEN_RESULT
172172
)

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
**Build:**
88
![Java 8+](https://img.shields.io/badge/Java-8%2B-informational)
99
![Platform](https://img.shields.io/badge/Platform-Linux%20%7C%20macOS%20%7C%20Windows%20%7C%20Android-lightgrey)
10-
[![llama.cpp b9739](https://img.shields.io/badge/llama.cpp-%23b9739-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9739)
10+
[![llama.cpp b9789](https://img.shields.io/badge/llama.cpp-%23b9789-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9789)
1111
[![JPMS](https://img.shields.io/badge/JPMS-modular%20JAR-25A162)](https://openjdk.org/projects/jigsaw/)
1212
![JUnit](https://img.shields.io/badge/tested%20with-JUnit6-25A162)
1313
[![JSpecify](https://img.shields.io/badge/JSpecify-1.0.0%20%40NullMarked-25A162)](https://jspecify.dev)

TODO.md

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -164,7 +164,7 @@ primary goal: agentic tool-calling with Qwen):
164164
What remains is manual validation against the actual editor clients — point Copilot's Ollama provider /
165165
a Custom Endpoint, Claude Code, and a Responses client at the running server — since a server-side
166166
round-trip confirms the wire shapes but not each client's own parser.
167-
- **Gemma 4 tool-calling validation.** Confirm the pinned llama.cpp (`b9739`) includes the Gemma 4
167+
- **Gemma 4 tool-calling validation.** Confirm the pinned llama.cpp (`b9789`) includes the Gemma 4
168168
tool-call parser fixes; if not, bump per the upgrade procedure.
169169
- **NativeServer — wire upstream `server.cpp` routes to JNI (in progress; scaffold landed `dd264b2`).**
170170
The upstream HTTP transport (`tools/server/server-http.cpp` + the cpp-httplib backend) is already
@@ -238,10 +238,18 @@ Windows Java tests, but **collided** on the 4 server-integration setups (`OpenAi
238238
`OpenAiServerToolCalling*`, `MultimodalIntegrationTest`, `OpenAiCompatServerIntegrationTest`) whose
239239
argv length happened to equal `java.exe`'s, so they kept failing with the same parse error. The patch
240240
was changed to **fix option 2** (drop the override entirely for our build — a JNI library is never the
241-
process, so the override is pure liability), which is deterministic. Still worth upstreaming as an
242-
opt-out / `common_params_parse_argv` that preserves the standalone tools' UTF-8 fix, so the patch can
243-
eventually be dropped; until then it must be re-verified on each llama.cpp bump (the applier fails loud
244-
if it no longer applies).
241+
process, so the override is pure liability), which is deterministic. **As of the b9789 bump the patch
242+
was reshaped into the clean opt-in form intended for upstreaming (fix option 3's core):**
243+
`common_params_parse` now parses exactly the argv it is given, and a new `common_params_parse_main()`
244+
wrapper carries the `GetCommandLineW` UTF-8 recovery that the standalone tools' `main()` opt into.
245+
**The patch now carries the full upstream change (37 files):** the ~34 `common_params_parse(argc, argv,
246+
…)` call sites across `tools/*`, `examples/*` and the `tests/*` programs flip to
247+
`common_params_parse_main()`, plus a `tests/test-arg-parser.cpp` regression case. Embedded callers stay
248+
on `common_params_parse`. Our subproject build compiles only the `arg.{cpp,h}` core
249+
(`LLAMA_BUILD_TOOLS`/`TESTS` OFF), so the flips + test are validated via a one-off tools+tests build
250+
(the new test's asserts pass; `test-arg-parser`'s only red is the live `ggml.ai` download check, which
251+
is sandbox-network). The 37-file patch must be re-verified on each llama.cpp bump (the applier fails
252+
loud). Submit it to llama.cpp and drop the local copy once merged.
245253

246254
**Symptom.** On **Windows x86_64 only**, every Java test that loads a real model fails in
247255
`LlamaModel.loadModel` (native) with `LlamaException: "Failed to parse model parameters"`

0 commit comments

Comments
 (0)