bernardladenthin
diff --git a/‎.github/workflows/publish.yml‎
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/publish.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎CLAUDE.md‎
Lines changed: 7 additions & 6 deletions b/‎CLAUDE.md‎
Lines changed: 7 additions & 6 deletions
diff --git a/‎CMakeLists.txt‎
Lines changed: 2 additions & 2 deletions b/‎CMakeLists.txt‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎README.md‎
Lines changed: 1 addition & 1 deletion b/‎README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎TODO.md‎
Lines changed: 13 additions & 5 deletions b/‎TODO.md‎
Lines changed: 13 additions & 5 deletions
@@ -264,7 +264,7 @@ jobs:
     # Native ARM64 build on GitHub's free arm64 runner, mirroring upstream llama.cpp's
     # `ubuntu-cpu` aarch64 release job (ubuntu-24.04-arm + GCC 14). Replaces the former dockcross
     # `linux-arm64-lts` cross-compile (GCC 8.5, glibc 2.17), which can no longer compile llama.cpp
-    # b9739 — its C++17 CTAD-in-`new` needs GCC >= 12. Building natively also lets us run the C++
+    # b9789 — its C++17 CTAD-in-`new` needs GCC >= 12. Building natively also lets us run the C++
     # unit suite (ctest) on real ARM hardware for the first time (the cross build ran no tests).
     # Trade-off: the glibc floor rises 2.17 -> ~2.39, the same envelope upstream's own ARM binaries
     # require. GGML_NATIVE=OFF keeps the artifact portable across ARMv8 CPU generations (no
 
@@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
 
 Java bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) via JNI, providing a high-level API for LLM inference in Java. The Java layer communicates with a native C++ library through JNI.
 
-Current llama.cpp pinned version: **b9739**
+Current llama.cpp pinned version: **b9789**
 
 ## Upgrading CUDA Version
 
@@ -241,7 +241,7 @@ needs no extra step here, `build-webui` re-reads the tag and rebuilds the matchi
 ships no UI):
 ```bash
 # needs node/npm + network; embed.cpp is plain C++17 (no npm)
-git clone --depth 1 --branch b9739 https://github.com/ggml-org/llama.cpp /tmp/lc
+git clone --depth 1 --branch b9789 https://github.com/ggml-org/llama.cpp /tmp/lc
 ( cd /tmp/lc/tools/ui && npm ci && npm run build \
   && ( cd dist && find . -type f -not -path './_gzip/*' \
        | while read -r f; do mkdir -p "_gzip/$(dirname "$f")"; gzip -9 -c "$f" > "_gzip/$f"; done ) \
@@ -275,7 +275,7 @@ plus a cache token are present, `build.sh` adds
 - `SCCACHE_WEBDAV_TOKEN: ${{ secrets.DEPOT_TOKEN }}` — a Depot **organization** token, stored
   as the repo secret **`DEPOT_TOKEN`**.
 
-Because `sccache` is **content-addressed** and llama.cpp is pinned (`GIT_TAG b9739`), the
+Because `sccache` is **content-addressed** and llama.cpp is pinned (`GIT_TAG b9789`), the
 ~280 upstream object files are byte-identical every run, so a warm cache recompiles only the
 *changed* files. Depot's cache is **shared across all branches** (unlike GitHub's
 per-branch `actions/cache`), so every branch builds incrementally; a `b<nnnn>` version bump
@@ -382,7 +382,8 @@ Current patches:
 
 | Patch | Fixes |
 |-------|-------|
-| `0001-win32-arg-parse-embed-guard.patch` | Windows JNI regression from llama.cpp **#24779** (b9739): `common_params_parse` unconditionally replaced the caller's argv with the process command line (`GetCommandLineW`), so an embedded/JNI caller (`java.exe`) lost its `--model …` args → "Failed to parse model parameters". The patch **drops the override for our build** (keeps the `make_utf8_argv()` call referenced so there's no `-Wunused-function`, but never adopts its result), so the caller's already-UTF-8 argv is always used. This is **deterministic** — an earlier count-guard variant (only override when the re-derived arg count equals `argc`) collided on the server-integration tests whose argv length happened to equal `java.exe`'s and kept them failing. The upstream PR can instead expose an opt-out / `common_params_parse_argv` that preserves the standalone tools' UTF-8 fix. |
+| `0001-win32-arg-parse-embed-guard.patch` | Windows JNI regression from llama.cpp **#24779** (introduced b9739): on Windows `common_params_parse` re-derived argv from the **process** command line (`GetCommandLineW`) and adopted it, so an embedded/JNI caller (`java.exe`) lost its `--model …` args → "Failed to parse model parameters". b9789 narrowed the unconditional override to a **count-guard** (`if (static_cast<int>(utf8.buf.size()) == argc) { argv = utf8.ptrs.data(); }`), but that is exactly the variant the project already found breaks its Windows server-integration tests (when the embedded argv length coincides with `java.exe`'s). The patch carries the **complete upstream change** (so it can be submitted to llama.cpp verbatim and then dropped here): **(1)** `common_params_parse` parses **exactly the argv it is given** (no `GetCommandLineW` magic) and a new `common_params_parse_main()` wrapper holds the UTF-8 recovery for the standalone tools' `main()` (`common/arg.{cpp,h}`); **(2)** the **~34 standalone `main()` call sites** (every `common_params_parse(argc, argv, …)` across `tools/*`, `examples/*` and the `tests/*` programs) flip to `common_params_parse_main()`; **(3)** a `tests/test-arg-parser.cpp` regression case pins that `common_params_parse` honors a caller-supplied argv. The embedded caller (`jllama.cpp`) keeps calling `common_params_parse` and is never overridden. **Our subproject build compiles only the `arg.{cpp,h}` core** — `LLAMA_BUILD_TOOLS`/`LLAMA_BUILD_TESTS` are OFF for a FetchContent subproject — so the flips + test are applied-but-not-compiled here; they were validated via a one-off `-DLLAMA_BUILD_TOOLS=ON -DLLAMA_BUILD_TESTS=ON` build (the new test compiles and its asserts pass; `test-arg-parser`'s only red there is the live `ggml.ai` download check, which is sandbox-network, not the patch). Because it spans **37 files** it must be refreshed on every llama.cpp bump (the applier fails loud). |
+| `0002-server-preserve-caller-load-progress-callback.patch` | Load-progress-callback regression introduced in llama.cpp **b9789**: `server_context::load_model` (`tools/server/server-context.cpp`) now **unconditionally** installs the server's own load-progress reporter on `params_base.load_progress_callback` immediately before `common_init_from_params`, clobbering any callback the embedding caller already set. libjllama's `LoadProgressCallback` feature wires `common_params.load_progress_callback` to a JNI trampoline *before* calling `load_model`, so the bump silently killed it — `LoadProgressCallbackTest` saw zero progress updates and the abort-on-`false` path never threw. The patch guards the assignment with `if (params_base.load_progress_callback == nullptr)`, so the server installs its own reporter **only when the caller hasn't** — a caller-supplied callback survives and fires during load. Standalone `llama-server` (no caller callback, so the field is null) is unaffected. Same JNI-vs-standalone divergence class as `0001`. |
 
 ## OuteTTS build-time extraction (`cmake/generate-tts-upstream.cmake`)
 
@@ -888,7 +889,7 @@ now **"Build and Test Linux aarch64"**) builds **natively on `ubuntu-24.04-arm`*
 llama.cpp's own `ubuntu-cpu` aarch64 release job (`ubuntu-24.04-arm` + **GCC 14**).
 
 **Why it moved off dockcross.** The old `dockcross/linux-arm64-lts` image ships **GCC 8.5 / glibc
-2.17**; llama.cpp **b9739** uses C++17 CTAD-in-`new`, which needs **GCC ≥ 12**, so the cross build
+2.17**; llama.cpp **b9789** uses C++17 CTAD-in-`new`, which needs **GCC ≥ 12**, so the cross build
 stopped compiling. Upstream solved the same problem by building natively on `ubuntu-24.04-arm` with
 GCC 14 and ships a **glibc ≈ 2.39** ARM binary with no old-glibc compatibility layer. This repo now
 does the same: the aarch64 artifact's **glibc floor rises 2.17 → ~2.39** — the same envelope
@@ -956,7 +957,7 @@ ctest --test-dir build --output-on-failure -R "ResultsToJson"
 
 #### Upstream source location (in CMake build tree)
 
-llama.cpp is fetched via CMake FetchContent, pinned to `GIT_TAG b9739`.
+llama.cpp is fetched via CMake FetchContent, pinned to `GIT_TAG b9789`.
 
 **GoogleTest** is a separate `BUILD_TESTING`-only FetchContent (`GIT_TAG v1.17.0`), used solely
 by the `jllama_test` C++ unit-test binary — not by the shipped library, and not coupled to the
 
@@ -143,7 +143,7 @@ set(LLAMA_BUILD_APP OFF CACHE BOOL "" FORCE)
 FetchContent_Declare(
 	llama.cpp
 	GIT_REPOSITORY https://github.com/ggerganov/llama.cpp.git
-	GIT_TAG        b9739
+	GIT_TAG        b9789
 	PATCH_COMMAND  ${CMAKE_COMMAND}
 		-DPATCH_DIR=${CMAKE_CURRENT_SOURCE_DIR}/patches
 		-DLLAMA_SRC=<SOURCE_DIR>
@@ -166,7 +166,7 @@ execute_process(
     COMMAND ${CMAKE_COMMAND}
         -DTTS_SRC=${llama.cpp_SOURCE_DIR}/tools/tts/tts.cpp
         -DOUT_CPP=${JLLAMA_TTS_GEN_CPP}
-        -DLLAMA_TAG=b9739
+        -DLLAMA_TAG=b9789
         -P ${CMAKE_CURRENT_SOURCE_DIR}/cmake/generate-tts-upstream.cmake
     RESULT_VARIABLE JLLAMA_TTS_GEN_RESULT
 )
 
@@ -7,7 +7,7 @@
 **Build:**  
 ![Java 8+](https://img.shields.io/badge/Java-8%2B-informational)  
 ![Platform](https://img.shields.io/badge/Platform-Linux%20%7C%20macOS%20%7C%20Windows%20%7C%20Android-lightgrey)  
-[![llama.cpp b9739](https://img.shields.io/badge/llama.cpp-%23b9739-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9739)  
+[![llama.cpp b9789](https://img.shields.io/badge/llama.cpp-%23b9789-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9789)  
 [![JPMS](https://img.shields.io/badge/JPMS-modular%20JAR-25A162)](https://openjdk.org/projects/jigsaw/)  
 ![JUnit](https://img.shields.io/badge/tested%20with-JUnit6-25A162)  
 [![JSpecify](https://img.shields.io/badge/JSpecify-1.0.0%20%40NullMarked-25A162)](https://jspecify.dev)  
 
@@ -164,7 +164,7 @@ primary goal: agentic tool-calling with Qwen):
   What remains is manual validation against the actual editor clients — point Copilot's Ollama provider /
   a Custom Endpoint, Claude Code, and a Responses client at the running server — since a server-side
   round-trip confirms the wire shapes but not each client's own parser.
-- **Gemma 4 tool-calling validation.** Confirm the pinned llama.cpp (`b9739`) includes the Gemma 4
+- **Gemma 4 tool-calling validation.** Confirm the pinned llama.cpp (`b9789`) includes the Gemma 4
   tool-call parser fixes; if not, bump per the upgrade procedure.
 - **NativeServer — wire upstream `server.cpp` routes to JNI (in progress; scaffold landed `dd264b2`).**
   The upstream HTTP transport (`tools/server/server-http.cpp` + the cpp-httplib backend) is already
@@ -238,10 +238,18 @@ Windows Java tests, but **collided** on the 4 server-integration setups (`OpenAi
 `OpenAiServerToolCalling*`, `MultimodalIntegrationTest`, `OpenAiCompatServerIntegrationTest`) whose
 argv length happened to equal `java.exe`'s, so they kept failing with the same parse error. The patch
 was changed to **fix option 2** (drop the override entirely for our build — a JNI library is never the
-process, so the override is pure liability), which is deterministic. Still worth upstreaming as an
-opt-out / `common_params_parse_argv` that preserves the standalone tools' UTF-8 fix, so the patch can
-eventually be dropped; until then it must be re-verified on each llama.cpp bump (the applier fails loud
-if it no longer applies).
+process, so the override is pure liability), which is deterministic. **As of the b9789 bump the patch
+was reshaped into the clean opt-in form intended for upstreaming (fix option 3's core):**
+`common_params_parse` now parses exactly the argv it is given, and a new `common_params_parse_main()`
+wrapper carries the `GetCommandLineW` UTF-8 recovery that the standalone tools' `main()` opt into.
+**The patch now carries the full upstream change (37 files):** the ~34 `common_params_parse(argc, argv,
+…)` call sites across `tools/*`, `examples/*` and the `tests/*` programs flip to
+`common_params_parse_main()`, plus a `tests/test-arg-parser.cpp` regression case. Embedded callers stay
+on `common_params_parse`. Our subproject build compiles only the `arg.{cpp,h}` core
+(`LLAMA_BUILD_TOOLS`/`TESTS` OFF), so the flips + test are validated via a one-off tools+tests build
+(the new test's asserts pass; `test-arg-parser`'s only red is the live `ggml.ai` download check, which
+is sandbox-network). The 37-file patch must be re-verified on each llama.cpp bump (the applier fails
+loud). Submit it to llama.cpp and drop the local copy once merged.
 
 **Symptom.** On **Windows x86_64 only**, every Java test that loads a real model fails in
 `LlamaModel.loadModel` (native) with `LlamaException: "Failed to parse model parameters"`