build: add patches/ mechanism + fix Windows JNI arg parse (llama.cpp #24779)

claude · claude · commit 1d875b1e122f · 2026-06-20T20:03:03.000Z
Introduce a generic source-patch mechanism for the FetchContent'd llama.cpp tree so fixes apply to every C++ build (all CI jobs + local) from one place: - patches/ : drop *.patch / *.diff here (applied in filename order) - cmake/apply-llama-patches.cmake : cross-platform (cmake -P), idempotent (git apply --reverse --check skips already-applied), fail-loud (a stale patch aborts configure so it can't be silently dropped from a release build) - CMakeLists.txt : wired as the llama.cpp FetchContent PATCH_COMMAND First patch, 0001-win32-arg-parse-embed-guard.patch, fixes the b9739 Windows JNI regression from upstream #24779: common_params_parse unconditionally replaced the caller's argv with the process command line (GetCommandLineW), so an embedded java.exe lost its --model args -> "Failed to parse model parameters". The guard adopts the process command line only when the re-derived arg count equals argc: true for the standalone llama-* tools (UTF-8 CLI fix preserved), false for a JVM host (our already-UTF-8 argv kept). Verified the patch applies cleanly to b9739 and the applier is idempotent. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SfvSZ76NW4e1qX1PjL4RKq
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -331,9 +331,35 @@ siblings; why (and why the `DEPOT_TOKEN` org secret and the README "Build cache
 are kept jllama-only) is explained in the cross-repo status under "Deliberate non-parity":
 [`../workspace/crossrepostatus.md`](../workspace/crossrepostatus.md).
 
+## Local llama.cpp source patches (`patches/`)
+
+The fetched llama.cpp source is patched before it compiles, via a generic mechanism:
+
+- **`patches/`** (repo root) — drop any number of `*.patch` / `*.diff` files here. They are applied
+  in **filename order** (use a numeric prefix, e.g. `0001-`, `0002-`), so keep them independent or
+  ordered. Each must be a `git apply`-compatible unified diff with paths relative to the llama.cpp
+  source root (`a/common/arg.cpp` / `b/common/arg.cpp`, i.e. `-p1`).
+- **`cmake/apply-llama-patches.cmake`** — the applier. Cross-platform (`cmake -P`, so identical on
+  Linux/macOS/Windows), **idempotent** (`git apply --reverse --check` skips already-applied patches
+  so a reconfigure never double-applies) and **fail-loud** (a patch that no longer applies aborts
+  the configure — a stale patch can't be silently dropped from a release build).
+- **`CMakeLists.txt`** — wired as the llama.cpp `FetchContent_Declare(... PATCH_COMMAND ...)`, so it
+  runs for **every** C++ build (all CI jobs *and* local `cmake -B build`) from one place — no
+  per-build-step plumbing.
+
+**On a llama.cpp version bump, every patch must still apply** — if a bump shifts the patched code,
+the configure fails with an "does not apply cleanly" error; refresh the diff against the new source
+and recommit. Treat `patches/` as part of the upgrade checklist below.
+
+Current patches:
+
+| Patch | Fixes |
+|-------|-------|
+| `0001-win32-arg-parse-embed-guard.patch` | Windows JNI regression from llama.cpp **#24779** (b9739): `common_params_parse` unconditionally replaced the caller's argv with the process command line (`GetCommandLineW`), so an embedded/JNI caller (`java.exe`) lost its `--model …` args → "Failed to parse model parameters". The patch guards the override to fire **only when the re-derived arg count equals `argc`** — true for the standalone `llama-*` tools (their UTF-8 CLI fix is preserved), false for a JVM host (our already-UTF-8 argv is kept). This is also the shape to PR upstream. |
+
 ## Upgrading/Downgrading llama.cpp Version
 
-To change the llama.cpp version, update the following **three** files:
+To change the llama.cpp version, update the following **three** files (and re-verify `patches/`):
 
 1. **CMakeLists.txt** — the `GIT_TAG` line for llama.cpp: `GIT_TAG        b8831`
 2. **README.md** — the badge and link line with the version number
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -136,10 +136,18 @@ set(GGML_AVX512  OFF CACHE BOOL "" FORCE)
 set(LLAMA_BUILD_UI OFF CACHE BOOL "" FORCE)
 # b9284 flipped LLAMA_BUILD_APP default to ON; we don't build the unified binary
 set(LLAMA_BUILD_APP OFF CACHE BOOL "" FORCE)
+# Local source patches for the fetched llama.cpp tree. Every patches/*.patch|*.diff is applied
+# (sorted, idempotently, fail-loud) by cmake/apply-llama-patches.cmake — see that file's header.
+# This runs for every C++ build (all CI jobs + local) from one place. <SOURCE_DIR> is substituted
+# by FetchContent/ExternalProject to the fetched llama.cpp source root.
 FetchContent_Declare(
 	llama.cpp
 	GIT_REPOSITORY https://github.com/ggerganov/llama.cpp.git
 	GIT_TAG        b9739
+	PATCH_COMMAND  ${CMAKE_COMMAND}
+		-DPATCH_DIR=${CMAKE_CURRENT_SOURCE_DIR}/patches
+		-DLLAMA_SRC=<SOURCE_DIR>
+		-P ${CMAKE_CURRENT_SOURCE_DIR}/cmake/apply-llama-patches.cmake
 )
 FetchContent_MakeAvailable(llama.cpp)
 
diff --git a/TODO.md b/TODO.md
@@ -137,8 +137,13 @@ proving Ninja Multi-Config + MSVC works on the same tree). The two builds produc
 
 ### Known regression (b9739) — Windows JNI: `common_params_parse` ignores caller argv
 
-**Status: root-caused, fix deferred.** Surfaced while bringing PR #248 green (the b9739 build fixes let
-the Windows Java jobs run to completion and exposed this).
+**Status: FIXED via local source patch (`patches/0001-win32-arg-parse-embed-guard.patch`).** Surfaced
+while bringing PR #248 green (the b9739 build fixes let the Windows Java jobs run to completion and
+exposed this). Resolved by **fix option 1 below** — the count-guard — applied through the generic
+`patches/` mechanism (see CLAUDE.md "Local llama.cpp source patches"), so it covers every C++ build
+and re-applies on each clean build. Still worth upstreaming (the guard, or a `common_params_parse_argv`
+companion) so the patch can eventually be dropped; until then it must be re-verified on each llama.cpp
+bump (the applier fails loud if it no longer applies).
 
 **Symptom.** On **Windows x86_64 only**, every Java test that loads a real model fails in
 `LlamaModel.loadModel` (native) with `LlamaException: "Failed to parse model parameters"`
diff --git a/cmake/apply-llama-patches.cmake b/cmake/apply-llama-patches.cmake
@@ -0,0 +1,71 @@
+# SPDX-License-Identifier: MIT
+#
+# apply-llama-patches.cmake — applies every patch in the repo-root `patches/` directory to the
+# llama.cpp source tree fetched by FetchContent. Wired as the llama.cpp `PATCH_COMMAND` in the
+# top-level CMakeLists.txt, so it runs for EVERY C++ build (all CI jobs + local) from one place,
+# rather than per-build-step.
+#
+# Design:
+#   * Cross-platform: invoked via `cmake -P`, so it behaves identically on Linux, macOS and
+#     Windows (the dockcross/native/MSVC jobs all call the same code path).
+#   * Every `patches/*.patch` and `patches/*.diff` is applied, sorted by filename (so a numeric
+#     prefix like 0001-, 0002- defines a deterministic order).
+#   * Idempotent: `git apply --reverse --check` detects an already-applied patch and skips it, so
+#     a CMake reconfigure over an already-patched source tree does not fail.
+#   * Fail-loud: a patch that no longer applies (e.g. after a llama.cpp version bump shifts the
+#     context) aborts the configure with a clear message, so a stale patch can never be silently
+#     dropped from a release build.
+#
+# Invoked as:
+#   cmake -DPATCH_DIR=<repo>/patches -DLLAMA_SRC=<fetched-src> -P cmake/apply-llama-patches.cmake
+
+if(NOT DEFINED PATCH_DIR OR NOT DEFINED LLAMA_SRC)
+    message(FATAL_ERROR "apply-llama-patches: both PATCH_DIR and LLAMA_SRC must be defined")
+endif()
+
+find_program(GIT_EXECUTABLE NAMES git)
+if(NOT GIT_EXECUTABLE)
+    message(FATAL_ERROR "apply-llama-patches: 'git' not found on PATH (required to apply patches)")
+endif()
+
+file(GLOB patch_files "${PATCH_DIR}/*.patch" "${PATCH_DIR}/*.diff")
+list(SORT patch_files)
+
+if(NOT patch_files)
+    message(STATUS "apply-llama-patches: no patches in ${PATCH_DIR} (nothing to apply)")
+    return()
+endif()
+
+foreach(patch IN LISTS patch_files)
+    get_filename_component(patch_name "${patch}" NAME)
+
+    # Already applied? A successful reverse-apply check means the change is present already.
+    execute_process(
+        COMMAND "${GIT_EXECUTABLE}" -C "${LLAMA_SRC}" apply --reverse --check "${patch}"
+        RESULT_VARIABLE reverse_rc
+        OUTPUT_QUIET ERROR_QUIET)
+    if(reverse_rc EQUAL 0)
+        message(STATUS "apply-llama-patches: ${patch_name} already applied — skipping")
+        continue()
+    endif()
+
+    # Not applied yet — confirm it applies cleanly before touching the tree.
+    execute_process(
+        COMMAND "${GIT_EXECUTABLE}" -C "${LLAMA_SRC}" apply --check "${patch}"
+        RESULT_VARIABLE check_rc
+        OUTPUT_QUIET ERROR_QUIET)
+    if(NOT check_rc EQUAL 0)
+        message(FATAL_ERROR
+            "apply-llama-patches: ${patch_name} does not apply cleanly to ${LLAMA_SRC}.\n"
+            "  A llama.cpp version bump probably shifted the patched code — refresh the patch "
+            "against the new source and recommit it.")
+    endif()
+
+    execute_process(
+        COMMAND "${GIT_EXECUTABLE}" -C "${LLAMA_SRC}" apply "${patch}"
+        RESULT_VARIABLE apply_rc)
+    if(NOT apply_rc EQUAL 0)
+        message(FATAL_ERROR "apply-llama-patches: failed to apply ${patch_name}")
+    endif()
+    message(STATUS "apply-llama-patches: applied ${patch_name}")
+endforeach()
diff --git a/patches/0001-win32-arg-parse-embed-guard.patch b/patches/0001-win32-arg-parse-embed-guard.patch
@@ -0,0 +1,19 @@
+diff --git a/common/arg.cpp b/common/arg.cpp
+--- a/common/arg.cpp
++++ b/common/arg.cpp
+@@ -924,7 +924,14 @@ bool common_params_parse(int argc, char ** argv, common_params & params, llama_e
+ bool common_params_parse(int argc, char ** argv, common_params & params, llama_example ex, void(*print_usage)(int, char **)) {
+ #ifdef _WIN32
+     auto utf8 = make_utf8_argv();
+-    if (!utf8.ptrs.empty()) {
++    // java-llama.cpp patch (PR #248): only adopt the process command line (GetCommandLineW) when
++    // the caller actually passed THIS process's own argv -- i.e. the re-derived argument count
++    // matches argc. For the standalone llama-* tools that is always true, so their UTF-8 CLI fix
++    // (upstream llama.cpp #24779) is preserved. For an embedded JNI caller the process is java.exe
++    // (many more args), so the counts differ and our already-UTF-8 argv (from GetStringUTFChars)
++    // is kept instead of being silently discarded -- which otherwise makes common_params_parse_ex
++    // parse java.exe's command line and fail with "Failed to parse model parameters".
++    if (!utf8.ptrs.empty() && static_cast<int>(utf8.buf.size()) == argc) {
+         argc = static_cast<int>(utf8.buf.size());
+         argv = utf8.ptrs.data();
+     }