Make patch 0001 the complete upstream #24779 fix (core + flips + test)

claude · claude · commit aecdde0eccd6 · 2026-06-25T16:44:45.000Z
Expand patches/0001-win32-arg-parse-embed-guard.patch from the arg.cpp/arg.h core into the full, submittable upstream change so it can be sent to llama.cpp verbatim and then dropped here: - common/arg.cpp + common/arg.h: common_params_parse() parses exactly the argv it is given; new common_params_parse_main() wrapper carries the Windows GetCommandLineW UTF-8 recovery (#24779) for the standalone tools. - ~34 standalone main() call sites across tools/*, examples/* and the tests/* programs flip common_params_parse(argc, argv, ...) -> common_params_parse_main(argc, argv, ...). - tests/test-arg-parser.cpp: regression case asserting common_params_parse honors a caller-supplied argv (the embedded/JNI contract). Our build compiles llama.cpp as a subproject (LLAMA_BUILD_TOOLS/TESTS OFF), so only the arg.{cpp,h} core is compiled here -- the flips + test are applied but not built in normal CI, and our embedded path (jllama.cpp -> common_params_parse) is behaviorally identical to before. Validated the flips + test via a one-off -DLLAMA_BUILD_TOOLS=ON -DLLAMA_BUILD_TESTS=ON build: the new test compiles and its asserts pass, and a flipped program (test-thread-safety) builds; test-arg-parser's only failure is its live ggml.ai download assertion (sandbox network, not the patch). Full patch applies + reverse-applies cleanly against b9789 (37 files). Docs updated (CLAUDE.md patches table, breaking-changes row, TODO.md). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SLQk4Fk7vk7R4f2za1KxYg
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -382,7 +382,7 @@ Current patches:
 
 | Patch | Fixes |
 |-------|-------|
-| `0001-win32-arg-parse-embed-guard.patch` | Windows JNI regression from llama.cpp **#24779** (introduced b9739): on Windows `common_params_parse` re-derived argv from the **process** command line (`GetCommandLineW`) and adopted it, so an embedded/JNI caller (`java.exe`) lost its `--model …` args → "Failed to parse model parameters". b9789 narrowed the unconditional override to a **count-guard** (`if (static_cast<int>(utf8.buf.size()) == argc) { argv = utf8.ptrs.data(); }`), but that is exactly the variant the project already found breaks its Windows server-integration tests (when the embedded argv length coincides with `java.exe`'s). The patch implements the clean **opt-in** fix intended for upstreaming: `common_params_parse` now parses **exactly the argv it is given** (no `GetCommandLineW` magic), and a new `common_params_parse_main()` wrapper carries the UTF-8 recovery for the standalone tools' `main()`. The embedded caller (`jllama.cpp`) calls `common_params_parse` directly and is never overridden. The local patch deliberately does **not** flip the ~20 standalone `main()` call sites (we don't ship those tools, and a 20-file patch would be bump-fragile) — the full upstream PR adds those flips so the standalone tools keep their UTF-8 fix, after which this local patch can be dropped. Re-verify on each llama.cpp bump (the applier fails loud). |
+| `0001-win32-arg-parse-embed-guard.patch` | Windows JNI regression from llama.cpp **#24779** (introduced b9739): on Windows `common_params_parse` re-derived argv from the **process** command line (`GetCommandLineW`) and adopted it, so an embedded/JNI caller (`java.exe`) lost its `--model …` args → "Failed to parse model parameters". b9789 narrowed the unconditional override to a **count-guard** (`if (static_cast<int>(utf8.buf.size()) == argc) { argv = utf8.ptrs.data(); }`), but that is exactly the variant the project already found breaks its Windows server-integration tests (when the embedded argv length coincides with `java.exe`'s). The patch carries the **complete upstream change** (so it can be submitted to llama.cpp verbatim and then dropped here): **(1)** `common_params_parse` parses **exactly the argv it is given** (no `GetCommandLineW` magic) and a new `common_params_parse_main()` wrapper holds the UTF-8 recovery for the standalone tools' `main()` (`common/arg.{cpp,h}`); **(2)** the **~34 standalone `main()` call sites** (every `common_params_parse(argc, argv, …)` across `tools/*`, `examples/*` and the `tests/*` programs) flip to `common_params_parse_main()`; **(3)** a `tests/test-arg-parser.cpp` regression case pins that `common_params_parse` honors a caller-supplied argv. The embedded caller (`jllama.cpp`) keeps calling `common_params_parse` and is never overridden. **Our subproject build compiles only the `arg.{cpp,h}` core** — `LLAMA_BUILD_TOOLS`/`LLAMA_BUILD_TESTS` are OFF for a FetchContent subproject — so the flips + test are applied-but-not-compiled here; they were validated via a one-off `-DLLAMA_BUILD_TOOLS=ON -DLLAMA_BUILD_TESTS=ON` build (the new test compiles and its asserts pass; `test-arg-parser`'s only red there is the live `ggml.ai` download check, which is sandbox-network, not the patch). Because it spans **37 files** it must be refreshed on every llama.cpp bump (the applier fails loud). |
 | `0002-server-preserve-caller-load-progress-callback.patch` | Load-progress-callback regression introduced in llama.cpp **b9789**: `server_context::load_model` (`tools/server/server-context.cpp`) now **unconditionally** installs the server's own load-progress reporter on `params_base.load_progress_callback` immediately before `common_init_from_params`, clobbering any callback the embedding caller already set. libjllama's `LoadProgressCallback` feature wires `common_params.load_progress_callback` to a JNI trampoline *before* calling `load_model`, so the bump silently killed it — `LoadProgressCallbackTest` saw zero progress updates and the abort-on-`false` path never threw. The patch guards the assignment with `if (params_base.load_progress_callback == nullptr)`, so the server installs its own reporter **only when the caller hasn't** — a caller-supplied callback survives and fires during load. Standalone `llama-server` (no caller callback, so the field is null) is unaffected. Same JNI-vs-standalone divergence class as `0001`. |
 
 ## OuteTTS build-time extraction (`cmake/generate-tts-upstream.cmake`)
diff --git a/TODO.md b/TODO.md
@@ -242,10 +242,14 @@ process, so the override is pure liability), which is deterministic. **As of the
 was reshaped into the clean opt-in form intended for upstreaming (fix option 3's core):**
 `common_params_parse` now parses exactly the argv it is given, and a new `common_params_parse_main()`
 wrapper carries the `GetCommandLineW` UTF-8 recovery that the standalone tools' `main()` opt into.
-Embedded callers stay on `common_params_parse` and are never overridden. The local patch leaves the
-~20 standalone `main()` call-site flips to the upstream PR (we don't ship those tools and a 20-file
-patch would be bump-fragile); once that PR lands the local patch can be dropped. Until then it must be
-re-verified on each llama.cpp bump (the applier fails loud if it no longer applies).
+**The patch now carries the full upstream change (37 files):** the ~34 `common_params_parse(argc, argv,
+…)` call sites across `tools/*`, `examples/*` and the `tests/*` programs flip to
+`common_params_parse_main()`, plus a `tests/test-arg-parser.cpp` regression case. Embedded callers stay
+on `common_params_parse`. Our subproject build compiles only the `arg.{cpp,h}` core
+(`LLAMA_BUILD_TOOLS`/`TESTS` OFF), so the flips + test are validated via a one-off tools+tests build
+(the new test's asserts pass; `test-arg-parser`'s only red is the live `ggml.ai` download check, which
+is sandbox-network). The 37-file patch must be re-verified on each llama.cpp bump (the applier fails
+loud). Submit it to llama.cpp and drop the local copy once merged.
 
 **Symptom.** On **Windows x86_64 only**, every Java test that loads a real model fails in
 `LlamaModel.loadModel` (native) with `LlamaException: "Failed to parse model parameters"`
diff --git a/docs/history/llama-cpp-breaking-changes.md b/docs/history/llama-cpp-breaking-changes.md
@@ -379,7 +379,7 @@ Used during `llama.cpp` version bumps: when upgrading, scan this file from the r
 | b9739–b9789 | `include/llama.h` | **New API** `llama_model_n_layer_nextn(const llama_model *)` &mdash; returns the number of NextN/MTP layers (additive; the surrounding accessor block was otherwise only column-realigned). Not called by project; could back a future introspection accessor. No project source changes required |
 | b9739–b9789 | `common/common.h` | `common_params::checkpoint_min_step` default raised `256` &#x2192; `8192` (minimum spacing between context checkpoints). Tuning default consumed inside upstream-compiled `server-context.cpp`; not surfaced by `ModelParameters`. No project source changes required |
 | b9739–b9789 | `common/arg.h` + `common/arg.cpp` + `common/download.h` | `common_params_handle_models()` gained a 3rd parameter &mdash; a `common_params_handle_models_params` struct (`{ common_download_callback*, bool preset_only }`) for router-mode preset-only downloads; `arg.h` now `#include`s `download.h`; new `common_download_opts::preset_only`. Project does not call `common_params_handle_models()` directly (arg parsing happens upstream) &mdash; `grep -rn common_params_handle_models src/` &#x2192; zero matches. No project source changes required |
-| b9739–b9789 | `common/arg.cpp` + `common/arg.h` (**patch target**) | Upstream's Windows `common_params_parse` argv handling changed again: the unconditional `argc/argv = make_utf8_argv()` override (the original #24779 regression) became a **count-guard** `if (static_cast<int>(utf8.buf.size()) == argc) { argv = utf8.ptrs.data(); }` — exactly the variant the project already found breaks its Windows server-integration tests (embedded argv length coincides with `java.exe`'s). **`patches/0001-win32-arg-parse-embed-guard.patch` was reshaped into the clean opt-in form intended for upstreaming:** `common_params_parse` now parses exactly the argv it is given, and a new `common_params_parse_main()` wrapper holds the `GetCommandLineW` UTF-8 recovery for standalone tools' `main()`. The embedded JNI caller (which already calls `common_params_parse` directly) is respected by default. Re-verified to apply + reverse-apply cleanly against b9789, compile clean, ctest 454/454. The ~20 standalone `main()` call-site flips are left to the upstream PR, not the local patch |
+| b9739–b9789 | `common/arg.cpp` + `common/arg.h` + ~34 `tools/*`,`examples/*`,`tests/*` mains + `tests/test-arg-parser.cpp` (**patch target**) | Upstream's Windows `common_params_parse` argv handling changed again: the unconditional `argc/argv = make_utf8_argv()` override (the original #24779 regression) became a **count-guard** `if (static_cast<int>(utf8.buf.size()) == argc) { argv = utf8.ptrs.data(); }` — exactly the variant the project already found breaks its Windows server-integration tests (embedded argv length coincides with `java.exe`'s). **`patches/0001-win32-arg-parse-embed-guard.patch` now carries the complete upstream fix (37 files):** `common_params_parse` parses exactly the argv it is given; a new `common_params_parse_main()` wrapper holds the `GetCommandLineW` recovery; the ~34 standalone `main()` call sites flip to it; and a `tests/test-arg-parser.cpp` case pins the contract. The embedded JNI caller stays on `common_params_parse` and is respected. Our subproject build compiles only the `arg.{cpp,h}` core (`LLAMA_BUILD_TOOLS`/`TESTS` OFF, so our 454-test suite is unchanged); the flips + test were validated via a one-off tools+tests build (new test passes; `test-arg-parser`'s only failure is the live `ggml.ai` download check — sandbox network). 37-file patch — refresh on every bump |
 | b9739–b9789 | `tools/mtmd/mtmd.h` + `tools/mtmd/clip.h` + `clip.cpp` + `mtmd.cpp` | **New feature** &mdash; multimodal model-load progress: new `mtmd_progress_callback` typedef + `progress_callback` / `progress_callback_user_data` fields on `mtmd_context_params` and `clip_context_params` (additive, appended to the structs; returning `false` aborts the load). Project does not aggregate-init either struct (`grep -rn mtmd_context_params src/` &#x2192; zero matches) so the new fields are harmless; could later feed a Java `LoadProgressCallback` for vision models. No project source changes required |
 | b9739–b9789 | `tools/server/server-models.{h,cpp}` + `server-context.h` | Multi-model router refactor: model downloading moved into a dedicated child-process mode (`enum server_child_mode`, `server_models::load(name, load_options)`, `server_child::run_download()`; old `server_models::download()` removed); `SERVER_STATE_DOWNLOADING` re-enabled in `server_state`. Project links `server-models.cpp` but does not drive the router (`grep -rn "server_models\|SERVER_CHILD_MODE" src/` &#x2192; zero matches). Compiles into `jllama` unchanged. No project source changes required |
 | b9739–b9789 | `ggml/src/ggml-{hexagon,vulkan,sycl,opencl,webgpu,cuda}/` + shaders | Backend-internal work only: Hexagon HTP matmul kernels re-tiled (`hmx-matmul-ops.c` &#x2192; `hmx-mm-kernels-tiled.h`); Vulkan gains a `conv3d_mm` shader + `get_rows_back` and folds the elementwise unary shaders (`clamp`/`cos`/`sin`/`sqrt`/`square`/`leaky_relu`.comp removed) into `unary.comp`; SYCL element-wise / conv3d additions; OpenCL Adreno norm/gemv tweaks; WebGPU `mul_mat_vec` refactor. No API surface visible to `jllama.cpp`; the OpenCL set only affects the `opencl-android-aarch64` classifier. No project source changes required |
diff --git a/patches/0001-win32-arg-parse-embed-guard.patch b/patches/0001-win32-arg-parse-embed-guard.patch