bernardladenthin
diff --git a/‎CLAUDE.md‎
Lines changed: 4 additions & 4 deletions b/‎CLAUDE.md‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎CMakeLists.txt‎
Lines changed: 2 additions & 2 deletions b/‎CMakeLists.txt‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎README.md‎
Lines changed: 1 addition & 1 deletion b/‎README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎TODO.md‎
Lines changed: 2 additions & 2 deletions b/‎TODO.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/feature-investigation-similar-projects.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/feature-investigation-similar-projects.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/history/llama-cpp-breaking-changes.md‎
Lines changed: 6 additions & 0 deletions b/‎docs/history/llama-cpp-breaking-changes.md‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎src/main/java/net/ladenthin/llama/LlamaModel.java‎
Lines changed: 12 additions & 18 deletions b/‎src/main/java/net/ladenthin/llama/LlamaModel.java‎
Lines changed: 12 additions & 18 deletions
diff --git a/‎src/main/java/net/ladenthin/llama/args/ModelFlag.java‎
Lines changed: 11 additions & 8 deletions b/‎src/main/java/net/ladenthin/llama/args/ModelFlag.java‎
Lines changed: 11 additions & 8 deletions
@@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
 
 Java bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) via JNI, providing a high-level API for LLM inference in Java. The Java layer communicates with a native C++ library through JNI.
 
-Current llama.cpp pinned version: **b9789**
+Current llama.cpp pinned version: **b9803**
 
 ## Upgrading CUDA Version
 
@@ -241,7 +241,7 @@ needs no extra step here, `build-webui` re-reads the tag and rebuilds the matchi
 ships no UI):
 ```bash
 # needs node/npm + network; embed.cpp is plain C++17 (no npm)
-git clone --depth 1 --branch b9789 https://github.com/ggml-org/llama.cpp /tmp/lc
+git clone --depth 1 --branch b9803 https://github.com/ggml-org/llama.cpp /tmp/lc
 ( cd /tmp/lc/tools/ui && npm ci && npm run build \
   && ( cd dist && find . -type f -not -path './_gzip/*' \
        | while read -r f; do mkdir -p "_gzip/$(dirname "$f")"; gzip -9 -c "$f" > "_gzip/$f"; done ) \
@@ -275,7 +275,7 @@ plus a cache token are present, `build.sh` adds
 - `SCCACHE_WEBDAV_TOKEN: ${{ secrets.DEPOT_TOKEN }}` — a Depot **organization** token, stored
   as the repo secret **`DEPOT_TOKEN`**.
 
-Because `sccache` is **content-addressed** and llama.cpp is pinned (`GIT_TAG b9789`), the
+Because `sccache` is **content-addressed** and llama.cpp is pinned (`GIT_TAG b9803`), the
 ~280 upstream object files are byte-identical every run, so a warm cache recompiles only the
 *changed* files. Depot's cache is **shared across all branches** (unlike GitHub's
 per-branch `actions/cache`), so every branch builds incrementally; a `b<nnnn>` version bump
@@ -957,7 +957,7 @@ ctest --test-dir build --output-on-failure -R "ResultsToJson"
 
 #### Upstream source location (in CMake build tree)
 
-llama.cpp is fetched via CMake FetchContent, pinned to `GIT_TAG b9789`.
+llama.cpp is fetched via CMake FetchContent, pinned to `GIT_TAG b9803`.
 
 **GoogleTest** is a separate `BUILD_TESTING`-only FetchContent (`GIT_TAG v1.17.0`), used solely
 by the `jllama_test` C++ unit-test binary — not by the shipped library, and not coupled to the
 
@@ -143,7 +143,7 @@ set(LLAMA_BUILD_APP OFF CACHE BOOL "" FORCE)
 FetchContent_Declare(
 	llama.cpp
 	GIT_REPOSITORY https://github.com/ggerganov/llama.cpp.git
-	GIT_TAG        b9789
+	GIT_TAG        b9803
 	PATCH_COMMAND  ${CMAKE_COMMAND}
 		-DPATCH_DIR=${CMAKE_CURRENT_SOURCE_DIR}/patches
 		-DLLAMA_SRC=<SOURCE_DIR>
@@ -166,7 +166,7 @@ execute_process(
     COMMAND ${CMAKE_COMMAND}
         -DTTS_SRC=${llama.cpp_SOURCE_DIR}/tools/tts/tts.cpp
         -DOUT_CPP=${JLLAMA_TTS_GEN_CPP}
-        -DLLAMA_TAG=b9789
+        -DLLAMA_TAG=b9803
         -P ${CMAKE_CURRENT_SOURCE_DIR}/cmake/generate-tts-upstream.cmake
     RESULT_VARIABLE JLLAMA_TTS_GEN_RESULT
 )
 
@@ -7,7 +7,7 @@
 **Build:**  
 ![Java 8+](https://img.shields.io/badge/Java-8%2B-informational)  
 ![Platform](https://img.shields.io/badge/Platform-Linux%20%7C%20macOS%20%7C%20Windows%20%7C%20Android-lightgrey)  
-[![llama.cpp b9789](https://img.shields.io/badge/llama.cpp-%23b9789-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9789)  
+[![llama.cpp b9803](https://img.shields.io/badge/llama.cpp-%23b9803-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9803)  
 [![JPMS](https://img.shields.io/badge/JPMS-modular%20JAR-25A162)](https://openjdk.org/projects/jigsaw/)  
 ![JUnit](https://img.shields.io/badge/tested%20with-JUnit6-25A162)  
 [![JSpecify](https://img.shields.io/badge/JSpecify-1.0.0%20%40NullMarked-25A162)](https://jspecify.dev)  
 
@@ -497,7 +497,7 @@ Foundation):
   `ChatRequest`).
 - **Loader** (internal, NOT exported): `loader` (LlamaLoader, OSInfo,
   ProcessRunner, NativeLibraryPermissionSetter, Java8CompatibilityHelper,
-  SkipDownloadFailureTranslator, LlamaSystemProperties).
+  OfflineModelGuard, LlamaSystemProperties).
 - **Api** (root): LlamaModel, Session, LlamaIterable, LlamaIterator.
 
 Cycle-breaking moves: `TimingsLogger` root→`json`, `ParameterJsonSerializer`
@@ -545,7 +545,7 @@ keeps `loader` internal. All 11 ArchUnit rules green; `javadoc:jar` clean.
 - **Banned-API enforcement** — Maven Enforcer (`8baae0c`), ArchUnit `System.exit` / `new Random` / `Thread.sleep` (`329d764`), `sun.*` / `com.sun.*` / `jdk.internal.*` (`e6069da`).
 - **ArchUnit public-fields-final** — `7b6667d`.
 - **LogCaptor smoke test** — `LoggingSmokeTest` (`3cedc6e`).
-- **Expose `common_params::skip_download`** — `ModelFlag.SKIP_DOWNLOAD` + `ModelParameters.setSkipDownload(boolean)` + `hasFlag` helper + new public `ModelUnavailableException` (extends now-public `LlamaException`) + Java-side heuristic translator. 7 unit tests in `LlamaModelSkipDownloadTest`. No JNI rebuild required.
+- **Offline / air-gapped model loading** — `ModelFlag.OFFLINE` + `ModelParameters.setOffline(boolean)` + `hasFlag` helper + public `ModelUnavailableException` (extends now-public `LlamaException`) + deterministic pre-check `OfflineModelGuard`. Unit tests in `LlamaModelOfflineTest`. No JNI rebuild required. *(Originally shipped as `SKIP_DOWNLOAD`/`setSkipDownload` over a parse-failure heuristic; reworked when llama.cpp b9803 removed `common_params::skip_download` and `common_skip_download_exception` — `--skip-download` was never a registered upstream arg, so it never actually skipped a download. `--offline` is the real upstream flag with the intended load-from-cache semantics.)*
 - **`LlamaSystemProperties` registry cleanup** — `getLibName()` deleted (`6bb63e1` upstream forensic trace); `OSInfo.getArchName()` now routes through `LlamaSystemProperties.getOsinfoArchitecture()` (`3ae6c81`).
 - **Abstract the Java and test writing guidelines to a workspace-level shared layer.** Workspace version chain at [`../workspace/guides/src/CODE_WRITING_GUIDE-8.md`](../workspace/guides/src/CODE_WRITING_GUIDE-8.md) and [`../workspace/guides/test/TEST_WRITING_GUIDE-8.md`](../workspace/guides/test/TEST_WRITING_GUIDE-8.md); canonical TDD skill at [`../workspace/.claude/skills/java-tdd-guide/SKILL.md`](../workspace/.claude/skills/java-tdd-guide/SKILL.md).
 - **Standardised CLAUDE.md template** — [`../workspace/templates/CLAUDE.md.template`](../workspace/templates/CLAUDE.md.template).
@@ -38,7 +38,7 @@ The following are confirmed present in `java-llama.cpp` as of this survey — fl
 
 | Capability | Status |
 |---|---|
-| `setSkipDownload(boolean)` + typed `ModelUnavailableException` | ✅ (commit `37754d4`) |
+| `setOffline(boolean)` (was `setSkipDownload`) + typed `ModelUnavailableException` | ✅ (commit `37754d4`) |
 | Reasoning-format toggle, reasoning-budget tokens | ✅ (`InferenceParameters#setReasoningFormat` etc.) |
 | Tool calls + custom chat templates | ✅ |
 | Speculative draft model | ✅ |
 
@@ -386,3 +386,9 @@ Used during `llama.cpp` version bumps: when upgrading, scan this file from the r
 | b9739–b9789 | `common/json-schema-to-grammar.cpp` (Java-test impact) | The JSON-schema &#x2192; GBNF serializer changed where it emits the `space` whitespace rule: a closing object is now `… )? space "}"` (was `… )? "}" space`) and a root-level `string` rule no longer appends a trailing `space` (`string ::= "\"" char* "\""`, was `… "\"" space`). Functionally equivalent (leading- vs trailing-whitespace placement) but byte-different, so the pinned expectation in `LlamaModelTest.testJsonSchemaToGrammar` was updated to the b9789 output. `LlamaModel.jsonSchemaToGrammar` is a pure JNI call (no model), so this failed on every platform's Java-test job; the new expectation was verified locally against the built b9789 `libjllama`. Test-data change only |
 | b9739–b9789 | `tools/server/server-context.cpp` (**patch target**, regression) | `server_context::load_model` now **unconditionally** installs the server's own load-progress reporter on `params_base.load_progress_callback` immediately before `common_init_from_params` (b9739 called `common_init_from_params(params_base)` with no such assignment). This clobbered libjllama's `LoadProgressCallback` JNI trampoline (set on `common_params.load_progress_callback` before `load_model`), so `LoadProgressCallbackTest` observed zero progress updates and the abort-on-`false` path stopped throwing. Fixed by new **`patches/0002-server-preserve-caller-load-progress-callback.patch`**, which guards the install behind `if (params_base.load_progress_callback == nullptr)` so a caller-supplied callback survives (standalone `llama-server` keeps its reporter — the field is null there). Re-verified to apply + reverse-apply cleanly against b9789 and to compile clean (ctest still 454/454) |
 | b9739–b9789 | upstream build / verification | Local build with `GIT_TAG b9789` verified clean on Linux x86_64 (GCC 13.3; sources were pre-staged from release tarballs + both patches hand-applied because this sandbox blocks `github.com` git-clone, so `FetchContent`'s git path and `PATCH_COMMAND` could not run &mdash; the published CI pipeline uses the normal git FetchContent path). `cmake -B build -DBUILD_TESTING=ON` configures cleanly (the OuteTTS build-time extraction **and** the refreshed Windows patch both pass their fail-loud anchor checks against b9789), `cmake --build build --config Release -j$(nproc)` links `libjllama.so` + `jllama_test` with zero warnings on any project translation unit, and `ctest --test-dir build --output-on-failure` reports **454/454 tests passing**. Every upstream breaking change in this range is absorbed inside upstream-compiled translation units, so no project C++ *source* edits were required &mdash; **but** PR CI's model-backed Java suite (which the restricted sandbox cannot run) surfaced two project-side fixes captured in the two rows above: the `json-schema-to-grammar` test-expectation update and the `load_progress_callback` server regression (`patches/0002`) |
+| b9789–b9803 | `common/arg.{cpp,h}` + `common/download.{cpp,h}` + `common/common.h` (model-download refactor) | The model-download pipeline was rewritten: `common_params_handle_models()` / `common_params_handle_models_params` / `common_download_model()` / `common_download_model_result` **removed** and replaced by a two-phase `common_models_handler` API (`common_models_handler_init()` builds the HF plan + opts; `common_models_handler_apply()` runs the parallel `common_download_task` list); `common_download_opts::skip_download` / `::preset_only` and the whole `common_skip_download_exception` type **removed**; new `common_download_get_hf_plan()` / `common_download_run_tasks()` / `common_download_get_all_parts()`; `download.h` now `#include`s `hf-cache.h`. Project C++ references none of these — verified `grep -rn "common_params_handle_models\|common_download_model\|common_skip_download\|skip_download\|preset_only" src/main/cpp src/test/cpp` &#x2192; zero matches, and no project TU includes `download.h` directly. All consumers (arg parsing, `server-models.cpp`, `llama-bench.cpp`) are upstream-compiled. No project C++ source changes required. **Java API follow-up (behavioural):** this removal exposed that the project's `ModelFlag.SKIP_DOWNLOAD` (`--skip-download`) was never a registered upstream arg — it only ever forced a parse failure that `SkipDownloadFailureTranslator` mapped to `ModelUnavailableException`, and it could never load a *present* model. It was **replaced** with the real upstream `--offline` flag: `ModelFlag.OFFLINE` + `ModelParameters.setOffline(boolean)`; the heuristic translator was replaced by a deterministic pre-check `OfflineModelGuard` (throws `ModelUnavailableException` when `--offline` is set and the configured local `--model` file is absent, before the native call); `LlamaModelSkipDownloadTest` → `LlamaModelOfflineTest`. `ModelUnavailableException` is retained. Pure-Java change, no JNI rebuild |
+| b9789–b9803 | `common/common.h` | `common_params_model` gained `bool empty()` and `get_name()` became `const` (additive); `common_params::skip_download` field **removed**; new `LLAMA_EXAMPLE_DOWNLOAD` enumerator appended before `LLAMA_EXAMPLE_COUNT`. None surfaced by `ModelParameters`; consumed inside upstream-compiled TUs. No project source changes required |
+| b9789–b9803 | `CMakeLists.txt` + `tools/mtmd/CMakeLists.txt` | New top-level `LLAMA_BUILD_MTMD` option for standalone library-only mtmd builds; the mtmd **CLI** executables (`llama-llava-cli`, `llama-gemma3-cli`, `llama-minicpmv-cli`, `llama-qwen2vl-cli`, `llama-mtmd-cli`, `llama-mtmd-debug`) are now gated behind `if (LLAMA_BUILD_TOOLS)`. The project adds `tools/mtmd` directly with `LLAMA_BUILD_TOOLS=OFF`, so after this bump those CLI executables are **no longer built as collateral** — beneficial (less build time); the `mtmd` **library** target the project links still builds via the `if (TARGET mtmd)` block above the gate. No project source changes required |
+| b9789–b9803 | `common/arg.cpp` + `docs/speculative.md` | **New feature** — EAGLE-3 speculative decoding (`--spec-type draft-eagle3`): a small one-layer draft transformer that reads the target model's hidden states for higher acceptance; plus a new standalone `llama download` / `llama get` subcommand (`app/download.cpp`, `LLAMA_EXAMPLE_DOWNLOAD`) and a `--mtp` download flag. Server-level CLI; not surfaced by `ModelParameters`/`InferenceParameters`. Could later feed an inference-parameter setter (`--spec-type`). No project source changes required |
+| b9789–b9803 | `ggml/src/ggml-cuda/{binbcast,cpy}.cu` + `ggml-opencl` + `src/llama-model.{cpp,h}` + `src/models/lfm2.cpp` | Backend/model-internal only: CUDA `binbcast`/`cpy` kernels reworked for >INT_MAX index safety (int→uint32/int64 widening + overflow guards); OpenCL flushes the profiling batch on context teardown; new `LLM_TYPE_230M` mapped for LFM2 (`n_ff == 2560`). No API surface visible to `jllama.cpp`; CUDA set only affects the `cuda13-linux-x86-64` classifier, OpenCL only the `opencl-android-aarch64` classifier. No project source changes required |
+| b9789–b9803 | upstream verification (sandbox) | Both `patches/0001-win32-arg-parse-embed-guard.patch` (37 files) and `patches/0002-server-preserve-caller-load-progress-callback.patch` re-verified to apply cleanly against b9803 via `git apply --check` over the actual b9803 sources fetched from `raw.githubusercontent.com` (github.com git-clone is blocked in this sandbox, so a full `FetchContent` build could not run — exit 0 for both patches). Patch 0001's `common_params_parse` target region is byte-identical to b9789; the b9803 arg.cpp churn is confined to the `common_models_handler` rewrite and `set_examples` tags, which don't overlap the patched hunks. OuteTTS generator anchors hold (upstream `tts.cpp` unchanged in this range apart from patch 0001's main()-only parse flip). Full build + `ctest` to be confirmed by the CI pipeline |
@@ -25,7 +25,7 @@
 import net.ladenthin.llama.json.CompletionResponseParser;
 import net.ladenthin.llama.json.RerankResponseParser;
 import net.ladenthin.llama.loader.LlamaLoader;
-import net.ladenthin.llama.loader.SkipDownloadFailureTranslator;
+import net.ladenthin.llama.loader.OfflineModelGuard;
 import net.ladenthin.llama.parameters.ChatRequest;
 import net.ladenthin.llama.parameters.InferenceParameters;
 import net.ladenthin.llama.parameters.ModelParameters;
@@ -85,21 +85,18 @@ public class LlamaModel implements AutoCloseable {
      * </ul>
      *
      * @param parameters the set of options
-     * @throws net.ladenthin.llama.exception.ModelUnavailableException if {@link net.ladenthin.llama.parameters.ModelParameters#setSkipDownload(boolean)
-     *                                   setSkipDownload(true)} (or
-     *                                   {@link net.ladenthin.llama.args.ModelFlag#SKIP_DOWNLOAD})
-     *                                   is set and the configured model file is missing or invalid
+     * @throws net.ladenthin.llama.exception.ModelUnavailableException if {@link net.ladenthin.llama.parameters.ModelParameters#setOffline(boolean)
+     *                                   setOffline(true)} (or
+     *                                   {@link net.ladenthin.llama.args.ModelFlag#OFFLINE})
+     *                                   is set and the configured local model file does not exist
      * @throws net.ladenthin.llama.exception.LlamaException            for any other load failure
      */
     // loadModel is a native method; it does not call back into Java with this,
     // so the @UnderInitialization receiver warning is a CF false positive.
     @SuppressWarnings("method.invocation")
     public LlamaModel(ModelParameters parameters) {
-        try {
-            loadModel(parameters.toArray());
-        } catch (LlamaException e) {
-            throw SkipDownloadFailureTranslator.translate(parameters, e);
-        }
+        OfflineModelGuard.check(parameters);
+        loadModel(parameters.toArray());
     }
 
     /**
@@ -117,14 +114,11 @@ public LlamaModel(ModelParameters parameters) {
     // false positive.
     @SuppressWarnings("method.invocation")
     public LlamaModel(ModelParameters parameters, LoadProgressCallback progress) {
-        try {
-            if (progress == null) {
-                loadModel(parameters.toArray());
-            } else {
-                loadModelWithProgress(parameters.toArray(), progress);
-            }
-        } catch (LlamaException e) {
-            throw SkipDownloadFailureTranslator.translate(parameters, e);
+        OfflineModelGuard.check(parameters);
+        if (progress == null) {
+            loadModel(parameters.toArray());
+        } else {
+            loadModelWithProgress(parameters.toArray(), progress);
         }
     }
 
 
@@ -116,16 +116,19 @@ public enum ModelFlag {
     NO_MMPROJ_OFFLOAD("--no-mmproj-offload"),
 
     /**
-     * Skip any model file download — only validation is performed. Useful for air-gapped or
-     * pre-staged-model deployments where any outbound network call is a failure mode.
+     * Run fully offline — never make an outbound network request to download a model. Useful for
+     * air-gapped or pre-staged-model deployments where any outbound call is itself a failure mode.
      *
-     * <p>When this flag is set and the configured model file is missing or invalid (e.g. ETag
-     * mismatch), upstream throws {@code common_skip_download_exception} during arg parsing,
-     * which is caught inside {@code common_params_parse_ex} and surfaces as a {@code false}
-     * return; the Java layer translates that combined signal into a typed
-     * {@link net.ladenthin.llama.exception.ModelUnavailableException}.</p>
+     * <p>Maps to the upstream {@code --offline} flag ({@code common_params::offline}), which the
+     * model-download pipeline honors by skipping all download tasks: a model already present on
+     * disk (or in the Hugging Face cache) loads normally, while a missing one fails instead of
+     * being fetched. When a local model path is configured via
+     * {@link net.ladenthin.llama.parameters.ModelParameters#setModel(String)} and that file does
+     * not exist, the loader reports a typed
+     * {@link net.ladenthin.llama.exception.ModelUnavailableException} so callers can distinguish an
+     * air-gapped miss from a genuine misconfiguration.</p>
      */
-    SKIP_DOWNLOAD("--skip-download");
+    OFFLINE("--offline");
 
     private final String cliFlag;