Skip to content

Commit 964086a

Browse files
Merge pull request #272 from bernardladenthin/claude/affectionate-hamilton-xnatjf
Replace --skip-download with --offline flag (llama.cpp b9803)
2 parents 212634e + 5276469 commit 964086a

16 files changed

Lines changed: 245 additions & 236 deletions

CLAUDE.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
66

77
Java bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) via JNI, providing a high-level API for LLM inference in Java. The Java layer communicates with a native C++ library through JNI.
88

9-
Current llama.cpp pinned version: **b9789**
9+
Current llama.cpp pinned version: **b9803**
1010

1111
## Upgrading CUDA Version
1212

@@ -241,7 +241,7 @@ needs no extra step here, `build-webui` re-reads the tag and rebuilds the matchi
241241
ships no UI):
242242
```bash
243243
# needs node/npm + network; embed.cpp is plain C++17 (no npm)
244-
git clone --depth 1 --branch b9789 https://github.com/ggml-org/llama.cpp /tmp/lc
244+
git clone --depth 1 --branch b9803 https://github.com/ggml-org/llama.cpp /tmp/lc
245245
( cd /tmp/lc/tools/ui && npm ci && npm run build \
246246
&& ( cd dist && find . -type f -not -path './_gzip/*' \
247247
| while read -r f; do mkdir -p "_gzip/$(dirname "$f")"; gzip -9 -c "$f" > "_gzip/$f"; done ) \
@@ -275,7 +275,7 @@ plus a cache token are present, `build.sh` adds
275275
- `SCCACHE_WEBDAV_TOKEN: ${{ secrets.DEPOT_TOKEN }}` — a Depot **organization** token, stored
276276
as the repo secret **`DEPOT_TOKEN`**.
277277

278-
Because `sccache` is **content-addressed** and llama.cpp is pinned (`GIT_TAG b9789`), the
278+
Because `sccache` is **content-addressed** and llama.cpp is pinned (`GIT_TAG b9803`), the
279279
~280 upstream object files are byte-identical every run, so a warm cache recompiles only the
280280
*changed* files. Depot's cache is **shared across all branches** (unlike GitHub's
281281
per-branch `actions/cache`), so every branch builds incrementally; a `b<nnnn>` version bump
@@ -957,7 +957,7 @@ ctest --test-dir build --output-on-failure -R "ResultsToJson"
957957

958958
#### Upstream source location (in CMake build tree)
959959

960-
llama.cpp is fetched via CMake FetchContent, pinned to `GIT_TAG b9789`.
960+
llama.cpp is fetched via CMake FetchContent, pinned to `GIT_TAG b9803`.
961961

962962
**GoogleTest** is a separate `BUILD_TESTING`-only FetchContent (`GIT_TAG v1.17.0`), used solely
963963
by the `jllama_test` C++ unit-test binary — not by the shipped library, and not coupled to the

CMakeLists.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -143,7 +143,7 @@ set(LLAMA_BUILD_APP OFF CACHE BOOL "" FORCE)
143143
FetchContent_Declare(
144144
llama.cpp
145145
GIT_REPOSITORY https://github.com/ggerganov/llama.cpp.git
146-
GIT_TAG b9789
146+
GIT_TAG b9803
147147
PATCH_COMMAND ${CMAKE_COMMAND}
148148
-DPATCH_DIR=${CMAKE_CURRENT_SOURCE_DIR}/patches
149149
-DLLAMA_SRC=<SOURCE_DIR>
@@ -166,7 +166,7 @@ execute_process(
166166
COMMAND ${CMAKE_COMMAND}
167167
-DTTS_SRC=${llama.cpp_SOURCE_DIR}/tools/tts/tts.cpp
168168
-DOUT_CPP=${JLLAMA_TTS_GEN_CPP}
169-
-DLLAMA_TAG=b9789
169+
-DLLAMA_TAG=b9803
170170
-P ${CMAKE_CURRENT_SOURCE_DIR}/cmake/generate-tts-upstream.cmake
171171
RESULT_VARIABLE JLLAMA_TTS_GEN_RESULT
172172
)

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
**Build:**
88
![Java 8+](https://img.shields.io/badge/Java-8%2B-informational)
99
![Platform](https://img.shields.io/badge/Platform-Linux%20%7C%20macOS%20%7C%20Windows%20%7C%20Android-lightgrey)
10-
[![llama.cpp b9789](https://img.shields.io/badge/llama.cpp-%23b9789-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9789)
10+
[![llama.cpp b9803](https://img.shields.io/badge/llama.cpp-%23b9803-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9803)
1111
[![JPMS](https://img.shields.io/badge/JPMS-modular%20JAR-25A162)](https://openjdk.org/projects/jigsaw/)
1212
![JUnit](https://img.shields.io/badge/tested%20with-JUnit6-25A162)
1313
[![JSpecify](https://img.shields.io/badge/JSpecify-1.0.0%20%40NullMarked-25A162)](https://jspecify.dev)

TODO.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -497,7 +497,7 @@ Foundation):
497497
`ChatRequest`).
498498
- **Loader** (internal, NOT exported): `loader` (LlamaLoader, OSInfo,
499499
ProcessRunner, NativeLibraryPermissionSetter, Java8CompatibilityHelper,
500-
SkipDownloadFailureTranslator, LlamaSystemProperties).
500+
OfflineModelGuard, LlamaSystemProperties).
501501
- **Api** (root): LlamaModel, Session, LlamaIterable, LlamaIterator.
502502
503503
Cycle-breaking moves: `TimingsLogger` root→`json`, `ParameterJsonSerializer`
@@ -545,7 +545,7 @@ keeps `loader` internal. All 11 ArchUnit rules green; `javadoc:jar` clean.
545545
- **Banned-API enforcement** — Maven Enforcer (`8baae0c`), ArchUnit `System.exit` / `new Random` / `Thread.sleep` (`329d764`), `sun.*` / `com.sun.*` / `jdk.internal.*` (`e6069da`).
546546
- **ArchUnit public-fields-final** — `7b6667d`.
547547
- **LogCaptor smoke test** — `LoggingSmokeTest` (`3cedc6e`).
548-
- **Expose `common_params::skip_download`** — `ModelFlag.SKIP_DOWNLOAD` + `ModelParameters.setSkipDownload(boolean)` + `hasFlag` helper + new public `ModelUnavailableException` (extends now-public `LlamaException`) + Java-side heuristic translator. 7 unit tests in `LlamaModelSkipDownloadTest`. No JNI rebuild required.
548+
- **Offline / air-gapped model loading** — `ModelFlag.OFFLINE` + `ModelParameters.setOffline(boolean)` + `hasFlag` helper + public `ModelUnavailableException` (extends now-public `LlamaException`) + deterministic pre-check `OfflineModelGuard`. Unit tests in `LlamaModelOfflineTest`. No JNI rebuild required. *(Originally shipped as `SKIP_DOWNLOAD`/`setSkipDownload` over a parse-failure heuristic; reworked when llama.cpp b9803 removed `common_params::skip_download` and `common_skip_download_exception` — `--skip-download` was never a registered upstream arg, so it never actually skipped a download. `--offline` is the real upstream flag with the intended load-from-cache semantics.)*
549549
- **`LlamaSystemProperties` registry cleanup** — `getLibName()` deleted (`6bb63e1` upstream forensic trace); `OSInfo.getArchName()` now routes through `LlamaSystemProperties.getOsinfoArchitecture()` (`3ae6c81`).
550550
- **Abstract the Java and test writing guidelines to a workspace-level shared layer.** Workspace version chain at [`../workspace/guides/src/CODE_WRITING_GUIDE-8.md`](../workspace/guides/src/CODE_WRITING_GUIDE-8.md) and [`../workspace/guides/test/TEST_WRITING_GUIDE-8.md`](../workspace/guides/test/TEST_WRITING_GUIDE-8.md); canonical TDD skill at [`../workspace/.claude/skills/java-tdd-guide/SKILL.md`](../workspace/.claude/skills/java-tdd-guide/SKILL.md).
551551
- **Standardised CLAUDE.md template** — [`../workspace/templates/CLAUDE.md.template`](../workspace/templates/CLAUDE.md.template).

docs/feature-investigation-similar-projects.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ The following are confirmed present in `java-llama.cpp` as of this survey — fl
3838

3939
| Capability | Status |
4040
|---|---|
41-
| `setSkipDownload(boolean)` + typed `ModelUnavailableException` | ✅ (commit `37754d4`) |
41+
| `setOffline(boolean)` (was `setSkipDownload`) + typed `ModelUnavailableException` | ✅ (commit `37754d4`) |
4242
| Reasoning-format toggle, reasoning-budget tokens | ✅ (`InferenceParameters#setReasoningFormat` etc.) |
4343
| Tool calls + custom chat templates ||
4444
| Speculative draft model ||

docs/history/llama-cpp-breaking-changes.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -386,3 +386,9 @@ Used during `llama.cpp` version bumps: when upgrading, scan this file from the r
386386
| b9739–b9789 | `common/json-schema-to-grammar.cpp` (Java-test impact) | The JSON-schema &#x2192; GBNF serializer changed where it emits the `space` whitespace rule: a closing object is now `… )? space "}"` (was `… )? "}" space`) and a root-level `string` rule no longer appends a trailing `space` (`string ::= "\"" char* "\""`, was `… "\"" space`). Functionally equivalent (leading- vs trailing-whitespace placement) but byte-different, so the pinned expectation in `LlamaModelTest.testJsonSchemaToGrammar` was updated to the b9789 output. `LlamaModel.jsonSchemaToGrammar` is a pure JNI call (no model), so this failed on every platform's Java-test job; the new expectation was verified locally against the built b9789 `libjllama`. Test-data change only |
387387
| b9739–b9789 | `tools/server/server-context.cpp` (**patch target**, regression) | `server_context::load_model` now **unconditionally** installs the server's own load-progress reporter on `params_base.load_progress_callback` immediately before `common_init_from_params` (b9739 called `common_init_from_params(params_base)` with no such assignment). This clobbered libjllama's `LoadProgressCallback` JNI trampoline (set on `common_params.load_progress_callback` before `load_model`), so `LoadProgressCallbackTest` observed zero progress updates and the abort-on-`false` path stopped throwing. Fixed by new **`patches/0002-server-preserve-caller-load-progress-callback.patch`**, which guards the install behind `if (params_base.load_progress_callback == nullptr)` so a caller-supplied callback survives (standalone `llama-server` keeps its reporter — the field is null there). Re-verified to apply + reverse-apply cleanly against b9789 and to compile clean (ctest still 454/454) |
388388
| b9739–b9789 | upstream build / verification | Local build with `GIT_TAG b9789` verified clean on Linux x86_64 (GCC 13.3; sources were pre-staged from release tarballs + both patches hand-applied because this sandbox blocks `github.com` git-clone, so `FetchContent`'s git path and `PATCH_COMMAND` could not run &mdash; the published CI pipeline uses the normal git FetchContent path). `cmake -B build -DBUILD_TESTING=ON` configures cleanly (the OuteTTS build-time extraction **and** the refreshed Windows patch both pass their fail-loud anchor checks against b9789), `cmake --build build --config Release -j$(nproc)` links `libjllama.so` + `jllama_test` with zero warnings on any project translation unit, and `ctest --test-dir build --output-on-failure` reports **454/454 tests passing**. Every upstream breaking change in this range is absorbed inside upstream-compiled translation units, so no project C++ *source* edits were required &mdash; **but** PR CI's model-backed Java suite (which the restricted sandbox cannot run) surfaced two project-side fixes captured in the two rows above: the `json-schema-to-grammar` test-expectation update and the `load_progress_callback` server regression (`patches/0002`) |
389+
| b9789–b9803 | `common/arg.{cpp,h}` + `common/download.{cpp,h}` + `common/common.h` (model-download refactor) | The model-download pipeline was rewritten: `common_params_handle_models()` / `common_params_handle_models_params` / `common_download_model()` / `common_download_model_result` **removed** and replaced by a two-phase `common_models_handler` API (`common_models_handler_init()` builds the HF plan + opts; `common_models_handler_apply()` runs the parallel `common_download_task` list); `common_download_opts::skip_download` / `::preset_only` and the whole `common_skip_download_exception` type **removed**; new `common_download_get_hf_plan()` / `common_download_run_tasks()` / `common_download_get_all_parts()`; `download.h` now `#include`s `hf-cache.h`. Project C++ references none of these — verified `grep -rn "common_params_handle_models\|common_download_model\|common_skip_download\|skip_download\|preset_only" src/main/cpp src/test/cpp` &#x2192; zero matches, and no project TU includes `download.h` directly. All consumers (arg parsing, `server-models.cpp`, `llama-bench.cpp`) are upstream-compiled. No project C++ source changes required. **Java API follow-up (behavioural):** this removal exposed that the project's `ModelFlag.SKIP_DOWNLOAD` (`--skip-download`) was never a registered upstream arg — it only ever forced a parse failure that `SkipDownloadFailureTranslator` mapped to `ModelUnavailableException`, and it could never load a *present* model. It was **replaced** with the real upstream `--offline` flag: `ModelFlag.OFFLINE` + `ModelParameters.setOffline(boolean)`; the heuristic translator was replaced by a deterministic pre-check `OfflineModelGuard` (throws `ModelUnavailableException` when `--offline` is set and the configured local `--model` file is absent, before the native call); `LlamaModelSkipDownloadTest` → `LlamaModelOfflineTest`. `ModelUnavailableException` is retained. Pure-Java change, no JNI rebuild |
390+
| b9789–b9803 | `common/common.h` | `common_params_model` gained `bool empty()` and `get_name()` became `const` (additive); `common_params::skip_download` field **removed**; new `LLAMA_EXAMPLE_DOWNLOAD` enumerator appended before `LLAMA_EXAMPLE_COUNT`. None surfaced by `ModelParameters`; consumed inside upstream-compiled TUs. No project source changes required |
391+
| b9789–b9803 | `CMakeLists.txt` + `tools/mtmd/CMakeLists.txt` | New top-level `LLAMA_BUILD_MTMD` option for standalone library-only mtmd builds; the mtmd **CLI** executables (`llama-llava-cli`, `llama-gemma3-cli`, `llama-minicpmv-cli`, `llama-qwen2vl-cli`, `llama-mtmd-cli`, `llama-mtmd-debug`) are now gated behind `if (LLAMA_BUILD_TOOLS)`. The project adds `tools/mtmd` directly with `LLAMA_BUILD_TOOLS=OFF`, so after this bump those CLI executables are **no longer built as collateral** — beneficial (less build time); the `mtmd` **library** target the project links still builds via the `if (TARGET mtmd)` block above the gate. No project source changes required |
392+
| b9789–b9803 | `common/arg.cpp` + `docs/speculative.md` | **New feature** — EAGLE-3 speculative decoding (`--spec-type draft-eagle3`): a small one-layer draft transformer that reads the target model's hidden states for higher acceptance; plus a new standalone `llama download` / `llama get` subcommand (`app/download.cpp`, `LLAMA_EXAMPLE_DOWNLOAD`) and a `--mtp` download flag. Server-level CLI; not surfaced by `ModelParameters`/`InferenceParameters`. Could later feed an inference-parameter setter (`--spec-type`). No project source changes required |
393+
| b9789–b9803 | `ggml/src/ggml-cuda/{binbcast,cpy}.cu` + `ggml-opencl` + `src/llama-model.{cpp,h}` + `src/models/lfm2.cpp` | Backend/model-internal only: CUDA `binbcast`/`cpy` kernels reworked for >INT_MAX index safety (int→uint32/int64 widening + overflow guards); OpenCL flushes the profiling batch on context teardown; new `LLM_TYPE_230M` mapped for LFM2 (`n_ff == 2560`). No API surface visible to `jllama.cpp`; CUDA set only affects the `cuda13-linux-x86-64` classifier, OpenCL only the `opencl-android-aarch64` classifier. No project source changes required |
394+
| b9789–b9803 | upstream verification (sandbox) | Both `patches/0001-win32-arg-parse-embed-guard.patch` (37 files) and `patches/0002-server-preserve-caller-load-progress-callback.patch` re-verified to apply cleanly against b9803 via `git apply --check` over the actual b9803 sources fetched from `raw.githubusercontent.com` (github.com git-clone is blocked in this sandbox, so a full `FetchContent` build could not run — exit 0 for both patches). Patch 0001's `common_params_parse` target region is byte-identical to b9789; the b9803 arg.cpp churn is confined to the `common_models_handler` rewrite and `set_examples` tags, which don't overlap the patched hunks. OuteTTS generator anchors hold (upstream `tts.cpp` unchanged in this range apart from patch 0001's main()-only parse flip). Full build + `ctest` to be confirmed by the CI pipeline |

src/main/java/net/ladenthin/llama/LlamaModel.java

Lines changed: 12 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525
import net.ladenthin.llama.json.CompletionResponseParser;
2626
import net.ladenthin.llama.json.RerankResponseParser;
2727
import net.ladenthin.llama.loader.LlamaLoader;
28-
import net.ladenthin.llama.loader.SkipDownloadFailureTranslator;
28+
import net.ladenthin.llama.loader.OfflineModelGuard;
2929
import net.ladenthin.llama.parameters.ChatRequest;
3030
import net.ladenthin.llama.parameters.InferenceParameters;
3131
import net.ladenthin.llama.parameters.ModelParameters;
@@ -85,21 +85,18 @@ public class LlamaModel implements AutoCloseable {
8585
* </ul>
8686
*
8787
* @param parameters the set of options
88-
* @throws net.ladenthin.llama.exception.ModelUnavailableException if {@link net.ladenthin.llama.parameters.ModelParameters#setSkipDownload(boolean)
89-
* setSkipDownload(true)} (or
90-
* {@link net.ladenthin.llama.args.ModelFlag#SKIP_DOWNLOAD})
91-
* is set and the configured model file is missing or invalid
88+
* @throws net.ladenthin.llama.exception.ModelUnavailableException if {@link net.ladenthin.llama.parameters.ModelParameters#setOffline(boolean)
89+
* setOffline(true)} (or
90+
* {@link net.ladenthin.llama.args.ModelFlag#OFFLINE})
91+
* is set and the configured local model file does not exist
9292
* @throws net.ladenthin.llama.exception.LlamaException for any other load failure
9393
*/
9494
// loadModel is a native method; it does not call back into Java with this,
9595
// so the @UnderInitialization receiver warning is a CF false positive.
9696
@SuppressWarnings("method.invocation")
9797
public LlamaModel(ModelParameters parameters) {
98-
try {
99-
loadModel(parameters.toArray());
100-
} catch (LlamaException e) {
101-
throw SkipDownloadFailureTranslator.translate(parameters, e);
102-
}
98+
OfflineModelGuard.check(parameters);
99+
loadModel(parameters.toArray());
103100
}
104101

105102
/**
@@ -117,14 +114,11 @@ public LlamaModel(ModelParameters parameters) {
117114
// false positive.
118115
@SuppressWarnings("method.invocation")
119116
public LlamaModel(ModelParameters parameters, LoadProgressCallback progress) {
120-
try {
121-
if (progress == null) {
122-
loadModel(parameters.toArray());
123-
} else {
124-
loadModelWithProgress(parameters.toArray(), progress);
125-
}
126-
} catch (LlamaException e) {
127-
throw SkipDownloadFailureTranslator.translate(parameters, e);
117+
OfflineModelGuard.check(parameters);
118+
if (progress == null) {
119+
loadModel(parameters.toArray());
120+
} else {
121+
loadModelWithProgress(parameters.toArray(), progress);
128122
}
129123
}
130124

src/main/java/net/ladenthin/llama/args/ModelFlag.java

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -116,16 +116,19 @@ public enum ModelFlag {
116116
NO_MMPROJ_OFFLOAD("--no-mmproj-offload"),
117117

118118
/**
119-
* Skip any model file download — only validation is performed. Useful for air-gapped or
120-
* pre-staged-model deployments where any outbound network call is a failure mode.
119+
* Run fully offline — never make an outbound network request to download a model. Useful for
120+
* air-gapped or pre-staged-model deployments where any outbound call is itself a failure mode.
121121
*
122-
* <p>When this flag is set and the configured model file is missing or invalid (e.g. ETag
123-
* mismatch), upstream throws {@code common_skip_download_exception} during arg parsing,
124-
* which is caught inside {@code common_params_parse_ex} and surfaces as a {@code false}
125-
* return; the Java layer translates that combined signal into a typed
126-
* {@link net.ladenthin.llama.exception.ModelUnavailableException}.</p>
122+
* <p>Maps to the upstream {@code --offline} flag ({@code common_params::offline}), which the
123+
* model-download pipeline honors by skipping all download tasks: a model already present on
124+
* disk (or in the Hugging Face cache) loads normally, while a missing one fails instead of
125+
* being fetched. When a local model path is configured via
126+
* {@link net.ladenthin.llama.parameters.ModelParameters#setModel(String)} and that file does
127+
* not exist, the loader reports a typed
128+
* {@link net.ladenthin.llama.exception.ModelUnavailableException} so callers can distinguish an
129+
* air-gapped miss from a genuine misconfiguration.</p>
127130
*/
128-
SKIP_DOWNLOAD("--skip-download");
131+
OFFLINE("--offline");
129132

130133
private final String cliFlag;
131134

0 commit comments

Comments
 (0)