You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CLAUDE.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
6
6
7
7
Java bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) via JNI, providing a high-level API for LLM inference in Java. The Java layer communicates with a native C++ library through JNI.
8
8
9
-
Current llama.cpp pinned version: **b9789**
9
+
Current llama.cpp pinned version: **b9803**
10
10
11
11
## Upgrading CUDA Version
12
12
@@ -241,7 +241,7 @@ needs no extra step here, `build-webui` re-reads the tag and rebuilds the matchi
241
241
ships no UI):
242
242
```bash
243
243
# needs node/npm + network; embed.cpp is plain C++17 (no npm)
- **Expose `common_params::skip_download`** — `ModelFlag.SKIP_DOWNLOAD` + `ModelParameters.setSkipDownload(boolean)` + `hasFlag` helper + new public `ModelUnavailableException` (extends now-public `LlamaException`) + Java-side heuristic translator. 7 unit tests in `LlamaModelSkipDownloadTest`. No JNI rebuild required.
548
+
- **Offline / air-gapped model loading** — `ModelFlag.OFFLINE` + `ModelParameters.setOffline(boolean)` + `hasFlag` helper + public `ModelUnavailableException` (extends now-public `LlamaException`) + deterministic pre-check `OfflineModelGuard`. Unit tests in `LlamaModelOfflineTest`. No JNI rebuild required. *(Originally shipped as `SKIP_DOWNLOAD`/`setSkipDownload` over a parse-failure heuristic; reworked when llama.cpp b9803 removed `common_params::skip_download` and `common_skip_download_exception` — `--skip-download` was never a registered upstream arg, so it never actually skipped a download. `--offline` is the real upstream flag with the intended load-from-cache semantics.)*
549
549
- **`LlamaSystemProperties` registry cleanup** — `getLibName()` deleted (`6bb63e1` upstream forensic trace); `OSInfo.getArchName()` now routes through `LlamaSystemProperties.getOsinfoArchitecture()` (`3ae6c81`).
550
550
- **Abstract the Java and test writing guidelines to a workspace-level shared layer.** Workspace version chain at [`../workspace/guides/src/CODE_WRITING_GUIDE-8.md`](../workspace/guides/src/CODE_WRITING_GUIDE-8.md) and [`../workspace/guides/test/TEST_WRITING_GUIDE-8.md`](../workspace/guides/test/TEST_WRITING_GUIDE-8.md); canonical TDD skill at [`../workspace/.claude/skills/java-tdd-guide/SKILL.md`](../workspace/.claude/skills/java-tdd-guide/SKILL.md).
Copy file name to clipboardExpand all lines: docs/history/llama-cpp-breaking-changes.md
+6Lines changed: 6 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -386,3 +386,9 @@ Used during `llama.cpp` version bumps: when upgrading, scan this file from the r
386
386
| b9739–b9789 |`common/json-schema-to-grammar.cpp` (Java-test impact) | The JSON-schema → GBNF serializer changed where it emits the `space` whitespace rule: a closing object is now `… )? space "}"` (was `… )? "}" space`) and a root-level `string` rule no longer appends a trailing `space` (`string ::= "\"" char* "\""`, was `… "\"" space`). Functionally equivalent (leading- vs trailing-whitespace placement) but byte-different, so the pinned expectation in `LlamaModelTest.testJsonSchemaToGrammar` was updated to the b9789 output. `LlamaModel.jsonSchemaToGrammar` is a pure JNI call (no model), so this failed on every platform's Java-test job; the new expectation was verified locally against the built b9789 `libjllama`. Test-data change only |
387
387
| b9739–b9789 |`tools/server/server-context.cpp` (**patch target**, regression) |`server_context::load_model` now **unconditionally** installs the server's own load-progress reporter on `params_base.load_progress_callback` immediately before `common_init_from_params` (b9739 called `common_init_from_params(params_base)` with no such assignment). This clobbered libjllama's `LoadProgressCallback` JNI trampoline (set on `common_params.load_progress_callback` before `load_model`), so `LoadProgressCallbackTest` observed zero progress updates and the abort-on-`false` path stopped throwing. Fixed by new **`patches/0002-server-preserve-caller-load-progress-callback.patch`**, which guards the install behind `if (params_base.load_progress_callback == nullptr)` so a caller-supplied callback survives (standalone `llama-server` keeps its reporter — the field is null there). Re-verified to apply + reverse-apply cleanly against b9789 and to compile clean (ctest still 454/454) |
388
388
| b9739–b9789 | upstream build / verification | Local build with `GIT_TAG b9789` verified clean on Linux x86_64 (GCC 13.3; sources were pre-staged from release tarballs + both patches hand-applied because this sandbox blocks `github.com` git-clone, so `FetchContent`'s git path and `PATCH_COMMAND` could not run — the published CI pipeline uses the normal git FetchContent path). `cmake -B build -DBUILD_TESTING=ON` configures cleanly (the OuteTTS build-time extraction **and** the refreshed Windows patch both pass their fail-loud anchor checks against b9789), `cmake --build build --config Release -j$(nproc)` links `libjllama.so` + `jllama_test` with zero warnings on any project translation unit, and `ctest --test-dir build --output-on-failure` reports **454/454 tests passing**. Every upstream breaking change in this range is absorbed inside upstream-compiled translation units, so no project C++ *source* edits were required — **but** PR CI's model-backed Java suite (which the restricted sandbox cannot run) surfaced two project-side fixes captured in the two rows above: the `json-schema-to-grammar` test-expectation update and the `load_progress_callback` server regression (`patches/0002`) |
389
+
| b9789–b9803 | `common/arg.{cpp,h}` + `common/download.{cpp,h}` + `common/common.h` (model-download refactor) | The model-download pipeline was rewritten: `common_params_handle_models()` / `common_params_handle_models_params` / `common_download_model()` / `common_download_model_result` **removed** and replaced by a two-phase `common_models_handler` API (`common_models_handler_init()` builds the HF plan + opts; `common_models_handler_apply()` runs the parallel `common_download_task` list); `common_download_opts::skip_download` / `::preset_only` and the whole `common_skip_download_exception` type **removed**; new `common_download_get_hf_plan()` / `common_download_run_tasks()` / `common_download_get_all_parts()`; `download.h` now `#include`s `hf-cache.h`. Project C++ references none of these — verified `grep -rn "common_params_handle_models\|common_download_model\|common_skip_download\|skip_download\|preset_only" src/main/cpp src/test/cpp` → zero matches, and no project TU includes `download.h` directly. All consumers (arg parsing, `server-models.cpp`, `llama-bench.cpp`) are upstream-compiled. No project C++ source changes required. **Java API follow-up (behavioural):** this removal exposed that the project's `ModelFlag.SKIP_DOWNLOAD` (`--skip-download`) was never a registered upstream arg — it only ever forced a parse failure that `SkipDownloadFailureTranslator` mapped to `ModelUnavailableException`, and it could never load a *present* model. It was **replaced** with the real upstream `--offline` flag: `ModelFlag.OFFLINE` + `ModelParameters.setOffline(boolean)`; the heuristic translator was replaced by a deterministic pre-check `OfflineModelGuard` (throws `ModelUnavailableException` when `--offline` is set and the configured local `--model` file is absent, before the native call); `LlamaModelSkipDownloadTest` → `LlamaModelOfflineTest`. `ModelUnavailableException` is retained. Pure-Java change, no JNI rebuild |
390
+
| b9789–b9803 |`common/common.h`|`common_params_model` gained `bool empty()` and `get_name()` became `const` (additive); `common_params::skip_download` field **removed**; new `LLAMA_EXAMPLE_DOWNLOAD` enumerator appended before `LLAMA_EXAMPLE_COUNT`. None surfaced by `ModelParameters`; consumed inside upstream-compiled TUs. No project source changes required |
391
+
| b9789–b9803 |`CMakeLists.txt` + `tools/mtmd/CMakeLists.txt`| New top-level `LLAMA_BUILD_MTMD` option for standalone library-only mtmd builds; the mtmd **CLI** executables (`llama-llava-cli`, `llama-gemma3-cli`, `llama-minicpmv-cli`, `llama-qwen2vl-cli`, `llama-mtmd-cli`, `llama-mtmd-debug`) are now gated behind `if (LLAMA_BUILD_TOOLS)`. The project adds `tools/mtmd` directly with `LLAMA_BUILD_TOOLS=OFF`, so after this bump those CLI executables are **no longer built as collateral** — beneficial (less build time); the `mtmd`**library** target the project links still builds via the `if (TARGET mtmd)` block above the gate. No project source changes required |
392
+
| b9789–b9803 |`common/arg.cpp` + `docs/speculative.md`|**New feature** — EAGLE-3 speculative decoding (`--spec-type draft-eagle3`): a small one-layer draft transformer that reads the target model's hidden states for higher acceptance; plus a new standalone `llama download` / `llama get` subcommand (`app/download.cpp`, `LLAMA_EXAMPLE_DOWNLOAD`) and a `--mtp` download flag. Server-level CLI; not surfaced by `ModelParameters`/`InferenceParameters`. Could later feed an inference-parameter setter (`--spec-type`). No project source changes required |
393
+
| b9789–b9803 |`ggml/src/ggml-cuda/{binbcast,cpy}.cu` + `ggml-opencl` + `src/llama-model.{cpp,h}` + `src/models/lfm2.cpp`| Backend/model-internal only: CUDA `binbcast`/`cpy` kernels reworked for >INT_MAX index safety (int→uint32/int64 widening + overflow guards); OpenCL flushes the profiling batch on context teardown; new `LLM_TYPE_230M` mapped for LFM2 (`n_ff == 2560`). No API surface visible to `jllama.cpp`; CUDA set only affects the `cuda13-linux-x86-64` classifier, OpenCL only the `opencl-android-aarch64` classifier. No project source changes required |
394
+
| b9789–b9803 | upstream verification (sandbox) | Both `patches/0001-win32-arg-parse-embed-guard.patch` (37 files) and `patches/0002-server-preserve-caller-load-progress-callback.patch` re-verified to apply cleanly against b9803 via `git apply --check` over the actual b9803 sources fetched from `raw.githubusercontent.com` (github.com git-clone is blocked in this sandbox, so a full `FetchContent` build could not run — exit 0 for both patches). Patch 0001's `common_params_parse` target region is byte-identical to b9789; the b9803 arg.cpp churn is confined to the `common_models_handler` rewrite and `set_examples` tags, which don't overlap the patched hunks. OuteTTS generator anchors hold (upstream `tts.cpp` unchanged in this range apart from patch 0001's main()-only parse flip). Full build + `ctest` to be confirmed by the CI pipeline |
0 commit comments