You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Breaking changes fixed:
- server-schema.cpp added to CMakeLists.txt target_sources (new upstream
file extracted from server-task.cpp; server-context.cpp now depends on it)
- #include "server-schema.h" added to jllama.cpp and test_server.cpp
- server_task::params_from_json_cmpl() → server_schema::eval_llama_cmpl_schema()
in jllama.cpp:populate_completion_task and test_server.cpp:parse_params
Non-breaking upstream changes absorbed automatically:
- common_params_model::name → get_name() (not referenced in project C++)
- webui/webui_mcp_proxy/webui_config_json fields removed from common_params
- server_state enum: SERVER_STATE_LOADING_MODEL→LOADING, new SLEEPING value
- on_sleeping_changed → set_state_callback / server_state_callback_t
- cpp-httplib vendor bump v0.47.0 → v0.48.0
New upstream features (available for future Java API exposure):
- common_speculative_get_state/set_state: Eagle3 checkpoint save/restore
- common_download_remove: cached model deletion
- --agent flag: all tools + MCP CORS proxy in one step
- API key file #-comment support (auto-applied for existing setApiKeyFile users)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XjHW4CFNEcj4sB8KksJ4LB
Copy file name to clipboardExpand all lines: CLAUDE.md
+9-9Lines changed: 9 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
6
6
7
7
Java bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) via JNI, providing a high-level API for LLM inference in Java. The Java layer communicates with a native C++ library through JNI.
8
8
9
-
Current llama.cpp pinned version: **b9682**
9
+
Current llama.cpp pinned version: **b9739**
10
10
11
11
## Upgrading CUDA Version
12
12
@@ -193,7 +193,7 @@ needs no extra step here, `build-webui` re-reads the tag and rebuilds the matchi
193
193
ships no UI):
194
194
```bash
195
195
# needs node/npm + network; embed.cpp is plain C++17 (no npm)
Copy file name to clipboardExpand all lines: docs/history/llama-cpp-breaking-changes.md
+12Lines changed: 12 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -361,3 +361,15 @@ Used during `llama.cpp` version bumps: when upgrading, scan this file from the r
361
361
| b9642–b9682 |`common/speculative.{h,cpp}`| Speculative decoding now accumulates per-draft-position acceptance statistics and adds an Eagle3 backend-sampling path (the draft model samples on the compute backend). `common_speculative_*` is compiled into `common` and reached only through the upstream server's speculative slot; the project's C++ references no `speculative`/`draft` symbol. No project source changes required. **New feature:** per-position draft-acceptance metrics — could surface as speculative-decoding telemetry in a future Java API |
362
362
| b9642–b9682 |`tools/server/server-context.cpp`| Server slot refactored so an `mtmd` (multimodal) prompt can feed a speculative draft model: image/media chunks are routed through the new `mtmd_helper_decode_image_chunk` callback before drafting. Compiled directly into `jllama` (the project builds `server-context/queue/task/models`), but the change is internal to the slot state machine and binds no new/renamed symbol; verified that `jllama.cpp` and the `*_helpers.hpp` headers call none of the touched functions. No project source changes required |
363
363
| b9642–b9682 |`ggml/src/ggml-*` backends, `tools/` (incl. `llama-bench --offline`), conda-forge packaging, `docs/`, `.github/`| Routine backend kernel updates and tooling/docs/CI tweaks (a new `llama-bench --offline` flag, conda-forge recipe notes). None are compiled into `jllama` beyond the already-built CPU/CUDA/Metal/OpenCL backends, and none change a symbol the project binds. No project changes required |
364
+
| b9682–b9739 |`tools/server/server-schema.{h,cpp}` (new) + `tools/server/server-task.{h,cpp}`|**Build-breaking.**`server_task::params_from_json_cmpl()` MOVED to `server_schema::eval_llama_cmpl_schema()` in new `server-schema.h`/`server-schema.cpp`. **Required project changes**: (1) add `server-schema.cpp` to the `target_sources(jllama ...)` block in `CMakeLists.txt`; (2) add `#include "server-schema.h"` in `src/main/cpp/jllama.cpp` and `src/test/cpp/test_server.cpp`; (3) update the call sites in `jllama.cpp:203` and `test_server.cpp:1722` from `server_task::params_from_json_cmpl(...)` to `server_schema::eval_llama_cmpl_schema(...)`|
365
+
| b9682–b9739 |`common/common.h` (`common_params_model`) |`common_params_model::name` field REMOVED; replaced by `get_name()` method. Not referenced in project source (model name is read from `server_context_meta::model_name`, populated upstream) — no project source changes required |
366
+
| b9682–b9739 |`common/common.h` (`common_params`) |`webui`, `webui_mcp_proxy`, `webui_config_json` fields REMOVED (deprecated aliases; replaced by `ui`/`ui_mcp_proxy`/`ui_config_json` introduced in b9172). Project never references these fields directly — no project source changes required |
367
+
| b9682–b9739 |`tools/server/server-models.h` + `server-models.cpp`|`server_state` enum: `SERVER_STATE_LOADING_MODEL` renamed to `SERVER_STATE_LOADING`; new `SERVER_STATE_SLEEPING` added. `on_sleeping_changed` callback replaced by `set_state_callback` with `server_state_callback_t` type. None are referenced in `jllama.cpp` — no project source changes required |
368
+
| b9682–b9739 |`vendor/cpp-httplib/httplib.{h,cpp}`| cpp-httplib bumped from v0.47.0 to v0.48.0. Compiled automatically via FetchContent — no project source changes required |
369
+
| b9682–b9739 |`common/speculative.{h,cpp}`| New `common_speculative_get_state()` / `common_speculative_set_state()` Eagle3 state checkpointing APIs; `common_prompt_checkpoint::data_spec` field added for Eagle3 speculative draft state stash. Additive; compiled into upstream `common`; project does not call these functions — no project source changes required. **New feature:** Eagle3 speculative decoding state save/restore — could expose later |
370
+
| b9682–b9739 |`common/download.h` + `common/download.cpp`| New `common_download_remove()` function for deleting cached model files. Additive; project does not call it — no project source changes required. **New feature:** could be exposed as `LlamaModel.deleteCachedModel(String path)`|
371
+
| b9682–b9739 |`common/arg.cpp`| New `--agent` flag that enables all tools + MCP CORS proxy in one step. Server-level CLI flag; not referenced by `ModelParameters` — no project source changes required. **New feature:** consider `ModelParameters.setAgent(boolean)`|
372
+
| b9682–b9739 |`common/arg.cpp` + `tools/server/server-http.cpp`| API key file: lines starting with `#` are now treated as comments and ignored. Behaviour fix for existing `ModelParameters.setApiKeyFile(String)` users — upgrade picks it up automatically, no source changes required |
| b9682–b9739 |`ggml/src/ggml-cuda/`| New `col2im_1d` CUDA op. Internal CUDA backend, no project changes required |
375
+
| b9682–b9739 |`ggml/src/ggml-metal/`| ROPE_BACK Metal support; concat kernel extended to additional types. Internal Metal backend, no project changes required |
0 commit comments