Skip to content

Commit 1a43caa

Browse files
committed
Upgrade llama.cpp from b9682 to b9739
Breaking changes fixed: - server-schema.cpp added to CMakeLists.txt target_sources (new upstream file extracted from server-task.cpp; server-context.cpp now depends on it) - #include "server-schema.h" added to jllama.cpp and test_server.cpp - server_task::params_from_json_cmpl() → server_schema::eval_llama_cmpl_schema() in jllama.cpp:populate_completion_task and test_server.cpp:parse_params Non-breaking upstream changes absorbed automatically: - common_params_model::name → get_name() (not referenced in project C++) - webui/webui_mcp_proxy/webui_config_json fields removed from common_params - server_state enum: SERVER_STATE_LOADING_MODEL→LOADING, new SLEEPING value - on_sleeping_changed → set_state_callback / server_state_callback_t - cpp-httplib vendor bump v0.47.0 → v0.48.0 New upstream features (available for future Java API exposure): - common_speculative_get_state/set_state: Eagle3 checkpoint save/restore - common_download_remove: cached model deletion - --agent flag: all tools + MCP CORS proxy in one step - API key file #-comment support (auto-applied for existing setApiKeyFile users) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XjHW4CFNEcj4sB8KksJ4LB
1 parent ee8b035 commit 1a43caa

7 files changed

Lines changed: 29 additions & 14 deletions

File tree

CLAUDE.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
66

77
Java bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) via JNI, providing a high-level API for LLM inference in Java. The Java layer communicates with a native C++ library through JNI.
88

9-
Current llama.cpp pinned version: **b9682**
9+
Current llama.cpp pinned version: **b9739**
1010

1111
## Upgrading CUDA Version
1212

@@ -193,7 +193,7 @@ needs no extra step here, `build-webui` re-reads the tag and rebuilds the matchi
193193
ships no UI):
194194
```bash
195195
# needs node/npm + network; embed.cpp is plain C++17 (no npm)
196-
git clone --depth 1 --branch b9682 https://github.com/ggml-org/llama.cpp /tmp/lc
196+
git clone --depth 1 --branch b9739 https://github.com/ggml-org/llama.cpp /tmp/lc
197197
( cd /tmp/lc/tools/ui && npm ci && npm run build \
198198
&& ( cd dist && find . -type f -not -path './_gzip/*' \
199199
| while read -r f; do mkdir -p "_gzip/$(dirname "$f")"; gzip -9 -c "$f" > "_gzip/$f"; done ) \
@@ -227,7 +227,7 @@ plus a cache token are present, `build.sh` adds
227227
- `SCCACHE_WEBDAV_TOKEN: ${{ secrets.DEPOT_TOKEN }}` — a Depot **organization** token, stored
228228
as the repo secret **`DEPOT_TOKEN`**.
229229

230-
Because `sccache` is **content-addressed** and llama.cpp is pinned (`GIT_TAG b9682`), the
230+
Because `sccache` is **content-addressed** and llama.cpp is pinned (`GIT_TAG b9739`), the
231231
~280 upstream object files are byte-identical every run, so a warm cache recompiles only the
232232
*changed* files. Depot's cache is **shared across all branches** (unlike GitHub's
233233
per-branch `actions/cache`), so every branch builds incrementally; a `b<nnnn>` version bump
@@ -765,7 +765,7 @@ ctest --test-dir build --output-on-failure -R "ResultsToJson"
765765
| File | Tests | Scope |
766766
|------|-------|-------|
767767
| `src/test/cpp/test_utils.cpp` | 156 | Upstream helpers: `server_tokens`, `server_grammar_trigger`, `gen_tool_call_id`, `json_value`, `json_get_nested_values`, UTF-8 helpers, `format_response_rerank`, `format_embeddings_response_oaicompat`, `oaicompat_completion_params_parse`, `oaicompat_chat_params_parse`, `are_lora_equal`, `strip_flag_from_argv`, `token_piece_value`, `json_is_array_and_contains_numbers`, `format_oai_sse`, `format_oai_resp_sse`, `format_anthropic_sse` |
768-
| `src/test/cpp/test_server.cpp` | 188 | Upstream result types: `result_timings`, `task_params::to_json()` (incl. `dry_sequence_breakers`, `preserved_tokens`, `timings_per_token`), `completion_token_output`, `server_task_result_cmpl_partial` (non-oaicompat + `to_json_oaicompat` + logprobs + `to_json_oaicompat_chat` + `to_json_anthropic` + dispatcher), `server_task_result_cmpl_final` (non-oaicompat + `to_json_oaicompat` + `to_json_oaicompat_chat` + `to_json_oaicompat_chat_stream` + `to_json_anthropic` + `to_json_anthropic_stream` + tool_calls + dispatcher), `server_task_result_embd`, `server_task_result_rerank`, `server_task_result_metrics`, `server_task_result_slot_save_load`, `server_task_result_slot_erase`, `server_task_result_apply_lora`, `server_task_result_error`, `format_error_response`, `server_task::need_sampling()`, `server_task::n_tokens()`, `server_task::params_from_json_cmpl()` (parsing pipeline + grammar routing + error paths), `response_fields` projection |
768+
| `src/test/cpp/test_server.cpp` | 188 | Upstream result types: `result_timings`, `task_params::to_json()` (incl. `dry_sequence_breakers`, `preserved_tokens`, `timings_per_token`), `completion_token_output`, `server_task_result_cmpl_partial` (non-oaicompat + `to_json_oaicompat` + logprobs + `to_json_oaicompat_chat` + `to_json_anthropic` + dispatcher), `server_task_result_cmpl_final` (non-oaicompat + `to_json_oaicompat` + `to_json_oaicompat_chat` + `to_json_oaicompat_chat_stream` + `to_json_anthropic` + `to_json_anthropic_stream` + tool_calls + dispatcher), `server_task_result_embd`, `server_task_result_rerank`, `server_task_result_metrics`, `server_task_result_slot_save_load`, `server_task_result_slot_erase`, `server_task_result_apply_lora`, `server_task_result_error`, `format_error_response`, `server_task::need_sampling()`, `server_task::n_tokens()`, `server_schema::eval_llama_cmpl_schema()` (parsing pipeline + grammar routing + error paths), `response_fields` projection |
769769
| `src/test/cpp/test_json_helpers.cpp` | 47 | All functions in `json_helpers.hpp`: `get_result_error_message`, `results_to_json`, `rerank_results_to_json`, `parse_encoding_format`, `extract_embedding_prompt`, `is_infill_request`, `parse_slot_prompt_similarity`, `parse_positive_int_config`, `wrap_stream_chunk` |
770770
| `src/test/cpp/test_log_helpers.cpp` | 13 | All functions in `log_helpers.hpp`: `log_level_name`, `format_log_as_json` |
771771
| `src/test/cpp/test_jni_helpers.cpp` | 41 | All functions in `jni_helpers.hpp` using a zero-filled `JNINativeInterface_` mock |
@@ -774,7 +774,7 @@ ctest --test-dir build --output-on-failure -R "ResultsToJson"
774774

775775
#### Upstream source location (in CMake build tree)
776776

777-
llama.cpp is fetched via CMake FetchContent, pinned to `GIT_TAG b9682`.
777+
llama.cpp is fetched via CMake FetchContent, pinned to `GIT_TAG b9739`.
778778

779779
**GoogleTest** is a separate `BUILD_TESTING`-only FetchContent (`GIT_TAG v1.17.0`), used solely
780780
by the `jllama_test` C++ unit-test binary — not by the shipped library, and not coupled to the
@@ -877,9 +877,9 @@ f.timings.prompt_n = 5;
877877
EXPECT_TRUE(j.contains("timings"));
878878
```
879879

880-
**3. Parameter parsing (`params_from_json_cmpl`) without a model**
880+
**3. Parameter parsing (`eval_llama_cmpl_schema`) without a model**
881881

882-
`server_task::params_from_json_cmpl(vocab, params_base, n_ctx_slot, logit_bias_eog, data)`
882+
`server_schema::eval_llama_cmpl_schema(vocab, params_base, n_ctx_slot, logit_bias_eog, data)`
883883
can be called with `nullptr` vocab **if the JSON does not trigger grammar/preserved_tokens
884884
tokenisation** (those are the only vocab-dependent paths). This lets us test the full
885885
parsing pipeline including error throws:
@@ -891,12 +891,12 @@ const int n_ctx = 512;
891891

892892
// test: repeat_last_n=-1 is expanded to n_ctx_slot
893893
json data = {{"repeat_last_n", -1}};
894-
auto p = server_task::params_from_json_cmpl(nullptr, params_base, n_ctx, no_bias, data);
894+
auto p = server_schema::eval_llama_cmpl_schema(nullptr, params_base, n_ctx, no_bias, data);
895895
EXPECT_EQ(p.sampling.penalty_last_n, n_ctx);
896896

897897
// test: invalid value throws std::runtime_error
898898
json bad = {{"dry_sequence_breakers", json::array()}}; // empty → error
899-
EXPECT_THROW(server_task::params_from_json_cmpl(nullptr, params_base, n_ctx, no_bias, bad),
899+
EXPECT_THROW(server_schema::eval_llama_cmpl_schema(nullptr, params_base, n_ctx, no_bias, bad),
900900
std::runtime_error);
901901
```
902902

CMakeLists.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,7 @@ set(LLAMA_BUILD_APP OFF CACHE BOOL "" FORCE)
139139
FetchContent_Declare(
140140
llama.cpp
141141
GIT_REPOSITORY https://github.com/ggerganov/llama.cpp.git
142-
GIT_TAG b9682
142+
GIT_TAG b9739
143143
)
144144
FetchContent_MakeAvailable(llama.cpp)
145145

@@ -270,6 +270,7 @@ target_sources(jllama PRIVATE
270270
${llama.cpp_SOURCE_DIR}/tools/server/server-context.cpp
271271
${llama.cpp_SOURCE_DIR}/tools/server/server-queue.cpp
272272
${llama.cpp_SOURCE_DIR}/tools/server/server-task.cpp
273+
${llama.cpp_SOURCE_DIR}/tools/server/server-schema.cpp
273274
)
274275
if(NOT ANDROID_ABI AND NOT OS_NAME MATCHES "Android")
275276
target_sources(jllama PRIVATE

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
**Build:**
22
![Java 8+](https://img.shields.io/badge/Java-8%2B-informational)
33
![Platform](https://img.shields.io/badge/Platform-Linux%20%7C%20macOS%20%7C%20Windows%20%7C%20Android-lightgrey)
4-
[![llama.cpp b9682](https://img.shields.io/badge/llama.cpp-%23b9682-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9682)
4+
[![llama.cpp b9739](https://img.shields.io/badge/llama.cpp-%23b9739-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9739)
55
[![JPMS](https://img.shields.io/badge/JPMS-modular%20JAR-25A162)](https://openjdk.org/projects/jigsaw/)
66
![JUnit](https://img.shields.io/badge/tested%20with-JUnit6-25A162)
77
[![JSpecify](https://img.shields.io/badge/JSpecify-1.0.0%20%40NullMarked-25A162)](https://jspecify.dev)

docs/history/llama-cpp-breaking-changes.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -361,3 +361,15 @@ Used during `llama.cpp` version bumps: when upgrading, scan this file from the r
361361
| b9642–b9682 | `common/speculative.{h,cpp}` | Speculative decoding now accumulates per-draft-position acceptance statistics and adds an Eagle3 backend-sampling path (the draft model samples on the compute backend). `common_speculative_*` is compiled into `common` and reached only through the upstream server's speculative slot; the project's C++ references no `speculative`/`draft` symbol. No project source changes required. **New feature:** per-position draft-acceptance metrics — could surface as speculative-decoding telemetry in a future Java API |
362362
| b9642–b9682 | `tools/server/server-context.cpp` | Server slot refactored so an `mtmd` (multimodal) prompt can feed a speculative draft model: image/media chunks are routed through the new `mtmd_helper_decode_image_chunk` callback before drafting. Compiled directly into `jllama` (the project builds `server-context/queue/task/models`), but the change is internal to the slot state machine and binds no new/renamed symbol; verified that `jllama.cpp` and the `*_helpers.hpp` headers call none of the touched functions. No project source changes required |
363363
| b9642–b9682 | `ggml/src/ggml-*` backends, `tools/` (incl. `llama-bench --offline`), conda-forge packaging, `docs/`, `.github/` | Routine backend kernel updates and tooling/docs/CI tweaks (a new `llama-bench --offline` flag, conda-forge recipe notes). None are compiled into `jllama` beyond the already-built CPU/CUDA/Metal/OpenCL backends, and none change a symbol the project binds. No project changes required |
364+
| b9682–b9739 | `tools/server/server-schema.{h,cpp}` (new) + `tools/server/server-task.{h,cpp}` | **Build-breaking.** `server_task::params_from_json_cmpl()` MOVED to `server_schema::eval_llama_cmpl_schema()` in new `server-schema.h`/`server-schema.cpp`. **Required project changes**: (1) add `server-schema.cpp` to the `target_sources(jllama ...)` block in `CMakeLists.txt`; (2) add `#include "server-schema.h"` in `src/main/cpp/jllama.cpp` and `src/test/cpp/test_server.cpp`; (3) update the call sites in `jllama.cpp:203` and `test_server.cpp:1722` from `server_task::params_from_json_cmpl(...)` to `server_schema::eval_llama_cmpl_schema(...)` |
365+
| b9682–b9739 | `common/common.h` (`common_params_model`) | `common_params_model::name` field REMOVED; replaced by `get_name()` method. Not referenced in project source (model name is read from `server_context_meta::model_name`, populated upstream) — no project source changes required |
366+
| b9682–b9739 | `common/common.h` (`common_params`) | `webui`, `webui_mcp_proxy`, `webui_config_json` fields REMOVED (deprecated aliases; replaced by `ui`/`ui_mcp_proxy`/`ui_config_json` introduced in b9172). Project never references these fields directly — no project source changes required |
367+
| b9682–b9739 | `tools/server/server-models.h` + `server-models.cpp` | `server_state` enum: `SERVER_STATE_LOADING_MODEL` renamed to `SERVER_STATE_LOADING`; new `SERVER_STATE_SLEEPING` added. `on_sleeping_changed` callback replaced by `set_state_callback` with `server_state_callback_t` type. None are referenced in `jllama.cpp` — no project source changes required |
368+
| b9682–b9739 | `vendor/cpp-httplib/httplib.{h,cpp}` | cpp-httplib bumped from v0.47.0 to v0.48.0. Compiled automatically via FetchContent — no project source changes required |
369+
| b9682–b9739 | `common/speculative.{h,cpp}` | New `common_speculative_get_state()` / `common_speculative_set_state()` Eagle3 state checkpointing APIs; `common_prompt_checkpoint::data_spec` field added for Eagle3 speculative draft state stash. Additive; compiled into upstream `common`; project does not call these functions — no project source changes required. **New feature:** Eagle3 speculative decoding state save/restore — could expose later |
370+
| b9682–b9739 | `common/download.h` + `common/download.cpp` | New `common_download_remove()` function for deleting cached model files. Additive; project does not call it — no project source changes required. **New feature:** could be exposed as `LlamaModel.deleteCachedModel(String path)` |
371+
| b9682–b9739 | `common/arg.cpp` | New `--agent` flag that enables all tools + MCP CORS proxy in one step. Server-level CLI flag; not referenced by `ModelParameters` — no project source changes required. **New feature:** consider `ModelParameters.setAgent(boolean)` |
372+
| b9682–b9739 | `common/arg.cpp` + `tools/server/server-http.cpp` | API key file: lines starting with `#` are now treated as comments and ignored. Behaviour fix for existing `ModelParameters.setApiKeyFile(String)` users — upgrade picks it up automatically, no source changes required |
373+
| b9682–b9739 | `ggml/src/ggml-sycl/` | New conv2d, conv2d_dw, conv2d_transpose, conv3d SYCL ops; Q1_0 quantization support. Internal SYCL backend, no project changes required |
374+
| b9682–b9739 | `ggml/src/ggml-cuda/` | New `col2im_1d` CUDA op. Internal CUDA backend, no project changes required |
375+
| b9682–b9739 | `ggml/src/ggml-metal/` | ROPE_BACK Metal support; concat kernel extended to additional types. Internal Metal backend, no project changes required |

src/main/cpp/jllama.cpp

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
#include "server-context.h"
1515
#include "server-queue.h"
1616
#include "server-task.h"
17+
#include "server-schema.h"
1718
#include "server-common.h"
1819
#include "server-chat.h"
1920
#include "utils.hpp"
@@ -200,7 +201,7 @@ static void populate_completion_task(server_task &task, jllama_context *jctx, in
200201
if (!tokenized_prompts.empty()) {
201202
task.tokens = std::move(tokenized_prompts[0]);
202203
}
203-
task.params = server_task::params_from_json_cmpl(jctx->vocab, jctx->params, n_ctx_slot, logit_bias_eog, data);
204+
task.params = server_schema::eval_llama_cmpl_schema(jctx->vocab, jctx->params, n_ctx_slot, logit_bias_eog, data);
204205
}
205206

206207
[[nodiscard]] static jint dispatch_streaming_completion(JNIEnv *env, jllama_context *jctx, const json &data,

src/main/cpp/jni_helpers.hpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ struct jllama_context {
6161
llama_model *vocab_only_model = nullptr;
6262

6363
// Saved copy of common_params used to load the model.
64-
// Required by server_task::params_from_json_cmpl which takes common_params&.
64+
// Required by server_schema::eval_llama_cmpl_schema which takes common_params&.
6565
common_params params;
6666

6767
// Per-streaming-task response readers, keyed by task id.

src/test/cpp/test_server.cpp

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323
#include "server-context.h"
2424
#include "server-queue.h"
2525
#include "server-task.h"
26+
#include "server-schema.h"
2627
#include "server-common.h"
2728
#include "server-chat.h"
2829
#include "utils.hpp"
@@ -1719,7 +1720,7 @@ namespace {
17191720
task_params parse_params(const json &data, int n_ctx = 512) {
17201721
common_params params_base;
17211722
std::vector<llama_logit_bias> no_bias;
1722-
return server_task::params_from_json_cmpl(nullptr, params_base, n_ctx, no_bias, data);
1723+
return server_schema::eval_llama_cmpl_schema(nullptr, params_base, n_ctx, no_bias, data);
17231724
}
17241725
} // namespace
17251726

0 commit comments

Comments
 (0)