Skip to content

Commit 6c7322e

Browse files
committed
Upgrade llama.cpp from b9172 to b9198
No source changes required: the JNI layer does not touch any of the renamed or removed symbols (verified via grep for webui/build_info/llama_memory_seq_rm/ speculative.type/n_rs_seq/ctx_type/embd_pre_norm). The deprecated LLAMA_BUILD_WEBUI cache var still forwards to the new LLAMA_BUILD_UI via the upstream backward-compat shim, so this project's existing CMake override continues to work unchanged. https://claude.ai/code/session_01GPq7jMmz2dYSpBsWgsUf86
1 parent c213b8d commit 6c7322e

3 files changed

Lines changed: 15 additions & 3 deletions

File tree

CLAUDE.md

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
66

77
Java bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) via JNI, providing a high-level API for LLM inference in Java. The Java layer communicates with a native C++ library through JNI.
88

9-
Current llama.cpp pinned version: **b9172**
9+
Current llama.cpp pinned version: **b9198**
1010

1111
## Upgrading CUDA Version
1212

@@ -281,6 +281,18 @@ Also review the project `CMakeLists.txt` for build-system-level breaks (e.g. ren
281281
| ~b9151–b9172 | `common/reasoning-budget.cpp` | `common_reasoning_budget_clone` rewritten to use `llama_sampler_init` properly; pure bug fix, no API change, no project changes required |
282282
| ~b9151–b9172 | `ggml/src/ggml-cuda/fattn-mma-f16.cuh` + `mma.cuh` | AMD RDNA3 WMMA flash attention support; new `DATA_LAYOUT_I_MAJOR_SCRAMBLED`, `tile<16,16,half2,I_MAJOR_SCRAMBLED>`, extended config tables; internal CUDA backend, no project changes required |
283283
| ~b9151–b9172 | `tools/server/server-chat.cpp` | Non-function Responses API tools now silently skipped (`continue`) instead of throwing; server behavior fix, no Java API change required |
284+
| ~b9172–b9198 | project `CMakeLists.txt` | Option `LLAMA_BUILD_WEBUI` renamed to `LLAMA_BUILD_UI` (and `LLAMA_USE_PREBUILT_WEBUI``LLAMA_USE_PREBUILT_UI`); upstream keeps a backward-compat shim that forwards the old cache variable with a `DEPRECATION` message, so this project's `set(LLAMA_BUILD_WEBUI OFF CACHE BOOL "" FORCE)` still works unchanged |
285+
| ~b9172–b9198 | `common/common.h` | `common_params::webui` / `webui_mcp_proxy` / `webui_config_json` deprecated in favour of `ui` / `ui_mcp_proxy` / `ui_config_json`; both pairs of fields are kept and synced by `common/arg.cpp`, compiled upstream sources unaffected; new `common_params::ctx_type` and `cparams.n_rs_seq` fields added (default `LLAMA_CONTEXT_TYPE_DEFAULT` / `0`), additive |
286+
| ~b9172–b9198 | `common/common.cpp` + `common.h` | `common_params_print_info` gained optional `print_devices` parameter (default `true`); upstream `tools/server/server.cpp` passes `!is_router_server` to skip GPU enumeration on the router process; this project does not compile `server.cpp`, no impact |
287+
| ~b9172–b9198 | `common/speculative.h` + `speculative.cpp` | New enum value `COMMON_SPECULATIVE_TYPE_DRAFT_MTP` (count is now 9); new `common_speculative_need_embd()` API; MTP draft implementation added (`common_speculative_state_draft_mtp`); `--spec-type draft-mtp` CLI flag added in `common/arg.cpp`; additive, no project changes (could be exposed later as a `ModelParameters` enhancement) |
288+
| ~b9172–b9198 | `include/llama.h` | New `enum llama_context_type { LLAMA_CONTEXT_TYPE_DEFAULT, LLAMA_CONTEXT_TYPE_MTP }`; new `llama_context_params::n_rs_seq` (recurrent-state snapshots per seq for rollback) and `ctx_type` fields; new `llama_n_rs_seq()` accessor; all additive, default-zero, no project impact |
289+
| ~b9172–b9198 | `src/llama-ext.h` (new) + `src/llama-context.cpp` | New pre-norm embedding extraction path: `llama_set_embeddings_pre_norm` / `llama_get_embeddings_pre_norm[_ith]` APIs and an `embd_pre_norm` output buffer in `llama_context`; used by the MTP draft loop only, additive |
290+
| ~b9172–b9198 | `src/llama-memory-recurrent.cpp` | Recurrent-state rollback support: per-seq `rs_idx` snapshot index and `set_rs_idx()` helper; tensors widened to `(1 + n_rs_seq)` groups; `seq_rm` now rolls back via snapshot when within `n_rs_seq` bounds. Backwards-compatible when `n_rs_seq == 0` (this project's default), no project changes |
291+
| ~b9172–b9198 | `tools/server/server-context.cpp` | Embedding endpoint default now reads `params.embd_normalize` (was hard-coded `2`); compiled upstream, no project changes |
292+
| ~b9172–b9198 | `tools/server/CMakeLists.txt` + new `tools/ui/CMakeLists.txt` | WebUI asset wiring moved into a new `llama-ui` static library; `tools/server` now links `llama-ui`; project does not build the `llama-server` binary (only compiles `server-context.cpp` / `server-queue.cpp` / `server-task.cpp` / `server-models.cpp` directly into `jllama`), so no impact. HF bucket name renamed `LLAMA_WEBUI_HF_BUCKET``LLAMA_UI_HF_BUCKET` (old name still honoured) |
293+
| ~b9172–b9198 | `vendor/cpp-httplib/httplib.{h,cpp}` | Bumped to v0.45.0: RFC 9112 §6 message-body framing — requests without `Content-Length` / `Transfer-Encoding` no longer scan for stray body bytes on persistent connections (fixes #2450 keep-alive misframing); X-Forwarded-For parser falls back to the connection remote address when the header is empty/malformed; compiled automatically, no project changes |
294+
| ~b9172–b9198 | `ggml/CMakeLists.txt` | GGML version bumped 0.11.1 → 0.12.0; no project changes |
295+
| ~b9172–b9198 | `ggml/src/ggml.c` + `ggml-cuda/gated_delta_net.cu` + `ggml-metal/ggml-metal.metal` + `ggml-vulkan/vulkan-shaders/gated_delta_net.comp` | `ggml_gated_delta_net` state tensor reshaped from 2D `(S_v*S_v*H, n_seqs)` to 3D `(S_v*S_v*H, K, n_seqs)` where `K` is the snapshot slot count (`K=1` is final-state-only, `K>1` keeps last `min(n_tokens, K)` per-token snapshots); internal Qwen3.5 / Qwen3-Next recurrent-attention kernel, no project changes |
284296

285297
## Build Commands
286298

CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ set(LLAMA_BUILD_WEBUI OFF CACHE BOOL "" FORCE)
108108
FetchContent_Declare(
109109
llama.cpp
110110
GIT_REPOSITORY https://github.com/ggerganov/llama.cpp.git
111-
GIT_TAG b9172
111+
GIT_TAG b9198
112112
)
113113
FetchContent_MakeAvailable(llama.cpp)
114114

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
**Build:**
22
![Java 11+](https://img.shields.io/badge/Java-11%2B-informational)
33
![JUnit](https://img.shields.io/badge/tested%20with-JUnit4-yellow)
4-
[![llama.cpp b9172](https://img.shields.io/badge/llama.cpp-%23b9172-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9172)
4+
[![llama.cpp b9198](https://img.shields.io/badge/llama.cpp-%23b9198-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9198)
55
[![Publish](https://github.com/bernardladenthin/java-llama.cpp/actions/workflows/publish.yml/badge.svg)](https://github.com/bernardladenthin/java-llama.cpp/actions/workflows/publish.yml)
66
[![CodeQL](https://github.com/bernardladenthin/java-llama.cpp/actions/workflows/codeql.yml/badge.svg)](https://github.com/bernardladenthin/java-llama.cpp/actions/workflows/codeql.yml)
77

0 commit comments

Comments
 (0)