From a01bae17e986726eab72df5046588dd261b26587 Mon Sep 17 00:00:00 2001 From: Claude Date: Sat, 13 Jun 2026 13:56:04 +0000 Subject: [PATCH 1/2] Upgrade llama.cpp from b9555 to b9621 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit No project source changes required — all upstream API breaks in this range (ggml_gated_delta_net state-tensor reshape, common_get_device_memory_data return-type change, mtmd_helper_bitmap_* return-type change, llm_graph_result::set_outputs signature change) are absorbed inside upstream-compiled translation units. New upstream features in this range (EAGLE3 speculative decoding, video input pipeline, mtmd_batch_max_tokens, path_prompts_log_dir, ggml_col2im_1d op) are noted in the breaking-changes doc as candidates for future Java API exposure. https://claude.ai/code/session_016jPq9MLePa3eXjxiLLStwi --- CLAUDE.md | 2 +- CMakeLists.txt | 2 +- README.md | 2 +- docs/history/llama-cpp-breaking-changes.md | 14 ++++++++++++++ 4 files changed, 17 insertions(+), 3 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index 484d1934..a85fa756 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co Java bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) via JNI, providing a high-level API for LLM inference in Java. The Java layer communicates with a native C++ library through JNI. -Current llama.cpp pinned version: **b9555** +Current llama.cpp pinned version: **b9621** ## Upgrading CUDA Version diff --git a/CMakeLists.txt b/CMakeLists.txt index aeb12b4f..2d68461f 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -114,7 +114,7 @@ set(LLAMA_BUILD_APP OFF CACHE BOOL "" FORCE) FetchContent_Declare( llama.cpp GIT_REPOSITORY https://github.com/ggerganov/llama.cpp.git - GIT_TAG b9555 + GIT_TAG b9621 ) FetchContent_MakeAvailable(llama.cpp) diff --git a/README.md b/README.md index 954a62a1..f5e857a3 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@ **Build:** ![Java 8+](https://img.shields.io/badge/Java-8%2B-informational) ![Platform](https://img.shields.io/badge/Platform-Linux%20%7C%20macOS%20%7C%20Windows%20%7C%20Android-lightgrey) -[![llama.cpp b9555](https://img.shields.io/badge/llama.cpp-%23b9555-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9555) +[![llama.cpp b9621](https://img.shields.io/badge/llama.cpp-%23b9621-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9621) [![JPMS](https://img.shields.io/badge/JPMS-modular%20JAR-25A162)](https://openjdk.org/projects/jigsaw/) ![JUnit](https://img.shields.io/badge/tested%20with-JUnit6-25A162) [![JSpecify](https://img.shields.io/badge/JSpecify-1.0.0%20%40NullMarked-25A162)](https://jspecify.dev) diff --git a/docs/history/llama-cpp-breaking-changes.md b/docs/history/llama-cpp-breaking-changes.md index a6a321cd..994622a2 100644 --- a/docs/history/llama-cpp-breaking-changes.md +++ b/docs/history/llama-cpp-breaking-changes.md @@ -326,3 +326,17 @@ Used during `llama.cpp` version bumps: when upgrading, scan this file from the r | ~b9549–b9553 | `conversion/mistral.py` + `convert_hf_to_gguf.py` | Python conversion-script robustness only: `hparams["llama_4_scaling"]` and `"moe" in hparams` replaced with `hparams.get(...)` / `is not None` guards so a present-but-null key no longer crashes conversion. Python tooling, not part of the JNI build. No impact | | ~b9549–b9553 | upstream build / verification | Local build with `GIT_TAG b9553` verified clean on Linux x86_64: `cmake -B build -DBUILD_TESTING=ON` configures cleanly, `cmake --build build --config Release -j$(nproc)` links `libjllama.so` + `jllama_test` with zero warnings on any project translation unit, and `ctest --test-dir build --output-on-failure` reports **440/440 tests passing** (435 prior + 5 new `Samplers_*` tests). The sole breaking change in this range (the `common_sampler_types_from_names` signature) is absorbed inside upstream-compiled translation units; no project C++ source edits were required for the version bump itself | | ~b9553–b9555 | `.devops/intel.Dockerfile` + `ggml/src/ggml-metal/ggml-metal-device.cpp` + `tests/test-backend-ops.cpp` | Tiny maintenance bump — **no API change and no new feature**. (1) `intel.Dockerfile`: Intel GPU userspace driver pins bumped (IGC `v2.20.5`→`v2.34.4`, compute-runtime `25.40.35563.10`→`26.18.38308.1`, IGDGMM `22.8.2`→`22.10.0`) with the old multi-GPU-safe versions commented out; upstream's own Docker image only — this project ships its own `publish.yml` and does not consume `.devops/`. No impact. (2) `ggml-metal-device.cpp`: bugfix to the Metal im2col pipeline selector — the standard-vs-`_ext` kernel choice now keys off the actual conv-kernel footprint (`KH*KW`, with `KH = is_2D ? ne01 : 1`, `KW = ne00`) instead of the raw `ne00*ne01` product, fixing kernel selection for 1-D convolutions. Backend-internal Metal TU compiled via FetchContent; no API surface visible to `jllama.cpp`, and only affects the macOS/Metal backend at runtime. (3) `tests/test-backend-ops.cpp`: one extra `test_im2col` case (`{3000,384,1,1}` / `{3,384,384,1}`) added — upstream test only, not linked into the JNI build. **No project source changes required; no new Java-API-exposable feature.** Build verification deferred to CI (`publish.yml`) / a developer host as usual | +| ~b9555–b9621 | `ggml/include/ggml.h` + `ggml/src/ggml.c` + `ggml/src/ggml-cuda/gated_delta_net.cu` + `ggml/src/ggml-metal/ggml-metal.metal` + `ggml/src/ggml-vulkan/vulkan-shaders/gated_delta_net.comp` | `ggml_gated_delta_net` state tensor reshaped again: the 3D `(S_v*S_v*H, K, n_seqs)` layout is now the 4D `[S_v, S_v, H, n_seqs]` with an explicit `int64_t K` seventh parameter (snapshot count, K=1 is final-state-only). Signature: `ggml_gated_delta_net(ctx, q, k, v, g, beta, state, K)` (was 6-argument). Snapshot-slot ordering also flipped to most-recent-first. Internal Qwen3.5 / Qwen3-Next recurrent-attention kernel; project does not call `ggml_gated_delta_net` directly — no project source changes required | +| ~b9555–b9621 | `ggml/include/ggml.h` | New `ggml_col2im_1d(ctx, a, s0, oc, p0)` function and `GGML_OP_COL2IM_1D` enum value added; `GGML_OP_COUNT` incremented 96 → 97. Additive; not called by project — no project source changes required | +| ~b9555–b9621 | `common/fit.h` + `tools/server/server-context.cpp` | `common_get_device_memory_data()` return type changed: now returns `common_device_memory_data_vec` (typedef for `std::vector`). New `common_device_memory_data` struct carries `.total`, `.free`, `.model`, `.context`, `.compute` fields directly (previously the caller reached them via `.mb.model` etc.). `fit.h` also dropped its `#include "ggml-backend.h"` and `#include "../src/llama-ext.h"` lines (those types are no longer needed at the header level). Consumed exclusively in upstream-compiled `server-context.cpp` (field-accessor update from `.mb.model` → `.model` etc. was applied upstream); project does not include `fit.h` or call `common_get_device_memory_data()` directly — no project source changes required | +| ~b9555–b9621 | `tools/mtmd/mtmd-helper.h` + `tools/mtmd/mtmd-helper.cpp` + `tools/server/server-common.cpp` | `mtmd_helper_bitmap_init_from_file()` and `mtmd_helper_bitmap_init_from_buf()` return type changed: both now return `mtmd_helper_bitmap_wrapper` struct (contains `bitmap` + `video_ctx` fields) instead of `mtmd_bitmap*`. All call sites updated in upstream `server-common.cpp`. Project does not call these functions from `src/main/cpp/` (verified via grep: zero matches) — no project source changes required | +| ~b9555–b9621 | `tools/mtmd/mtmd-helper.h` + `tools/mtmd/mtmd-helper.cpp` | New video pipeline: `mtmd_helper_video_context`, `mtmd_helper_video_*` API family (init/free/decode), ffmpeg-based frame extraction. New `--video` CLI flag in `common/arg.cpp`; new `input_video` content type in `server-common.cpp`. Multimodal helper additions flow through the upstream-compiled `mtmd-helper.cpp` and `server-common.cpp`; project does not reference any `mtmd_helper_video_*` symbol — no project source changes required. Could be exposed in a future Java API as `InferenceParameters.setVideoPath(String)` | +| ~b9555–b9621 | `common/common.h` | New `common_params` fields: `path_prompts_log_dir` (prompt-logging output directory, string) and `mtmd_batch_max_tokens` (multimodal batch token limit, default 1024). Both additive with harmless defaults. Not surfaced by `ModelParameters` today — could be added in a future enhancement. No project source changes required | +| ~b9555–b9621 | `src/llama-ext.h` | New EAGLE3 speculative-decoding support APIs: `llama_set_embeddings_layer_inp(ctx, lid, value)`, `llama_get_embeddings_layer_inp(ctx, lid)`, `llama_model_target_layer_ids(model)` → `const int32_t*`, `llama_model_target_layer_ids_n(model)` → `uint32_t`. New `LLM_ARCH_EAGLE3` model architecture; new `llama_model_eagle3` struct in upstream model sources. EAGLE3 enables full encoder+decoder graph implementation for speculative decoding. All consumed inside upstream-compiled `speculative.cpp` and model TUs; project does not reference any of these symbols — no project source changes required. Could be exposed later as a speculative-decoding backend type in `ModelParameters` | +| ~b9555–b9621 | `src/llama-graph.h` + `src/llama-graph.cpp` | `llm_graph_result::set_outputs()` signature changed: now takes a `const llm_graph_params &` parameter (was no-parameter). New `t_layer_inp` vector added to `llm_graph_result` for layer-input embedding extraction (used by EAGLE3). Internal graph-building API; not called from project sources — no project source changes required | +| ~b9555–b9621 | `src/llama-context.cpp` | `llama_context` now initializes `embeddings_layer_inp` storage for EAGLE3 layer-input extraction; `n_outputs_max` is forced to `n_batch` when `llama_model_has_encoder()` returns true (encoder models always need all outputs). Internal context lifecycle; no project sources reference these fields — no project source changes required | +| ~b9555–b9621 | `vendor/cpp-httplib/httplib.h` + `httplib.cpp` | cpp-httplib bumped to v0.47.0. Compiled automatically via FetchContent — no project source changes required | +| ~b9555–b9621 | `ggml/src/ggml-cuda/ggml-cuda.cu` | `ggml_concat` on CUDA now handles F16, BF16, I8, I16, I32, I64 element types in addition to F32; `active_count` tracking added to CUDA context to prevent memory leak from lazy `cudaMemGetInfo` context creation. Internal CUDA backend, no project changes required | +| ~b9555–b9621 | `ggml/src/ggml-vulkan/` + Vulkan shaders | New `VK_VALVE_shader_mixed_float_dot_product` extension support for F16→F32 fused dot products (`dot2_f16`) in flash attention and GEMM matmul. Internal Vulkan backend, no project changes required | +| ~b9555–b9621 | `ggml/src/ggml-opencl/` + OpenCL kernels | New Q5_0 and Q5_1 GEMM/GEMV noshuffle kernels for Qualcomm Adreno GPUs. Internal OpenCL backend (affects `opencl-android-aarch64` classifier build only); no project source changes required | +| ~b9555–b9621 | `ggml/src/ggml-cuda/ssm-scan.cu` | Added `__syncthreads()` before the final reduction stage to prevent shared-memory race conditions on multi-warp SSM scan. Bug fix, internal CUDA backend, no project changes required | From 770afe071ffbee6aa83cbf117c69fe214cbd223c Mon Sep 17 00:00:00 2001 From: Claude Date: Sat, 13 Jun 2026 14:01:15 +0000 Subject: [PATCH 2/2] Add REUSE SPDX headers to three markdown docs TODO.md, docs/feature-investigation-similar-projects.md, and docs/history/llama-cpp-breaking-changes.md were missing SPDX-FileCopyrightText / SPDX-License-Identifier tags, causing the REUSE compliance check to report 3 non-compliant files (211/214). https://claude.ai/code/session_016jPq9MLePa3eXjxiLLStwi --- TODO.md | 6 ++++++ docs/feature-investigation-similar-projects.md | 6 ++++++ docs/history/llama-cpp-breaking-changes.md | 6 ++++++ 3 files changed, 18 insertions(+) diff --git a/TODO.md b/TODO.md index 4cb2f7be..f802f0a8 100644 --- a/TODO.md +++ b/TODO.md @@ -1,3 +1,9 @@ + + # TODO — java-llama.cpp Open work items for this repo. Cross-cutting tracking lives in diff --git a/docs/feature-investigation-similar-projects.md b/docs/feature-investigation-similar-projects.md index 748f747a..3e031bb3 100644 --- a/docs/feature-investigation-similar-projects.md +++ b/docs/feature-investigation-similar-projects.md @@ -1,3 +1,9 @@ + + # Feature Investigation — ideas from pure-Java sibling runtimes and `llamacpp4j` Comparison sources (all surveyed in one pass for this document): diff --git a/docs/history/llama-cpp-breaking-changes.md b/docs/history/llama-cpp-breaking-changes.md index 994622a2..aa8f4f10 100644 --- a/docs/history/llama-cpp-breaking-changes.md +++ b/docs/history/llama-cpp-breaking-changes.md @@ -1,3 +1,9 @@ + + # llama.cpp upstream breaking changes — version-range changelog Per-version-range record of upstream API breaks observed in the b5022 → latest range, what the affected upstream files are, and the project-side fix (or "no project changes required" when the break stayed inside an upstream-compiled translation unit).