You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
No project source changes required — all upstream API breaks in this
range (ggml_gated_delta_net state-tensor reshape, common_get_device_memory_data
return-type change, mtmd_helper_bitmap_* return-type change,
llm_graph_result::set_outputs signature change) are absorbed inside
upstream-compiled translation units.
New upstream features in this range (EAGLE3 speculative decoding,
video input pipeline, mtmd_batch_max_tokens, path_prompts_log_dir,
ggml_col2im_1d op) are noted in the breaking-changes doc as
candidates for future Java API exposure.
https://claude.ai/code/session_016jPq9MLePa3eXjxiLLStwi
Copy file name to clipboardExpand all lines: CLAUDE.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
6
6
7
7
Java bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) via JNI, providing a high-level API for LLM inference in Java. The Java layer communicates with a native C++ library through JNI.
Copy file name to clipboardExpand all lines: docs/history/llama-cpp-breaking-changes.md
+14Lines changed: 14 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -326,3 +326,17 @@ Used during `llama.cpp` version bumps: when upgrading, scan this file from the r
326
326
|~b9549–b9553 |`conversion/mistral.py` + `convert_hf_to_gguf.py`| Python conversion-script robustness only: `hparams["llama_4_scaling"]` and `"moe" in hparams` replaced with `hparams.get(...)` / `is not None` guards so a present-but-null key no longer crashes conversion. Python tooling, not part of the JNI build. No impact |
327
327
|~b9549–b9553 | upstream build / verification | Local build with `GIT_TAG b9553` verified clean on Linux x86_64: `cmake -B build -DBUILD_TESTING=ON` configures cleanly, `cmake --build build --config Release -j$(nproc)` links `libjllama.so` + `jllama_test` with zero warnings on any project translation unit, and `ctest --test-dir build --output-on-failure` reports **440/440 tests passing** (435 prior + 5 new `Samplers_*` tests). The sole breaking change in this range (the `common_sampler_types_from_names` signature) is absorbed inside upstream-compiled translation units; no project C++ source edits were required for the version bump itself |
328
328
| ~b9553–b9555 | `.devops/intel.Dockerfile` + `ggml/src/ggml-metal/ggml-metal-device.cpp` + `tests/test-backend-ops.cpp` | Tiny maintenance bump — **no API change and no new feature**. (1) `intel.Dockerfile`: Intel GPU userspace driver pins bumped (IGC `v2.20.5`→`v2.34.4`, compute-runtime `25.40.35563.10`→`26.18.38308.1`, IGDGMM `22.8.2`→`22.10.0`) with the old multi-GPU-safe versions commented out; upstream's own Docker image only — this project ships its own `publish.yml` and does not consume `.devops/`. No impact. (2) `ggml-metal-device.cpp`: bugfix to the Metal im2col pipeline selector — the standard-vs-`_ext` kernel choice now keys off the actual conv-kernel footprint (`KH*KW`, with `KH = is_2D ? ne01 : 1`, `KW = ne00`) instead of the raw `ne00*ne01` product, fixing kernel selection for 1-D convolutions. Backend-internal Metal TU compiled via FetchContent; no API surface visible to `jllama.cpp`, and only affects the macOS/Metal backend at runtime. (3) `tests/test-backend-ops.cpp`: one extra `test_im2col` case (`{3000,384,1,1}` / `{3,384,384,1}`) added — upstream test only, not linked into the JNI build. **No project source changes required; no new Java-API-exposable feature.** Build verification deferred to CI (`publish.yml`) / a developer host as usual |
329
+
|~b9555–b9621 |`ggml/include/ggml.h` + `ggml/src/ggml.c` + `ggml/src/ggml-cuda/gated_delta_net.cu` + `ggml/src/ggml-metal/ggml-metal.metal` + `ggml/src/ggml-vulkan/vulkan-shaders/gated_delta_net.comp`|`ggml_gated_delta_net` state tensor reshaped again: the 3D `(S_v*S_v*H, K, n_seqs)` layout is now the 4D `[S_v, S_v, H, n_seqs]` with an explicit `int64_t K` seventh parameter (snapshot count, K=1 is final-state-only). Signature: `ggml_gated_delta_net(ctx, q, k, v, g, beta, state, K)` (was 6-argument). Snapshot-slot ordering also flipped to most-recent-first. Internal Qwen3.5 / Qwen3-Next recurrent-attention kernel; project does not call `ggml_gated_delta_net` directly — no project source changes required |
330
+
|~b9555–b9621 |`ggml/include/ggml.h`| New `ggml_col2im_1d(ctx, a, s0, oc, p0)` function and `GGML_OP_COL2IM_1D` enum value added; `GGML_OP_COUNT` incremented 96 → 97. Additive; not called by project — no project source changes required |
331
+
|~b9555–b9621 |`common/fit.h` + `tools/server/server-context.cpp`|`common_get_device_memory_data()` return type changed: now returns `common_device_memory_data_vec` (typedef for `std::vector<common_device_memory_data>`). New `common_device_memory_data` struct carries `.total`, `.free`, `.model`, `.context`, `.compute` fields directly (previously the caller reached them via `.mb.model` etc.). `fit.h` also dropped its `#include "ggml-backend.h"` and `#include "../src/llama-ext.h"` lines (those types are no longer needed at the header level). Consumed exclusively in upstream-compiled `server-context.cpp` (field-accessor update from `.mb.model` → `.model` etc. was applied upstream); project does not include `fit.h` or call `common_get_device_memory_data()` directly — no project source changes required |
332
+
|~b9555–b9621 |`tools/mtmd/mtmd-helper.h` + `tools/mtmd/mtmd-helper.cpp` + `tools/server/server-common.cpp`|`mtmd_helper_bitmap_init_from_file()` and `mtmd_helper_bitmap_init_from_buf()` return type changed: both now return `mtmd_helper_bitmap_wrapper` struct (contains `bitmap` + `video_ctx` fields) instead of `mtmd_bitmap*`. All call sites updated in upstream `server-common.cpp`. Project does not call these functions from `src/main/cpp/` (verified via grep: zero matches) — no project source changes required |
333
+
|~b9555–b9621 |`tools/mtmd/mtmd-helper.h` + `tools/mtmd/mtmd-helper.cpp`| New video pipeline: `mtmd_helper_video_context`, `mtmd_helper_video_*` API family (init/free/decode), ffmpeg-based frame extraction. New `--video` CLI flag in `common/arg.cpp`; new `input_video` content type in `server-common.cpp`. Multimodal helper additions flow through the upstream-compiled `mtmd-helper.cpp` and `server-common.cpp`; project does not reference any `mtmd_helper_video_*` symbol — no project source changes required. Could be exposed in a future Java API as `InferenceParameters.setVideoPath(String)`|
334
+
|~b9555–b9621 |`common/common.h`| New `common_params` fields: `path_prompts_log_dir` (prompt-logging output directory, string) and `mtmd_batch_max_tokens` (multimodal batch token limit, default 1024). Both additive with harmless defaults. Not surfaced by `ModelParameters` today — could be added in a future enhancement. No project source changes required |
335
+
|~b9555–b9621 |`src/llama-ext.h`| New EAGLE3 speculative-decoding support APIs: `llama_set_embeddings_layer_inp(ctx, lid, value)`, `llama_get_embeddings_layer_inp(ctx, lid)`, `llama_model_target_layer_ids(model)` → `const int32_t*`, `llama_model_target_layer_ids_n(model)` → `uint32_t`. New `LLM_ARCH_EAGLE3` model architecture; new `llama_model_eagle3` struct in upstream model sources. EAGLE3 enables full encoder+decoder graph implementation for speculative decoding. All consumed inside upstream-compiled `speculative.cpp` and model TUs; project does not reference any of these symbols — no project source changes required. Could be exposed later as a speculative-decoding backend type in `ModelParameters`|
336
+
|~b9555–b9621 |`src/llama-graph.h` + `src/llama-graph.cpp`|`llm_graph_result::set_outputs()` signature changed: now takes a `const llm_graph_params &` parameter (was no-parameter). New `t_layer_inp` vector added to `llm_graph_result` for layer-input embedding extraction (used by EAGLE3). Internal graph-building API; not called from project sources — no project source changes required |
337
+
|~b9555–b9621 |`src/llama-context.cpp`|`llama_context` now initializes `embeddings_layer_inp` storage for EAGLE3 layer-input extraction; `n_outputs_max` is forced to `n_batch` when `llama_model_has_encoder()` returns true (encoder models always need all outputs). Internal context lifecycle; no project sources reference these fields — no project source changes required |
338
+
|~b9555–b9621 |`vendor/cpp-httplib/httplib.h` + `httplib.cpp`| cpp-httplib bumped to v0.47.0. Compiled automatically via FetchContent — no project source changes required |
339
+
|~b9555–b9621 |`ggml/src/ggml-cuda/ggml-cuda.cu`|`ggml_concat` on CUDA now handles F16, BF16, I8, I16, I32, I64 element types in addition to F32; `active_count` tracking added to CUDA context to prevent memory leak from lazy `cudaMemGetInfo` context creation. Internal CUDA backend, no project changes required |
340
+
|~b9555–b9621 |`ggml/src/ggml-vulkan/` + Vulkan shaders | New `VK_VALVE_shader_mixed_float_dot_product` extension support for F16→F32 fused dot products (`dot2_f16`) in flash attention and GEMM matmul. Internal Vulkan backend, no project changes required |
341
+
|~b9555–b9621 |`ggml/src/ggml-opencl/` + OpenCL kernels | New Q5_0 and Q5_1 GEMM/GEMV noshuffle kernels for Qualcomm Adreno GPUs. Internal OpenCL backend (affects `opencl-android-aarch64` classifier build only); no project source changes required |
342
+
|~b9555–b9621 |`ggml/src/ggml-cuda/ssm-scan.cu`| Added `__syncthreads()` before the final reduction stage to prevent shared-memory race conditions on multi-warp SSM scan. Bug fix, internal CUDA backend, no project changes required |
0 commit comments