Skip to content

Commit cf2d2b3

Browse files
committed
Upgrade llama.cpp from b9264 to b9279
All upstream changes in this range are additive or internal to llama.cpp. The two files this project compiles from upstream (server-context.cpp, server-models.cpp) receive only additive changes: new slot-info JSON fields, a destructor reorder, and a no-op LLAMA_APP_CMD env-var hook. No project source changes required. Verified: cmake build clean, all 417 C++ tests pass.
1 parent 674314c commit cf2d2b3

3 files changed

Lines changed: 15 additions & 3 deletions

File tree

CLAUDE.md

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
66

77
Java bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) via JNI, providing a high-level API for LLM inference in Java. The Java layer communicates with a native C++ library through JNI.
88

9-
Current llama.cpp pinned version: **b9264**
9+
Current llama.cpp pinned version: **b9279**
1010

1111
## Upgrading CUDA Version
1212

@@ -345,6 +345,18 @@ Also review the project `CMakeLists.txt` for build-system-level breaks (e.g. ren
345345
| ~b9245–b9264 | `conversion/hunyuan.py`, `gguf-py/gguf/constants.py`, `gguf-py/gguf/tensor_mapping.py` | HunyuanOCR / HunyuanVL unified in conversion: `VisionProjectorType.HUNYUANOCR` removed; `HunYuanVLForConditionalGeneration` registers a single `HunyuanVLVisionModel` + `HunyuanVLTextModel`; `vit.perceive.*` tensor mappings now only mention `HunyuanVL`. Python tooling, not compiled by project |
346346
| ~b9245–b9264 | `CMakeLists.txt` (upstream) | New `LLAMA_BUILD_APP` option (default OFF); deprecation shims for `LLAMA_BUILD_WEBUI`/`LLAMA_USE_PREBUILT_WEBUI``LLAMA_BUILD_UI`/`LLAMA_USE_PREBUILT_UI` preserved. Project's `set(LLAMA_BUILD_WEBUI OFF CACHE BOOL "" FORCE)` still works unchanged |
347347
| ~b9245–b9264 | `.devops/*.Dockerfile`, `.github/workflows/build-and-test-snapdragon.yml`, `scripts/snapdragon/`, `docs/backend/snapdragon/`, `tools/cli/README.md`, `tools/server/README.md`, `tools/mtmd/tests/` | Docker images add `conversion/` dir; snapdragon toolchain bumped v0.3 → v0.6 with `+dotprod+i8mm`; mtmd test rewritten to use CER/chrF metrics; doc-only updates. Not compiled by project |
348+
| ~b9264–b9279 | `tools/server/server-context.cpp` | Slot-info JSON adds three additive fields (`n_prompt_tokens`, `n_prompt_tokens_processed`, `n_prompt_tokens_cache`) on each in-flight task; `server_context_impl::destroy()` now resets `spec` / `ctx_dft` / `model_dft` BEFORE `llama_init.reset()` to avoid use-after-free when a draft model holds back-references into the target context. Compiled directly into jllama from upstream — no project source changes required |
349+
| ~b9264–b9279 | `tools/server/server-models.cpp` | Adds `#include <cstdlib>` and a `LLAMA_APP_CMD` env-var lookup in `server_model_meta::update_args()` to re-inject the unified-binary subcommand into router-spawned child argv. Env var is only set by the new `llama-app` binary (which this project does not build), so the lookup harmlessly returns null and the code path is a no-op. Compiled upstream-as-is, no project changes |
350+
| ~b9264–b9279 | `src/llama-vocab.cpp` | New `hybriddna` BPE tokenizer model (DNA k-mer tokenization with `<dna>…</dna>` tag handling, k=6, OOV fallback) registered as a BPE variant; reached only when GGUF metadata declares `tokenizer.model = "hybriddna"`. Adds a virtual destructor + virtual `tokenize()` to `llm_tokenizer_bpe_session` and a `llm_tokenizer_hybriddna_session` subclass; existing BPE callers unchanged. Additive, no project changes |
351+
| ~b9264–b9279 | `src/llama-graph.cpp` | `llm_graph_input_attn_kv_iswa::set_input()` / `can_reuse()` now guard the base and SWA tensor accesses behind `if (self_k_idxs && self_k_idxs->buffer)` / `if (self_k_idxs_swa && self_k_idxs_swa->buffer)`. Fixes crashes on models with only-SWA or only-non-SWA attention layers. Internal, no project impact |
352+
| ~b9264–b9279 | `src/models/qwen35.cpp` + `src/models/qwen35moe.cpp` | MTP draft sub-graph now builds an `inp_out_ids` input and applies `ggml_get_rows(cur, inp_out_ids)` just before the head norm, so only the requested output rows are projected. Bug fix for MTP draft path; internal, no project changes |
353+
| ~b9264–b9279 | `ggml/src/ggml-backend.cpp` | `ggml_backend_tensor_get_2d()` fast-path condition fixed: now checks `iface.get_tensor_2d == NULL` (was incorrectly checking `set_tensor_2d`), so multi-copy gets correctly fall back to the per-copy loop when the backend lacks `get_tensor_2d`. Bug fix, no project changes |
354+
| ~b9264–b9279 | `ggml/src/ggml-vulkan/` (`ggml-vulkan.cpp`, new `vulkan-shaders/snake.comp`, `vulkan-shaders-gen.cpp`) | New Vulkan Snake activation fusion: detects the 5-op chain `MUL → SIN → SQR → MUL → ADD` (matching CUDA b9094 introduction) and dispatches a single fused `snake_{f32,f16,bf16}` kernel `y = x + sin(a*x)^2 * inv_b`. New `ggml_vk_can_fuse_snake()` validates contiguity, 2D shape, and broadcast operands `[1, C, 1, 1]`. Internal Vulkan backend, no project changes |
355+
| ~b9264–b9279 | `ggml/src/ggml-metal/ggml-metal-ops.cpp` + `ggml-metal.metal` | `kernel_concat` / `kernel_set` now batch multiple small rows into one threadgroup (`nrptg = min(256/ne0, ne1)`, capped at 256 threads/group) to improve small-row throughput; `kernel_concat` gains an early-return bounds check. Internal Metal backend, no project changes |
356+
| ~b9264–b9279 | `ggml/src/ggml-hexagon/` (`ggml-hexagon.cpp`, `htp/ssm-conv.c`, `htp/rope-ops.c`) | SSM_CONV HVX kernel rewritten with VTCM-staged 32×32 fp32 in-register transpose and per-thread tiling (1 MiB VTCM budget); strictly-contiguous gate replaced with byte-stride checks (`nb[0]==sizeof(float)` and `nb[1]==ne[0]*sizeof(float)`); `rope_cache_init` / `mrope_cache_init` marked `__attribute__((noinline))` to reduce code-bloat on Hexagon. Internal Qualcomm DSP backend, no project changes |
357+
| ~b9264–b9279 | `examples/save-load-state/` removed, `tests/test-save-load-state.cpp` added; `tools/{batched-bench,fit-params,quantize,perplexity}/CMakeLists.txt` | The `llama-save-load-state` example binary was removed and re-homed as a CTest target; the four remaining standalone tools were each split into a `*-impl` static library + a thin `main.cpp` wrapper (mirroring the b9245 split of cli/completion/llama-bench/server), with the entry-point renamed to `llama_batched_bench` / `llama_fit_params` / `llama_quantize` / `llama_perplexity` to satisfy `-Wmissing-declarations`. Project does not compile any of these `.cpp` files (only `server-context.cpp`, `server-queue.cpp`, `server-task.cpp`, `server-models.cpp` — see `CMakeLists.txt`), so no impact |
358+
| ~b9264–b9279 | `app/` (`CMakeLists.txt`, `llama.cpp`) | `llama-app` unified binary gains four new subcommands (`batched-bench`, `fit-params`, `quantize`, `perplexity`) and sets `LLAMA_APP_CMD` in the env before dispatching so that the router can re-inject the subcommand into spawned child argv. Guarded by `LLAMA_BUILD_APP=OFF` default — project doesn't enable it, no impact |
359+
| ~b9264–b9279 | `conversion/base.py` + `conversion/llama.py` | New `_set_vocab_hybriddna()` Python helper that emits a `gpt2`-style BPE vocab tagged as `tokenizer.model = "hybriddna"`; `LlamaModel.set_vocab()` dispatches to it when `tokenizer_config.json` declares `"tokenizer_class": "HybridDNATokenizer"`; `add_prefix_space` handling moved earlier in the same method. Conversion tooling only, not compiled by project |
348360

349361
## Build Commands
350362

CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ set(LLAMA_BUILD_WEBUI OFF CACHE BOOL "" FORCE)
108108
FetchContent_Declare(
109109
llama.cpp
110110
GIT_REPOSITORY https://github.com/ggerganov/llama.cpp.git
111-
GIT_TAG b9264
111+
GIT_TAG b9279
112112
)
113113
FetchContent_MakeAvailable(llama.cpp)
114114

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
**Build:**
22
![Java 11+](https://img.shields.io/badge/Java-11%2B-informational)
33
![JUnit](https://img.shields.io/badge/tested%20with-JUnit4-yellow)
4-
[![llama.cpp b9264](https://img.shields.io/badge/llama.cpp-%23b9264-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9264)
4+
[![llama.cpp b9279](https://img.shields.io/badge/llama.cpp-%23b9279-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9279)
55
[![Publish](https://github.com/bernardladenthin/java-llama.cpp/actions/workflows/publish.yml/badge.svg)](https://github.com/bernardladenthin/java-llama.cpp/actions/workflows/publish.yml)
66
[![CodeQL](https://github.com/bernardladenthin/java-llama.cpp/actions/workflows/codeql.yml/badge.svg)](https://github.com/bernardladenthin/java-llama.cpp/actions/workflows/codeql.yml)
77

0 commit comments

Comments
 (0)