Skip to content

Commit 6c8641b

Browse files
committed
Upgrade llama.cpp from b9279 to b9284
b9284 flipped LLAMA_BUILD_APP default to ON; pin it OFF explicitly so the unified binary is not configured when consumed via FetchContent. Other upstream changes (tool *-impl libraries switched to default type, hybriddna k-mer marker fix, CUDA PDL arch gate) are internal and require no project changes. https://claude.ai/code/session_01JrzzN8oBCjasZMQ6M1aXUc
1 parent fc55802 commit 6c8641b

3 files changed

Lines changed: 9 additions & 3 deletions

File tree

CLAUDE.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
66

77
Java bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) via JNI, providing a high-level API for LLM inference in Java. The Java layer communicates with a native C++ library through JNI.
88

9-
Current llama.cpp pinned version: **b9279**
9+
Current llama.cpp pinned version: **b9284**
1010

1111
## Upgrading CUDA Version
1212

@@ -357,6 +357,10 @@ Also review the project `CMakeLists.txt` for build-system-level breaks (e.g. ren
357357
| ~b9264–b9279 | `examples/save-load-state/` removed, `tests/test-save-load-state.cpp` added; `tools/{batched-bench,fit-params,quantize,perplexity}/CMakeLists.txt` | The `llama-save-load-state` example binary was removed and re-homed as a CTest target; the four remaining standalone tools were each split into a `*-impl` static library + a thin `main.cpp` wrapper (mirroring the b9245 split of cli/completion/llama-bench/server), with the entry-point renamed to `llama_batched_bench` / `llama_fit_params` / `llama_quantize` / `llama_perplexity` to satisfy `-Wmissing-declarations`. Project does not compile any of these `.cpp` files (only `server-context.cpp`, `server-queue.cpp`, `server-task.cpp`, `server-models.cpp` — see `CMakeLists.txt`), so no impact |
358358
| ~b9264–b9279 | `app/` (`CMakeLists.txt`, `llama.cpp`) | `llama-app` unified binary gains four new subcommands (`batched-bench`, `fit-params`, `quantize`, `perplexity`) and sets `LLAMA_APP_CMD` in the env before dispatching so that the router can re-inject the subcommand into spawned child argv. Guarded by `LLAMA_BUILD_APP=OFF` default — project doesn't enable it, no impact |
359359
| ~b9264–b9279 | `conversion/base.py` + `conversion/llama.py` | New `_set_vocab_hybriddna()` Python helper that emits a `gpt2`-style BPE vocab tagged as `tokenizer.model = "hybriddna"`; `LlamaModel.set_vocab()` dispatches to it when `tokenizer_config.json` declares `"tokenizer_class": "HybridDNATokenizer"`; `add_prefix_space` handling moved earlier in the same method. Conversion tooling only, not compiled by project |
360+
| ~b9279–b9284 | upstream `CMakeLists.txt` | `LLAMA_BUILD_APP` default flipped `OFF``ON`. Project's `LLAMA_BUILD_TOOLS` is OFF (FetchContent, `LLAMA_STANDALONE=OFF`), so `tools/`-dependent app targets are not configured; nevertheless `CMakeLists.txt:108` now explicitly forces `set(LLAMA_BUILD_APP OFF CACHE BOOL "" FORCE)` to keep the cache pinned across upgrades |
361+
| ~b9279–b9284 | `tools/{batched-bench,cli,completion,fit-params,llama-bench,perplexity,quantize,server}/CMakeLists.txt` | Each `*-impl` target switched from `add_library(... STATIC ...)` to default library type (becomes SHARED when `BUILD_SHARED_LIBS=ON`); added `WINDOWS_EXPORT_ALL_SYMBOLS ON` and conditional `install(TARGETS ... LIBRARY)` under `LLAMA_TOOLS_INSTALL`. Project doesn't enable `LLAMA_BUILD_TOOLS`, so none of these targets are configured — no impact |
362+
| ~b9279–b9284 | `src/llama-vocab.cpp` + `conversion/base.py` | HybridDNA tokenizer fix: k-mers are now stored in `token_to_id` with a reserved `\xee\x80\x80` (U+E000) suffix to disambiguate them from identical base-vocab BPE tokens (e.g. `CCCCCC`); the suffix is stripped from `id_to_token` text after vocab load. Pure tokenizer internals, not exposed via JNI — no project changes required |
363+
| ~b9279–b9284 | `ggml/src/ggml-cuda/common.cuh` | PDL-launch gating now uses `ggml_cuda_highest_compiled_arch(cc) >= GGML_CUDA_CC_HOPPER` instead of the raw device cc — fixes false negatives when running on a Hopper device with a binary compiled for an older arch. Internal CUDA backend, no project changes required |
360364

361365
## Build Commands
362366

CMakeLists.txt

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -105,10 +105,12 @@ set(GGML_FMA ON CACHE BOOL "" FORCE)
105105
set(GGML_F16C ON CACHE BOOL "" FORCE)
106106
set(GGML_AVX512 OFF CACHE BOOL "" FORCE)
107107
set(LLAMA_BUILD_WEBUI OFF CACHE BOOL "" FORCE)
108+
# b9284 flipped LLAMA_BUILD_APP default to ON; we don't build the unified binary
109+
set(LLAMA_BUILD_APP OFF CACHE BOOL "" FORCE)
108110
FetchContent_Declare(
109111
llama.cpp
110112
GIT_REPOSITORY https://github.com/ggerganov/llama.cpp.git
111-
GIT_TAG b9279
113+
GIT_TAG b9284
112114
)
113115
FetchContent_MakeAvailable(llama.cpp)
114116

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
**Build:**
22
![Java 11+](https://img.shields.io/badge/Java-11%2B-informational)
33
![JUnit](https://img.shields.io/badge/tested%20with-JUnit4-yellow)
4-
[![llama.cpp b9279](https://img.shields.io/badge/llama.cpp-%23b9279-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9279)
4+
[![llama.cpp b9284](https://img.shields.io/badge/llama.cpp-%23b9284-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9284)
55
[![Publish](https://github.com/bernardladenthin/java-llama.cpp/actions/workflows/publish.yml/badge.svg)](https://github.com/bernardladenthin/java-llama.cpp/actions/workflows/publish.yml)
66
[![CodeQL](https://github.com/bernardladenthin/java-llama.cpp/actions/workflows/codeql.yml/badge.svg)](https://github.com/bernardladenthin/java-llama.cpp/actions/workflows/codeql.yml)
77

0 commit comments

Comments
 (0)