Merge pull request #203 from bernardladenthin/claude/charming-gauss-9l007

bernardladenthin · web-flow · commit 982240c9a6b8 · 2026-05-31T13:38:55.000+02:00
Upgrade llama.cpp from b9437 to b9442
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
 
 Java bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) via JNI, providing a high-level API for LLM inference in Java. The Java layer communicates with a native C++ library through JNI.
 
-Current llama.cpp pinned version: **b9437**
+Current llama.cpp pinned version: **b9442**
 
 ## Upgrading CUDA Version
 
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -114,7 +114,7 @@ set(LLAMA_BUILD_APP OFF CACHE BOOL "" FORCE)
 FetchContent_Declare(
 	llama.cpp
 	GIT_REPOSITORY https://github.com/ggerganov/llama.cpp.git
-	GIT_TAG        b9437
+	GIT_TAG        b9442
 )
 FetchContent_MakeAvailable(llama.cpp)
 
diff --git a/README.md b/README.md
@@ -8,7 +8,7 @@
 [![Lincheck](https://img.shields.io/badge/tested%20with-Lincheck-7F52FF)](https://github.com/JetBrains/lincheck)  
 [![vmlens](https://img.shields.io/badge/tested%20with-vmlens-ff6f00)](https://vmlens.com)  
 [![JMH](https://img.shields.io/badge/benchmarked%20with-JMH-25A162)](https://openjdk.org/projects/code-tools/jmh/)  
-[![llama.cpp b9437](https://img.shields.io/badge/llama.cpp-%23b9437-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9437)  
+[![llama.cpp b9442](https://img.shields.io/badge/llama.cpp-%23b9442-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9442)  
 [![Publish](https://github.com/bernardladenthin/java-llama.cpp/actions/workflows/publish.yml/badge.svg)](https://github.com/bernardladenthin/java-llama.cpp/actions/workflows/publish.yml)  
 [![CodeQL](https://github.com/bernardladenthin/java-llama.cpp/actions/workflows/codeql.yml/badge.svg)](https://github.com/bernardladenthin/java-llama.cpp/actions/workflows/codeql.yml)  
 
diff --git a/docs/history/llama-cpp-breaking-changes.md b/docs/history/llama-cpp-breaking-changes.md
@@ -279,3 +279,7 @@ Used during `llama.cpp` version bumps: when upgrading, scan this file from the r
 | ~b9354–b9437 | `vendor/cpp-httplib/` | Bumped to v0.46.0: adds `Client::set_no_proxy(std::vector&lt;std::string&gt;)` with full hostname-suffix and IPv4/IPv6 CIDR matching; `Server::ThreadPool` constructor is exception-safe (already in v0.45.0); `Client::set_proxy()` now disconnects the held socket immediately so a later proxy change cannot reuse the old TLS session. Compiled automatically, no project changes required |
 | ~b9354–b9437 | `common/arg.cpp` (additive flags) | New `--spec-draft-backend-sampling` / `--no-spec-draft-backend-sampling` (env `LLAMA_ARG_SPEC_DRAFT_BACKEND_SAMPLING`) and `--skip-download` (mapped to `common_params::skip_download`). Both default-on / default-off in a way that preserves current Java behaviour. Consider exposing as `ModelParameters.setSpecDraftBackendSampling(boolean)` and `setSkipDownload(boolean)` in a follow-up &mdash; tracked under Open TODOs |
 | ~b9354–b9437 | `ggml/src/ggml-cuda/common.cuh` | `GGML_CUDA_USE_PDL` gating tightened: for MSVC, now requires CTK &#x2265; 12.3 (was 11.8) due to a compiler bug in the older Windows CUDA toolchains. Project's only CUDA build is Linux (dockcross, CUDA 13.2) so the MSVC gate has no CI impact; Windows CI builds CPU-only |
+| ~b9437–b9442 | `src/llama-vocab.{h,cpp}` + `src/llama-arch.{h,cpp}` | New `LLAMA_VOCAB_PRE_TYPE_WHITESPACE = 53` and `llm_tokenizer_whitespace_session` (used by jina-v2-base-zh embeddings); new "whitespace" tokenizer_model routed as `LLAMA_VOCAB_TYPE_BPE`; new `LLM_KV_TOKENIZER_NORMALIZER_LOWERCASE` key (`tokenizer.ggml.normalizer.lowercase`) read into `llama_vocab::impl::normalizer_lowercase`; new public accessor `llama_vocab::get_normalizer_lowercase()`. All additive &mdash; existing tokenizers untouched; new whitespace + lowercase normalizer is consumed automatically when loading a GGUF that sets these vocabulary keys, no project source or Java API changes required |
+| ~b9437–b9442 | `src/llama.cpp` | `llama_prepare_model_devices()` iGPU collection now appends only the FIRST `GGML_BACKEND_DEVICE_TYPE_IGPU` device (prevents duplicate iGPU registration on multi-iGPU hosts). Behavioural fix, single-line caller in `jllama.cpp` unchanged, no project source changes required |
+| ~b9437–b9442 | `tools/ui/embed.cpp` + `tools/ui/src/...` (Svelte) | Webasset embedder tightened printf format specifiers (`%lu` &#x2192; `%zu` and `PRIx64`); UI settings split `custom` into `customJson` + `customCss`; runtime CSS injection via `<svelte:head>`. Project does not ship the upstream UI, no impact |
+| ~b9437–b9442 | `gguf-py/`, `conversion/` (Python) | New `_set_vocab_whitespace()` helper and `add_normalizer_lowercase()` GGUF writer for the new whitespace tokenizer + lowercase normalizer keys (mirrors the vocab additions above); jina-v2 Roberta-tokenizer path now branches to whitespace when `tokenizer.json` declares a `Whitespace` pre-tokenizer. Python-side only, no impact on the Java/JNI build |

Original file line number	Diff line number	Diff line change
`@@ -114,7 +114,7 @@ set(LLAMA_BUILD_APP OFF CACHE BOOL "" FORCE)`
`114`	`114`	`FetchContent_Declare(`
`115`	`115`	`llama.cpp`
`116`	`116`	`GIT_REPOSITORY https://github.com/ggerganov/llama.cpp.git`
`117`		`- GIT_TAG b9437`
	`117`	`+ GIT_TAG b9442`
`118`	`118`	`)`
`119`	`119`	`FetchContent_MakeAvailable(llama.cpp)`
`120`	`120`