Merge pull request #125 from bernardladenthin/claude/update-b9106-compatibility-SPPw1

bernardladenthin · web-flow · commit 62274cab841a · 2026-05-11T18:00:30.000+02:00
Upgrade llama.cpp to b9106
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
 
 Java bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) via JNI, providing a high-level API for LLM inference in Java. The Java layer communicates with a native C++ library through JNI.
 
-Current llama.cpp pinned version: **b9103**
+Current llama.cpp pinned version: **b9106**
 
 ## Upgrading CUDA Version
 
@@ -250,6 +250,9 @@ Also review the project `CMakeLists.txt` for build-system-level breaks (e.g. ren
 | ~b9094–b9102 | `src/llama-model.cpp` | `ggml/src/ggml-virtgpu/ggml-backend-device.cpp` gains `#include <mutex>` for `std::once_flag`; internal backend fix, no project changes required |
 | ~b9094–b9102 | `vendor/cpp-httplib/httplib.cpp` + `httplib.h` | Security fix: chunk-size parsing replaced `strtoul` with manual hex-digit scanning to prevent overflow and reject invalid chunk extensions; version bumped to 0.43.4; compiled automatically, no project changes required |
 | ~b9102–b9103 | `vendor/cpp-httplib/httplib.cpp` + `httplib.h` | cpp-httplib bumped to v0.44.0: (1) RFC 9110 §5.5 compliance — header field values are no longer percent-decoded by the recipient in `parse_header`; `Location`/`Referer` special-casing removed; callers that need URI-component decoding must call `decode_uri_component()` explicitly; (2) `ThreadPool` constructor is now exception-safe — if thread creation fails partway through, already-started workers are signalled to exit and joined before rethrowing, preventing `std::terminate` from joinable threads in the destructor; compiled automatically, no project changes required |
+| ~b9103–b9106 | `ggml/src/ggml-vulkan/ggml-vulkan.cpp` + Vulkan shaders | Vulkan flash attention refactored: `pipeline_flash_attn_f32_f16` changed from a per-type array of maps to a single map; mixed K/V quant types (e.g. Q4_0 K + F16 V) now supported on all Vulkan FA paths (scalar, cm1, cm2) rather than coopmat2 only; per-type SPIR-V variants replaced by two generic modules (`flash_attn_f32_f16` and `flash_attn_f32_f16_int8`) that select K/V type at runtime via `FaTypeK`/`FaTypeV` spec constants; new `flash_attn_dequant.glsl` contains aliased SSBO views and an uber `dequantize4()` switch; the K/V type mismatch guard removed from `ggml_backend_vk_device_supports_op`; internal Vulkan backend refactor, no project changes required |
+| ~b9103–b9106 | `ggml/src/ggml-cuda/argsort.cu` | Added `#include <cuda/iterator>` for CCCL ≥ 3.1 strided-iterator path; internal CUDA backend, no project changes required |
+| ~b9103–b9106 | `convert_hf_to_gguf.py` | Mistral Medium 3.5 mmproj support: `n_embd_text` now reads `"dim"` key instead of `"hidden_dim"`; negative `img_break_tok_id` placeholders resolved from `tekken.json` or `tokenizer.json`; conversion tool only, no project changes required |
 
 ## Build Commands
 
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -97,7 +97,7 @@ set(GGML_AVX512  OFF CACHE BOOL "" FORCE)
 FetchContent_Declare(
 	llama.cpp
 	GIT_REPOSITORY https://github.com/ggerganov/llama.cpp.git
-	GIT_TAG        b9103
+	GIT_TAG        b9106
 )
 FetchContent_MakeAvailable(llama.cpp)
 
diff --git a/README.md b/README.md
@@ -1,5 +1,5 @@
 ![Java 8+](https://img.shields.io/badge/Java-8%2B-informational)
-[![llama.cpp b9103](https://img.shields.io/badge/llama.cpp-%23b9103-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9103)
+[![llama.cpp b9106](https://img.shields.io/badge/llama.cpp-%23b9106-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9106)
 [![Snapshot](https://img.shields.io/badge/snapshot-latest-informational)](https://github.com/bernardladenthin/java-llama.cpp/releases/tag/snapshot)
 
 # Java Bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp)

Original file line number	Diff line number	Diff line change
`@@ -97,7 +97,7 @@ set(GGML_AVX512 OFF CACHE BOOL "" FORCE)`
`97`	`97`	`FetchContent_Declare(`
`98`	`98`	`llama.cpp`
`99`	`99`	`GIT_REPOSITORY https://github.com/ggerganov/llama.cpp.git`
`100`		`- GIT_TAG b9103`
	`100`	`+ GIT_TAG b9106`
`101`	`101`	`)`
`102`	`102`	`FetchContent_MakeAvailable(llama.cpp)`
`103`	`103`