Upgrade llama.cpp from b9637 to b9642

claude · claude · commit f21ebd5d69c4 · 2026-06-15T11:30:44.000Z
No project source changes required — the b9637..b9642 range only touches
the CUDA/WebGPU backends, the Python LoRA converter, an upstream backend-ops
test, and the WebUI. None of the headers consumed by jllama.cpp / server-*
/ utils.hpp changed.

- CMakeLists.txt: GIT_TAG b9637 -&gt; b9642
- README.md: badge + release link
- CLAUDE.md: pinned-version line + FetchContent note
- docs/history: appended b9637..b9642 breaking-changes rows
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
 
 Java bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) via JNI, providing a high-level API for LLM inference in Java. The Java layer communicates with a native C++ library through JNI.
 
-Current llama.cpp pinned version: **b9637**
+Current llama.cpp pinned version: **b9642**
 
 ## Upgrading CUDA Version
 
@@ -590,7 +590,7 @@ ctest --test-dir build --output-on-failure -R "ResultsToJson"
 
 #### Upstream source location (in CMake build tree)
 
-llama.cpp is fetched via CMake FetchContent, pinned to `GIT_TAG b9637`.
+llama.cpp is fetched via CMake FetchContent, pinned to `GIT_TAG b9642`.
 
 ```
 build/_deps/llama.cpp-src/tools/server/   ← server-task.h, server-common.h, etc.
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -139,7 +139,7 @@ set(LLAMA_BUILD_APP OFF CACHE BOOL "" FORCE)
 FetchContent_Declare(
 	llama.cpp
 	GIT_REPOSITORY https://github.com/ggerganov/llama.cpp.git
-	GIT_TAG        b9637
+	GIT_TAG        b9642
 )
 FetchContent_MakeAvailable(llama.cpp)
 
diff --git a/README.md b/README.md
@@ -1,7 +1,7 @@
 **Build:**  
 ![Java 8+](https://img.shields.io/badge/Java-8%2B-informational)  
 ![Platform](https://img.shields.io/badge/Platform-Linux%20%7C%20macOS%20%7C%20Windows%20%7C%20Android-lightgrey)  
-[![llama.cpp b9637](https://img.shields.io/badge/llama.cpp-%23b9637-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9637)  
+[![llama.cpp b9642](https://img.shields.io/badge/llama.cpp-%23b9642-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9642)  
 [![JPMS](https://img.shields.io/badge/JPMS-modular%20JAR-25A162)](https://openjdk.org/projects/jigsaw/)  
 ![JUnit](https://img.shields.io/badge/tested%20with-JUnit6-25A162)  
 [![JSpecify](https://img.shields.io/badge/JSpecify-1.0.0%20%40NullMarked-25A162)](https://jspecify.dev)  
diff --git a/docs/history/llama-cpp-breaking-changes.md b/docs/history/llama-cpp-breaking-changes.md
@@ -352,3 +352,7 @@ Used during `llama.cpp` version bumps: when upgrading, scan this file from the r
 | b9621–b9637 | `ggml/src/ggml-vulkan/` + shaders | Unary shaders consolidated into one templated `unary.comp`; new `EXPM1` Vulkan op; GLU push-constants reworked (per-dim strides + misalign offsets); fastdiv `L` values byte-packed to stay under the 128B push-constant limit. Internal Vulkan backend — the project builds CPU/CUDA/Metal/OpenCL only, never Vulkan. No project changes required |
 | b9621–b9637 | `tools/server/server-http.cpp`, `tools/ui/`, `scripts/ui-assets.cmake` | Optional gzip-compressed WebUI asset serving (`LLAMA_UI_GZIP`, `llama_ui_use_gzip()`). The project compiles `server-context/queue/task/models` but not `server-http.cpp` or `tools/ui`, so the HTTP/WebUI layer is absent from `jllama`. No project changes required |
 | b9621–b9637 | `tools/cli/cli.cpp`, `.devops/*.Dockerfile`, `.github/`, `conversion/`, `convert_hf_to_gguf_update.py`, `gguf-py/`, `models/templates/Cohere2MoE.jinja`, `docs/`, `tests/` | CLI preserved-token wiring, Docker image `docker.io/` prefixes, CI labeler/release tweaks, Python GGUF converters, the new model template asset, doc typos, and upstream tests. None are compiled into `jllama` or shipped by the project. No project changes required |
+| b9637–b9642 | `ggml/src/ggml-cuda/ggml-cuda.cu` | `ggml_backend_cuda_device_supports_op` for `GGML_OP_REPEAT` tightened: the supported-types check changed from a blocklist (`!= I32 && != I16`) to an allowlist (`== F32 \|\| == F16`), because the CUDA REPEAT path only implements F32/F16 and other types asserted at runtime. Internal CUDA backend; the project switches on no op-support enum and never calls this. No project changes required |
+| b9637–b9642 | `ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_decls.tmpl` | WebGPU matmul shared-memory dequant templates rewritten: legacy/k-quant `#elif` chains converted to independent `#if defined(...)` blocks, and the i-quant (super-block 256) IQ1/IQ2/IQ3/IQ4 paths reworked to process `NQ` quants per thread with vectorized `store_shmem_iquants`/`create_iq_gw4` helpers. Internal WebGPU backend — the project builds CPU/CUDA/Metal/OpenCL only, never WebGPU. No project changes required |
+| b9637–b9642 | `tools/ui/`, `tools/ui/src/lib/utils/heic-to-jpeg.ts` (new) | WebUI gains a "render thinking as Markdown" display setting and client-side HEIC/HEIF image upload support (lazy CDN-loaded `heic-to` decoder → JPEG). The project compiles `server-context/queue/task/models` but not `tools/ui`, so the WebUI is absent from `jllama`. No project changes required |
+| b9637–b9642 | `convert_lora_to_gguf.py`, `tests/test-backend-ops.cpp` | LoRA converter now resolves the base-model architecture via `get_model_architecture(hparams, ModelType.TEXT)` instead of hand-reading `text_config`/`architectures`; a `GGML_TYPE_BF16` `test_repeat` case was added to the backend-ops test. Python tooling and an upstream test — neither is compiled into `jllama`. No project changes required |

Original file line number	Diff line number	Diff line change
`@@ -139,7 +139,7 @@ set(LLAMA_BUILD_APP OFF CACHE BOOL "" FORCE)`
`139`	`139`	`FetchContent_Declare(`
`140`	`140`	`llama.cpp`
`141`	`141`	`GIT_REPOSITORY https://github.com/ggerganov/llama.cpp.git`
`142`		`- GIT_TAG b9637`
	`142`	`+ GIT_TAG b9642`
`143`	`143`	`)`
`144`	`144`	`FetchContent_MakeAvailable(llama.cpp)`
`145`	`145`