Skip to content

Commit f21ebd5

Browse files
committed
Upgrade llama.cpp from b9637 to b9642
No project source changes required — the b9637..b9642 range only touches the CUDA/WebGPU backends, the Python LoRA converter, an upstream backend-ops test, and the WebUI. None of the headers consumed by jllama.cpp / server-* / utils.hpp changed. - CMakeLists.txt: GIT_TAG b9637 -> b9642 - README.md: badge + release link - CLAUDE.md: pinned-version line + FetchContent note - docs/history: appended b9637..b9642 breaking-changes rows
1 parent f543cb8 commit f21ebd5

4 files changed

Lines changed: 8 additions & 4 deletions

File tree

CLAUDE.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
66

77
Java bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) via JNI, providing a high-level API for LLM inference in Java. The Java layer communicates with a native C++ library through JNI.
88

9-
Current llama.cpp pinned version: **b9637**
9+
Current llama.cpp pinned version: **b9642**
1010

1111
## Upgrading CUDA Version
1212

@@ -590,7 +590,7 @@ ctest --test-dir build --output-on-failure -R "ResultsToJson"
590590

591591
#### Upstream source location (in CMake build tree)
592592

593-
llama.cpp is fetched via CMake FetchContent, pinned to `GIT_TAG b9637`.
593+
llama.cpp is fetched via CMake FetchContent, pinned to `GIT_TAG b9642`.
594594

595595
```
596596
build/_deps/llama.cpp-src/tools/server/ ← server-task.h, server-common.h, etc.

CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,7 @@ set(LLAMA_BUILD_APP OFF CACHE BOOL "" FORCE)
139139
FetchContent_Declare(
140140
llama.cpp
141141
GIT_REPOSITORY https://github.com/ggerganov/llama.cpp.git
142-
GIT_TAG b9637
142+
GIT_TAG b9642
143143
)
144144
FetchContent_MakeAvailable(llama.cpp)
145145

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
**Build:**
22
![Java 8+](https://img.shields.io/badge/Java-8%2B-informational)
33
![Platform](https://img.shields.io/badge/Platform-Linux%20%7C%20macOS%20%7C%20Windows%20%7C%20Android-lightgrey)
4-
[![llama.cpp b9637](https://img.shields.io/badge/llama.cpp-%23b9637-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9637)
4+
[![llama.cpp b9642](https://img.shields.io/badge/llama.cpp-%23b9642-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9642)
55
[![JPMS](https://img.shields.io/badge/JPMS-modular%20JAR-25A162)](https://openjdk.org/projects/jigsaw/)
66
![JUnit](https://img.shields.io/badge/tested%20with-JUnit6-25A162)
77
[![JSpecify](https://img.shields.io/badge/JSpecify-1.0.0%20%40NullMarked-25A162)](https://jspecify.dev)

docs/history/llama-cpp-breaking-changes.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -352,3 +352,7 @@ Used during `llama.cpp` version bumps: when upgrading, scan this file from the r
352352
| b9621–b9637 | `ggml/src/ggml-vulkan/` + shaders | Unary shaders consolidated into one templated `unary.comp`; new `EXPM1` Vulkan op; GLU push-constants reworked (per-dim strides + misalign offsets); fastdiv `L` values byte-packed to stay under the 128B push-constant limit. Internal Vulkan backend — the project builds CPU/CUDA/Metal/OpenCL only, never Vulkan. No project changes required |
353353
| b9621–b9637 | `tools/server/server-http.cpp`, `tools/ui/`, `scripts/ui-assets.cmake` | Optional gzip-compressed WebUI asset serving (`LLAMA_UI_GZIP`, `llama_ui_use_gzip()`). The project compiles `server-context/queue/task/models` but not `server-http.cpp` or `tools/ui`, so the HTTP/WebUI layer is absent from `jllama`. No project changes required |
354354
| b9621–b9637 | `tools/cli/cli.cpp`, `.devops/*.Dockerfile`, `.github/`, `conversion/`, `convert_hf_to_gguf_update.py`, `gguf-py/`, `models/templates/Cohere2MoE.jinja`, `docs/`, `tests/` | CLI preserved-token wiring, Docker image `docker.io/` prefixes, CI labeler/release tweaks, Python GGUF converters, the new model template asset, doc typos, and upstream tests. None are compiled into `jllama` or shipped by the project. No project changes required |
355+
| b9637–b9642 | `ggml/src/ggml-cuda/ggml-cuda.cu` | `ggml_backend_cuda_device_supports_op` for `GGML_OP_REPEAT` tightened: the supported-types check changed from a blocklist (`!= I32 && != I16`) to an allowlist (`== F32 \|\| == F16`), because the CUDA REPEAT path only implements F32/F16 and other types asserted at runtime. Internal CUDA backend; the project switches on no op-support enum and never calls this. No project changes required |
356+
| b9637–b9642 | `ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_decls.tmpl` | WebGPU matmul shared-memory dequant templates rewritten: legacy/k-quant `#elif` chains converted to independent `#if defined(...)` blocks, and the i-quant (super-block 256) IQ1/IQ2/IQ3/IQ4 paths reworked to process `NQ` quants per thread with vectorized `store_shmem_iquants`/`create_iq_gw4` helpers. Internal WebGPU backend — the project builds CPU/CUDA/Metal/OpenCL only, never WebGPU. No project changes required |
357+
| b9637–b9642 | `tools/ui/`, `tools/ui/src/lib/utils/heic-to-jpeg.ts` (new) | WebUI gains a "render thinking as Markdown" display setting and client-side HEIC/HEIF image upload support (lazy CDN-loaded `heic-to` decoder → JPEG). The project compiles `server-context/queue/task/models` but not `tools/ui`, so the WebUI is absent from `jllama`. No project changes required |
358+
| b9637–b9642 | `convert_lora_to_gguf.py`, `tests/test-backend-ops.cpp` | LoRA converter now resolves the base-model architecture via `get_model_architecture(hparams, ModelType.TEXT)` instead of hand-reading `text_config`/`architectures`; a `GGML_TYPE_BF16` `test_repeat` case was added to the backend-ops test. Python tooling and an upstream test — neither is compiled into `jllama`. No project changes required |

0 commit comments

Comments
 (0)