Skip to content

Commit acf7052

Browse files
committed
Upgrade llama.cpp from b9621 to b9637
No breaking API changes: none of the project's include surface (common.h, chat.h, speculative.h, mtmd.h, llama-cpp.h, arg.h, llama.h, download.h) is touched. The upgrade is purely additive. New capabilities gained automatically (no project code needed): - Cohere2 MoE ("North Code") model arch (MoE + MTP/NextN) with a dedicated chat parser, auto-detected via the existing specialized-template path. - Jinja chat-template engine fixes (count/d/e filter aliases, negative-step slicing, empty-separator split guard, empty-old_str replace). Vulkan unary-shader consolidation + EXPM1, WebUI gzip serving, CLI/Docker/CI/ Python-converter changes are all in TUs the project does not compile or ship. Verified: CMake configures cleanly against b9637 (ggml 0.15.1, CPU backend). docs/history/llama-cpp-breaking-changes.md gains the b9621-b9637 rows. https://claude.ai/code/session_01EQJCrQGmxCBf8WTCDuFE3X
1 parent 69a7ab0 commit acf7052

4 files changed

Lines changed: 9 additions & 3 deletions

File tree

CLAUDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
66

77
Java bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) via JNI, providing a high-level API for LLM inference in Java. The Java layer communicates with a native C++ library through JNI.
88

9-
Current llama.cpp pinned version: **b9621**
9+
Current llama.cpp pinned version: **b9637**
1010

1111
## Upgrading CUDA Version
1212

CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,7 @@ set(LLAMA_BUILD_APP OFF CACHE BOOL "" FORCE)
139139
FetchContent_Declare(
140140
llama.cpp
141141
GIT_REPOSITORY https://github.com/ggerganov/llama.cpp.git
142-
GIT_TAG b9621
142+
GIT_TAG b9637
143143
)
144144
FetchContent_MakeAvailable(llama.cpp)
145145

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
**Build:**
22
![Java 8+](https://img.shields.io/badge/Java-8%2B-informational)
33
![Platform](https://img.shields.io/badge/Platform-Linux%20%7C%20macOS%20%7C%20Windows%20%7C%20Android-lightgrey)
4-
[![llama.cpp b9621](https://img.shields.io/badge/llama.cpp-%23b9621-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9621)
4+
[![llama.cpp b9637](https://img.shields.io/badge/llama.cpp-%23b9637-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9637)
55
[![JPMS](https://img.shields.io/badge/JPMS-modular%20JAR-25A162)](https://openjdk.org/projects/jigsaw/)
66
![JUnit](https://img.shields.io/badge/tested%20with-JUnit6-25A162)
77
[![JSpecify](https://img.shields.io/badge/JSpecify-1.0.0%20%40NullMarked-25A162)](https://jspecify.dev)

docs/history/llama-cpp-breaking-changes.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -346,3 +346,9 @@ Used during `llama.cpp` version bumps: when upgrading, scan this file from the r
346346
| ~b9555–b9621 | `ggml/src/ggml-vulkan/` + Vulkan shaders | New `VK_VALVE_shader_mixed_float_dot_product` extension support for F16→F32 fused dot products (`dot2_f16`) in flash attention and GEMM matmul. Internal Vulkan backend, no project changes required |
347347
| ~b9555–b9621 | `ggml/src/ggml-opencl/` + OpenCL kernels | New Q5_0 and Q5_1 GEMM/GEMV noshuffle kernels for Qualcomm Adreno GPUs. Internal OpenCL backend (affects `opencl-android-aarch64` classifier build only); no project source changes required |
348348
| ~b9555–b9621 | `ggml/src/ggml-cuda/ssm-scan.cu` | Added `__syncthreads()` before the final reduction stage to prevent shared-memory race conditions on multi-warp SSM scan. Bug fix, internal CUDA backend, no project changes required |
349+
| b9621–b9637 | `common/chat.cpp` | New Cohere2 MoE ("North Code") chat parser `common_chat_params_init_cohere2moe` + auto-detection (template containing `<\|START_TEXT\|>` and `<\|START_ACTION\|>`). Purely additive — compiled in the `chat.cpp` TU and reached through the existing specialized-template path, so the project's `oaicompat_chat_params_parse` picks it up automatically. No project source changes required. **New feature:** Cohere2 MoE reasoning + JSON tool-call chat support |
350+
| b9621–b9637 | `common/jinja/runtime.cpp`, `common/jinja/value.cpp` | Jinja chat-template engine fixes: filter aliases `count``length`, `d``default`, `e``escape`; negative-step slice start/stop defaults; `split` raises on empty separator; `replace('', x)` now expands between every char. Compiled into `common`; improves chat-template compatibility automatically. No project source changes required |
351+
| b9621–b9637 | `src/llama-arch.{h,cpp}`, `src/models/cohere2moe.cpp` (new), `src/models/models.h`, `src/llama-model.cpp`, `src/llama-model-saver.cpp`, `src/llama-vocab.cpp` | New `LLM_ARCH_COHERE2MOE` architecture (MoE + MTP/NextN) with `llama_model_cohere2moe`; `cohere2moe` tokenizer pre-type (maps to `LLAMA_VOCAB_PRE_TYPE_TINY_AYA`); Cohere2 dense path gains `ffn_*_s` NVFP4 scale tensors; tied-NVFP4-`output` assert relaxed to allow sidecar LM-head scales. Additive enum/struct internal to libllama; the project includes `llama.h`, not `llama-arch.h`/`models.h`, and switches on no arch enum. No project source changes required. **New feature:** loads North-Mini-Code GGUFs |
352+
| b9621–b9637 | `ggml/src/ggml-vulkan/` + shaders | Unary shaders consolidated into one templated `unary.comp`; new `EXPM1` Vulkan op; GLU push-constants reworked (per-dim strides + misalign offsets); fastdiv `L` values byte-packed to stay under the 128B push-constant limit. Internal Vulkan backend — the project builds CPU/CUDA/Metal/OpenCL only, never Vulkan. No project changes required |
353+
| b9621–b9637 | `tools/server/server-http.cpp`, `tools/ui/`, `scripts/ui-assets.cmake` | Optional gzip-compressed WebUI asset serving (`LLAMA_UI_GZIP`, `llama_ui_use_gzip()`). The project compiles `server-context/queue/task/models` but not `server-http.cpp` or `tools/ui`, so the HTTP/WebUI layer is absent from `jllama`. No project changes required |
354+
| b9621–b9637 | `tools/cli/cli.cpp`, `.devops/*.Dockerfile`, `.github/`, `conversion/`, `convert_hf_to_gguf_update.py`, `gguf-py/`, `models/templates/Cohere2MoE.jinja`, `docs/`, `tests/` | CLI preserved-token wiring, Docker image `docker.io/` prefixes, CI labeler/release tweaks, Python GGUF converters, the new model template asset, doc typos, and upstream tests. None are compiled into `jllama` or shipped by the project. No project changes required |

0 commit comments

Comments
 (0)