Skip to content

Commit 9bb4b66

Browse files
committed
Upgrade llama.cpp from b9333 to b9354
No breaking API changes requiring project source modifications. New additions compiled automatically via FetchContent: - New Talkie model architecture (LLM_ARCH_TALKIE, NEOX rope, logit scale) - New LLAMA_VOCAB_PRE_TYPE_MINICPM5 tokenizer pre-type - Mistral3 NVFP4 scale tensor bug fix in build_ffn/build_moe_ffn - Server HTTP: https:// prefix when SSL enabled (listening_address fix) - SYCL virtual memory pool (GGML_SYCL_ENABLE_VMM) - CUDA FWHT graceful fallback (bool return instead of ABORT) - Vulkan conv2d cm1 cooperative matrix support - WebGPU MMVQ mat-vec path using packed_4x8_integer_dot_product https://claude.ai/code/session_011iZwreRR2WrGzK4WN6oo98
1 parent 1092625 commit 9bb4b66

3 files changed

Lines changed: 11 additions & 3 deletions

File tree

CLAUDE.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
66

77
Java bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) via JNI, providing a high-level API for LLM inference in Java. The Java layer communicates with a native C++ library through JNI.
88

9-
Current llama.cpp pinned version: **b9333**
9+
Current llama.cpp pinned version: **b9354**
1010

1111
## Upgrading CUDA Version
1212

@@ -434,6 +434,14 @@ Also review the project `CMakeLists.txt` for build-system-level breaks (e.g. ren
434434
| ~b9305–b9333 | `src/llama-arch.cpp` | `LLM_TENSOR_FFN_LATENT_DOWN` and `LLM_TENSOR_FFN_LATENT_UP` probe op changed from `GGML_OP_MUL` to `GGML_OP_MUL_MAT`; fixes Nemotron 3 Super latent projections not staying on GPU (buft probe must use `MUL_MAT` to keep them there); internal upstream fix, no project changes required |
435435
| ~b9305–b9333 | `vendor/cpp-httplib/httplib.{h,cpp}` | Bumped to v0.45.1: `close_socket`, `shutdown_socket`, `Server::stop` marked `noexcept`; macOS Keychain cert loading migrated from deprecated `SecTrustCopyAnchorCertificates` to `SecTrustSettingsCopyCertificates` (all three trust domains: system, admin, user); `CPPHTTPLIB_USE_CERTS_FROM_MACOSX_KEYCHAIN` now restricted to `TARGET_OS_OSX` only with compile-time `#error` on iOS/tvOS/watchOS; compiled automatically, no project changes required |
436436
| ~b9305–b9333 | `common/common.h` | New `string_lcs(std::string_view a, std::string_view b)` function (longest common substring via DP); additive, not used by project directly |
437+
| ~b9333–b9354 | `src/models/talkie.cpp` (new) + `src/llama-arch.h/cpp` + `src/llama-model.cpp` + `src/llama-vocab.cpp/h` | New Talkie model architecture (`LLM_ARCH_TALKIE`); uses NEOX rope type; embedding skip connections via `out_scale`; per-head Q gain via `attn_q_norm`; logit scale; new `LLAMA_VOCAB_PRE_TYPE_MINICPM5 = 52` ("minicpm5" pre-type with `ignore_merges = true`); "talkie" tokenizer_pre mapped to GPT4O; `Gemma4ForCausalLM` registered as Gemma4 in HF conversion map; all additive, no project source changes required |
438+
| ~b9333–b9354 | `src/models/mistral3.cpp` | Dense FFN now passes `ffn_up_s`/`ffn_gate_s`/`ffn_down_s` instead of `nullptr`; MoE passes `ffn_up_exps_s`/`ffn_gate_exps_s`/`ffn_down_exps_s` to `build_moe_ffn`; bug fix for NVFP4 Mistral3/Mistral-MoE models; upstream only, no project changes required |
439+
| ~b9333–b9354 | `tools/server/server-http.h` + `server-http.cpp` | `bool is_ssl = false` field added to `server_http_context`; `listening_address` now uses `https://` prefix when SSL is configured (was always `http://`); compiled from upstream, no project changes required |
440+
| ~b9333–b9354 | `ggml/src/ggml-sycl/ggml-sycl.cpp` | Virtual memory pool (`ggml_sycl_pool_vmm`) implemented when `SYCL_EXT_ONEAPI_VIRTUAL_MEM` is available; `GGML_SYCL_ENABLE_VMM` env var (default `1`) controls it; `DEBUG_SYCL_MALLOC` compile flag for verbose allocation logging; `vmm_granularity` field in `sycl_device_info`; internal SYCL backend, no project changes required |
441+
| ~b9333–b9354 | `ggml/src/ggml-cuda/fwht.cu` + `fwht.cuh` | `ggml_cuda_op_fwht` return type changed `void` → `bool`; returns `false` for non-contiguous tensors or unsupported N values instead of calling `GGML_ABORT`; caller in `ggml-cuda.cu` now skips FWHT gracefully; internal CUDA backend, no project changes required |
442+
| ~b9333–b9354 | `ggml/src/ggml-vulkan/ggml-vulkan.cpp` + `conv2d_mm.comp` | Cooperative matrix 1 (cm1) path for conv2d; new `CONV_SHAPE_64x128` tile size; `aligned` spec constant skips bounds checks when K/CRS/NPQ are tile-aligned; `csh_store` stages cm2/cm1 output through shared memory for coalesced global stores; internal Vulkan backend, no project changes required |
443+
| ~b9333–b9354 | `ggml/src/ggml-webgpu/` | New MMVQ path for mat-vec using `packed_4x8_integer_dot_product`; legacy `mul_mat.wgsl` removed (replaced by register-tile path); new `quantize_q8.wgsl` and `mul_mat_vec_q_acc.tmpl`; vendor and dot-product capability detection at init; `q8_1.m` renamed to `q8_1.s` in WGSL struct; internal WebGPU backend, no project changes required |
444+
| ~b9333–b9354 | upstream CI (`.github/workflows/`) | CANN and SYCL builds disabled to save Actions resources; macOS builds moved to `build-apple.yml`; cache keys prefixed with `cache-gha-`; `[no release]` commit message token skips release pipeline; no project changes required |
437445

438446
## Build Commands
439447

CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,7 @@ set(LLAMA_BUILD_APP OFF CACHE BOOL "" FORCE)
114114
FetchContent_Declare(
115115
llama.cpp
116116
GIT_REPOSITORY https://github.com/ggerganov/llama.cpp.git
117-
GIT_TAG b9333
117+
GIT_TAG b9354
118118
)
119119
FetchContent_MakeAvailable(llama.cpp)
120120

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
[![ArchUnit](https://img.shields.io/badge/tested%20with-ArchUnit-c71a36)](https://www.archunit.org)
66
[![SpotBugs](https://img.shields.io/badge/analyzed%20with-SpotBugs-3b5998)](https://spotbugs.github.io)
77
[![JMH](https://img.shields.io/badge/benchmarked%20with-JMH-brightgreen)](https://github.com/openjdk/jmh)
8-
[![llama.cpp b9333](https://img.shields.io/badge/llama.cpp-%23b9333-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9333)
8+
[![llama.cpp b9354](https://img.shields.io/badge/llama.cpp-%23b9354-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9354)
99
[![Publish](https://github.com/bernardladenthin/java-llama.cpp/actions/workflows/publish.yml/badge.svg)](https://github.com/bernardladenthin/java-llama.cpp/actions/workflows/publish.yml)
1010
[![CodeQL](https://github.com/bernardladenthin/java-llama.cpp/actions/workflows/codeql.yml/badge.svg)](https://github.com/bernardladenthin/java-llama.cpp/actions/workflows/codeql.yml)
1111

0 commit comments

Comments
 (0)