Upgrade llama.cpp from b9150 to b9151

claude · claude · commit aa3346841993 · 2026-05-14T18:04:31.000Z
Key changes in b9151: - New LOG_TRC macro (trace level between INFO=3 and DEBUG=5) - New common_params_print_info() consolidates build/device/system info logging; replace the two-line LOG_INF pattern in jllama.cpp with a single call - common_init() now defaults log prefix and timestamps to true (opt-out via --no-log-prefix / --no-log-timestamps) - New SLT_TRC / SRV_TRC server macros; many verbose server messages demoted from INF to TRC (less noise at default verbosity) - server_slot gains periodic in-flight throughput printing (print_timings_tg/pp) All 417 C++ unit tests pass. https://claude.ai/code/session_01FFt37e3FpbpFbT7oaPSbLB
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
 
 Java bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) via JNI, providing a high-level API for LLM inference in Java. The Java layer communicates with a native C++ library through JNI.
 
-Current llama.cpp pinned version: **b9150**
+Current llama.cpp pinned version: **b9151**
 
 ## Upgrading CUDA Version
 
@@ -268,6 +268,13 @@ Also review the project `CMakeLists.txt` for build-system-level breaks (e.g. ren
 | ~b9145–b9150 | `ggml/src/ggml-vulkan/ggml-vulkan.cpp` | Bug fix: `mul_mat_l_int[i]` / `mul_mat_m_int[i]` / `mul_mat_s_int[i]` / `mul_mat_id_l_int[i]` / `mul_mat_id_m_int[i]` / `mul_mat_id_s_int[i]` were unconditionally set to `true` instead of mirroring the actual device pipeline capabilities from `mul_mat_l[i]` etc.; now properly initialized; internal Vulkan backend bug fix, no project changes required |
 | ~b9145–b9150 | `src/unicode.cpp` | New `unicode_regex_split_custom_qwen35()` function registered for the Qwen 3.5 tokenizer regex pattern; uses `[\p{L}\p{M}]+` letter-plus-combining-mark runs vs. Qwen2's `\p{L}+`; additive internal tokenizer change, no project changes required |
 | ~b9145–b9150 | `ggml/src/ggml-cpu/ggml-cpu-riscv64-spacemit/` | SpaceMIT RISC-V IME backend major refactor: IME2 kernels, expanded quantization (Q2_K, Q3_K, Q6_K, Q8_0, Q5_0, Q5_1, Q5_K, MXFP4), TCM (Tightly Coupled Memory) pool; new source files `ime2_kernels.cpp`, `ime_env.cpp`, `repack.cpp`, `rvv_kernels.cpp`, `spine_mem_pool.cpp`; guarded by `GGML_CPU_RISCV64_SPACEMIT` build flag; no project changes required |
+| ~b9150–b9151 | `common/log.h` | New `LOG_TRC` macro added at `LOG_LEVEL_TRACE = 4` (between INFO=3 and DEBUG=5); `LOG_LEVEL_DEBUG` bumped from 4 to 5; new `LOG_TRCV` verbosity variant; additive, no project changes required |
+| ~b9150–b9151 | `common/common.h` + `common/common.cpp` | New `common_params_print_info(const common_params &)` function: prints verbosity level, per-device memory (name, total, free), and system info at `LOG_INF` level; replaces the two-line pattern `LOG_INF("build_info: %s\n", llama_build_info()); LOG_INF("%s\n", common_params_get_system_info(params).c_str());` — updated in `jllama.cpp` |
+| ~b9150–b9151 | `common/common.cpp` | `common_init()` now unconditionally calls `common_log_set_prefix(…, true)` and `common_log_set_timestamps(…, true)` before setting the log callback; log output will always include prefix and timestamps unless explicitly disabled with `--no-log-prefix` / `--no-log-timestamps` |
+| ~b9150–b9151 | `common/arg.cpp` | `--log-prefix` and `--log-timestamps` now also accept negated forms `--no-log-prefix` / `--no-log-timestamps` (lambda receives a `bool value`); backing env vars renamed `LLAMA_LOG_PREFIX` → `LLAMA_ARG_LOG_PREFIX` and `LLAMA_LOG_TIMESTAMPS` → `LLAMA_ARG_LOG_TIMESTAMPS`; Java layer does not expose these, so no project changes required |
+| ~b9150–b9151 | `tools/server/server-common.h` | New `SLT_TRC` and `SRV_TRC` macros (emit at `LOG_TRC` level); additive, no project changes required |
+| ~b9150–b9151 | `tools/server/server-context.cpp` | New `server_slot::t_print_last` field + `print_timings_tg()` / `print_timings_pp()` methods: emit periodic in-flight token-generation and prompt-processing throughput to `SLT_INF` (throttled to ≥100 decoded tokens and ≥3 s interval); `server_context_impl` constructor now calls `mtmd_helper_log_set` unconditionally (was guarded by `!is_resume`); many `SLT_INF`/`SRV_WRN` downgraded to `SLT_TRC`/`SRV_INF`; compiled from upstream, no project JNI changes required |
+| ~b9150–b9151 | `tools/server/server-task.cpp` | Several `SRV_WRN` calls downgraded to `SRV_INF`; one `SRV_WRN` upgraded to `SRV_ERR` for failed state restore; compiled from upstream, no project changes required |
 
 ## Build Commands
 
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -107,7 +107,7 @@ set(GGML_AVX512  OFF CACHE BOOL "" FORCE)
 FetchContent_Declare(
 	llama.cpp
 	GIT_REPOSITORY https://github.com/ggerganov/llama.cpp.git
-	GIT_TAG        b9150
+	GIT_TAG        b9151
 )
 FetchContent_MakeAvailable(llama.cpp)
 
diff --git a/README.md b/README.md
@@ -1,5 +1,5 @@
 ![Java 8+](https://img.shields.io/badge/Java-8%2B-informational)
-[![llama.cpp b9150](https://img.shields.io/badge/llama.cpp-%23b9150-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9150)
+[![llama.cpp b9151](https://img.shields.io/badge/llama.cpp-%23b9151-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9151)
 [![Maven Central](https://img.shields.io/maven-central/v/net.ladenthin/llama)](https://central.sonatype.com/artifact/net.ladenthin/llama)
 [![Snapshot](https://img.shields.io/badge/snapshot-latest-informational)](https://central.sonatype.com/repository/maven-snapshots/net/ladenthin/llama/)
 
diff --git a/src/main/cpp/jllama.cpp b/src/main/cpp/jllama.cpp
@@ -666,8 +666,7 @@ JNIEXPORT void JNICALL Java_net_ladenthin_llama_LlamaModel_loadModel(JNIEnv *env
 
     llama_numa_init(params.numa);
 
-    LOG_INF("build_info: %s\n", llama_build_info());
-    LOG_INF("%s\n", common_params_get_system_info(params).c_str());
+    common_params_print_info(params);
 
     // Resolve the auto sentinel before loading the model.
     if (params.n_parallel <= N_PARALLEL_AUTO) {

Original file line number	Diff line number	Diff line change
`@@ -107,7 +107,7 @@ set(GGML_AVX512 OFF CACHE BOOL "" FORCE)`
`107`	`107`	`FetchContent_Declare(`
`108`	`108`	`llama.cpp`
`109`	`109`	`GIT_REPOSITORY https://github.com/ggerganov/llama.cpp.git`
`110`		`- GIT_TAG b9150`
	`110`	`+ GIT_TAG b9151`
`111`	`111`	`)`
`112`	`112`	`FetchContent_MakeAvailable(llama.cpp)`
`113`	`113`