Skip to content

Commit 278ba4f

Browse files
committed
Upgrade llama.cpp from b9297 to b9305
- Bump GIT_TAG, README badge, CLAUDE.md. - Rename project-side cache pin from LLAMA_BUILD_WEBUI to LLAMA_BUILD_UI: the top-level CMakeLists no longer forwards the deprecated name (the shim survived only in tools/ui/CMakeLists.txt, which is not configured in FetchContent mode). - No C++ source changes required. server-context.cpp now includes common/fit.h to estimate draft/MTP VRAM when fit_params is on; the include resolves via the existing llama-common include path and the feature is purely additive. Local: 435/435 ctest pass.
1 parent c07ab4a commit 278ba4f

3 files changed

Lines changed: 19 additions & 4 deletions

File tree

CLAUDE.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
66

77
Java bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) via JNI, providing a high-level API for LLM inference in Java. The Java layer communicates with a native C++ library through JNI.
88

9-
Current llama.cpp pinned version: **b9297**
9+
Current llama.cpp pinned version: **b9305**
1010

1111
## Upgrading CUDA Version
1212

@@ -408,6 +408,17 @@ Also review the project `CMakeLists.txt` for build-system-level breaks (e.g. ren
408408
| ~b9284–b9297 | `ggml/src/ggml-vulkan/CMakeLists.txt` | `find_package(SPIRV-Headers)` switched to `CONFIG REQUIRED` and adds `$ENV{VULKAN_SDK}` to `CMAKE_PREFIX_PATH`; fixes detection when SPIRV-Headers ships only the CMake-config files (no FindSPIRV-Headers.cmake). Internal Vulkan build config, no project changes required |
409409
| ~b9284–b9297 | `ggml/src/ggml-zendnn/` (`CMakeLists.txt`, `ggml-zendnn.cpp`) | ZenDNN bumped to ZenDNN-2026-WW19; Q8_0 weight support added for matmul and matmul_id paths via dynamic quantization (S8 compute, BF16 scales); ZenDNN matmul/matmul_id now handles `GGML_TYPE_Q8_0` with FP32 src1 directly without F32→Q8_0 conversion. Internal AMD ZenDNN backend, no project changes required |
410410
| ~b9284–b9297 | `tools/perplexity/perplexity.cpp` | `log_probs.resize(n_ctx * nv)` widened to `size_t(n_ctx) * nv` to avoid 32-bit overflow on large context sizes. Standalone tool not compiled by project, no impact |
411+
| ~b9297–b9305 | upstream `CMakeLists.txt` | Top-level backward-compat shims that forwarded `LLAMA_BUILD_WEBUI``LLAMA_BUILD_UI` and `LLAMA_USE_PREBUILT_WEBUI``LLAMA_USE_PREBUILT_UI` were REMOVED (they now live only in `tools/ui/CMakeLists.txt`). **Java impact**: project's `set(LLAMA_BUILD_WEBUI OFF CACHE BOOL "" FORCE)` no longer hits the shim at top level. `tools/ui` is not configured in FetchContent mode (`LLAMA_BUILD_TOOLS=OFF`), so the old setting was inert in practice, but the project's `CMakeLists.txt:107` was renamed to `set(LLAMA_BUILD_UI OFF CACHE BOOL "" FORCE)` for clarity and to defend against future flips of `LLAMA_BUILD_UI` default |
412+
| ~b9297–b9305 | `common/common.h` | `LLAMA_UI_DEFAULT_ENABLED` macro removed; `common_params::ui` default is now unconditionally `true`. Not referenced by project, no changes required |
413+
| ~b9297–b9305 | `common/fit.{h,cpp}` | `common_get_device_memory_data()` made non-static and exported from `fit.h` (was a file-local helper). `fit.h` now also pulls in `ggml-backend.h`, `llama.h`, and `../src/llama-ext.h`. Used by upstream `tools/server/server-context.cpp` (compiled directly into jllama). The `#include "../src/llama-ext.h"` resolves relative to fit.h's location (`common/../src/llama-ext.h`), so no extra include paths are required. No project source changes |
414+
| ~b9297–b9305 | `tools/server/server-context.cpp` | New `#include "fit.h"` and a new draft/MTP memory measurement block: when `params_base.fit_params` is set AND the speculative config includes a draft model or `COMMON_SPECULATIVE_TYPE_DRAFT_MTP`, `common_get_device_memory_data()` is called against the draft model (or a copy of the target params with `LLAMA_CONTEXT_TYPE_MTP` for MTP) and the resulting per-device `model + context + compute` bytes are added to `params_base.fit_params_target` before the target context is fitted. Compiled directly into jllama from upstream; behaviour is additive and only triggers for speculative-decoding setups. `ModelParameters.setFit(boolean)` defaults to `on`, so this kicks in automatically when a user configures a draft model — no Java-side wiring required |
415+
| ~b9297–b9305 | `tools/server/server-context.cpp` | `[mtmd] estimated memory usage of mmproj` log line reworded to `estimated worst-case memory usage`; log only, no behavioural change |
416+
| ~b9297–b9305 | `tools/server/server-http.cpp` | UI serving path migrated from per-asset extern arrays (`index_html`, `bundle_js`, …) and the `LLAMA_BUILD_UI` macro to a runtime `llama_ui_find_asset()` lookup gated on the new `LLAMA_UI_HAS_ASSETS` macro generated by the new `llama-ui-embed` host tool. Project does NOT compile `server-http.cpp` (only `server-context.cpp`/`server-queue.cpp`/`server-task.cpp`/`server-models.cpp`), no impact |
417+
| ~b9297–b9305 | `tools/ui/` (`CMakeLists.txt`, new `embed.cpp`, new `sources.cmake`, new `scripts/ui-assets.cmake`, removed `scripts/ui-download.cmake` + `scripts/xxd.cmake`, removed `ui.cpp`+`ui.h`) | Full UI build pipeline rewrite: `xxd.cmake`+`ui-download.cmake` replaced by a host-compiled `llama-ui-embed` C++ tool that generates `ui.cpp`/`ui.h` (declaring a `g_assets[]` table and `llama_ui_find_asset()` lookup, plus `LLAMA_UI_HAS_ASSETS` macro) from arbitrary asset files; new `scripts/ui-assets.cmake` orchestrates asset provisioning with a clearer priority (pre-built `tools/ui/dist` → npm build → HF Bucket); `tools/ui` is now an `add_custom_target` always re-run per build. The deprecation shims for `LLAMA_BUILD_WEBUI`/`LLAMA_USE_PREBUILT_WEBUI`/`LLAMA_WEBUI_HF_BUCKET` moved here from the top-level `CMakeLists.txt`. Project does not build the UI (`LLAMA_BUILD_TOOLS=OFF` in FetchContent mode), no impact |
418+
| ~b9297–b9305 | `ggml/include/ggml-alloc.h` | Comment-only API documentation update for `ggml_backend_alloc_ctx_tensors_from_buft`. No project changes required |
419+
| ~b9297–b9305 | `ggml/src/ggml-backend-meta.cpp` | Bug fix for zero-sized split tensor slices: `set_tensor`/`get_tensor`/`set_tensor_async`/`get_tensor_async` paths now `continue` when `chunk_size_j == 0`; `ggml_backend_meta_alloc_ctx_tensors_from_buft` now allocates a dummy buffer when all tensors in a context are zero-sized (was returning `NULL` and asserting); `ggml_backend_buft_alloc_buffer` result now `GGML_ASSERT`ed non-null. Internal backend code, no project changes required |
420+
| ~b9297–b9305 | `ggml/src/ggml-hexagon/htp/hmx-flash-attn-ops.c` | `hvx_vec_splat_f16(hvx_vec_get_f16(...))` round-trip replaced with `hvx_vec_repl_f16(...)` which stays in the vector domain via `vdelta` (avoids store/reload through scalar). Internal Hexagon DSP backend optimization, no project changes required |
421+
| ~b9297–b9305 | `ggml/src/ggml-opencl/ggml-opencl.cpp` | `GGML_OPENCL_PROFILING` batching fix: when `profiling_info` reaches 2048 entries the batch is now flushed into a persistent `profiling_results` vector (events released, durations populated) instead of accumulating until shutdown. Also fixes missing `]` closing the JSON array in `cl_trace.json`. Profile-only code (`GGML_OPENCL_PROFILING` is off by default), no project changes required |
411422

412423
## Build Commands
413424

CMakeLists.txt

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -104,13 +104,17 @@ endif()
104104
set(GGML_FMA ON CACHE BOOL "" FORCE)
105105
set(GGML_F16C ON CACHE BOOL "" FORCE)
106106
set(GGML_AVX512 OFF CACHE BOOL "" FORCE)
107-
set(LLAMA_BUILD_WEBUI OFF CACHE BOOL "" FORCE)
107+
# b9305 removed the top-level LLAMA_BUILD_WEBUI -> LLAMA_BUILD_UI shim; set the
108+
# new name directly. (The old name no longer forwards at top level; the shim
109+
# survives in tools/ui/CMakeLists.txt but that subdir is not configured in
110+
# FetchContent mode, so the old setting would be inert anyway.)
111+
set(LLAMA_BUILD_UI OFF CACHE BOOL "" FORCE)
108112
# b9284 flipped LLAMA_BUILD_APP default to ON; we don't build the unified binary
109113
set(LLAMA_BUILD_APP OFF CACHE BOOL "" FORCE)
110114
FetchContent_Declare(
111115
llama.cpp
112116
GIT_REPOSITORY https://github.com/ggerganov/llama.cpp.git
113-
GIT_TAG b9297
117+
GIT_TAG b9305
114118
)
115119
FetchContent_MakeAvailable(llama.cpp)
116120

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
**Build:**
22
![Java 11+](https://img.shields.io/badge/Java-11%2B-informational)
33
![JUnit](https://img.shields.io/badge/tested%20with-JUnit4-yellow)
4-
[![llama.cpp b9297](https://img.shields.io/badge/llama.cpp-%23b9297-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9297)
4+
[![llama.cpp b9305](https://img.shields.io/badge/llama.cpp-%23b9305-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9305)
55
[![Publish](https://github.com/bernardladenthin/java-llama.cpp/actions/workflows/publish.yml/badge.svg)](https://github.com/bernardladenthin/java-llama.cpp/actions/workflows/publish.yml)
66
[![CodeQL](https://github.com/bernardladenthin/java-llama.cpp/actions/workflows/codeql.yml/badge.svg)](https://github.com/bernardladenthin/java-llama.cpp/actions/workflows/codeql.yml)
77

0 commit comments

Comments
 (0)