diff --git a/CLAUDE.md b/CLAUDE.md index 9f14c9a2..f610ee90 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co Java bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) via JNI, providing a high-level API for LLM inference in Java. The Java layer communicates with a native C++ library through JNI. -Current llama.cpp pinned version: **b9637** +Current llama.cpp pinned version: **b9642** ## Upgrading CUDA Version @@ -590,7 +590,7 @@ ctest --test-dir build --output-on-failure -R "ResultsToJson" #### Upstream source location (in CMake build tree) -llama.cpp is fetched via CMake FetchContent, pinned to `GIT_TAG b9637`. +llama.cpp is fetched via CMake FetchContent, pinned to `GIT_TAG b9642`. ``` build/_deps/llama.cpp-src/tools/server/ ← server-task.h, server-common.h, etc. diff --git a/CMakeLists.txt b/CMakeLists.txt index df6f10d8..351075a4 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -139,7 +139,7 @@ set(LLAMA_BUILD_APP OFF CACHE BOOL "" FORCE) FetchContent_Declare( llama.cpp GIT_REPOSITORY https://github.com/ggerganov/llama.cpp.git - GIT_TAG b9637 + GIT_TAG b9642 ) FetchContent_MakeAvailable(llama.cpp) diff --git a/README.md b/README.md index caafe9d0..dd52badd 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@ **Build:** ![Java 8+](https://img.shields.io/badge/Java-8%2B-informational) ![Platform](https://img.shields.io/badge/Platform-Linux%20%7C%20macOS%20%7C%20Windows%20%7C%20Android-lightgrey) -[![llama.cpp b9637](https://img.shields.io/badge/llama.cpp-%23b9637-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9637) +[![llama.cpp b9642](https://img.shields.io/badge/llama.cpp-%23b9642-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9642) [![JPMS](https://img.shields.io/badge/JPMS-modular%20JAR-25A162)](https://openjdk.org/projects/jigsaw/) ![JUnit](https://img.shields.io/badge/tested%20with-JUnit6-25A162) [![JSpecify](https://img.shields.io/badge/JSpecify-1.0.0%20%40NullMarked-25A162)](https://jspecify.dev) diff --git a/docs/history/llama-cpp-breaking-changes.md b/docs/history/llama-cpp-breaking-changes.md index ff663035..bb1e8ef6 100644 --- a/docs/history/llama-cpp-breaking-changes.md +++ b/docs/history/llama-cpp-breaking-changes.md @@ -352,3 +352,7 @@ Used during `llama.cpp` version bumps: when upgrading, scan this file from the r | b9621–b9637 | `ggml/src/ggml-vulkan/` + shaders | Unary shaders consolidated into one templated `unary.comp`; new `EXPM1` Vulkan op; GLU push-constants reworked (per-dim strides + misalign offsets); fastdiv `L` values byte-packed to stay under the 128B push-constant limit. Internal Vulkan backend — the project builds CPU/CUDA/Metal/OpenCL only, never Vulkan. No project changes required | | b9621–b9637 | `tools/server/server-http.cpp`, `tools/ui/`, `scripts/ui-assets.cmake` | Optional gzip-compressed WebUI asset serving (`LLAMA_UI_GZIP`, `llama_ui_use_gzip()`). The project compiles `server-context/queue/task/models` but not `server-http.cpp` or `tools/ui`, so the HTTP/WebUI layer is absent from `jllama`. No project changes required | | b9621–b9637 | `tools/cli/cli.cpp`, `.devops/*.Dockerfile`, `.github/`, `conversion/`, `convert_hf_to_gguf_update.py`, `gguf-py/`, `models/templates/Cohere2MoE.jinja`, `docs/`, `tests/` | CLI preserved-token wiring, Docker image `docker.io/` prefixes, CI labeler/release tweaks, Python GGUF converters, the new model template asset, doc typos, and upstream tests. None are compiled into `jllama` or shipped by the project. No project changes required | +| b9637–b9642 | `ggml/src/ggml-cuda/ggml-cuda.cu` | `ggml_backend_cuda_device_supports_op` for `GGML_OP_REPEAT` tightened: the supported-types check changed from a blocklist (`!= I32 && != I16`) to an allowlist (`== F32 \|\| == F16`), because the CUDA REPEAT path only implements F32/F16 and other types asserted at runtime. Internal CUDA backend; the project switches on no op-support enum and never calls this. No project changes required | +| b9637–b9642 | `ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_decls.tmpl` | WebGPU matmul shared-memory dequant templates rewritten: legacy/k-quant `#elif` chains converted to independent `#if defined(...)` blocks, and the i-quant (super-block 256) IQ1/IQ2/IQ3/IQ4 paths reworked to process `NQ` quants per thread with vectorized `store_shmem_iquants`/`create_iq_gw4` helpers. Internal WebGPU backend — the project builds CPU/CUDA/Metal/OpenCL only, never WebGPU. No project changes required | +| b9637–b9642 | `tools/ui/`, `tools/ui/src/lib/utils/heic-to-jpeg.ts` (new) | WebUI gains a "render thinking as Markdown" display setting and client-side HEIC/HEIF image upload support (lazy CDN-loaded `heic-to` decoder → JPEG). The project compiles `server-context/queue/task/models` but not `tools/ui`, so the WebUI is absent from `jllama`. No project changes required | +| b9637–b9642 | `convert_lora_to_gguf.py`, `tests/test-backend-ops.cpp` | LoRA converter now resolves the base-model architecture via `get_model_architecture(hparams, ModelType.TEXT)` instead of hand-reading `text_config`/`architectures`; a `GGML_TYPE_BF16` `test_repeat` case was added to the backend-ops test. Python tooling and an upstream test — neither is compiled into `jllama`. No project changes required |