You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CLAUDE.md
+19-1Lines changed: 19 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
6
6
7
7
Java bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) via JNI, providing a high-level API for LLM inference in Java. The Java layer communicates with a native C++ library through JNI.
8
8
9
-
Current llama.cpp pinned version: **b9222**
9
+
Current llama.cpp pinned version: **b9245**
10
10
11
11
## Upgrading CUDA Version
12
12
@@ -303,6 +303,24 @@ Also review the project `CMakeLists.txt` for build-system-level breaks (e.g. ren
303
303
|~b9198–b9219 |`ggml/src/ggml-sycl/ggml-sycl.cpp` + `vecdotq.hpp`| SYCL GEMM now falls back to direct MKL for small problems (gemm_flops < 256³); Q6_K dot product refactored to a single scalar fast-path helper `vec_dot_q6_K_q8_1_impl_mmvq_scalar`; internal SYCL backend, no project changes |
304
304
|~b9219–b9222 |`ggml/src/ggml-hexagon/` + `htp/pad-ops.c` (new) + `htp/unary-ops.c`| Hexagon HTP backend gains `GGML_OP_PAD` (HVX + optional VTCM/DMA double-buffered, both zero-pad and circular-pad variants) and `GGML_OP_TRI` (HVX-vectorised triangular masking) support; new `HTP_OP_PAD` / `HTP_OP_TRI` opcodes; internal Qualcomm DSP backend, no project changes |
305
305
|~b9219–b9222 |`.devops/*.Dockerfile` + `.github/workflows/docker.yml`| OCI image labels (`org.opencontainers.image.*`) added via `BUILD_DATE`/`APP_VERSION`/`APP_REVISION` build args; new `skip_s390x` workflow_dispatch input; manifest annotations on `docker buildx imagetools create`; upstream packaging/CI only, no project changes |
306
+
|~b9222–b9245 |`common/common.h` + `common.cpp`|`common_init_result(common_params &, bool model_only = false)` and `common_init_from_params(common_params &, bool model_only = false)` gain an optional `model_only` flag that skips context/sampler/lora/warmup setup and returns only the loaded model. Additive with default value; no project call sites in `src/main/cpp/`, no source changes required |
307
+
|~b9222–b9245 |`common/common.h`|`common_params_speculative_draft` defaults retuned: `n_max` 16→3, `p_min` 0.75f→0.0f. Defaults only; Java `ModelParameters` sets these explicitly via JSON, so behaviour is unchanged for this project |
308
+
|~b9222–b9245 |`common/speculative.{h,cpp}`|`common_speculative_impl::accept()` virtual gains a 3rd `bool is_other` parameter; `common_speculative_accept()` now broadcasts the accepted-token count to every registered impl (with `is_other=true` for impls that did not generate the draft). `common_speculative_impl_ngram_map_k` ctor signature simplified (no longer takes `common_params_speculative`). Lots of new `LOG_INF` startup banners per impl. Internal to upstream-compiled `server-context.cpp`; no project call sites |
309
+
|~b9222–b9245 |`common/arg.cpp` + `common/common.cpp` + `tools/fit-params/fit-params.cpp`|`--verbosity` levels relabeled: level `4` now means "trace (more info)" and level `5` means "debug"; `LOG_LEVEL_DEBUG` constant value moved from `4` to `5`. Direct `params.verbosity >= 4` comparisons in upstream `common.cpp` and `fit-params.cpp` replaced with `>= LOG_LEVEL_DEBUG`. Project does not reference `LOG_LEVEL_DEBUG` or numeric verbosity thresholds in `src/main/cpp/`; no source changes required |
310
+
|~b9222–b9245 |`common/arg.cpp`|`--spec-type` duplicate-arg DEPRECATED warning suppressed (the flag legitimately accepts repeated values to form the comma-list). Behaviour-only |
311
+
|~b9222–b9245 |`common/ngram-map.cpp`| One per-draft `LOG_INF` downgraded to `LOG_DBG`. Log-level only |
312
+
|~b9222–b9245 |`src/llama-graph.h`|`llm_graph_params::operator==` adds a third disjunct so ubatches with both `token` and `embd` arrays present compare equal (graph reuse fix for MTP pre-norm path). Internal |
313
+
|~b9222–b9245 |`src/llama-memory-recurrent.{h,cpp}` + `src/llama-memory-hybrid.cpp` + `src/llama-memory-hybrid-iswa.cpp`|`init_batch()` now forces sequential split (`split_seq`) instead of equal split when `n_rs_seq > 0` (recurrent-state rollback is incompatible with equal splits). Internal upstream model code, no project impact |
314
+
|~b9222–b9245 |`src/models/delta-net-base.cpp` + `src/models/models.h` + `src/models/qwen35.cpp`|`llm_build_delta_net_base::keep_rs()` helper removed; conv-state and recurrent-attn paths reworked to read `cparams.n_rs_seq` directly and loop `K = n_rs_seq + 1` snapshot slots. Comment fix in `qwen35.cpp` MTP layer index. All internal upstream model code |
315
+
|~b9222–b9245 |`tools/server/server-context.cpp`|`pos_min_thold` lowered by one (`pos_next - n_swa` → `pos_next - n_swa - 1`); checkpoint trigger guard relaxed from `n_past < slot.prompt.n_tokens()` to `<=`; per-slot `print_timings_pp`/`print_timings_tg` lines split into separate `SLT_INF` calls; new `graphs reused` and `draft acceptance` lines; `n_draft_total` log moved from `SLT_CNT` to `SLT_INF`. Compiled upstream-as-is, no project changes |
316
+
|~b9222–b9245 |`ggml/src/ggml-cuda/mmvq.cu`|`calc_nwarps` table tweak: Q6_K returns 2 warps (was grouped with the 8-warp tier). Internal CUDA backend |
317
+
|~b9222–b9245 |`ggml/src/ggml-hexagon/` (`htp/rope-ops.c`, `htp/unary-ops.c`, `htp-ops.h`, `main.c`, `ggml-hexagon.cpp`) | New `HTP_OP_NORM` opcode (mean+variance norm); `rope-ops.c` adds MROPE / IMROPE position-id support via new `mrope_cache_init()`. Internal Qualcomm DSP backend |
|~b9222–b9245 |`ggml/src/ggml-rpc/ggml-rpc.cpp`|`last_graph_uid` field moved from `ggml_backend_rpc_context` (per-backend) into `ggml_backend_rpc_device_context` (per-device) so multiple backends sharing a device reuse cached graphs. Internal RPC backend |
320
+
|~b9222–b9245 |`ggml/src/ggml-sycl/ggml-sycl.cpp`| New `GGML_SYCL_USE_ASYNC_MEM_OP` env (default `1`) decouples async USM alloc/free from the graph path. Internal SYCL backend |
|~b9222–b9245 |`convert_hf_to_gguf.py`, `convert_lora_to_gguf.py`, `examples/save-load-state/save-load-state.cpp`, `examples/llama-eval/*`, `tools/cli/README.md`, `tools/server/README.md`, `docs/speculative.md`, `docs/backend/SYCL.md`| Doc/example/tooling updates only. Not compiled by this project |
323
+
|~b9222–b9245 |`tools/ui/*`| WebUI source reorganisation (enum file renames `*.ts` → `*.enums.ts`, new chat components, Tailwind plugin imports). Project sets `LLAMA_BUILD_WEBUI OFF CACHE BOOL "" FORCE` in `CMakeLists.txt`, so the UI is never built — no impact |
0 commit comments