You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CLAUDE.md
+6-2Lines changed: 6 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
6
6
7
7
Java bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) via JNI, providing a high-level API for LLM inference in Java. The Java layer communicates with a native C++ library through JNI.
Copy file name to clipboardExpand all lines: docs/history/llama-cpp-breaking-changes.md
+5Lines changed: 5 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -356,3 +356,8 @@ Used during `llama.cpp` version bumps: when upgrading, scan this file from the r
356
356
| b9637–b9642 |`ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_decls.tmpl`| WebGPU matmul shared-memory dequant templates rewritten: legacy/k-quant `#elif` chains converted to independent `#if defined(...)` blocks, and the i-quant (super-block 256) IQ1/IQ2/IQ3/IQ4 paths reworked to process `NQ` quants per thread with vectorized `store_shmem_iquants`/`create_iq_gw4` helpers. Internal WebGPU backend — the project builds CPU/CUDA/Metal/OpenCL only, never WebGPU. No project changes required |
357
357
| b9637–b9642 |`tools/ui/`, `tools/ui/src/lib/utils/heic-to-jpeg.ts` (new) | WebUI gains a "render thinking as Markdown" display setting and client-side HEIC/HEIF image upload support (lazy CDN-loaded `heic-to` decoder → JPEG). The project compiles `server-context/queue/task/models` but not `tools/ui`, so the WebUI is absent from `jllama`. No project changes required |
358
358
| b9637–b9642 |`convert_lora_to_gguf.py`, `tests/test-backend-ops.cpp`| LoRA converter now resolves the base-model architecture via `get_model_architecture(hparams, ModelType.TEXT)` instead of hand-reading `text_config`/`architectures`; a `GGML_TYPE_BF16``test_repeat` case was added to the backend-ops test. Python tooling and an upstream test — neither is compiled into `jllama`. No project changes required |
359
+
| b9642–b9682 |`tools/mtmd/mtmd-helper.h` + `tools/mtmd/mtmd-helper.cpp`|`mtmd_helper_decode_image_chunk` gained two parameters — a post-decode callback plus its `user_data` — so callers can hook each decoded multimodal chunk; the standalone `process_chunk` helper was removed and folded into `mtmd_helper_eval_chunk_single`. Consumed only inside the upstream-compiled `mtmd-helper.cpp` / `server-context.cpp`; the project's hand-written C++ references no `mtmd_*`/`process_chunk` symbol (zero matches in `src/main/cpp`). No project source changes required. **New feature:** the post-decode callback enables multimodal speculative-draft decoding — exposable later as a vision + draft-model Java path |
360
+
| b9642–b9682 |`common/common.cpp` (`build_lora_mm_id`) | The LoRA multimodal id-embedding builder gained a `w_s` scale-weight argument for per-adapter scaling. Internal to the upstream-compiled `common` library; the project never calls it. No project source changes required |
361
+
| b9642–b9682 |`common/speculative.{h,cpp}`| Speculative decoding now accumulates per-draft-position acceptance statistics and adds an Eagle3 backend-sampling path (the draft model samples on the compute backend). `common_speculative_*` is compiled into `common` and reached only through the upstream server's speculative slot; the project's C++ references no `speculative`/`draft` symbol. No project source changes required. **New feature:** per-position draft-acceptance metrics — could surface as speculative-decoding telemetry in a future Java API |
362
+
| b9642–b9682 |`tools/server/server-context.cpp`| Server slot refactored so an `mtmd` (multimodal) prompt can feed a speculative draft model: image/media chunks are routed through the new `mtmd_helper_decode_image_chunk` callback before drafting. Compiled directly into `jllama` (the project builds `server-context/queue/task/models`), but the change is internal to the slot state machine and binds no new/renamed symbol; verified that `jllama.cpp` and the `*_helpers.hpp` headers call none of the touched functions. No project source changes required |
363
+
| b9642–b9682 |`ggml/src/ggml-*` backends, `tools/` (incl. `llama-bench --offline`), conda-forge packaging, `docs/`, `.github/`| Routine backend kernel updates and tooling/docs/CI tweaks (a new `llama-bench --offline` flag, conda-forge recipe notes). None are compiled into `jllama` beyond the already-built CPU/CUDA/Metal/OpenCL backends, and none change a symbol the project binds. No project changes required |
0 commit comments