Skip to content

Commit 3b01985

Browse files
committed
Upgrade llama.cpp from b9106 to b9134
Breaking changes handled: - Remove ModelParameters.setCtxSizeDraft(): --spec-draft-ctx-size CLI flag was removed in b9134 and now throws std::invalid_argument at parse time - Fix SlotParamsToJson test: task_params::to_json() renamed field "speculative.type" → "speculative.types" (now serialises a vector) Version bumps: CMakeLists.txt GIT_TAG, README.md badge/link, CLAUDE.md pinned version + new breaking-change table rows for b9106→b9134. https://claude.ai/code/session_01GYjU7CXHB6QLbxrbZcQWCn
1 parent a412480 commit 3b01985

5 files changed

Lines changed: 11 additions & 17 deletions

File tree

CLAUDE.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
66

77
Java bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) via JNI, providing a high-level API for LLM inference in Java. The Java layer communicates with a native C++ library through JNI.
88

9-
Current llama.cpp pinned version: **b9106**
9+
Current llama.cpp pinned version: **b9134**
1010

1111
## Upgrading CUDA Version
1212

@@ -253,6 +253,12 @@ Also review the project `CMakeLists.txt` for build-system-level breaks (e.g. ren
253253
| ~b9103–b9106 | `ggml/src/ggml-vulkan/ggml-vulkan.cpp` + Vulkan shaders | Vulkan flash attention refactored: `pipeline_flash_attn_f32_f16` changed from a per-type array of maps to a single map; mixed K/V quant types (e.g. Q4_0 K + F16 V) now supported on all Vulkan FA paths (scalar, cm1, cm2) rather than coopmat2 only; per-type SPIR-V variants replaced by two generic modules (`flash_attn_f32_f16` and `flash_attn_f32_f16_int8`) that select K/V type at runtime via `FaTypeK`/`FaTypeV` spec constants; new `flash_attn_dequant.glsl` contains aliased SSBO views and an uber `dequantize4()` switch; the K/V type mismatch guard removed from `ggml_backend_vk_device_supports_op`; internal Vulkan backend refactor, no project changes required |
254254
| ~b9103–b9106 | `ggml/src/ggml-cuda/argsort.cu` | Added `#include <cuda/iterator>` for CCCL ≥ 3.1 strided-iterator path; internal CUDA backend, no project changes required |
255255
| ~b9103–b9106 | `convert_hf_to_gguf.py` | Mistral Medium 3.5 mmproj support: `n_embd_text` now reads `"dim"` key instead of `"hidden_dim"`; negative `img_break_tok_id` placeholders resolved from `tekken.json` or `tokenizer.json`; conversion tool only, no project changes required |
256+
| ~b9106–b9134 | `common/arg.cpp` | CLI option `--spec-draft-ctx-size` / `-cd` / `--ctx-size-draft` REMOVED — throws `std::invalid_argument` at parse time; `ModelParameters.setCtxSizeDraft()` removed; no replacement (context size now managed internally by speculative engine) |
257+
| ~b9106–b9134 | `common/arg.cpp` | CLI option `--spec-draft-replace` / `--spec-replace` REMOVED — throws `std::invalid_argument` at parse time; no corresponding Java method existed |
258+
| ~b9106–b9134 | `common/speculative.h` | Full redesign: `common_speculative_type` enum values renamed `DRAFT`&#x2192;`DRAFT_SIMPLE`, `EAGLE3`&#x2192;`DRAFT_EAGLE3`; `common_params_speculative.type` (single enum) &#x2192; `.types` (vector); `common_speculative_n_max()` / `common_speculative_n_min()` REMOVED; new `common_speculative_init(params, n_seq)` no longer takes ctx; new `common_speculative_begin(spec, seq_id, prompt)`, `common_speculative_draft(spec)`, `common_speculative_accept(spec, seq_id, n)`, `common_speculative_process(spec, batch)` signatures; `common_speculative_draft_params` struct added; server sources compiled directly, no project JNI changes required |
259+
| ~b9106–b9134 | `common/common.h` | New `common_prompt_checkpoint` struct (contains `data_tgt` + `data_dft`) replaces the old `server_prompt_checkpoint` in `server-task.h`; compiled from upstream server sources, no project JNI changes required |
260+
| ~b9106–b9134 | `tools/server/server-task.cpp` | `task_params::to_json()` renamed field `"speculative.type"` &#x2192; `"speculative.types"` (now serialises the vector); test `SlotParamsToJson.SpeculativeFields_Present` updated accordingly |
261+
| ~b9106–b9134 | `include/llama.h` | New `LLAMA_STATE_SEQ_FLAGS_NONE = 0` macro added; additive, no project changes required |
256262

257263
## Build Commands
258264

CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@ set(GGML_AVX512 OFF CACHE BOOL "" FORCE)
9797
FetchContent_Declare(
9898
llama.cpp
9999
GIT_REPOSITORY https://github.com/ggerganov/llama.cpp.git
100-
GIT_TAG b9106
100+
GIT_TAG b9134
101101
)
102102
FetchContent_MakeAvailable(llama.cpp)
103103

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
![Java 8+](https://img.shields.io/badge/Java-8%2B-informational)
2-
[![llama.cpp b9106](https://img.shields.io/badge/llama.cpp-%23b9106-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9106)
2+
[![llama.cpp b9134](https://img.shields.io/badge/llama.cpp-%23b9134-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9134)
33
[![Maven Central](https://img.shields.io/maven-central/v/net.ladenthin/llama)](https://central.sonatype.com/artifact/net.ladenthin/llama)
44
[![Snapshot](https://img.shields.io/badge/snapshot-latest-informational)](https://central.sonatype.com/repository/maven-snapshots/net/ladenthin/llama/)
55

src/main/java/net/ladenthin/llama/ModelParameters.java

Lines changed: 0 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1263,17 +1263,6 @@ public ModelParameters setDraftPMin(float draftPMin) {
12631263
return this;
12641264
}
12651265

1266-
/**
1267-
* Set the size of the prompt context for the draft model.
1268-
*
1269-
* @param ctxSizeDraft the prompt context size for the draft model
1270-
* @return this builder
1271-
*/
1272-
public ModelParameters setCtxSizeDraft(int ctxSizeDraft) {
1273-
parameters.put("--spec-draft-ctx-size", String.valueOf(ctxSizeDraft));
1274-
return this;
1275-
}
1276-
12771266
/**
12781267
* Set the comma-separated list of devices to use for offloading the draft model.
12791268
*

src/test/cpp/test_server.cpp

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -243,9 +243,8 @@ TEST(SlotParamsToJson, SpeculativeFields_Present) {
243243
task_params p;
244244
const json j = p.to_json();
245245

246-
// b8962: only speculative.type is serialised; n_max/n_min/p_min are
247-
// input-only (consumed by params_from_json_cmpl, not emitted by to_json)
248-
EXPECT_TRUE(j.contains("speculative.type"));
246+
// b9134: field renamed speculative.type → speculative.types (now a vector)
247+
EXPECT_TRUE(j.contains("speculative.types"));
249248
}
250249

251250
TEST(SlotParamsToJson, GrammarTriggers_IsArrayByDefault) {

0 commit comments

Comments
 (0)