You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Key changes in b9102:
- Internal CUDA AllReduce pipeline (no NCCL required, works on Windows/PCIe)
- SYCL IM2COL_3D support for Intel GPU backend
- Bug fix: backend sampling now correctly tracks cur_p.selected for n_probs
- Bug fix: post_sampling_probs now works with backend sampling
- n_vocab loading moved to per-model load_arch_hparams() (internal refactor)
- httplib 0.43.4: chunk-size security fix (manual hex parsing vs strtoul)
- ggml version patch 0.11.0 → 0.11.1
No project-level JNI or Java changes required.
https://claude.ai/code/session_01QopdxqEvbkhiaaBRqBzgzc
Copy file name to clipboardExpand all lines: CLAUDE.md
+10-1Lines changed: 10 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
6
6
7
7
Java bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) via JNI, providing a high-level API for LLM inference in Java. The Java layer communicates with a native C++ library through JNI.
8
8
9
-
Current llama.cpp pinned version: **b9094**
9
+
Current llama.cpp pinned version: **b9102**
10
10
11
11
## Upgrading CUDA Version
12
12
@@ -240,6 +240,15 @@ Also review the project `CMakeLists.txt` for build-system-level breaks (e.g. ren
240
240
|~b9071–b9094 |`tools/server/server-models.h` + `server.cpp`| Router child→parent model info propagation: new `CMD_CHILD_TO_ROUTER_INFO` command; `setup_child_server()` gains `const json & model_info` parameter; new `update_loaded_info()` method; `server_model_meta` gains `loaded_info` field; all internally consistent across compiled upstream sources, no project changes required |
241
241
|~b9071–b9094 |`common/reasoning-budget.cpp`| Forced token logit no longer set to `+INFINITY`; only competing tokens set to `-INFINITY`; internal sampler behavior change, no project changes required |
|~b9094–b9102 |`ggml/src/ggml-sycl/ggml-sycl.cpp` + `im2col.cpp` + `im2col.hpp`| New `ggml_sycl_im2col_3d` function; `GGML_OP_IM2COL_3D` now supported on Intel GPU via SYCL; 2D im2col kernel rewritten with tile-based `IC_KH_KW` thread decomposition; new `SYCL_IM2COL_BLOCK_SIZE 256`; additive, no project changes required |
246
+
|~b9094–b9102 |`ggml/CMakeLists.txt`| GGML version patch bumped 0.11.0 → 0.11.1; no project changes required |
247
+
|~b9094–b9102 |`common/sampling.cpp`| Bug fix in `common_sampler_sample`: `set_logits` now called at the top before backend-sampling check; backend sampling token-selection now scans all of `cur_p.data` to find matching token (instead of artificial 1-element array), fixing `cur_p.selected` for downstream `n_probs`; post-sampling probabilities now work correctly with backend sampling |
248
+
|~b9094–b9102 |`tools/server/server-context.cpp`|`need_logits` renamed to `need_pre_sample_logits`; only set when `n_probs > 0 && !post_sampling_probs`; backend sampling now works with `post_sampling_probs`; 0.0-probability tokens filtered from `result.probs`; compiled from upstream, no project JNI changes required |
249
+
|~b9094–b9102 |`src/llama-model.cpp`|`n_vocab` loading moved from `llama_model_base::load_hparams()` to per-model `load_arch_hparams()` (e.g. `src/models/deepseek2.cpp`, `src/models/llama.cpp`); internal model-loading refactor, no project changes required |
250
+
|~b9094–b9102 |`src/llama-model.cpp`|`ggml/src/ggml-virtgpu/ggml-backend-device.cpp` gains `#include <mutex>` for `std::once_flag`; internal backend fix, no project changes required |
251
+
|~b9094–b9102 |`vendor/cpp-httplib/httplib.cpp` + `httplib.h`| Security fix: chunk-size parsing replaced `strtoul` with manual hex-digit scanning to prevent overflow and reject invalid chunk extensions; version bumped to 0.43.4; compiled automatically, no project changes required |
0 commit comments