Skip to content

Commit 6b7503d

Browse files
Merge pull request #274 from bernardladenthin/claude/inference-parameters-dry-sampling-j2tbm0
Upgrade llama.cpp to b9829; add DRY sampling parameters
2 parents b9ad93a + f3f5d38 commit 6b7503d

19 files changed

Lines changed: 608 additions & 153 deletions

.github/validate-models.bat

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,14 @@ REM GGUF files start with magic bytes: 0x47 0x47 0x55 0x46 ("GGUF")
99

1010
setlocal enabledelayedexpansion
1111

12-
set "MODELS=models\codellama-7b.Q2_K.gguf" "models\jina-reranker-v1-tiny-en-Q4_0.gguf" "models\AMD-Llama-135m-code.Q2_K.gguf" "models\Qwen3-0.6B-Q4_K_M.gguf" "models\Qwen2.5-1.5B-Instruct-Q4_K_M.gguf"
12+
REM Every CI Java test job (incl. Windows) now downloads the full model set before
13+
REM validating and runs the embedding / vision / TTS integration tests, so all of
14+
REM these are REQUIRED (a missing one is a hard failure, not a silent self-skip).
15+
set "MODELS=models\codellama-7b.Q2_K.gguf" "models\jina-reranker-v1-tiny-en-Q4_0.gguf" "models\AMD-Llama-135m-code.Q2_K.gguf" "models\Qwen3-0.6B-Q4_K_M.gguf" "models\Qwen2.5-1.5B-Instruct-Q4_K_M.gguf" "models\nomic-embed-text-v1.5.f16.gguf" "models\SmolVLM-500M-Instruct-Q8_0.gguf" "models\mmproj-SmolVLM-500M-Instruct-Q8_0.gguf" "models\OuteTTS-0.2-500M-Q4_K_M.gguf" "models\WavTokenizer-Large-75-F16.gguf"
1316

14-
REM Vision GGUFs are validated only when present (the Windows job downloads
15-
REM them too, but the validation step must not fail when a future job opts
16-
REM out of the vision matrix).
17-
set "OPTIONAL_MODELS=models\SmolVLM-500M-Instruct-Q8_0.gguf" "models\mmproj-SmolVLM-500M-Instruct-Q8_0.gguf"
17+
REM No optional models remain (the audio-input model has no CI download and its
18+
REM test self-skips). Left empty so the optional loop below is a no-op.
19+
set "OPTIONAL_MODELS="
1820

1921
echo Validating required model files...
2022
for %%M in (%MODELS%) do (

.github/validate-models.sh

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -10,26 +10,31 @@
1010

1111
set -e
1212

13+
# Every CI Java test job (Linux + all macOS + all Windows) now downloads the full
14+
# model set before validating, and runs the embedding / vision / TTS integration
15+
# tests with their properties set — so all of these are REQUIRED, not optional. A
16+
# missing model is a hard failure here (it would otherwise let an integration test
17+
# silently self-skip). See .github/workflows/publish.yml.
1318
MODELS=(
1419
"models/codellama-7b.Q2_K.gguf"
1520
"models/jina-reranker-v1-tiny-en-Q4_0.gguf"
1621
"models/AMD-Llama-135m-code.Q2_K.gguf"
1722
"models/Qwen3-0.6B-Q4_K_M.gguf"
1823
"models/Qwen2.5-1.5B-Instruct-Q4_K_M.gguf"
19-
)
20-
21-
# Optional GGUFs validated only when present so jobs that do not download
22-
# them (e.g. cross-compile smoke runs) still pass. The vision test image is
23-
# committed to src/test/resources/images/test-image.jpg and is not validated
24-
# here — its presence is asserted directly by MultimodalIntegrationTest.
25-
OPTIONAL_MODELS=(
2624
"models/nomic-embed-text-v1.5.f16.gguf"
2725
"models/SmolVLM-500M-Instruct-Q8_0.gguf"
2826
"models/mmproj-SmolVLM-500M-Instruct-Q8_0.gguf"
2927
"models/OuteTTS-0.2-500M-Q4_K_M.gguf"
3028
"models/WavTokenizer-Large-75-F16.gguf"
3129
)
3230

31+
# Optional GGUFs validated only when present. The vision test image is committed to
32+
# src/test/resources/images/test-image.jpg and is not validated here — its presence
33+
# is asserted directly by MultimodalIntegrationTest. The audio-input model
34+
# (AudioInputIntegrationTest) has no committed clip and no CI download, so that test
35+
# self-skips and its model is intentionally not listed here.
36+
OPTIONAL_MODELS=()
37+
3338
validate_gguf() {
3439
local model="$1"
3540
local required="$2"

.github/workflows/publish.yml

Lines changed: 61 additions & 16 deletions
Large diffs are not rendered by default.

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ from version 5.0.0 onward. Pre-fork releases (`1.x`–`4.2.0`) were authored by
1818
- End-to-end vision input across blocking, typed `ChatRequest`, streaming, and OpenAI-compatible request mapping; real-model tests verify that distinct red and blue images produce the correct semantic answers.
1919
- Explicit `setMmprojAuto(boolean)` and `setMmprojOffload(boolean)` controls, including the upstream `--no-mmproj-auto` and `--no-mmproj-offload` flags.
2020
- Per-request KV controls: `InferenceParameters.withSlotId(int)` and `withCacheReuse(int)`.
21+
- Per-request DRY sampling to `InferenceParameters` (`dry_multiplier`/`dry_base`/`dry_allowed_length`/`dry_penalty_last_n`/`dry_sequence_breakers`).
2122
- Typed cache observability through `Usage.getCachedTokens()`, `Usage.getProcessedPromptTokens()`, `SlotMetrics`, and `ServerMetrics.getSlotMetrics()`.
2223
- Authenticated JSON `GET /metrics` and `GET /slots` endpoints on the embedded server.
2324

@@ -27,9 +28,12 @@ from version 5.0.0 onward. Pre-fork releases (`1.x`–`4.2.0`) were authored by
2728
- README license badge corrected from "Apache 2.0" to "MIT" (matches `LICENSE` file and `pom.xml`).
2829
- `pom.xml` SCM URL: `tree/master``tree/main` (default branch renamed).
2930
- Upgraded llama.cpp from b9151 to b9172.
31+
- Upgraded llama.cpp from b9803 to b9829. Compiles the new upstream `server-stream.cpp` (resumable-streaming SSE replay buffer) into `libjllama`, required because `server-context`/`server-http`/`server-models` now reference its symbols; refreshed `patches/0001` for the `tests/test-export-graph-ops.cpp` rename and the `server.cpp` GC-init context shift.
32+
- `configureParallelInference` now applies `slot_prompt_similarity` live via `server_context::set_slot_prompt_similarity()` (upstream PR ggml-org/llama.cpp#22393, carried as `patches/0003` until merged), instead of validating it and discarding the value.
3033
- Extracted the `chatWithTools` agent loop into `ToolCallingAgent`; tool-result errors (unknown tool / handler exception) are now JSON-serialized so tool names containing special characters remain valid JSON.
3134

3235
### Fixed
36+
- Per-request `reasoning_budget_tokens` is now honored (via `patches/0004`, upstream PR ggml-org/llama.cpp#23116): `reasoning_budget_tokens=0` suppresses thinking. `ReasoningBudgetTest` now asserts the suppression directly (the previous test that pinned the unfixed-bug behavior was removed).
3337
- Preserved decoded image buffers across the JNI chat boundary and submitted media requests through llama.cpp's upstream multimodal task path instead of silently tokenizing them as text-only prompts.
3438
- Preserved multipart image content when using the typed `ChatRequest` serializer.
3539
- The standalone OpenAI-compatible server now advertises vision only when the loaded model confirms usable vision support.

0 commit comments

Comments
 (0)