You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Prepare 5.0.3 release: drop -SNAPSHOT, finalize CHANGELOG
- pom.xml: 5.0.3-SNAPSHOT -> 5.0.3
- README.md: release dependency examples 5.0.2 -> 5.0.3
(snapshot example stays 5.0.3-SNAPSHOT)
- CHANGELOG.md: close [Unreleased] into [5.0.3] - 2026-06-29 and
backfill the previously-missing [5.0.2] - 2026-06-08 section
(reconstructed from the v5.0.2 tag), plus compare links.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01URUX3HiqQ1wzJnT8qn8c8E
Copy file name to clipboardExpand all lines: CHANGELOG.md
+19-9Lines changed: 19 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,10 +9,9 @@ from version 5.0.0 onward. Pre-fork releases (`1.x`–`4.2.0`) were authored by
9
9
10
10
## [Unreleased]
11
11
12
+
## [5.0.3] - 2026-06-29
13
+
12
14
### Added
13
-
-`CODE_OF_CONDUCT.md` (Contributor Covenant 2.0).
14
-
-`docs/RELEASE.md` capturing the maintainer-facing release procedure (moved out of CHANGELOG).
15
-
- OpenSSF Best Practices badge (project 12862) on README.
16
15
- OpenAI-compatible `parallel_tool_calls` support: `ChatRequest.withParallelToolCalls(Boolean)` / `getParallelToolCalls()`, `InferenceParameters.withParallelToolCalls(boolean)`, and pass-through in the `/v1/chat/completions` server mapper.
17
16
- Real-model tool-calling integration tests for blocking and streaming required tool calls (`ToolCallingIntegrationTest`, Qwen2.5-1.5B-Instruct), wired into CI and `validate-models`.
18
17
- End-to-end vision input across blocking, typed `ChatRequest`, streaming, and OpenAI-compatible request mapping; real-model tests verify that distinct red and blue images produce the correct semantic answers.
@@ -22,13 +21,10 @@ from version 5.0.0 onward. Pre-fork releases (`1.x`–`4.2.0`) were authored by
- Typed cache observability through `Usage.getCachedTokens()`, `Usage.getProcessedPromptTokens()`, `SlotMetrics`, and `ServerMetrics.getSlotMetrics()`.
24
23
- Authenticated JSON `GET /metrics` and `GET /slots` endpoints on the embedded server.
24
+
- Windows GPU native classifiers: `cuda13-windows-x86-64`, `vulkan-windows-x86-64`, and `opencl-windows-x86-64`; the default Windows CPU JAR flipped to the Ninja Multi-Config generator with an `msvc-windows` classifier preserving the Visual Studio build.
25
25
26
26
### Changed
27
-
- Unified `CONTRIBUTING.md` and `SECURITY.md` structure with sibling repositories in the project family.
28
-
- Reconciled Java baseline to **11+** across `pom.xml`, README badge, `CLAUDE.md`, and `CONTRIBUTING.md`.
29
-
- README license badge corrected from "Apache 2.0" to "MIT" (matches `LICENSE` file and `pom.xml`).
- Upgraded llama.cpp from b9172 to b9803 across multiple incremental upgrades.
32
28
- Upgraded llama.cpp from b9803 to b9829. Compiles the new upstream `server-stream.cpp` (resumable-streaming SSE replay buffer) into `libjllama`, required because `server-context`/`server-http`/`server-models` now reference its symbols; refreshed `patches/0001` for the `tests/test-export-graph-ops.cpp` rename and the `server.cpp` GC-init context shift.
33
29
- Upgraded llama.cpp from b9829 to b9839. Pure version bump — no project source changes: all four patches (`0001`–`0004`) apply unchanged against b9839, and every upstream change in the range is absorbed inside upstream-compiled translation units. Brings DFlash block-diffusion speculative decoding (`--spec-type draft-dflash`), the MiniCPM5 XML tool-call chat template, a server `--reasoning-preserve` flag (preserve reasoning trace across the full history when the template supports it), and Jinja `min`/`max` array filters; removes the now-unused `common/regex-partial.{cpp,h}` (partial-regex matching is fully inside the PEG parser), which the project never referenced.
34
30
- Upgraded llama.cpp from b9839 to b9840. Pure version bump — no project source changes: the range is entirely the new **DeepSeek-V4** architecture (new `deepseek4` arch + dedicated `llama-kv-cache-dsv4` cache, `sqrtsoftplus` MoE gating, hyper-connection/compressor hparams + tensors, conversion scripts and embedded chat template), all absorbed inside upstream-compiled `libllama` and the Python converters. Upstream's `src/CMakeLists.txt` adds the new `llama-kv-cache-dsv4.cpp` itself (built via FetchContent). All four patches (`0001`–`0004`) apply unchanged; the project binds none of the new symbols.
@@ -43,9 +39,21 @@ from version 5.0.0 onward. Pre-fork releases (`1.x`–`4.2.0`) were authored by
43
39
-`Session` now pins every inference request to its configured slot, so generation and slot save/restore/erase target the same KV state.
44
40
- Cached-token usage is preserved through typed Java responses and OpenAI Responses/Anthropic blocking and streaming adapters.
45
41
42
+
## [5.0.2] - 2026-06-08
43
+
46
44
### Added
45
+
-`CODE_OF_CONDUCT.md` (Contributor Covenant 2.0).
46
+
-`docs/RELEASE.md` capturing the maintainer-facing release procedure (moved out of CHANGELOG).
47
+
- OpenSSF Best Practices badge (project 12862) on README.
47
48
- Reasoning-budget tests (Qwen3-0.6B).
48
49
50
+
### Changed
51
+
- Unified `CONTRIBUTING.md` and `SECURITY.md` structure with sibling repositories in the project family.
52
+
- Reconciled Java baseline to **11+** across `pom.xml`, README badge, `CLAUDE.md`, and `CONTRIBUTING.md`.
53
+
- README license badge corrected from "Apache 2.0" to "MIT" (matches `LICENSE` file and `pom.xml`).
@@ -110,6 +118,8 @@ Releases `1.1.1` through `4.2.0` were authored by [@kherud](https://github.com/k
110
118
111
119
For an architecture-level diff between the pre-fork baseline (`49be664`) and the first 5.0.0 candidate (`24918e4`), see [`docs/history/49be664_24918e4.md`](docs/history/49be664_24918e4.md). For the server-fork-deletion refactor that culminated in 5.0.0, see [`docs/history/REFACTORING.md`](docs/history/REFACTORING.md). For the chat-completion integration that landed in 5.0.0, see [`docs/history/CHAT_INTEGRATION_SUMMARY.md`](docs/history/CHAT_INTEGRATION_SUMMARY.md).
0 commit comments