Commit 92a4e1f
Add async APIs, cancellation support, and typed metrics accessors (#188)
* Add typed Usage / Timings / ServerMetrics accessors (§2.5)
Introduces Usage, Timings, and ServerMetrics value classes plus
LlamaModel.getMetricsTyped() so callers no longer need to parse the
raw JSON from getMetrics() by hand. Mirrors the existing ModelMeta
pattern. 15 unit tests, no native or JNI changes.
https://claude.ai/code/session_01R4ZrEy3ptJDLuUgUKuM4Gy
* Add per-request setJsonSchema + completeAsJson<T> helpers (§2.8)
InferenceParameters gains setJsonSchema(String) mirroring the existing
ModelParameters setter. LlamaModel.completeAsJson<T> sets the schema,
runs complete(), and deserializes the result via Jackson, throwing a
LlamaException if the model output is not valid JSON for the target
type. No JNI changes — the native server already accepts json_schema
in slot params.
https://claude.ai/code/session_01R4ZrEy3ptJDLuUgUKuM4Gy
* Add typed logprobs to LlamaOutput (§2.9)
New TokenLogprob record carries token text, id, raw prob/logprob, and
the nested top_probs/top_logprobs alternatives. LlamaOutput.logprobs
is populated by CompletionResponseParser.parseLogprobs from the same
completion_probabilities array that already feeds the flat
probabilities map. Existing constructor stays as a delegator so all
prior callers keep working with logprobs defaulting to empty.
https://claude.ai/code/session_01R4ZrEy3ptJDLuUgUKuM4Gy
* Document iterator cancel semantics + idempotency regression (§2.7)
LlamaIterator.cancel() / close() were already wired correctly via the
existing JNI cancelCompletion → erase_reader path, so this is purely a
docs + test pass:
- Clarify in LlamaIterator javadoc that the underlying llama.cpp slot
may continue to its natural stop after cancel(), while the reader is
released immediately and next() stops yielding.
- Document close() idempotency (post-natural-stop, post-cancel,
double-close all safe).
- Add try-with-resources example to LlamaModel.generate javadoc.
- Add testIteratorCloseIdempotent in LlamaModelTest covering both the
drained-then-closed and cancelled-then-closed paths and confirming
the model is still usable afterwards.
https://claude.ai/code/session_01R4ZrEy3ptJDLuUgUKuM4Gy
* Add CancellationToken + complete(params, token) overload (§2.10)
CancellationToken wraps an AtomicInteger task id and a model
reference. LlamaModel.complete(params, token) runs the streaming
inference path internally, binds the token, accumulates text, and
returns early when token.cancel() is invoked from another thread.
The token is reset on return so it is reusable across calls.
No JNI changes: reuses the existing cancelCompletion native method
(which erases the JNI reader; the upstream slot completes naturally).
https://claude.ai/code/session_01R4ZrEy3ptJDLuUgUKuM4Gy
* Add CompletableFuture async wrappers for complete/chatComplete (§2.3)
LlamaModel gains completeAsync, chatCompleteAsync, and
chatCompleteTextAsync — thin wrappers that dispatch the existing
blocking methods through ForkJoinPool.commonPool(). The
completeAsync(params, token) overload bridges future.cancel(true) to
CancellationToken.cancel() so cancellation propagates into the
inference loop.
Reactive Flow.Publisher streaming (M-effort) is intentionally deferred
to a follow-up; this PR delivers only the S-effort portion of §2.3.
https://claude.ai/code/session_01R4ZrEy3ptJDLuUgUKuM4Gy
* Add Session multi-turn helper + ChatMessage value type (§2.6)
Session is a thin wrapper over LlamaModel: it owns a slot id, an
accumulating user/assistant transcript, and an optional system
message and parameter customizer. send(userMessage) appends both
sides of the turn and runs chatCompleteText with the full history.
stream(userMessage) returns a LlamaIterable for streamed replies;
commitStreamedReply records the assistant turn once the caller has
accumulated the text.
save/restore delegate to existing LlamaModel.saveSlot/restoreSlot.
close() erases the slot's KV cache.
Single-threaded use only in this pass — per-session locking is the
M-effort follow-up. ChatMessage is the minimal value type for the
transcript; will be reused by ChatResponse when §2.2 lands.
https://claude.ai/code/session_01R4ZrEy3ptJDLuUgUKuM4Gy
* Fix CI VM crash: make CancellationToken cooperative-only
Cross-thread cancel raced with the JNI receive loop: cancel() called
cancelCompletion() from another thread, which erased the underlying
server_response_reader unique_ptr while the main thread held a raw
pointer to it and was blocked inside rd->next(). On the next token
this dereferenced freed memory and aborted with std::system_error,
crashing the test JVM (exit 134).
Fix: cancel() now sets a volatile flag only. The inference loop in
complete(params, token) checks the flag between tokens and, when set,
calls cancelCompletion from the same thread that just returned from
receiveCompletionJson — safe because no concurrent access remains.
Latency becomes one token interval (tens to a few hundred ms on CPU)
instead of immediate. Documented in CancellationToken javadoc.
Tests:
- LlamaModelTest#testCompleteWithCancellationToken: budget relaxed
from 5s to 30s (was tight even on the happy path).
- LlamaModelTest#testCompleteAsyncCancelPropagates: drop the brittle
poll on token.isCancelled() (the worker resets the token on return
before the assertion sees it); sleep for cancel propagation and
verify the model is still usable.
https://claude.ai/code/session_01R4ZrEy3ptJDLuUgUKuM4Gy
* Fix javadoc build error + reduce warnings on new classes
The release packaging job (mvn package, release profile) runs
maven-javadoc-plugin's attach-javadocs which treats Javadoc tool
errors as build failures. PR #188 introduced one such error:
TokenLogprob.java had a </p> with no matching <p> (the prose was
already enclosed by an outer <p>...</p>, and the inner </p> was
stray).
Fix the error and bring my new public APIs up to a clean shape:
- TokenLogprob: rebalance the <p>/</p> HTML and add @return / @param
to public getters and constructor.
- Timings, Usage, ServerMetrics, ChatMessage, CancellationToken,
Session, LlamaOutput: add @return / @param tags with a leading
one-line description (the "no main description" warning fires on
bare /** @return ... */ blocks).
- LlamaModel: restore the doc comment for complete(params, token)
that was accidentally stripped during an earlier edit, and add one
for getMetricsTyped(); remove a stray orphan doc block.
Local verification:
mvn clean javadoc:jar -DskipTests=true -Dgpg.skip=true
mvn -P release -Dmaven.test.skip=true -Dgpg.skip=true package
Both: BUILD SUCCESS (was: BUILD FAILURE, 1 error, 100 warnings).
60 warnings remain, all from pre-existing files outside this PR.
Document the verification command and the failure categories
(errors vs warnings) in CLAUDE.md under "Javadoc — must build
cleanly before mvn package".
https://claude.ai/code/session_01R4ZrEy3ptJDLuUgUKuM4Gy
* Add LoadProgressCallback for model-load progress (#113)
Exposes llama.cpp's llama_model_params.progress_callback as a Java
functional interface. New constructor:
new LlamaModel(parameters, progress -> { ... return true; });
The callback receives a float in [0.0, 1.0] on the loader thread
(same thread that called the constructor) and may return false to
abort, in which case the constructor throws LlamaException.
JNI: extracts the existing loadModel body into load_model_impl,
adds a trampoline that forwards float progress to a Java
LoadProgressCallback.onProgress(float)Z via CallBooleanMethod.
Trampoline state lives on the loader stack — bounded lifetime is
the single load call.
Two native entry points share the implementation:
loadModel(String[]) — unchanged signature
loadModelWithProgress(String[], LoadProgressCallback)
Tests in LoadProgressCallbackTest (model-gated): non-decreasing
progress in [0,1] reaching ~1.0, returning false aborts with
LlamaException, null callback overload delegates to plain loadModel.
All 435 C++ unit tests still pass. mvn javadoc:jar BUILD SUCCESS.
https://claude.ai/code/session_01R4ZrEy3ptJDLuUgUKuM4Gy
* Add typed ChatRequest/ChatResponse + tool calling + agent loop (§2.2)
New typed chat API on top of the existing handleChatCompletions JNI
path — no native changes.
Value types:
- ChatChoice, ChatResponse — choices array, Usage, Timings, raw JSON
- ToolCall, ToolDefinition — OAI-shaped tool wire types
- ChatMessage (extended) — tool_call_id + tool_calls support, with
toolResult() and assistantToolCalls() factory methods (backwards-
compatible 2-arg constructor kept for Session and existing tests)
- ToolHandler — functional interface for tool callbacks
- ChatRequest — builder with messages, tools, tool_choice,
maxToolRounds, and an InferenceParameters customizer
InferenceParameters: new setMessagesJson(String), setToolsJson(String),
setToolChoice(String) for verbatim JSON injection from ChatRequest.
LlamaModel:
- chat(ChatRequest) → ChatResponse
Serializes the request (auto-enables use_jinja when tools present),
calls chatComplete, parses the OAI JSON into ChatResponse via the
extended ChatResponseParser.parseResponse.
- chatWithTools(ChatRequest, Map<String, ToolHandler>) → ChatResponse
Agent loop: per round, calls chat(); if the assistant returned
tool_calls, invokes each handler (capturing exceptions as
{"error":...} tool results so the loop continues), appends the
assistant turn and tool-result turns to the request, and loops up to
ChatRequest.maxToolRounds (default 8). Unknown tool names produce a
{"error":"unknown tool: <name>"} result.
ChatResponseParser: new parseResponse() and tool-call/choice parsers;
handles both string-shaped and object-shaped tool_calls.arguments
(some upstream variants emit each shape).
Tests:
- ChatResponseTest (7 new unit tests, model-free): plain reply, tool
calls with string arguments, object-shaped arguments, malformed
input, ChatRequest serialization round-trip.
- LlamaModelTest: testTypedChat and testChatWithToolsLoopShortCircuits
(model-gated).
mvn javadoc:jar BUILD SUCCESS (0 errors, 60 warnings — same as before,
none from new files).
https://claude.ai/code/session_01R4ZrEy3ptJDLuUgUKuM4Gy
* Add completeWithStats() for typed Usage/Timings/logprobs on plain completion
complete() returned only the generated text, while chat() already
exposed Usage/Timings/TokenLogprob via ChatResponse. This commit
parity-fills the plain completion path:
- New CompletionResult value type (text + Usage + Timings +
List<TokenLogprob> + StopReason + raw JSON).
- New LlamaModel.completeWithStats(InferenceParameters) calling the
existing non-streaming JNI path and parsing the response via a new
CompletionResponseParser.parseCompletionResult.
- Maps the non-OAI completion fields: content -> text,
tokens_evaluated -> Usage.promptTokens, tokens_predicted ->
Usage.completionTokens, timings sub-object -> Timings,
completion_probabilities -> List<TokenLogprob>, stop_type ->
StopReason.
complete() (the String-returning overload) is unchanged for
backwards compatibility.
5 unit tests in CompletionResultTest (model-free): full response,
missing-fields defaults, stop reason mapping (eos / limit / word),
malformed input. mvn javadoc:jar BUILD SUCCESS, no new warnings.
https://claude.ai/code/session_01R4ZrEy3ptJDLuUgUKuM4Gy
* Add completeBatch / chatBatch parallel dispatch (§2.4)
Three new methods on LlamaModel that hand a list of requests to the
native scheduler at once and collect results in input order:
- completeBatch(List<InferenceParameters>) -> List<String>
- completeBatchWithStats(List<InferenceParameters>) -> List<CompletionResult>
- chatBatch(List<ChatRequest>) -> List<ChatResponse>
Implementation reuses the existing CompletableFuture wrappers
(completeAsync, supplyAsync(() -> completeWithStats/chat)) and
joins them all in input order. The native worker thread runs the
upstream slot scheduler, which dispatches tasks across however many
slots ModelParameters.setParallel(N) was configured with. With the
default N=1 the batch still works correctly, just sequentially.
No JNI changes — the upstream scheduler already supports parallel
slot execution; this surfaces it as a typed Java API.
Three model-gated tests in LlamaModelTest exercise the order-preserving
contract and per-result Usage population.
mvn javadoc:jar BUILD SUCCESS, no new warnings.
https://claude.ai/code/session_01R4ZrEy3ptJDLuUgUKuM4Gy
* Add LlamaPublisher reactive-streams token publisher (§2.3 follow-up)
Backpressure-aware Publisher<LlamaOutput> on top of the existing
streaming iterator. Reactor / RxJava / Kotlin coroutines all bridge
to the Reactive Streams interface natively, so consumers wrap with
Flux.from(...) / Flowable.fromPublisher(...) / asFlow() in one line.
LlamaPublisher:
- Single-subscriber (second subscribe signals onError per RS spec).
- Each subscribe starts a dedicated emitter daemon thread.
- Demand honoured via AtomicLong + monitor: emitter blocks while
demand == 0 and only calls iterator.next() when demand > 0.
- request(n <= 0) signals onError with IllegalArgumentException per
reactive-streams §3.9.
- cancel() closes the underlying iterator (cooperative, same path as
LlamaIterator.close); idempotent.
- onComplete fires on stop token, onError on any throwable from the
iterator path.
LlamaModel:
- streamPublisher(InferenceParameters) and
streamChatPublisher(InferenceParameters) factories.
Dependency: adds org.reactivestreams:reactive-streams 1.0.4 (~5 KB,
Java 8 compatible) to pom.xml.
Tests in LlamaPublisherTest:
- nullSubscriberThrows (model-free).
- backpressureAndCancel, singleSubscriberContract,
invalidRequestSignalsError (model-gated).
mvn javadoc:jar BUILD SUCCESS, no new warnings.
https://claude.ai/code/session_01R4ZrEy3ptJDLuUgUKuM4Gy
* Annotate docs with shipped/open status for the §2 feature inventory
The kotlin-comparison doc and the open-issues doc were both stale
after PR #188 shipped 11 features. Bring them up to date in place
rather than introducing a separate changelog:
docs/feature-investigation-llama-stack-client-kotlin.md
- §1 capability matrix gets new rows for everything that landed
(typed chat + tools, async wrappers, reactive Publisher, batch
dispatch, Usage/Timings, Session, completeAsJson, TokenLogprob,
CancellationToken, LoadProgressCallback).
- New §1.1 status legend (SHIPPED / PARTIAL / OPEN).
- Each §2.x section now starts with a Status: line summarising what
shipped, with commit refs into this PR. §2.2/2.3/2.4/2.5/2.7/2.8/
2.9/2.10 marked SHIPPED. §2.6 PARTIAL (locking deferred).
§2.10 PARTIAL — cooperative cancel shipped; immediate cancel
needs a new server-side JNI primitive (M-effort follow-up).
§2.1 OPEN (multimodal image API).
docs/history/49be664_open_issues.md
- #113 updated STILL POSSIBLE -> FIXED in PR #188 commit 70df324,
with a note that the richer payload (file name, bytes, weights
flag) is intentionally not exposed because the upstream
llama_model_params.progress_callback emits only a float.
No code changes, no test impact.
https://claude.ai/code/session_01R4ZrEy3ptJDLuUgUKuM4Gy
---------
Co-authored-by: Claude <noreply@anthropic.com>1 parent e96984c commit 92a4e1f
39 files changed
Lines changed: 3527 additions & 9 deletions
File tree
- docs
- history
- src
- main
- cpp
- java/net/ladenthin/llama
- json
- test/java/net/ladenthin/llama
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
514 | 514 | | |
515 | 515 | | |
516 | 516 | | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
517 | 563 | | |
518 | 564 | | |
519 | 565 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
32 | | - | |
| 32 | + | |
33 | 33 | | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
34 | 45 | | |
35 | 46 | | |
36 | 47 | | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
37 | 61 | | |
38 | 62 | | |
39 | 63 | | |
40 | 64 | | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
41 | 68 | | |
42 | 69 | | |
43 | 70 | | |
| |||
59 | 86 | | |
60 | 87 | | |
61 | 88 | | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
62 | 99 | | |
63 | 100 | | |
64 | 101 | | |
| |||
85 | 122 | | |
86 | 123 | | |
87 | 124 | | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
88 | 134 | | |
89 | 135 | | |
90 | 136 | | |
| |||
108 | 154 | | |
109 | 155 | | |
110 | 156 | | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
111 | 164 | | |
112 | 165 | | |
113 | 166 | | |
| |||
127 | 180 | | |
128 | 181 | | |
129 | 182 | | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
130 | 189 | | |
131 | 190 | | |
132 | 191 | | |
| |||
146 | 205 | | |
147 | 206 | | |
148 | 207 | | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
149 | 214 | | |
150 | 215 | | |
151 | 216 | | |
| |||
165 | 230 | | |
166 | 231 | | |
167 | 232 | | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
168 | 239 | | |
169 | 240 | | |
170 | 241 | | |
| |||
183 | 254 | | |
184 | 255 | | |
185 | 256 | | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
186 | 264 | | |
187 | 265 | | |
188 | 266 | | |
| |||
202 | 280 | | |
203 | 281 | | |
204 | 282 | | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
205 | 290 | | |
206 | 291 | | |
207 | 292 | | |
| |||
219 | 304 | | |
220 | 305 | | |
221 | 306 | | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
222 | 318 | | |
223 | 319 | | |
224 | 320 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
188 | 188 | | |
189 | 189 | | |
190 | 190 | | |
191 | | - | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
192 | 207 | | |
193 | 208 | | |
194 | 209 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
73 | 73 | | |
74 | 74 | | |
75 | 75 | | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
76 | 84 | | |
77 | 85 | | |
78 | 86 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
598 | 598 | | |
599 | 599 | | |
600 | 600 | | |
601 | | - | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
602 | 621 | | |
603 | 622 | | |
604 | 623 | | |
| |||
662 | 681 | | |
663 | 682 | | |
664 | 683 | | |
| 684 | + | |
| 685 | + | |
| 686 | + | |
| 687 | + | |
| 688 | + | |
| 689 | + | |
| 690 | + | |
| 691 | + | |
| 692 | + | |
| 693 | + | |
| 694 | + | |
| 695 | + | |
| 696 | + | |
| 697 | + | |
| 698 | + | |
665 | 699 | | |
666 | 700 | | |
667 | 701 | | |
| |||
706 | 740 | | |
707 | 741 | | |
708 | 742 | | |
| 743 | + | |
| 744 | + | |
| 745 | + | |
| 746 | + | |
| 747 | + | |
| 748 | + | |
| 749 | + | |
| 750 | + | |
| 751 | + | |
| 752 | + | |
709 | 753 | | |
710 | 754 | | |
711 | 755 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
0 commit comments