Skip to content

Commit 705f980

Browse files
committed
Fix §2.10 cancel hang: post CANCEL without touching waiting_task_ids
The §2.10 commit (99d1c89) routed CancellationToken.cancel() through server_response_reader::stop(), which also calls queue_results.remove_waiting_task_ids(id_tasks). When called from a thread other than the inference thread (the whole point of immediate cancel), this races fatally with the inference thread blocked inside rd->next() -> queue_results.recv_with_timeout(id_tasks, 1s): 1. cancel() (thread B) removes the task id from waiting_task_ids. 2. Worker processes the queued CANCEL, releases the slot, posts the slot's final stop result via server_response::send(). 3. send() iterates waiting_task_ids to decide whether to enqueue. The id is gone, so the result is silently dropped and no notify_all fires (server-queue.cpp:319-332). 4. recv_with_timeout keeps timing out every 1 s. next()'s should_stop is hard-coded false, so it loops forever. 5. The JVM is hung inside receiveCompletionJson. CI surefire never returns (observed on Ubuntu and Windows). The HTTP server path can call stop() safely only because by the time it runs the HTTP handler has already returned and nothing is recv-ing on those task ids. Our cancel is asynchronous and breaks that assumption. Fix - jllama.cpp Java_net_ladenthin_llama_LlamaModel_queueCancel: post the SERVER_TASK_TYPE_CANCEL task directly through the reader's public queue_tasks reference. Do NOT call reader->stop(). The waiting task id stays in queue_results, so the slot's stop result reaches send(), is enqueued, notify_all fires, recv wakes up, next() returns the stop result, and the Java receive loop exits naturally on out.stop == true. - LlamaModel.complete(params, token): the cooperative-branch queueCancel must NOT break the loop. Post the cancel once (guarded by a cancelPosted flag) and keep calling receiveCompletionJson until the natural stop result arrives. Breaking immediately would orphan the reader in jctx->readers (no one would consume the stop result and erase it from the map until LlamaModel.close()). - CancellationToken javadoc updated to record the post-without-stop invariant and the CI-hang regression that motivated it. Verified locally - cmake --build build --target jllama: BUILD SUCCESS. - nm -D libjllama.so shows Java_net_ladenthin_llama_LlamaModel_queueCancel exported with plain C linkage (no _Z mangling regression). - mvn surefire:test on the affected unit suites (CancellationTokenTest, ContentPartTest, MultimodalMessagesTest, SessionConcurrencyTest): 32 tests pass, 1 skipped (no model). - mvn javadoc:jar: BUILD SUCCESS.
1 parent bf157e2 commit 705f980

3 files changed

Lines changed: 47 additions & 20 deletions

File tree

src/main/cpp/jllama.cpp

Lines changed: 23 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -996,19 +996,34 @@ JNIEXPORT void JNICALL Java_net_ladenthin_llama_LlamaModel_cancelCompletion(JNIE
996996
}
997997

998998
// Post a SERVER_TASK_TYPE_CANCEL message to the upstream task queue without
999-
// freeing the reader. Safe to call from any thread: server_response_reader::stop()
1000-
// posts via server_queue (internally mutex-locked) and only flips an internal
1001-
// boolean; it does NOT destroy the reader. A concurrently-blocked rd->next() on
1002-
// another thread will observe the cancel naturally via its results queue. The
1003-
// reader is later removed from jctx->readers by the normal stop-result code path
1004-
// in receiveCompletionJson (line ~826). stop() is idempotent, so subsequent
1005-
// erase_reader() destructor calls are safe.
999+
// touching the reader's waiting_task_ids registration. Safe to call from any
1000+
// thread: server_queue::post is mutex-locked internally and we only touch
1001+
// jctx->readers under readers_mutex.
1002+
//
1003+
// Why NOT call reader->stop() here: stop() ALSO calls
1004+
// queue_results.remove_waiting_task_ids(id_tasks). If the inference thread is
1005+
// concurrently blocked inside rd->next() -> queue_results.recv_with_timeout()
1006+
// polling for this same task id, removing it from waiting_task_ids causes the
1007+
// worker's later send() of the slot's stop result to be silently dropped (see
1008+
// server-queue.cpp server_response::send line ~319, which iterates
1009+
// waiting_task_ids and returns without enqueueing if the id is missing). recv
1010+
// then never wakes up, the polling loop spins on its 1 s timeout forever, and
1011+
// the JVM hangs. The HTTP server path can call stop() safely only because by
1012+
// then its request handler has already returned and nothing is recv-ing.
1013+
//
1014+
// Posting CANCEL directly through the reader's public queue_tasks reference
1015+
// keeps the waiting-task-id alive so the slot's natural stop result reaches
1016+
// rd->next(). The Java receive loop sees out.stop=true and exits, at which
1017+
// point receiveCompletionJson erases the reader on the inference thread (see
1018+
// the is_stop() branch around line 826).
10061019
JNIEXPORT void JNICALL Java_net_ladenthin_llama_LlamaModel_queueCancel(JNIEnv *env, jobject obj, jint id_task) {
10071020
REQUIRE_SERVER_CONTEXT();
10081021
std::lock_guard<std::mutex> lk(jctx->readers_mutex);
10091022
auto it = jctx->readers.find(id_task);
10101023
if (it != jctx->readers.end() && it->second) {
1011-
it->second->stop();
1024+
server_task task(SERVER_TASK_TYPE_CANCEL);
1025+
task.id_target = id_task;
1026+
it->second->queue_tasks.post(std::move(task), /*front=*/true);
10121027
}
10131028
}
10141029

src/main/java/net/ladenthin/llama/CancellationToken.java

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -26,11 +26,18 @@
2626
* </ol>
2727
* <p>
2828
* The reader-backed buffer is intentionally <em>not</em> freed by
29-
* {@link #cancel()} — that was the use-after-free root cause of the previous
30-
* mid-token attempt (a concurrent {@code rd-&gt;next()} held a raw pointer into
31-
* the erased {@code unique_ptr}). The new path only enqueues a cancel message
32-
* and leaves the reader alive; the normal stop-result code path in
33-
* {@code receiveCompletionJson} cleans it up.
29+
* {@link #cancel()} &#x2014; that was the use-after-free root cause of the
30+
* previous mid-token attempt (a concurrent {@code rd-&gt;next()} held a raw
31+
* pointer into the erased {@code unique_ptr}). The native {@code queueCancel}
32+
* primitive posts the {@code SERVER_TASK_TYPE_CANCEL} task to the upstream
33+
* queue directly and does <em>not</em> touch the reader's
34+
* {@code waiting_task_ids} registration. That ordering is critical: removing
35+
* the registration would cause the worker's later {@code send()} of the slot's
36+
* stop result to be silently dropped, which would in turn leave the inference
37+
* thread's polling {@code recv_with_timeout} loop spinning forever (this was
38+
* observed as a CI hang after the first attempt at §2.10). The reader is
39+
* cleaned up by the normal stop-result code path in
40+
* {@code receiveCompletionJson} once the natural stop arrives.
3441
* </p>
3542
* <p>
3643
* A token may be reused across calls but must be used by only one inference at a

src/main/java/net/ladenthin/llama/LlamaModel.java

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -273,16 +273,21 @@ public String complete(InferenceParameters parameters, CancellationToken token)
273273
int taskId = requestCompletion(parameters.toString());
274274
token.register(this, taskId);
275275
StringBuilder sb = new StringBuilder();
276+
boolean cancelPosted = false;
276277
try {
277278
while (true) {
278-
if (token.isCancelled()) {
279-
// Cooperative branch: cancel() may have flipped the flag before the
280-
// register() call landed, so the cross-thread queueCancel could have
281-
// no-op'd. Posting one here from the loop thread itself guarantees
282-
// the upstream worker sees the cancel even in that race. The reader
283-
// is not freed; the natural stop-result path cleans it up.
279+
if (!cancelPosted && token.isCancelled()) {
280+
// Cooperative fallback. CancellationToken.cancel() normally posts
281+
// queueCancel from the cancelling thread; this branch only fires
282+
// when cancel() ran before token.register(...) landed (the
283+
// not-yet-bound race). Post the cancel once and KEEP RECEIVING —
284+
// do not break here. The slot will see the queued CANCEL on its
285+
// next worker iteration and post its natural stop result, which
286+
// rd->next() can only receive while this task id is still in
287+
// waiting_task_ids. Breaking here would orphan the reader in
288+
// jctx->readers until LlamaModel.close().
284289
queueCancel(taskId);
285-
break;
290+
cancelPosted = true;
286291
}
287292
String json = receiveCompletionJson(taskId);
288293
LlamaOutput out = completionParser.parse(json);

0 commit comments

Comments
 (0)