Fix §2.10 cancel hang: post CANCEL without touching waiting_task_ids

claude · claude · commit 705f980524a6 · 2026-05-23T13:38:00.000Z
The §2.10 commit (99d1c89) routed CancellationToken.cancel() through server_response_reader::stop(), which also calls queue_results.remove_waiting_task_ids(id_tasks). When called from a thread other than the inference thread (the whole point of immediate cancel), this races fatally with the inference thread blocked inside rd->next() -> queue_results.recv_with_timeout(id_tasks, 1s): 1. cancel() (thread B) removes the task id from waiting_task_ids. 2. Worker processes the queued CANCEL, releases the slot, posts the slot's final stop result via server_response::send(). 3. send() iterates waiting_task_ids to decide whether to enqueue. The id is gone, so the result is silently dropped and no notify_all fires (server-queue.cpp:319-332). 4. recv_with_timeout keeps timing out every 1 s. next()'s should_stop is hard-coded false, so it loops forever. 5. The JVM is hung inside receiveCompletionJson. CI surefire never returns (observed on Ubuntu and Windows). The HTTP server path can call stop() safely only because by the time it runs the HTTP handler has already returned and nothing is recv-ing on those task ids. Our cancel is asynchronous and breaks that assumption. Fix - jllama.cpp Java_net_ladenthin_llama_LlamaModel_queueCancel: post the SERVER_TASK_TYPE_CANCEL task directly through the reader's public queue_tasks reference. Do NOT call reader->stop(). The waiting task id stays in queue_results, so the slot's stop result reaches send(), is enqueued, notify_all fires, recv wakes up, next() returns the stop result, and the Java receive loop exits naturally on out.stop == true. - LlamaModel.complete(params, token): the cooperative-branch queueCancel must NOT break the loop. Post the cancel once (guarded by a cancelPosted flag) and keep calling receiveCompletionJson until the natural stop result arrives. Breaking immediately would orphan the reader in jctx->readers (no one would consume the stop result and erase it from the map until LlamaModel.close()). - CancellationToken javadoc updated to record the post-without-stop invariant and the CI-hang regression that motivated it. Verified locally - cmake --build build --target jllama: BUILD SUCCESS. - nm -D libjllama.so shows Java_net_ladenthin_llama_LlamaModel_queueCancel exported with plain C linkage (no _Z mangling regression). - mvn surefire:test on the affected unit suites (CancellationTokenTest, ContentPartTest, MultimodalMessagesTest, SessionConcurrencyTest): 32 tests pass, 1 skipped (no model). - mvn javadoc:jar: BUILD SUCCESS.
diff --git a/src/main/cpp/jllama.cpp b/src/main/cpp/jllama.cpp
@@ -996,19 +996,34 @@ JNIEXPORT void JNICALL Java_net_ladenthin_llama_LlamaModel_cancelCompletion(JNIE
 }
 
 // Post a SERVER_TASK_TYPE_CANCEL message to the upstream task queue without
-// freeing the reader. Safe to call from any thread: server_response_reader::stop()
-// posts via server_queue (internally mutex-locked) and only flips an internal
-// boolean; it does NOT destroy the reader. A concurrently-blocked rd->next() on
-// another thread will observe the cancel naturally via its results queue. The
-// reader is later removed from jctx->readers by the normal stop-result code path
-// in receiveCompletionJson (line ~826). stop() is idempotent, so subsequent
-// erase_reader() destructor calls are safe.
+// touching the reader's waiting_task_ids registration. Safe to call from any
+// thread: server_queue::post is mutex-locked internally and we only touch
+// jctx->readers under readers_mutex.
+//
+// Why NOT call reader->stop() here: stop() ALSO calls
+// queue_results.remove_waiting_task_ids(id_tasks). If the inference thread is
+// concurrently blocked inside rd->next() -> queue_results.recv_with_timeout()
+// polling for this same task id, removing it from waiting_task_ids causes the
+// worker's later send() of the slot's stop result to be silently dropped (see
+// server-queue.cpp server_response::send line ~319, which iterates
+// waiting_task_ids and returns without enqueueing if the id is missing). recv
+// then never wakes up, the polling loop spins on its 1 s timeout forever, and
+// the JVM hangs. The HTTP server path can call stop() safely only because by
+// then its request handler has already returned and nothing is recv-ing.
+//
+// Posting CANCEL directly through the reader's public queue_tasks reference
+// keeps the waiting-task-id alive so the slot's natural stop result reaches
+// rd->next(). The Java receive loop sees out.stop=true and exits, at which
+// point receiveCompletionJson erases the reader on the inference thread (see
+// the is_stop() branch around line 826).
 JNIEXPORT void JNICALL Java_net_ladenthin_llama_LlamaModel_queueCancel(JNIEnv *env, jobject obj, jint id_task) {
     REQUIRE_SERVER_CONTEXT();
     std::lock_guard<std::mutex> lk(jctx->readers_mutex);
     auto it = jctx->readers.find(id_task);
     if (it != jctx->readers.end() && it->second) {
-        it->second->stop();
+        server_task task(SERVER_TASK_TYPE_CANCEL);
+        task.id_target = id_task;
+        it->second->queue_tasks.post(std::move(task), /*front=*/true);
     }
 }
 
diff --git a/src/main/java/net/ladenthin/llama/CancellationToken.java b/src/main/java/net/ladenthin/llama/CancellationToken.java
@@ -26,11 +26,18 @@
  * </ol>
  * <p>
  * The reader-backed buffer is intentionally <em>not</em> freed by
- * {@link #cancel()} — that was the use-after-free root cause of the previous
- * mid-token attempt (a concurrent {@code rd-&gt;next()} held a raw pointer into
- * the erased {@code unique_ptr}). The new path only enqueues a cancel message
- * and leaves the reader alive; the normal stop-result code path in
- * {@code receiveCompletionJson} cleans it up.
+ * {@link #cancel()} &#x2014; that was the use-after-free root cause of the
+ * previous mid-token attempt (a concurrent {@code rd-&gt;next()} held a raw
+ * pointer into the erased {@code unique_ptr}). The native {@code queueCancel}
+ * primitive posts the {@code SERVER_TASK_TYPE_CANCEL} task to the upstream
+ * queue directly and does <em>not</em> touch the reader's
+ * {@code waiting_task_ids} registration. That ordering is critical: removing
+ * the registration would cause the worker's later {@code send()} of the slot's
+ * stop result to be silently dropped, which would in turn leave the inference
+ * thread's polling {@code recv_with_timeout} loop spinning forever (this was
+ * observed as a CI hang after the first attempt at §2.10). The reader is
+ * cleaned up by the normal stop-result code path in
+ * {@code receiveCompletionJson} once the natural stop arrives.
  * </p>
  * <p>
  * A token may be reused across calls but must be used by only one inference at a
diff --git a/src/main/java/net/ladenthin/llama/LlamaModel.java b/src/main/java/net/ladenthin/llama/LlamaModel.java
@@ -273,16 +273,21 @@ public String complete(InferenceParameters parameters, CancellationToken token)
 		int taskId = requestCompletion(parameters.toString());
 		token.register(this, taskId);
 		StringBuilder sb = new StringBuilder();
+		boolean cancelPosted = false;
 		try {
 			while (true) {
-				if (token.isCancelled()) {
-					// Cooperative branch: cancel() may have flipped the flag before the
-					// register() call landed, so the cross-thread queueCancel could have
-					// no-op'd. Posting one here from the loop thread itself guarantees
-					// the upstream worker sees the cancel even in that race. The reader
-					// is not freed; the natural stop-result path cleans it up.
+				if (!cancelPosted && token.isCancelled()) {
+					// Cooperative fallback. CancellationToken.cancel() normally posts
+					// queueCancel from the cancelling thread; this branch only fires
+					// when cancel() ran before token.register(...) landed (the
+					// not-yet-bound race). Post the cancel once and KEEP RECEIVING —
+					// do not break here. The slot will see the queued CANCEL on its
+					// next worker iteration and post its natural stop result, which
+					// rd->next() can only receive while this task id is still in
+					// waiting_task_ids. Breaking here would orphan the reader in
+					// jctx->readers until LlamaModel.close().
 					queueCancel(taskId);
-					break;
+					cancelPosted = true;
 				}
 				String json = receiveCompletionJson(taskId);
 				LlamaOutput out = completionParser.parse(json);