Commit 705f980
committed
Fix §2.10 cancel hang: post CANCEL without touching waiting_task_ids
The §2.10 commit (99d1c89) routed CancellationToken.cancel() through
server_response_reader::stop(), which also calls
queue_results.remove_waiting_task_ids(id_tasks). When called from a
thread other than the inference thread (the whole point of immediate
cancel), this races fatally with the inference thread blocked inside
rd->next() -> queue_results.recv_with_timeout(id_tasks, 1s):
1. cancel() (thread B) removes the task id from waiting_task_ids.
2. Worker processes the queued CANCEL, releases the slot, posts the
slot's final stop result via server_response::send().
3. send() iterates waiting_task_ids to decide whether to enqueue. The
id is gone, so the result is silently dropped and no notify_all
fires (server-queue.cpp:319-332).
4. recv_with_timeout keeps timing out every 1 s. next()'s should_stop
is hard-coded false, so it loops forever.
5. The JVM is hung inside receiveCompletionJson. CI surefire never
returns (observed on Ubuntu and Windows).
The HTTP server path can call stop() safely only because by the time it
runs the HTTP handler has already returned and nothing is recv-ing on
those task ids. Our cancel is asynchronous and breaks that assumption.
Fix
- jllama.cpp Java_net_ladenthin_llama_LlamaModel_queueCancel: post the
SERVER_TASK_TYPE_CANCEL task directly through the reader's public
queue_tasks reference. Do NOT call reader->stop(). The waiting task
id stays in queue_results, so the slot's stop result reaches send(),
is enqueued, notify_all fires, recv wakes up, next() returns the
stop result, and the Java receive loop exits naturally on
out.stop == true.
- LlamaModel.complete(params, token): the cooperative-branch
queueCancel must NOT break the loop. Post the cancel once (guarded
by a cancelPosted flag) and keep calling receiveCompletionJson until
the natural stop result arrives. Breaking immediately would orphan
the reader in jctx->readers (no one would consume the stop result
and erase it from the map until LlamaModel.close()).
- CancellationToken javadoc updated to record the post-without-stop
invariant and the CI-hang regression that motivated it.
Verified locally
- cmake --build build --target jllama: BUILD SUCCESS.
- nm -D libjllama.so shows Java_net_ladenthin_llama_LlamaModel_queueCancel
exported with plain C linkage (no _Z mangling regression).
- mvn surefire:test on the affected unit suites
(CancellationTokenTest, ContentPartTest, MultimodalMessagesTest,
SessionConcurrencyTest): 32 tests pass, 1 skipped (no model).
- mvn javadoc:jar: BUILD SUCCESS.1 parent bf157e2 commit 705f980
3 files changed
Lines changed: 47 additions & 20 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
996 | 996 | | |
997 | 997 | | |
998 | 998 | | |
999 | | - | |
1000 | | - | |
1001 | | - | |
1002 | | - | |
1003 | | - | |
1004 | | - | |
1005 | | - | |
| 999 | + | |
| 1000 | + | |
| 1001 | + | |
| 1002 | + | |
| 1003 | + | |
| 1004 | + | |
| 1005 | + | |
| 1006 | + | |
| 1007 | + | |
| 1008 | + | |
| 1009 | + | |
| 1010 | + | |
| 1011 | + | |
| 1012 | + | |
| 1013 | + | |
| 1014 | + | |
| 1015 | + | |
| 1016 | + | |
| 1017 | + | |
| 1018 | + | |
1006 | 1019 | | |
1007 | 1020 | | |
1008 | 1021 | | |
1009 | 1022 | | |
1010 | 1023 | | |
1011 | | - | |
| 1024 | + | |
| 1025 | + | |
| 1026 | + | |
1012 | 1027 | | |
1013 | 1028 | | |
1014 | 1029 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
29 | | - | |
30 | | - | |
31 | | - | |
32 | | - | |
33 | | - | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
34 | 41 | | |
35 | 42 | | |
36 | 43 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
273 | 273 | | |
274 | 274 | | |
275 | 275 | | |
| 276 | + | |
276 | 277 | | |
277 | 278 | | |
278 | | - | |
279 | | - | |
280 | | - | |
281 | | - | |
282 | | - | |
283 | | - | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
284 | 289 | | |
285 | | - | |
| 290 | + | |
286 | 291 | | |
287 | 292 | | |
288 | 293 | | |
| |||
0 commit comments