[52265] slot update_slots: id 2 | task 72394 | n_tokens = 64619, memory_seq_rm [64619, end)
[52265] slot update_slots: id 2 | task 72394 | 8192 tokens since last checkpoint at 54384, creating new checkpoint during processing at position 66666
[52265] slot update_slots: id 2 | task 72394 | prompt processing progress, n_tokens = 66666, batch.n_tokens = 2048, progress = 0.943730
[52265] slot update_slots: id 2 | task 72394 | created context checkpoint 10 of 512 (pos_min = 64618, pos_max = 64618, n_tokens = 64619, size = 186.329 MiB)
[52265] slot update_slots: id 2 | task 72394 | n_tokens = 66666, memory_seq_rm [66666, end)
[52265] slot update_slots: id 2 | task 72394 | prompt processing progress, n_tokens = 68593, batch.n_tokens = 1928, progress = 0.971008
[52265] slot update_slots: id 2 | task 72394 | n_tokens = 68593, memory_seq_rm [68593, end)
[52265] slot update_slots: id 2 | task 72394 | prompt processing progress, n_tokens = 70637, batch.n_tokens = 2045, progress = 0.999943
[52265] slot update_slots: id 2 | task 72394 | created context checkpoint 11 of 512 (pos_min = 68592, pos_max = 68592, n_tokens = 68593, size = 186.329 MiB)
[52265] slot update_slots: id 2 | task 72394 | n_tokens = 70637, memory_seq_rm [70637, end)
[52265] reasoning-budget: activated, budget=2147483647 tokens
[52265] slot init_sampler: id 2 | task 72394 | init sampler, took 6.73 ms, tokens: text = 70641, total = 70641
[52265] slot update_slots: id 2 | task 72394 | prompt processing done, n_tokens = 70641, batch.n_tokens = 5
[52265] slot update_slots: id 2 | task 72394 | created context checkpoint 12 of 512 (pos_min = 70636, pos_max = 70636, n_tokens = 70637, size = 186.329 MiB)
[52265] srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv log_server_r: done request: POST /v1/messages 192.168.0.61 200
srv operator(): http client error: Connection handling canceled
[52265] srv stop: cancel task, id_task = 72394
[52265] slot release: id 2 | task 72394 | stop processing: n_tokens = 74571, truncated = 0
srv operator(): http client error: Connection handling canceled
[52265] srv stop: cancel task, id_task = 72393
[52265] slot release: id 0 | task 72393 | stop processing: n_tokens = 4299, truncated = 0
[52265] srv update_slots: all slots are idle
^C[52265] srv operator(): operator(): cleaning up before exit...
srv operator(): operator(): cleaning up before exit...
Name and Version
3bd9aa1
Operating systems
Linux
GGML backends
CUDA
Hardware
Epyc 9274f + 3*RTX 4000 Pro Blackwell
Models
Qwen 3.5 397\122B (both in mxfp4 from unsloth)
Problem description & steps to reproduce
After recent changes (1-2 days max) I see some strange looping issues - I use claude code and on latest release of llama.cpp after 1-4 skill usage\tool usage model stucks in token generation loop (It can be seen in nvtop for example, as claude code does not shows message in streaming mode). I'm not sure whether it's thinking loop or it's generating something. Rolled back to commit
506200cand issue is gone. I will try to bissect to particular commit, but it would take quite some time as loading model takes about 20 min. for me (nmap is off for better prompt processing performance).First Bad Commit
No response
Relevant log output
Logs look like regular ones, just processing and generating
Logs