Skip to content

Eval bug: Generation loop for Qwen 3.5 397B \ 122B after recent changes #21622

@drrros

Description

@drrros

Name and Version

3bd9aa1

Operating systems

Linux

GGML backends

CUDA

Hardware

Epyc 9274f + 3*RTX 4000 Pro Blackwell

Models

Qwen 3.5 397\122B (both in mxfp4 from unsloth)

Problem description & steps to reproduce

After recent changes (1-2 days max) I see some strange looping issues - I use claude code and on latest release of llama.cpp after 1-4 skill usage\tool usage model stucks in token generation loop (It can be seen in nvtop for example, as claude code does not shows message in streaming mode). I'm not sure whether it's thinking loop or it's generating something. Rolled back to commit 506200c and issue is gone. I will try to bissect to particular commit, but it would take quite some time as loading model takes about 20 min. for me (nmap is off for better prompt processing performance).

First Bad Commit

No response

Relevant log output

Logs look like regular ones, just processing and generating

Logs
[52265] slot update_slots: id  2 | task 72394 | n_tokens = 64619, memory_seq_rm [64619, end)
[52265] slot update_slots: id  2 | task 72394 | 8192 tokens since last checkpoint at 54384, creating new checkpoint during processing at position 66666
[52265] slot update_slots: id  2 | task 72394 | prompt processing progress, n_tokens = 66666, batch.n_tokens = 2048, progress = 0.943730
[52265] slot update_slots: id  2 | task 72394 | created context checkpoint 10 of 512 (pos_min = 64618, pos_max = 64618, n_tokens = 64619, size = 186.329 MiB)
[52265] slot update_slots: id  2 | task 72394 | n_tokens = 66666, memory_seq_rm [66666, end)
[52265] slot update_slots: id  2 | task 72394 | prompt processing progress, n_tokens = 68593, batch.n_tokens = 1928, progress = 0.971008
[52265] slot update_slots: id  2 | task 72394 | n_tokens = 68593, memory_seq_rm [68593, end)
[52265] slot update_slots: id  2 | task 72394 | prompt processing progress, n_tokens = 70637, batch.n_tokens = 2045, progress = 0.999943
[52265] slot update_slots: id  2 | task 72394 | created context checkpoint 11 of 512 (pos_min = 68592, pos_max = 68592, n_tokens = 68593, size = 186.329 MiB)
[52265] slot update_slots: id  2 | task 72394 | n_tokens = 70637, memory_seq_rm [70637, end)
[52265] reasoning-budget: activated, budget=2147483647 tokens
[52265] slot init_sampler: id  2 | task 72394 | init sampler, took 6.73 ms, tokens: text = 70641, total = 70641
[52265] slot update_slots: id  2 | task 72394 | prompt processing done, n_tokens = 70641, batch.n_tokens = 5
[52265] slot update_slots: id  2 | task 72394 | created context checkpoint 12 of 512 (pos_min = 70636, pos_max = 70636, n_tokens = 70637, size = 186.329 MiB)
[52265] srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  log_server_r: done request: POST /v1/messages 192.168.0.61 200
srv    operator(): http client error: Connection handling canceled
[52265] srv          stop: cancel task, id_task = 72394
[52265] slot      release: id  2 | task 72394 | stop processing: n_tokens = 74571, truncated = 0
srv    operator(): http client error: Connection handling canceled
[52265] srv          stop: cancel task, id_task = 72393
[52265] slot      release: id  0 | task 72393 | stop processing: n_tokens = 4299, truncated = 0
[52265] srv  update_slots: all slots are idle
^C[52265] srv    operator(): operator(): cleaning up before exit...
srv    operator(): operator(): cleaning up before exit...

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions