Name and Version
./llama-cli --version
version: 10102 (85e22ea)
built with GNU 13.3.0 for Linux x86_64
Operating systems
Linux
GGML backends
CUDA
Hardware
Environment
- Hardware: NVIDIA GeForce RTX 5090 (32GB VRAM), AMD Ryzen 9 9950X3D 16-Core
- Target Model: Qwen3.6-27B-UD (Q5_K_XL quantization)
- Draft Model: Qwen3.6-27B-DFlash (Q5_K_M quantization)
- CUDA: 13.0 (ARCHS=1200)
- Backend: GGML_CUDA with Flash Attention, TurboQuant cache types
- Context Size: 262144 (256K)
- Threads: 30
Models
Qwen3.6-27B-UD-Q5_K_XL.gguf
Problem description & steps to reproduce
Reproduction
Scenario 1: DFlash only (no reasoning)
./build/bin/llama-server \
-m /path/to/Qwen3.6-27B-UD-Q5_K_XL.gguf \
-ngl 99 \
--ctx-size 262144 \
--threads 30 \
--port 8082 \
--host 127.0.0.1 \
-np 1 \
--flash-attn on \
--jinja \
--metrics \
--cache-type-k q4_0 \
--cache-type-v turbo4 \
-b 2048 \
-ub 512 \
--kv-unified \
--cache-ram 0 \
--no-mmap \
--mlock \
--no-host \
--log-timestamps \
--log-prefix \
--log-colors off \
--temp 0.6 \
--top-k 20 \
--min-p 0.0 \
--spec-draft-model /path/to/Qwen3.6-27B-DFlash-Q5_K_M.gguf \
--spec-type dflash \
--spec-dflash-cross-ctx 1024 \
--spec-draft-ngl all
Scenario 2: DFlash + reasoning
./build/bin/llama-server \
-m /path/to/Qwen3.6-27B-UD-Q5_K_XL.gguf \
-ngl 99 \
--ctx-size 262144 \
--threads 30 \
--port 8082 \
--host 127.0.0.1 \
-np 1 \
--flash-attn on \
--jinja \
--metrics \
--cache-type-k q4_0 \
--cache-type-v turbo4 \
-b 2048 \
-ub 512 \
--kv-unified \
--cache-ram 0 \
--no-mmap \
--mlock \
--no-host \
--log-timestamps \
--log-prefix \
--log-colors off \
--reasoning on \
--chat-template-kwargs '{"preserve_thinking":true}' \
--temp 0.6 \
--top-k 20 \
--min-p 0.0 \
--spec-draft-model /path/to/Qwen3.6-27B-DFlash-Q5_K_M.gguf \
--spec-type dflash \
--spec-dflash-cross-ctx 1024 \
--spec-draft-ngl all
Scenario 3: Reasoning only (no DFlash)
./build/bin/llama-server \
-m /path/to/Qwen3.6-27B-UD-Q5_K_XL.gguf \
-ngl 99 \
--ctx-size 262144 \
--threads 30 \
--port 8082 \
--host 127.0.0.1 \
-np 1 \
--flash-attn on \
--jinja \
--cache-type-k q4_0 \
--cache-type-v turbo4 \
-b 2048 \
-ub 512 \
--kv-unified \
--cache-ram 0 \
--no-mmap \
--mlock \
--no-host \
--log-timestamps \
--log-prefix \
--log-colors off \
--reasoning on \
--chat-template-kwargs '{"preserve_thinking":true}' \
--temp 0.6 \
--top-k 20 \
--min-p 0.0
Scenario 4: No Reasoning & no DFlash
./build/bin/llama-server \
-m /path/to/Qwen3.6-27B-UD-Q5_K_XL.gguf \
-ngl 99 \
--ctx-size 262144 \
--threads 30 \
--port 8082 \
--host 127.0.0.1 \
-np 1 \
--flash-attn on \
--jinja \
--cache-type-k q4_0 \
--cache-type-v turbo4 \
-b 2048 \
-ub 512 \
--kv-unified \
--cache-ram 0 \
--no-mmap \
--mlock \
--no-host \
--log-timestamps \
--log-prefix \
--log-colors off \
--temp 0.6 \
--top-k 20 \
--min-p 0.0
Steps to Reproduce
- Start the server with any of the commands above
- Send a simple chat completion request like "do you work?" withing Cline coding agent
- Observe the response contains repeated slashes instead of meaningful text
Expected Behavior
The server should generate a normal response.
Actual Behavior
The response gets stuck producing repeated slash characters: //////... (or similar patterns like /////...).
First Bad Commit
Regression Information
- Last known working version: v0.2.0
- First version with bug: v0.3.1 (commit
15d22acc8)
- Also tested: 6 commits after v0.3.1 — issue persists
The same model (Qwen3.6-27B-UD) with the same configuration worked without producing slash loops in v0.2.0. The regression was introduced between v0.2.0 and v0.3.1.
Relevant log output
Thinking:
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
Name and Version
./llama-cli --version
version: 10102 (85e22ea)
built with GNU 13.3.0 for Linux x86_64
Operating systems
Linux
GGML backends
CUDA
Hardware
Environment
Models
Qwen3.6-27B-UD-Q5_K_XL.gguf
Problem description & steps to reproduce
Reproduction
Scenario 1: DFlash only (no reasoning)
Scenario 2: DFlash + reasoning
./build/bin/llama-server \ -m /path/to/Qwen3.6-27B-UD-Q5_K_XL.gguf \ -ngl 99 \ --ctx-size 262144 \ --threads 30 \ --port 8082 \ --host 127.0.0.1 \ -np 1 \ --flash-attn on \ --jinja \ --metrics \ --cache-type-k q4_0 \ --cache-type-v turbo4 \ -b 2048 \ -ub 512 \ --kv-unified \ --cache-ram 0 \ --no-mmap \ --mlock \ --no-host \ --log-timestamps \ --log-prefix \ --log-colors off \ --reasoning on \ --chat-template-kwargs '{"preserve_thinking":true}' \ --temp 0.6 \ --top-k 20 \ --min-p 0.0 \ --spec-draft-model /path/to/Qwen3.6-27B-DFlash-Q5_K_M.gguf \ --spec-type dflash \ --spec-dflash-cross-ctx 1024 \ --spec-draft-ngl allScenario 3: Reasoning only (no DFlash)
./build/bin/llama-server \ -m /path/to/Qwen3.6-27B-UD-Q5_K_XL.gguf \ -ngl 99 \ --ctx-size 262144 \ --threads 30 \ --port 8082 \ --host 127.0.0.1 \ -np 1 \ --flash-attn on \ --jinja \ --cache-type-k q4_0 \ --cache-type-v turbo4 \ -b 2048 \ -ub 512 \ --kv-unified \ --cache-ram 0 \ --no-mmap \ --mlock \ --no-host \ --log-timestamps \ --log-prefix \ --log-colors off \ --reasoning on \ --chat-template-kwargs '{"preserve_thinking":true}' \ --temp 0.6 \ --top-k 20 \ --min-p 0.0Scenario 4: No Reasoning & no DFlash
Steps to Reproduce
Expected Behavior
The server should generate a normal response.
Actual Behavior
The response gets stuck producing repeated slash characters:
//////...(or similar patterns like/////...).First Bad Commit
Regression Information
15d22acc8)The same model (Qwen3.6-27B-UD) with the same configuration worked without producing slash loops in v0.2.0. The regression was introduced between v0.2.0 and v0.3.1.
Relevant log output
Thinking:
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////