Skip to content

Eval bug: Generation gets stuck in slash loop with Qwen3.6-27B-UD #66

@CalogeroZarbo

Description

@CalogeroZarbo

Name and Version

./llama-cli --version
version: 10102 (85e22ea)
built with GNU 13.3.0 for Linux x86_64

Operating systems

Linux

GGML backends

CUDA

Hardware

Environment

  • Hardware: NVIDIA GeForce RTX 5090 (32GB VRAM), AMD Ryzen 9 9950X3D 16-Core
  • Target Model: Qwen3.6-27B-UD (Q5_K_XL quantization)
  • Draft Model: Qwen3.6-27B-DFlash (Q5_K_M quantization)
  • CUDA: 13.0 (ARCHS=1200)
  • Backend: GGML_CUDA with Flash Attention, TurboQuant cache types
  • Context Size: 262144 (256K)
  • Threads: 30

Models

Qwen3.6-27B-UD-Q5_K_XL.gguf

Problem description & steps to reproduce

Reproduction

Scenario 1: DFlash only (no reasoning)

./build/bin/llama-server \
  -m /path/to/Qwen3.6-27B-UD-Q5_K_XL.gguf \
  -ngl 99 \
  --ctx-size 262144 \
  --threads 30 \
  --port 8082 \
  --host 127.0.0.1 \
  -np 1 \
  --flash-attn on \
  --jinja \
  --metrics \
  --cache-type-k q4_0 \
  --cache-type-v turbo4 \
  -b 2048 \
  -ub 512 \
  --kv-unified \
  --cache-ram 0 \
  --no-mmap \
  --mlock \
  --no-host \
  --log-timestamps \
  --log-prefix \
  --log-colors off \
  --temp 0.6 \
  --top-k 20 \
  --min-p 0.0 \
  --spec-draft-model /path/to/Qwen3.6-27B-DFlash-Q5_K_M.gguf \
  --spec-type dflash \
  --spec-dflash-cross-ctx 1024 \
  --spec-draft-ngl all

Scenario 2: DFlash + reasoning

./build/bin/llama-server \
  -m /path/to/Qwen3.6-27B-UD-Q5_K_XL.gguf \
  -ngl 99 \
  --ctx-size 262144 \
  --threads 30 \
  --port 8082 \
  --host 127.0.0.1 \
  -np 1 \
  --flash-attn on \
  --jinja \
  --metrics \
  --cache-type-k q4_0 \
  --cache-type-v turbo4 \
  -b 2048 \
  -ub 512 \
  --kv-unified \
  --cache-ram 0 \
  --no-mmap \
  --mlock \
  --no-host \
  --log-timestamps \
  --log-prefix \
  --log-colors off \
  --reasoning on \
  --chat-template-kwargs '{"preserve_thinking":true}' \
  --temp 0.6 \
  --top-k 20 \
  --min-p 0.0 \
  --spec-draft-model /path/to/Qwen3.6-27B-DFlash-Q5_K_M.gguf \
  --spec-type dflash \
  --spec-dflash-cross-ctx 1024 \
  --spec-draft-ngl all

Scenario 3: Reasoning only (no DFlash)

./build/bin/llama-server \
  -m /path/to/Qwen3.6-27B-UD-Q5_K_XL.gguf \
  -ngl 99 \
  --ctx-size 262144 \
  --threads 30 \
  --port 8082 \
  --host 127.0.0.1 \
  -np 1 \
  --flash-attn on \
  --jinja \
  --cache-type-k q4_0 \
  --cache-type-v turbo4 \
  -b 2048 \
  -ub 512 \
  --kv-unified \
  --cache-ram 0 \
  --no-mmap \
  --mlock \
  --no-host \
  --log-timestamps \
  --log-prefix \
  --log-colors off \
  --reasoning on \
  --chat-template-kwargs '{"preserve_thinking":true}' \
  --temp 0.6 \
  --top-k 20 \
  --min-p 0.0

Scenario 4: No Reasoning & no DFlash

./build/bin/llama-server \
  -m /path/to/Qwen3.6-27B-UD-Q5_K_XL.gguf \
  -ngl 99 \
  --ctx-size 262144 \
  --threads 30 \
  --port 8082 \
  --host 127.0.0.1 \
  -np 1 \
  --flash-attn on \
  --jinja \
  --cache-type-k q4_0 \
  --cache-type-v turbo4 \
  -b 2048 \
  -ub 512 \
  --kv-unified \
  --cache-ram 0 \
  --no-mmap \
  --mlock \
  --no-host \
  --log-timestamps \
  --log-prefix \
  --log-colors off \
  --temp 0.6 \
  --top-k 20 \
  --min-p 0.0

Steps to Reproduce

  1. Start the server with any of the commands above
  2. Send a simple chat completion request like "do you work?" withing Cline coding agent
  3. Observe the response contains repeated slashes instead of meaningful text

Expected Behavior

The server should generate a normal response.

Actual Behavior

The response gets stuck producing repeated slash characters: //////... (or similar patterns like /////...).

First Bad Commit

Regression Information

  • Last known working version: v0.2.0
  • First version with bug: v0.3.1 (commit 15d22acc8)
  • Also tested: 6 commits after v0.3.1 — issue persists

The same model (Qwen3.6-27B-UD) with the same configuration worked without producing slash loops in v0.2.0. The regression was introduced between v0.2.0 and v0.3.1.

Relevant log output

Thinking:
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions