Eval bug: Generation gets stuck in slash loop with Qwen3.6-27B-UD

### Name and Version

./llama-cli --version
version: 10102 (85e22ea0b)
built with GNU 13.3.0 for Linux x86_64


### Operating systems

Linux

### GGML backends

CUDA

### Hardware

## Environment

- **Hardware:** NVIDIA GeForce RTX 5090 (32GB VRAM), AMD Ryzen 9 9950X3D 16-Core
- **Target Model:** Qwen3.6-27B-UD (Q5_K_XL quantization)
- **Draft Model:** Qwen3.6-27B-DFlash (Q5_K_M quantization)
- **CUDA:** 13.0 (ARCHS=1200)
- **Backend:** GGML_CUDA with Flash Attention, TurboQuant cache types
- **Context Size:** 262144 (256K)
- **Threads:** 30

### Models

Qwen3.6-27B-UD-Q5_K_XL.gguf

### Problem description & steps to reproduce

## Reproduction

### Scenario 1: DFlash only (no reasoning)
```bash
./build/bin/llama-server \
  -m /path/to/Qwen3.6-27B-UD-Q5_K_XL.gguf \
  -ngl 99 \
  --ctx-size 262144 \
  --threads 30 \
  --port 8082 \
  --host 127.0.0.1 \
  -np 1 \
  --flash-attn on \
  --jinja \
  --metrics \
  --cache-type-k q4_0 \
  --cache-type-v turbo4 \
  -b 2048 \
  -ub 512 \
  --kv-unified \
  --cache-ram 0 \
  --no-mmap \
  --mlock \
  --no-host \
  --log-timestamps \
  --log-prefix \
  --log-colors off \
  --temp 0.6 \
  --top-k 20 \
  --min-p 0.0 \
  --spec-draft-model /path/to/Qwen3.6-27B-DFlash-Q5_K_M.gguf \
  --spec-type dflash \
  --spec-dflash-cross-ctx 1024 \
  --spec-draft-ngl all
```

### Scenario 2: DFlash + reasoning
```bash
./build/bin/llama-server \
  -m /path/to/Qwen3.6-27B-UD-Q5_K_XL.gguf \
  -ngl 99 \
  --ctx-size 262144 \
  --threads 30 \
  --port 8082 \
  --host 127.0.0.1 \
  -np 1 \
  --flash-attn on \
  --jinja \
  --metrics \
  --cache-type-k q4_0 \
  --cache-type-v turbo4 \
  -b 2048 \
  -ub 512 \
  --kv-unified \
  --cache-ram 0 \
  --no-mmap \
  --mlock \
  --no-host \
  --log-timestamps \
  --log-prefix \
  --log-colors off \
  --reasoning on \
  --chat-template-kwargs '{"preserve_thinking":true}' \
  --temp 0.6 \
  --top-k 20 \
  --min-p 0.0 \
  --spec-draft-model /path/to/Qwen3.6-27B-DFlash-Q5_K_M.gguf \
  --spec-type dflash \
  --spec-dflash-cross-ctx 1024 \
  --spec-draft-ngl all
```

### Scenario 3: Reasoning only (no DFlash)
```bash
./build/bin/llama-server \
  -m /path/to/Qwen3.6-27B-UD-Q5_K_XL.gguf \
  -ngl 99 \
  --ctx-size 262144 \
  --threads 30 \
  --port 8082 \
  --host 127.0.0.1 \
  -np 1 \
  --flash-attn on \
  --jinja \
  --cache-type-k q4_0 \
  --cache-type-v turbo4 \
  -b 2048 \
  -ub 512 \
  --kv-unified \
  --cache-ram 0 \
  --no-mmap \
  --mlock \
  --no-host \
  --log-timestamps \
  --log-prefix \
  --log-colors off \
  --reasoning on \
  --chat-template-kwargs '{"preserve_thinking":true}' \
  --temp 0.6 \
  --top-k 20 \
  --min-p 0.0
```

### Scenario 4: No Reasoning & no DFlash
```bash
./build/bin/llama-server \
  -m /path/to/Qwen3.6-27B-UD-Q5_K_XL.gguf \
  -ngl 99 \
  --ctx-size 262144 \
  --threads 30 \
  --port 8082 \
  --host 127.0.0.1 \
  -np 1 \
  --flash-attn on \
  --jinja \
  --cache-type-k q4_0 \
  --cache-type-v turbo4 \
  -b 2048 \
  -ub 512 \
  --kv-unified \
  --cache-ram 0 \
  --no-mmap \
  --mlock \
  --no-host \
  --log-timestamps \
  --log-prefix \
  --log-colors off \
  --temp 0.6 \
  --top-k 20 \
  --min-p 0.0
```

### Steps to Reproduce
1. Start the server with any of the commands above
2. Send a simple chat completion request like "do you work?" withing Cline coding agent
3. Observe the response contains repeated slashes instead of meaningful text

### Expected Behavior
The server should generate a normal response.

### Actual Behavior
The response gets stuck producing repeated slash characters: `//////...` (or similar patterns like `/////...`).

### First Bad Commit

## Regression Information

- **Last known working version:** v0.2.0
- **First version with bug:** v0.3.1 (commit `15d22acc8`)
- **Also tested:** 6 commits after v0.3.1 — issue persists

The same model (Qwen3.6-27B-UD) with the same configuration worked without producing slash loops in v0.2.0. The regression was introduced between v0.2.0 and v0.3.1.


### Relevant log output

Thinking:
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: Generation gets stuck in slash loop with Qwen3.6-27B-UD #66

Name and Version

Operating systems

GGML backends

Hardware

Environment

Models

Problem description & steps to reproduce

Reproduction

Scenario 1: DFlash only (no reasoning)

Scenario 2: DFlash + reasoning

Scenario 3: Reasoning only (no DFlash)

Scenario 4: No Reasoning & no DFlash

Steps to Reproduce

Expected Behavior

Actual Behavior

First Bad Commit

Regression Information

Relevant log output

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Eval bug: Generation gets stuck in slash loop with Qwen3.6-27B-UD #66

Description

Name and Version

Operating systems

GGML backends

Hardware

Environment

Models

Problem description & steps to reproduce

Reproduction

Scenario 1: DFlash only (no reasoning)

Scenario 2: DFlash + reasoning

Scenario 3: Reasoning only (no DFlash)

Scenario 4: No Reasoning & no DFlash

Steps to Reproduce

Expected Behavior

Actual Behavior

First Bad Commit

Regression Information

Relevant log output

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions