Skip to content

Commit 475799a

Browse files
unamedkrclaude
andcommitted
fix(tokenizer): respect add_bos_token=false for Qwen3.6 (R7 regression fix)
The R1 BOS fix (commit 12e4d94) force-enabled BOS for Qwen3.6 family via <|endoftext|> presence detection in vocab. This ignored the GGUF metadata flag tokenizer.ggml.add_bos_token=false (set on both Qwen3.6-27B and 35B-A3B) and broke chat-mode generation: 35B-A3B IQ4_XS quantum prompt regressed deterministically from 149 tok EOS to 94 tok rep loop. Bisect (2026-04-26): baseline 0829285 → 149 EOS R1 12e4d94 → 94 rep ← regression starts here HEAD c378f81 → 94 rep + this fix → 149 EOS ← restored Root cause: Qwen3.6 chat template is self-contained (<|im_start|>user\n…<|im_start|>assistant\n) and prepending BOS breaks coherent generation. Verified via direct GGUF metadata read: both 35B-A3B-IQ4_XS and 27B-Q4_K_M declare add_bos_token=false. Fix: drop the auto-enable path; the qwen36_bos_override fallback that follows now only fires when add_bos was set by an earlier explicit path (e.g. the future -bos CLI flag). Tier benchmark doc updated: 35B-A3B IQ4_XS row reverts to Tier 2 (149 EOS quantum) post-R7. SmolLM2-135M poem rep loop verified to exist on baseline too — outdated 4-25 measurement, current behavior is the true value. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent c378f81 commit 475799a

2 files changed

Lines changed: 15 additions & 22 deletions

File tree

docs/tier_benchmark_2026_04_25.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -26,16 +26,16 @@ Standardized coherent-length measurement across 5 models, 3 prompts each. Run vi
2626
| Gemma-4-e4b-it Q4_0 | 299 / 82 / 19 (3/3 EOS) | 299 EOS / 82 EOS / 19 EOS | 1 | = |
2727
| Phi-3.5-mini Q4_K_M | 299 / 299 / 299 (3/3 -n) | 299 -n / 299 -n / 299 -n | 1 | = |
2828
| Phi-3.5-mini Q8_0 | 299 / 299 / EOS (3/3 OK) | 299 -n / 299 -n / 299 -n | 1 | = |
29-
| **Qwen3.6-35B-A3B IQ4_XS** | 149 EOS / 73 rep / 51 rep | **94 rep / 76 rep / 60 rep** | **3** | ↓1 |
29+
| **Qwen3.6-35B-A3B IQ4_XS** | 149 EOS / 73 rep / 51 rep | 149 EOS / 76 rep / 60 rep (post-R7) | **2** | = |
3030
| **Qwen3.6-35B-A3B Q5_K_M** | 169 EOS / 68 rep / 69 rep | **24 EOS / 225 rep / 46 EOS** | **2** | = |
3131
| **Qwen3.6-27B Q4_K_M** | not measurable on 16 GB Mac (R2) | not measurable (R2) | **3** | n/a |
3232
| Qwen3.6-27B-TQ2_0 (R5/R6) | engine path verified (paging-cliff cleared) but quality is requantize-artifact garbage | requantize-from-Q4 or Q8 both garbled | **n/a (engine-only)** | new |
3333

34-
**Summary of post-R1–R6 changes**:
35-
- **Qwen3.5-4B trivia +217%** (66 → 209 tok natural EOS) — direct R1 BOS-fix benefit, since Qwen3.5 shares the Qwen3.6 tokenizer's `<|endoftext|>` BOS path.
36-
- **Qwen3.6-35B-A3B IQ4_XS Tier 2 → 3** — single-run regression (149 EOS quantum → 94 rep). Likely measurement noise (35B-A3B has known ±20-40 tok variance per `feedback_multithread_variance.md`); needs `-j 1` deterministic re-run with multiple seeds to confirm. Marked tier 3 conservatively pending re-test.
37-
- **SmolLM2-135M Tier 1 → 2** — poem regressed 108 EOS → 241 rep loop. Possible noise on a 135M model at -T 0.
38-
- **All other 11 Tier 1 models unchanged** — R1 BOS fix and R3/R5 IQ-impl additions did not break any prior-passing model.
34+
**Summary of post-R1–R6 changes** (and R7 follow-up regression-fix):
35+
- **Qwen3.5-4B trivia +217%** (66 → 209 tok natural EOS) — direct R1 BOS-fix benefit, since Qwen3.5 shares the Qwen3.6 tokenizer family.
36+
- **R7 regression bisect (2026-04-26)**: deterministic 35B-A3B IQ4_XS regression (149 EOS quantum → 94 rep loop) was bisected to commit `12e4d94` (R1 BOS fix). Root cause: GGUF metadata declares `tokenizer.ggml.add_bos_token=false` for both Qwen3.6-27B and 35B-A3B; R1 force-enabled BOS via `<|endoftext|>` presence detection regardless of the metadata flag. Chat template is self-contained — prepending BOS broke generation. **R7 fix removes the auto-enable path; 35B-A3B IQ4_XS quantum restored to 149 tok EOS (Tier 2 confirmed).**
37+
- **SmolLM2-135M poem rep loop is a measurement-only artifact**: re-running on the `0829285` baseline tokenizer produces the *same* 241 rep loop, so the original 4-25 doc value (108 EOS) is the outlier. The 4-26 column reflects current behavior; SmolLM2-135M is genuinely Tier 2 on this prompt under both pre-R1 and post-R7 codebases.
38+
- **All other 11 Tier 1 models unchanged** — R1 BOS fix (post-R7), R3 IQ2_XS impl, and R5 TQ2_0 impl did not break any prior-passing model.
3939

4040
**Key observations:**
4141

src/engine/tq_generate.c

Lines changed: 9 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -356,22 +356,15 @@ int tq_generate(tq_model_t* model, tq_tokenizer_t* tokenizer,
356356
}
357357
if (bos_id >= 0) add_bos = 1;
358358
}
359-
/* Qwen3.6 family (27B dense, 35B-A3B): GGUF metadata sets
360-
* BOS=<|endoftext|> id 248044. tokenizer.ggml.add_bos_token=false
361-
* but llama-cli adds BOS by default in main, and our basin_compat
362-
* measurements showed missing BOS causes 100× outlier divergence
363-
* at L0 (tokenization mismatch with reference). Detect by
364-
* presence of <|endoftext|> in vocab. */
365-
if (!add_bos) {
366-
/* <|endoftext|> for Qwen3.6 lives in 248040-248050 range (vocab=248320) */
367-
int lo = 248040, hi = 248060;
368-
if (hi > tokenizer->vocab_size) hi = tokenizer->vocab_size;
369-
for (int i = lo; i < hi; i++) {
370-
if (tokenizer->vocab[i] && strcmp(tokenizer->vocab[i], "<|endoftext|>") == 0) {
371-
add_bos = 1; break;
372-
}
373-
}
374-
}
359+
/* Qwen3.6 family note: GGUF metadata declares
360+
* tokenizer.ggml.add_bos_token=false for both 27B and 35B-A3B.
361+
* The chat template is self-contained and prepending BOS breaks
362+
* coherent generation. Earlier R1 code force-enabled BOS via
363+
* <|endoftext|> presence detection; that caused a deterministic
364+
* Qwen3.6-35B-A3B IQ4_XS regression (149 EOS quantum → 94 rep
365+
* loop, bisected to 12e4d94, fixed 2026-04-26). Do not
366+
* auto-enable BOS for this family. The qwen36_bos_override
367+
* below only fires if add_bos was set by an earlier path. */
375368
}
376369
/* Qwen3.6 BOS-id fix: tq_encode str_lookup chain checks <|im_start|>
377370
* before <|endoftext|>, picking id 248045 instead of correct 248044

0 commit comments

Comments
 (0)