Pillar 1 R3 single-line fix to src/engine/tq_tokenizer.c:1442 eliminates
the structural tokenization bug that caused every Qwen3 family coherence
issue tracked across Rounds 26-50.
/* tq_tokenizer.c heap-based BPE merge loop */
if (top.gen != gen[top.pos]) continue;
+ if (tokens[top.pos] < 0) continue; // ★ missing dead-slot check
int ri = next[top.pos];
if (ri >= n_tokens || tokens[ri] < 0) continue;Why it's the root cause: When a position dies as the RIGHT
neighbor of a merge, tokens[P] = -1 but gen[P] is not bumped.
Stale heap entries at position P pass the gen check, the code then
overwrites tokens[P] with a new merge result, resurrecting a dead
linked-list node and scrambling subsequent tokens.
| Tokens | Decoded | |
|---|---|---|
| HF reference (ground truth) | [9707] |
"Hello" |
| Our engine BEFORE R3 | [32713, 654] |
"Helll" (5 chars: H,e,l,l,l — 'o' replaced) |
| Our engine AFTER R3 | [9707] |
"Hello" ✓ |
Same 40-word prompt: "Write a Python function that computes the nth Fibonacci number using iterative dynamic programming. It should handle edge cases including negative numbers, zero, and very large inputs."
| Output | |
|---|---|
| BEFORE R3 | UTF-8 garbage ("� Would you like to..."), or 5-token fragment then EOS |
| AFTER R3 | Coherent Python code:def fibonacci(n): """Return the nth Fibonacci number.""" if n < 0: raise ValueError("n must be non-negative") |
Same 50-word prompt: "Once upon a time in a small village there lived a clever young programmer named Luna who was known throughout the kingdom for her extraordinary ability..."
| Output | |
|---|---|
| BEFORE R3 | Char-doubling garbage ("quicck bbrrown") |
| AFTER R3 | Full narrative: "The idea intrigued him so much that he decided to create his very own version of this classic game. He called it 'Hamster Run'..." |
| Model | Prompt | Before | After |
|---|---|---|---|
| Qwen3-0.6B | "Hello" | "p('��..." garbage |
"Hello" token |
| Qwen3.6-35B IQ4_XS | 40+ word code | garbage | perfect Python |
| Qwen3.6-35B Q5_K_M | factual | drift ≥25 tok | clean EOS |
| Phi-3.5 Q4 | "What is 2+2?" | "I'm sorry but 'tti'..." | "The sum of 2 and 2 is equal to four." |
| Phi-3.5 Q8 | same | same garbage | same fix |
| Llama-3.2-3B | long story | PASS | PASS (unaffected — different tokenizer quirk) |
scripts/test_models.sh: 15/15 PASS after fix + expected-string update
(Phi-3.5 "answer" → "sum" because model now gives actual math).
- Pillar 1 R1: Python HF reference env (Qwen3-0.6B FP32, torch 2.11)
- Pillar 1 R2: HF per-layer hidden state dump tool
- Pillar 1 R3: Token-level comparison revealed the bug before any hidden-state diff was needed.
The previous 30+ rounds (R26-R50) had assumed the tokenizer was
correct (per R32 Mission C note "drift is Qwen-common, not tokenizer").
Reference diff methodology made the token mismatch undeniable in one
print(t.encode("Hello")) call.
Compare tokens first, then hidden states, then layer outputs. Don't "rule out" a suspect without actually comparing to ground truth.
- Pillar 1 complete
- Pillar 2 (prefill speed for long docs) now unblocked
- Pillar 3 (document Q&A / code review / agent workflows) now possible
- v0.19.0 release with this as headline feature