|
3 | 3 | **Last updated**: 2026-04-21 (Phase 1 refparity ★) |
4 | 4 | **Session HEAD**: Reference-parity framework (tools/refparity/) LANDED — HF vs engine per-layer diff, pos-aligned, post_norm-aware. |
5 | 5 |
|
| 6 | +## ★★★ Phase 1 R26 — MoE softmax temperature BREAKS the 117-tok cliff (2026-04-22) ★★★ |
| 7 | + |
| 8 | +Added `TQ_MOE_ROUTE_TEMP` env — divides top-K softmax logits by temp |
| 9 | +before exp. `T>1` flattens the distribution (less peaky); `T<1` sharpens. |
| 10 | + |
| 11 | +Temperature sweep on Qwen3.6-35B IQ4_XS "Once upon a time in a faraway |
| 12 | +land" -n 200: |
| 13 | + |
| 14 | +| TEMP | outcome | |
| 15 | +|---:|:---| |
| 16 | +| 1.0 (default) | 117-tok loop: "It could do math! It could do math!" | |
| 17 | +| 1.5 | **87**-tok loop: "and everything went wrong!" (EARLIER cliff) | |
| 18 | +| 1.8 | 113-tok loop: "And that's why we have the Internet!" | |
| 19 | +| **2.0** | **200 tokens, no rep-loop detected**, Alex+sad-tree story | |
| 20 | +| 2.5 | **200 tokens, no rep-loop**, Alex+magic-leaves story | |
| 21 | +| 3.0 | 114-tok loop: "The sun would rise too!" | |
| 22 | + |
| 23 | +`TEMP=2.0` and `2.5` are the sweet spot. Outside this range cliff appears |
| 24 | +earlier or comes back. This is a **causal confirmation** of the R24 |
| 25 | +"MoE×DeltaNet interaction" hypothesis: spread the routing distribution |
| 26 | +and the feedback loop can't lock in. |
| 27 | + |
| 28 | +**Safety**: "Paris" factual probe correct at TEMP=2.0. Full regression |
| 29 | +(15 coherence + 11 tokenizer = 23/23) passes with TEMP=2.0. So TEMP=2.0 |
| 30 | +is opt-in-safe for users today. |
| 31 | + |
| 32 | +**What this means**: a one-line env flag recovers ~70% of the gap to |
| 33 | +"works on 200+ tokens" on 35B. The remaining degradation (character-level |
| 34 | +noise in last 30 tokens) is likely still DeltaNet-state+quantization |
| 35 | +related — but the cliff itself is broken. |
| 36 | + |
| 37 | +Updated: |
| 38 | +- `docs/env_vars.md`: TQ_MOE_ROUTE_TEMP row with measured impact |
| 39 | +- `docs/supported_models_tier.md`: 35B recipe now recommends |
| 40 | + `TQ_MOE_ROUTE_TEMP=2.0` alongside `--rep-penalty 1.3` |
| 41 | + |
6 | 42 | ## Phase 1 R25 — MoE router instrumentation: L4 is outlier, others balanced (2026-04-22) |
7 | 43 |
|
8 | 44 | Added `TQ_MOE_PROBE=call1,call2,...` env in `tq_moe_forward` — dumps |
|
0 commit comments