Commit 88ed094
★★★ state: R24 breakthrough — drift is MoE×DeltaNet, NOT DeltaNet alone
Qwen3.5-4B (DeltaNet + dense FFN, no MoE) on the EXACT 35B drift-trigger
prompt "Once upon a time in a faraway land" -n 200 T=0:
→ 200 coherent tokens about Lily the explorer, Wizard Wigglesworth,
math puzzles (5×3=15), multiple story beats, NO repetition loop.
35B (DeltaNet + MoE 256-expert K=8) on the same prompt:
→ 117 tokens → "It could do math! It could do math!" loop.
All prior rounds R16-R19 assumed DeltaNet state was the sole drift cause.
WRONG. DeltaNet works fine without MoE. The 117-tok cliff emerges from
the *interaction* — DeltaNet carries the "math math" semantic state, MoE
top-K routing locks onto experts that amplify it, positive feedback loop.
Memory task #192 (MoE router softmax sanity at long positions) now the
leading investigation. Next: instrument top-K entropy + expert histogram
at positions 50/100/115/120 on the 35B drift prompt.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 61f7ac0 commit 88ed094
1 file changed
Lines changed: 30 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
6 | 36 | | |
7 | 37 | | |
8 | 38 | | |
| |||
0 commit comments