Commit b8291c3
state: R44 — 1000-tok hunt summary, 4B vs 35B 5x coherent-window gap measured
R43 shared-expert Q4-skip fix: +20% on short prompts (117→204 tok).
Real landed quality gain.
R44 measurement with same 63-tok prefill prompt:
Qwen3.5-4B dense hybrid: 347 coherent gen tokens (+ 63 prefill = 410 total)
Qwen3.6-35B MoE hybrid: 65 coherent gen tokens (+ 63 prefill = 128 total)
4B is 5× better. Both have DeltaNet; only 35B has MoE. The remaining
degradation is MoE-internal and not fixed by: KV cache quant off,
rep-penalty, k-window, router temp sweep (1.0-2.5 all fail differently),
TQ_NO_Q4, or any easily-patchable path we've audited.
Still-unvetted candidates: routed-expert runtime dispatch accumulator,
MoE output aggregation order (8 experts × 40 layers × 500 tokens
summation drift), DeltaNet a_log in Q4_K form, LM head Q8_0 matmul
at long positions.
1000-tok coherent NOT achieved this session. But concrete direction:
need to run llama.cpp on same 35B+prompt to establish absolute
achievable ceiling, then surgically target the residual MoE precision
gap. DRY sampler as external safety net.
Methodology: user's Q4-commonness pushback was the key to R43's fix.
Same approach will be needed to find the remaining MoE bug — compare
directly vs reference, not speculate from symptoms.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent d0508a8 commit b8291c3
1 file changed
Lines changed: 75 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
6 | 81 | | |
7 | 82 | | |
8 | 83 | | |
| |||
0 commit comments