Skip to content

Commit 3461509

Browse files
unamedkrclaude
andcommitted
state: R36 final consolidation — 5 meta-insights + 35-round commit ledger
Puts a permanent "META-INSIGHTS" section at the top of state.md so the 5 durable patterns from this 2026-04-21→22 session are inherited by future sessions: 1. Reference > introspection (BPE and MoE fixes came from comparison, not ablation) 2. Narrow bug / wide instrumentation (7 new TQ_*_PROBE envs paid off) 3. Architecture-scoped auto-defaults > global envs 4. Null results advance (R16-R19 enabled R24's 4B-vs-35B reveal) 5. 5-line fix has 25-round prelude — invest in localization infra Also records the R33→R34 meta-anti-pattern (my probe had a silent chunking bug — refparity's methodology is only as good as matching production's plumbing exactly). Session's 34 code commits enumerated as a ledger. 8 permanent diagnostic env vars documented in one table for handoff. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 600d49e commit 3461509

1 file changed

Lines changed: 41 additions & 2 deletions

File tree

.claude/state.md

Lines changed: 41 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,46 @@
11
# quant.cpp — Session State
22

3-
**Last updated**: 2026-04-21 (Phase 1 refparity ★)
4-
**Session HEAD**: Reference-parity framework (tools/refparity/) LANDED — HF vs engine per-layer diff, pos-aligned, post_norm-aware.
3+
**Last updated**: 2026-04-22 (Phase 2 KV clean-bill)
4+
**Session HEAD**: turbo_kv_4b per-arch per-layer clean-bill LANDED via chunked TQ_KV_PROBE. 7×/+0% PPL claim now validated element-by-element across Llama, Qwen3-0.6B, Qwen3.5-4B, Qwen3.6-35B.
5+
6+
## ★ META-INSIGHTS (distilled from 2026-04-21→22 35-round session, keep for future sessions) ★
7+
8+
### Five durable patterns
9+
10+
1. **Reference comparison > symptom introspection.** Rounds that compared our engine vs HF / llama.cpp found a 5-line fix in 2-3 rounds each (BPE UTF-8, MoE 117-tok cliff). Rounds that only did internal ablation without a reference (R16-R19 DeltaNet state bisection) produced null results and wrong hypotheses. **Always establish a reference first; ablate second.**
11+
12+
2. **Narrow bug requires wide instrumentation.** Every new `TQ_*_PROBE` env paid for itself many times. This session added 7 (`TQ_DUMP_INTERMEDIATE`, `TQ_DELTA_PROBE`, `TQ_DELTA_RESET_EVERY`, `TQ_DELTA_RESET_LAYER`, `TQ_MOE_PROBE`, `TQ_MOE_ROUTE_TEMP`, `TQ_KV_PROBE`). Reactive instrumentation is a tax; proactive instrumentation compounds.
13+
14+
3. **Architecture-scoped auto-defaults > global envs.** Users shouldn't need to know flags. When a per-arch correction ships, bake it into `tools/quant.c` auto-detect block (qwen35moe → auto-serial, auto-moe-temp; MoE+Q8_0 → auto-skip-Q4). Users get the fix by upgrading.
15+
16+
4. **Null results advance us.** R16-R19 "DeltaNet alone isn't the cause" WAS necessary before R24 could propose "compare 4B hybrid". Document null results with the same discipline as fixes — the failure eliminates hypotheses.
17+
18+
5. **5-line fix has a 25-round prelude.** Both breakthroughs (BPE, MoE temp) are 5-line C changes. The time was entirely in LOCALIZATION. Invest in instrumentation infrastructure; the fix will be trivial once you know where.
19+
20+
### Meta-anti-pattern — the R33→R34 irony
21+
22+
Refparity's strength is comparing the **same code path** vs reference. In R33 I reported "hybrid arch KV probe produces NaN → production bug unclear". R34 found: my probe code was **clamping head_dim > TQ_BK without chunking**, while production chunks correctly. So the probe measured a different path than production → false positive. **Diagnostic tool's plumbing (chunking, buffer sizes, strides) must match production's plumbing**, not just the primary call. A silent bug in the diagnostic tool is as misleading as the bugs it tries to catch.
23+
24+
### Permanent diagnostic envs added this session (keep)
25+
26+
| env | scope | purpose |
27+
|---|---|---|
28+
| `TQ_DUMP_INTERMEDIATE` | refparity | per-layer sub-stage dumps (h_in/postattn/preffn/ffnout) |
29+
| `TQ_DELTA_PROBE` | DeltaNet | per-layer state L2 norm at listed call counts |
30+
| `TQ_DELTA_RESET_EVERY` | DeltaNet | periodic recurrent state reset (ablation) |
31+
| `TQ_DELTA_RESET_LAYER` | DeltaNet | restrict reset to one layer |
32+
| `TQ_MOE_PROBE` | MoE | top-K expert IDs + routing weights per layer |
33+
| `TQ_MOE_ROUTE_TEMP` | MoE | softmax temperature on router; **auto-flipped to 2.0 on qwen35moe** |
34+
| `TQ_KV_PROBE` | KV | per-layer K roundtrip cosine sim + MSE at sampled positions |
35+
| `TQ_NO_MOE_TEMP_AUTO` | escape | disable the qwen35moe auto-temp flip |
36+
37+
### Session commits ledger
38+
39+
`161a218` refparity framework · `5bc50b1` intermediate dumps · `f612c57` FFN drift diag · `6727a74` dtype opt · `6975522` NO_Q4 tradeoff · **`9c53491` BPE decode fix ★** · **`58d3925` BPE encode fix ★★** · `58a9d48` quant.h sync · `34661a8` v0.27.0 release · `972dc78` tokenizer fixtures · `fba7ff9` 35B baseline · `657f203` README v3.20 · `dc4152d` README.ko · `f912c32` regression chain · `2a1d40d` emoji/CJK fixtures · `b061e7d` delta reset ablation · `d1c6057` delta probe · `a05d4e4` a_log null · `65e4a2d` per-layer reset · `18223d8` bpe bench report · `ad30813` tier doc · `61f7ac0` env_vars · `88ed094` 4B-vs-35B insight · `6b362a8` moe probe · **`b212194` MoE temp cliff break ★★★** · `a4d0002` moe bench report · `d2fb852` v0.28.0 docs · `f0e51ab` auto-default temp · `f2f0d8a` auto-default docs · `52e78cb` .gitignore · `4d378f0` KV probe · `4b6019e` KV probe refined · `63e45bb` KV probe chunking fix · `600d49e` KV clean-bill report · this (`R36` consolidation).
40+
41+
---
42+
43+
544

645
## ★ Phase 2 R34 — KV probe chunking fix — turbo_kv_4b CLEAN across all tested arch ★
746

0 commit comments

Comments
 (0)