experiments/rht-k-sweep: screencap polish — % labels, softer "fails" wording,

TheTom · claude · TheTom · commit 60c3e20300a6 · 2026-05-13T19:13:09.000-05:00
synthetic-vs-real K distinction

Four review tweaks before posting:

1. Δ columns now labeled "Δ MSE %" and "Δ KL %", with values printed as
   "+141.7%" not "+141.7" (no dangling % char after alignment width).
2. "Application fails" → "Application to this KV-cache setup fails" to avoid
   sounding like a global claim about the paper.
3. "Real K post-WHT is bounded sub-Gaussian" → "matching the real post-WHT K
   shape: bounded / sub-Gaussian" to clarify that the test source is synthetic
   K shaped to match §3 stats, not real K extraction.
4. Drop "branch:" footer (kept local per Tom's call).

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/experiments/rht_k_sweep/screencap.py b/experiments/rht_k_sweep/screencap.py
@@ -128,20 +128,24 @@ def dpct(new, old):
 
 
 print(f"{B}Result{R}")
-print(f"  {'k_extra':<10}{'post-kurt':<14}{'KS':<10}{'MSE':<14}{'Δ MSE':<10}{'KL mean':<14}{'Δ KL':<10}{'catastrophic':<14}")
-print(f"  {DIM}{'-'*96}{R}")
-print(f"  {GREEN}{'0  baseline':<10}{R}  {kurt0:<+12.3f}{kspos0:<10.3f}{m0:<14.3e}{'—':<10}{kl0.mean():<14.3e}{'—':<10}{GREEN}{c0:<14.1%}{R}")
-print(f"  {RED}{'1  +1 RHT':<10}{R}  {kurt1:<+12.3f}{kspos1:<10.3f}{m1:<14.3e}{RED}{dpct(m1,m0):<+10.1f}{R}{kl1.mean():<14.3e}{RED}{dpct(kl1.mean(),kl0.mean()):<+10.1f}{R}{RED}{B}{c1:<14.1%}{R}")
-print(f"  {RED}{'2  +2 RHT':<10}{R}  {kurt2:<+12.3f}{kspos2:<10.3f}{m2:<14.3e}{RED}{dpct(m2,m0):<+10.1f}{R}{kl2.mean():<14.3e}{RED}{dpct(kl2.mean(),kl0.mean()):<+10.1f}{R}{RED}{c2:<14.1%}{R}")
+print(f"  {'k_extra':<10}{'post-kurt':<14}{'KS':<10}{'MSE':<14}{'Δ MSE %':<11}{'KL mean':<14}{'Δ KL %':<11}{'catastrophic':<14}")
+print(f"  {DIM}{'-'*98}{R}")
+def pct(v):  # format as e.g. "+141.7%" right-padded
+    return f"{v:+.1f}%"
+
+print(f"  {GREEN}{'0  baseline':<10}{R}  {kurt0:<+12.3f}{kspos0:<10.3f}{m0:<14.3e}{'—':<11}{kl0.mean():<14.3e}{'—':<11}{GREEN}{c0:<14.1%}{R}")
+print(f"  {RED}{'1  +1 RHT':<10}{R}  {kurt1:<+12.3f}{kspos1:<10.3f}{m1:<14.3e}{RED}{pct(dpct(m1,m0)):<11}{R}{kl1.mean():<14.3e}{RED}{pct(dpct(kl1.mean(),kl0.mean())):<11}{R}{RED}{B}{c1:<14.1%}{R}")
+print(f"  {RED}{'2  +2 RHT':<10}{R}  {kurt2:<+12.3f}{kspos2:<10.3f}{m2:<14.3e}{RED}{pct(dpct(m2,m0)):<11}{R}{kl2.mean():<14.3e}{RED}{pct(dpct(kl2.mean(),kl0.mean())):<11}{R}{RED}{c2:<14.1%}{R}")
 print()
 print(f"  {B}{RED}catastrophic rate: {c0:.1%}  →  {c1:.1%}{R}  {DIM}(per-query KL > 1.10 × baseline median){R}")
 print()
 print(f"{B}Mechanism{R}")
-print(f"  Theorem holds: kurt drifts from {kurt0:+.2f} to {kurt1:+.2f} after 1 extra RHT, as proven.")
-print(f"  Application fails: production turbo4 centroids extend to ±0.174 ≈ ±2σ because")
-print(f"  real K post-WHT is bounded sub-Gaussian. +RHT Gaussianizes K → mass past ±2σ →")
-print(f"  saturation at the codebook extreme → 100% catastrophic.")
+print(f"  Consistent with theorem direction: marginal moves toward Gaussian/URR target")
+print(f"  (kurt {kurt0:+.2f} → {kurt1:+.2f}, KS {kspos0:.3f} → {kspos1:.3f}).")
+print(f"  Application to this KV-cache setup fails: production turbo4 centroids extend to")
+print(f"  ±0.174 ≈ ±2σ, matching the real post-WHT K shape: bounded / sub-Gaussian.")
+print(f"  +RHT Gaussianizes the marginal → mass past ±2σ → saturation at the codebook")
+print(f"  extreme → 100% catastrophic on the attention-softmax KL proxy.")
 print()
-print(f"{DIM}repro: experiments/rht_k_sweep/screencap.py  (pure numpy/scipy, ~5s, no GPU){R}")
-print(f"{DIM}branch: experiment/rht-k-sweep on TheTom/llama-cpp-turboquant fork (local){R}")
+print(f"{DIM}repro: pure numpy/scipy, ~5s, no GPU. happy to share script.{R}")
 print()