Skip to content

Commit 60c3e20

Browse files
TheTomclaude
andcommitted
experiments/rht-k-sweep: screencap polish — % labels, softer "fails" wording,
synthetic-vs-real K distinction Four review tweaks before posting: 1. Δ columns now labeled "Δ MSE %" and "Δ KL %", with values printed as "+141.7%" not "+141.7" (no dangling % char after alignment width). 2. "Application fails" → "Application to this KV-cache setup fails" to avoid sounding like a global claim about the paper. 3. "Real K post-WHT is bounded sub-Gaussian" → "matching the real post-WHT K shape: bounded / sub-Gaussian" to clarify that the test source is synthetic K shaped to match §3 stats, not real K extraction. 4. Drop "branch:" footer (kept local per Tom's call). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 1e3d7a2 commit 60c3e20

1 file changed

Lines changed: 15 additions & 11 deletions

File tree

experiments/rht_k_sweep/screencap.py

Lines changed: 15 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -128,20 +128,24 @@ def dpct(new, old):
128128

129129

130130
print(f"{B}Result{R}")
131-
print(f" {'k_extra':<10}{'post-kurt':<14}{'KS':<10}{'MSE':<14}{'Δ MSE':<10}{'KL mean':<14}{'Δ KL':<10}{'catastrophic':<14}")
132-
print(f" {DIM}{'-'*96}{R}")
133-
print(f" {GREEN}{'0 baseline':<10}{R} {kurt0:<+12.3f}{kspos0:<10.3f}{m0:<14.3e}{'—':<10}{kl0.mean():<14.3e}{'—':<10}{GREEN}{c0:<14.1%}{R}")
134-
print(f" {RED}{'1 +1 RHT':<10}{R} {kurt1:<+12.3f}{kspos1:<10.3f}{m1:<14.3e}{RED}{dpct(m1,m0):<+10.1f}{R}{kl1.mean():<14.3e}{RED}{dpct(kl1.mean(),kl0.mean()):<+10.1f}{R}{RED}{B}{c1:<14.1%}{R}")
135-
print(f" {RED}{'2 +2 RHT':<10}{R} {kurt2:<+12.3f}{kspos2:<10.3f}{m2:<14.3e}{RED}{dpct(m2,m0):<+10.1f}{R}{kl2.mean():<14.3e}{RED}{dpct(kl2.mean(),kl0.mean()):<+10.1f}{R}{RED}{c2:<14.1%}{R}")
131+
print(f" {'k_extra':<10}{'post-kurt':<14}{'KS':<10}{'MSE':<14}{'Δ MSE %':<11}{'KL mean':<14}{'Δ KL %':<11}{'catastrophic':<14}")
132+
print(f" {DIM}{'-'*98}{R}")
133+
def pct(v): # format as e.g. "+141.7%" right-padded
134+
return f"{v:+.1f}%"
135+
136+
print(f" {GREEN}{'0 baseline':<10}{R} {kurt0:<+12.3f}{kspos0:<10.3f}{m0:<14.3e}{'—':<11}{kl0.mean():<14.3e}{'—':<11}{GREEN}{c0:<14.1%}{R}")
137+
print(f" {RED}{'1 +1 RHT':<10}{R} {kurt1:<+12.3f}{kspos1:<10.3f}{m1:<14.3e}{RED}{pct(dpct(m1,m0)):<11}{R}{kl1.mean():<14.3e}{RED}{pct(dpct(kl1.mean(),kl0.mean())):<11}{R}{RED}{B}{c1:<14.1%}{R}")
138+
print(f" {RED}{'2 +2 RHT':<10}{R} {kurt2:<+12.3f}{kspos2:<10.3f}{m2:<14.3e}{RED}{pct(dpct(m2,m0)):<11}{R}{kl2.mean():<14.3e}{RED}{pct(dpct(kl2.mean(),kl0.mean())):<11}{R}{RED}{c2:<14.1%}{R}")
136139
print()
137140
print(f" {B}{RED}catastrophic rate: {c0:.1%}{c1:.1%}{R} {DIM}(per-query KL > 1.10 × baseline median){R}")
138141
print()
139142
print(f"{B}Mechanism{R}")
140-
print(f" Theorem holds: kurt drifts from {kurt0:+.2f} to {kurt1:+.2f} after 1 extra RHT, as proven.")
141-
print(f" Application fails: production turbo4 centroids extend to ±0.174 ≈ ±2σ because")
142-
print(f" real K post-WHT is bounded sub-Gaussian. +RHT Gaussianizes K → mass past ±2σ →")
143-
print(f" saturation at the codebook extreme → 100% catastrophic.")
143+
print(f" Consistent with theorem direction: marginal moves toward Gaussian/URR target")
144+
print(f" (kurt {kurt0:+.2f}{kurt1:+.2f}, KS {kspos0:.3f}{kspos1:.3f}).")
145+
print(f" Application to this KV-cache setup fails: production turbo4 centroids extend to")
146+
print(f" ±0.174 ≈ ±2σ, matching the real post-WHT K shape: bounded / sub-Gaussian.")
147+
print(f" +RHT Gaussianizes the marginal → mass past ±2σ → saturation at the codebook")
148+
print(f" extreme → 100% catastrophic on the attention-softmax KL proxy.")
144149
print()
145-
print(f"{DIM}repro: experiments/rht_k_sweep/screencap.py (pure numpy/scipy, ~5s, no GPU){R}")
146-
print(f"{DIM}branch: experiment/rht-k-sweep on TheTom/llama-cpp-turboquant fork (local){R}")
150+
print(f"{DIM}repro: pure numpy/scipy, ~5s, no GPU. happy to share script.{R}")
147151
print()

0 commit comments

Comments
 (0)