Skip to content

Commit 17a7791

Browse files
K2.A.1 evidence: KakeyaLattice KL on/off A/B on vast H200 (1k/4k/16k)
NIAH validation A/B (KL OFF baseline vs KL ON D4 Q=38), gemma-3-1b-it, SDPA, N=20/arm, default §11.12 ladder. ctx tokens | v04 recall OFF | v04 recall ON | Δrecall | thr OFF | thr ON ~1k 1428 | 1.000 | 1.000 | +0.0pp | 9.92 | 7.72 ~4k 5598 | 0.350 | 0.300 | -5.0pp | 4.89 | 4.36 ~16k 21475 | 0.600 | 0.600 | +0.0pp | 0.95 | 0.93 Gate (b) [binding]: KL preserves v0.4 recall — 0pp at 1k and 16k; the -5pp at 4k is single-sample granularity at N=20 (7/20 vs 6/20), not a real regression. KL ON throughput < KL OFF as expected: K2.A.1 is the stateless round-trip (compress+decompress per step, no caching); gate (c) throughput is K2.A.2's target. NOTE: this evidence was produced with the CUDA bf16 dtype fix for _round_trip_resident_through_compressor (cast round-tripped K/V back to the resident dtype/device). That fix is on the K2.A.1 branch but is NOT yet on main (PR #83 merged before it) — main's KL-ON path will crash on CUDA bf16 with index_copy_ dtype mismatch until the fix lands. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
1 parent af0fe20 commit 17a7791

6 files changed

Lines changed: 5064 additions & 0 deletions

0 commit comments

Comments
 (0)