Skip to content

Latest commit

 

History

History
173 lines (124 loc) · 65 KB

File metadata and controls

173 lines (124 loc) · 65 KB

Shannon-Prime — master claims ledger

Single source of truth for the whole series. Rule: nothing appears in any paper, README, post, or talk unless it is a row here, with its scope attached. Rows are tagged by paper. See METHODOLOGY.md for the gates.

Standing caveat: proof-of-mechanism on small models, one dev host. The mechanisms work and are bit-faithful/gated; they are not scale-validated, multi-model, or independently reproduced. Say exactly that, everywhere.

Paper 01 — two-ring memory

# Claim Number Config Gate Caveat Status
01-R1 Quality at 8× KV sparsification +0.69% PPL (2× −0.71, 4× −0.92) Qwen3-0.6B, wiki, N=2048, q8, sinks=4 <2% deflection 1 model/2k/1 corpus done
01-R2 Needle retrieval, no recency bias HIT d10/50/90, to 8× @2k ±1 proj, decode path NIAH HIT 1 model, 1 needle type done
01-R3 Two-ring on physical Optane HIT off NVMe NO_BUFFERING+IOCP poison-gated 512 proven; 32k = R9 done(512)
01-R4 Optane read latency 7.57 µs/read (48.7→18.9→7.57) IOCP batch, 4 KB timed, no page cache syscall+media, Optane-specific done
01-R5 KV-RAM footprint 910× cache (1.8 GB live) (sink+W) ring buffer measured alloc+RSS net ~8×, projk-dominated (~950 MB) done
01-R6 KV codec ~3.5×/f32, lossy Spinor 63 B 29/31 argmax, KL .023 not bit-exact done
01-R7 O(N) recall selection set-equivalent quickselect parity + HIT time win not benchmarked done
01-R8 Bit-exact when disabled bit-identical gate-off no-op argmax parity methodology, not perf done
01-R9 32k needle off NVMe @ ~1.8 GB MISS (run completed: 16.3 h, zero errors; 67% LRU absorption; 19.6 µs/read at QD; 1.35B device reads/stream) N=32768, B=512 (= 64× selection — gated regime was 2×–8×), depth 50, f32 r=16 router (config regression: bits-r64/KVSEL env dropped from the runner) NIAH HIT (poison) — not met no full-attention 32k control yet; router-dilution vs 0.6B model ceiling unseparated; RAM ladder diagnostic open measured MISS — not a claim

Commit chain: 67f4997f8ea920 (+ a5e9b86). Honest negatives (must appear in the paper): CPU decode ~1.34× behind llama.cpp-Q8 (memory is the play, not tok/s); a magnitude-histogram recall signature was falsified and dropped; the composed 32k retrieval MISSed at the 64× selection budget (R9) — the infrastructure half of that run (16.3 h saturated dual-store IOCP, 67% temporal-cache absorption, queue-depth latency measured) is real and reportable as such, the retrieval headline is not. Paper 01 releases on the 512-position-proven R3, not R9.

Paper 02 — the reducing loader (staged; re-gate + repro before release)

# Claim Number Config Gate Caveat Status
02-L1 Reducing transcode, output-preserving .sp-model 16.3 GB < 19.7 GB GGUF, top-1 identical 35B-A3B MoE argmax parity one model; reduction source-dependent prior work
02-L2 Zero-copy swivel load no fp16 inflation of quants (avoids ~4× bw/footprint) arena load path arena-alias verified load-path invariant prior work
02-L3 Codec-by-source, no added loss Q4→packed-Q4, Q8/F16→packed-Q8 transcode gate-off bit-faithful not a new quant scheme prior work
02-L4 Bit-faithful on a second arch Gemma-class within f32-vs-Q8 floor (PPL 86.2 vs 90.7) gemma-e2b PPL gate + argmax 86.2 vs 90.7 is the floor direction, NOT "5% worse" prior work

Paper 02's receipts come from earlier measured work; per series rule 4 they are re-gated and a one-command repro is built before the paper releases.

Paper 04 — the Oracle & the Teacher (staged 2026-06-06; gated in engine, repro before release)

# Claim Number Config Gate Caveat Status
04-R1 Full variable-geometry forward == oracle argmax 12/12, max KL 2.663e-10, |dlogit| 1.84e-4 Gemma4-E2B (35L, MatFormer), RTX 2060 vs CPU oracle E_G4_CU_FULL (f64 log-softmax KL) one model, one host gated (engine)
04-R2 Autoregressive decode == oracle, teacher-forced ALL 12 generated tokens oracle-predicted jagged shared-KV cache, per-step AltUp E_G4_CU_DEC greedy only; short stream gated (engine)
04-R3 First-try composition both live runs green on first attempt; 0 debug sessions on composed code after 5 staged gates (38/38 cumulative) the gate trail itself process claim — receipts are the trail gated (engine)
04-R4 Oracle arithmetic must be enforced, not approximated per-weight dequant diverges 2.8e-3; inline lift restores the floor gemm_w_lift vs k_dequant_arena path L0 staged parity f32-rounding-order effect, model-agnostic mechanism gated (engine)

Engine provenance: tests/test_gemma4_cuda.c, tag stage-eta-phase1-closed-2026-06-06.

Paper 05 — the Probe Suite (staged 2026-06-06; every tool live in engine)

# Claim Number Config Gate Caveat Status
05-R1 Bisection localizes divergence without debugging 2.8e-3 → matmul arithmetic in 2 probe runs; sharer seam proven at ao 1.11e-5 abs truncated-parity harness, 6 boundaries staged ABS gates, telemetry-then-pin needs an oracle (paper 04) gated (engine)
05-R2 Norm layers amplify the f32 floor ~×25 pre-norm 6.3e-5 abs → residual 1.59e-3 (rms(ap)≈0.04) post_attn_norm, E2B L0 stage-5 pre-norm probe gate ABS at floors; never raw rel at norm outputs gated (engine)
05-R3 Cold-start + clock-state manufacture phantom speedups "12.65×" → ~1.06× real (CUDA lazy-load ≈13×; idle SM 405 vs 2100 MHz; GDDR6 free-running under -lgc) CUDA-graph decode, RTX 2060 warm + n_gen≥256 + both clocks GeForce -lmc flaky; within-run ratios are the signal gated (engine)
05-R4 Isolated bench ≠ production gate synthetic-Q4 bench 1.34e-7 PASS while production K-quant-mix path was 0/256 Q4_K_M-style arena (Q8 head/Q4 body) bench + E2E decode gate pair the pairing IS the method gated (engine)
05-R5 Amdahl regime-check before claiming kernel wins int8/Q4 GEMV ties f32 at 0.6B/full-clock (overhead-bound); ~7× where the bus binds decode vs isolated sweep convergence + crossover bottleneck must be named per claim gated (engine)

Engine provenance: tests/test_gemma4_cuda.c (harness), tests/bench_gemv_int8.cu (sweep), system CONVENTIONS.md (the binding benchmark rules).

Paper 06 — computing on the zip file: the dp4a ladder (staged 2026-06-06)

# Claim Number Config Gate Caveat Status
06-R1 f32 GEMV is bus-saturated at scale ~290 GB/s = 87% of 336 GB/s peak, flat N=3K..16K RTX 2060, cuBLAS SGEMV, clocks pinned isolated sweep one card gated (engine)
06-R2 int8 dp4a ladder rung ~3.8× f32 at N≥8K (4:1 bytes) warp-per-row, 128-bit loads, shuffle reduce sweep + 256/256 top-1 in decode naive GEMV ≈ cuBLAS absolute at small N gated (engine)
06-R3 Q4 dp4a ladder rung ~7.06× f32 at N≥12K (8:1 bytes − ~7% nibble-unpack ALU tax) in-ALU unpack, (n^8)-8 sign-extend sweep + host-ref 1.34e-7 + 256/256 top-1 activation int8 quant = top-1-lossless, not byte-exact gated (engine)
06-R4 Dequant-before-GEMM destroys the advantage ~9 B/weight; 3× slower than f32 dequant→f32-scratch→SGEMM same sweep the anti-pattern, measured gated (engine)
06-R5 Per-tensor precision dispatch for K-quant mixes 0/256 → 256/256 after DevTensor.prec routing Q8-head/Q4-body arena production decode gate required for any Q4_K_M-style artifact gated (engine)
06-R6 12B end-to-end tok/s (the headline) SP 34.2 vs llama.cpp-CUDA 31.29 ± 0.20 (+9.3%) Gemma-4-12B, RTX 2060, tg256, SM pinned 2100 (GeForce -lmc unsupported — memory free-ran for BOTH engines); SP = reducing .sp-model 5.56 GB w/ graph+dp4a; llama.cpp b8861 Q4_K_M ngl 99 tok/s measured both engines, same card/model/text; SP decode oracle-anchored (argmax or measured top-2) NOT citable until the PPL gate closes: the SP artifact squeezes Q6_K source tensors to Q4 (fewer bytes read AND more weight-quant error); wikitext PPL vs llama.cpp is the release-blocking gate measured — PPL gate pending
06-R7 Per-vector int8 activation quant collapses on outlier-heavy models (honest finding) oracle-rank 205596 (gap 27.9) at the 12B's L11 → rank 2 (gap 0.31) after per-16-block scales 12B L11 carries a TRAINED out_scale of 0.005 — the model flags its own activation outliers; blocks align with the GEMV's 128-bit loads (zero extra bus) LIFT-arithmetic discriminator (structure at 1.5e-4 floors everywhere) + oracle-rank telemetry E2B/qwen3 never tripped it — the failure is silent on easy models; the rank print is now standing gate equipment gated (engine)
06-R8 The gemma4 GGUF ecosystem ships broken weights; a hand-written reference exposed it (honest finding, ecosystem-scale) full-precision wikitext PPL = 4.68 (from-scratch forward off the official safetensors); the SAME arithmetic over GGUF-dequantized weights: pre-fix wave 271–364, post-June-5 rebuilt 192.9; llama.cpp on the same artifacts 397–506 (two engines agree per-artifact → forward exonerated, ARTIFACTS convicted) chunk-0/512-ctx/[256,512) teacher-forced, llama-dumped tokens (== HF tokenizer 5431/5431); damage anatomy: in-place, period-6 layer severity, layer-scale class defective, no permutation gold instrument + hybrid tensor-class swaps + per-layer cosine forensics (lattice tests/gemma4_gold/) corroborated by llama.cpp PR #24118 + Unsloth's "bugs were universal" rebuild notice — which did NOT fix the text tower measured, receipts in-repo
06-R9 Sovereign quantization pipeline (safetensors → SP transcoder, zero GGUF bytes) reproduces ground truth OK_Q8 artifact: PPL 4.7396 (+1.33% vs gold); mixed OK_Q4B/Q8 "B1" artifact (9.4 GB, fits a 12 GB card): 5.1259 (+9.6%, sim-predicted 5.1259 — match to 4 decimals); ecosystem's rebuilt 6.3 GB GGUF on identical math: 192.9 sp_transcode --st (values from bf16 checkpoint; GGUF supplies metadata/tokenizer only; mapped-but-missing = hard error); Q4B = per-32 f16 block scales, store-then-derive gold-instrument artifact gates (67–105 s each), per-layer residual norms tracking bf16 digit-for-digit per-row Q4 (06-R6's artifact) is formally superseded; 12B tok/s re-anchor (SHOOTOUT-2 on the B1 artifact + q4b kernel) is the remaining open leg measured — kernel leg pending
06-R10 SHOOTOUT-2: the honest 12B speed/quality point (closes 06-R6) 26.1 tok/s at wikitext PPL 5.12 on an RTX 2060 12GB (graph+dp4a, tg256, SM 2100; decode 256/256 top-1, graph 256/256 EXACT, 24/24 gates); llama.cpp-CUDA same card: 31.29 tok/s — at PPL 192–506 (its artifacts are broken; 06-R8) B1 artifact 9.4 GB (OK_Q4B gate/up + OK_Q8 rest, per-32 f16 block scales, k_gemv_q4b dp4a — one weight block per 128-bit chunk); GPU PPL gate 5.1160 vs gold 4.6776 (+9.4%, PASS; sim 5.1259 / CPU 5.1259 / GPU 5.1160 triple-agreement) clock-pinned tok/s + the full PPL gate ladder on the SAME artifact effective decode bandwidth: SP 245 GB/s vs llama 207 GB/s (+18% engine efficiency); SP's artifact is 42% heavier BECAUSE it is the only mathematically intact 4-bit gemma4-12B — a like-for-like speed race does not exist: no other stack runs this model correctly at 4-bit. 06-R6's 34.2 is RETIRED with its quality-failed artifact measured + gated — citable

Engine provenance: src/backends/cuda/cuda_forward.cu, tests/bench_gemv_int8.cu, tests/test_qwen3_decode_cuda.c (28/28); gold instruments: lattice tests/gemma4_gold/.

XBAR — the auditable latent crossbar (probe P1, closed 2026-06-08)

# Claim Number Config Gate Caveat Status
X-R1 Zero-copy latent crossbar: a 12B's generation is steered by direct KV-cache transplant, no tokens involved 15/15 trials (5 prompts × 3 concepts) lexically incorporate the injected concept; selectivity 15/15 (own-family logit-rank geomean 11×–880×, always > cross-family; 2×2 double dissociation); max single-token pull 3.69 orders (' violin' rank 4910→1); dose-response: 1 row (~4% attn mass) bends ranks ≤22×, 6 contiguous rows breach the lexical surface gemma-4-12B B1 artifact (06-R10), RTX 2060, per-step decode + dp4a; 6-row donor KV minted at identical absolute positions (RoPE-phase-exact), chat-template prompts; SP_XBAR_* harness, XBP1 payloads G0 self-transplant bit-identical 7/7 across all campaigns (instrumentation null); rank telemetry every step; G2v1 divergence 11/15 ≤1.5× (4 at 1.55–1.58 = strongest steering, kept as a steering-magnitude measure); G2v2 gold-instrument coherence: steered text PPL 1.70–4.10, 15/15 inside the healthy band (wikitext gold = 4.68) distinct-token diagnostic flags 3/15 dragon-payload trials as repetition-degenerate (9.4% distinct; low PPL because repetitive — why PPL alone can't certify coherence); raw KV splice is a blunt instrument — the learned-adapter phase (P2) exists to fix exactly this measured + gated — citable
X-R2 KV cache decoupled O(1) from context length on a 12B, with the needle retained A learned 512×32 LSH router selects the global top-B at +0.47% PPL (oracle ceiling −0.08%; frozen ±1 was +4.17%); a compact slab realizes O(1) VRAM (N=8k↔16k nvidia-smi flat within ~50 MiB; a full cache would add ~5.4 GiB); the NIAH needle survives the compaction at depths 10/50/90% (exact, learned-router-only; frozen ±1 control misses) Gemma-4-12B B1, RTX 2060 12 GB, gemma4_decode_cuda (backend-direct); SWA ring (W=1024) + global compact slab capped at the GQA union nh·B=4096, full K/V resident in host-RAM Ring-2, ranked by a resident r=32 RᵀK sidecar; needle SWA-isolated by construction (needle_end ≤ n_prompt−W); SP_CUDA_DECODE_INT8 tied-head G2 PPL deflection <2% + O(1) VRAM ladder (8k vs 16k flat) + NIAH HIT under NaN-poison/slab compaction + frozen-router negative control MISS (isolates the learned projection as the cause) the KV-cache term is O(1) and retentive; the absolute footprint in this backend-direct harness still carries the resident model (~9.4 GB) — arena-streamed weights = a separate gate; 1 model/1 host, proof-of-mechanism (not scale-validated) measured + gated — citable

| X-R3 | The crossbar writes: a stored episode is replayed into a frozen 12B's KV cache bit-exactly and load-bearingly, and replaying it does not break the model's perplexity | Replay-write (P3.3): replaying an intact stored episode over the prefill rows reproduces the baseline bit-identically (diffs=0); replaying a zeroed episode diverges in 12/12 trials (the decode collapses to a degenerate loop) — so the replayed payload is load-bearing, not inert. Proven on BOTH the 12B (48 owner layers) and the smaller E2B (15 owners / 20 sharers, exercising owner-indirection). Recall quality (P3.4): replaying a foreign episode over the 4 earliest of 84 positions deflects perplexity +1.38% (4.6665 → 4.7311) — inside the pre-registered <2.0% gate | Gemma-4-12B B1 (06-R10 / X-R2) + Gemma-4-E2B, RTX 2060 12 GB, gemma4_decode_cuda; SP_REPLAY injects the episode's owner-K/V at the cache-store boundary before attention (at both the graph-capture and velocity prefill paths); the recall-quality scorer is the same decode in SP_G4_SCORE mode, so replay composes with zero new code | G-P3-SHARED 3-leg gate (intact == baseline diffs=0; zeroed diverges 12/12; unset = floor) on 12B + E2B; G-P3-PPL deflection < 2.0% (pre-registered, not moved) | the PPL deflection is over n=42 scored positions, a single chunk — it is deterministic (replay, not router sampling, so not a noise-flippable small-N illusion), but a larger-N multi-chunk run is the named hardening lever before any headline; 1 model / 1 card, proof-of-mechanism | measured + gated — citable (n=42 caveat held) |

Provenance: lattice papers/CONTRACT-XBAR-P1-inception-probe.md + papers/CONTRACT-XBAR-P3-ring-on-exec.md (X-R2 run-records G-P3-R2.b-2b-* / -2c-* / -NIAH; X-R3 run-records G-P3-SHARED / G-P3-PPL) + papers/RFC-XBAR-auditable-latent-crossbar.md; engine tests/test_xbar_p1_cuda.c (P1) + tests/test_gemma4_cuda.c SP_G4_NIAH / SP_G4_REPLAY_GATE / SP_G4_SCORE + SP_ARM_* / SP_REPLAY knobs in cuda_forward.cu; trainer tools/xbar_lsh/train_lsh.py; receipts tests/fixtures/lsh/results/ + tests/fixtures/xbar_p3_replay/G-P3-SHARED_{12B,E2B}_GREEN.log + G-P3-PPL_run.log + _xbar\.

XBAR — the orchestration tier above the closed P3 substrate (C2 + #222 + Ring-3 + organism, closed 2026-06-17)

# Claim Number Config Gate Caveat Status
X-C2 The Memo curator: an autonomous loop drives the closed crossbar — it indexes, addresses, selects, is inert when off, and on metal promotes a matched recall while discarding a corrupted one G-MEMO-NULL: curator OFF ⇒ Exec decode bit-identical to baseline (PPL 4.6665 == 4.6665, shadow-oracle mismatches=0, empty-registry resolve = NULL). G-MEMO-CUE(discrete r=256): the address is a 256-bit LSH hash matched by XOR+popcount under an integer Hamming radius (TAU_BITS=168) — held-out cues resolve to own id at 177/178 of 256 bits, all 8/8 unrelated queries → NULL ≤140; reduction-order-immune by construction (an r-sweep showed sign-binarize collapses at r=32, bit-gap −1, and recovers at r≥128, +6 → r=256, +19). G-MEMO-LOOP: ACCEPT matched recall (NPOS=42) +0.000% deflection → PROMOTE; REJECT zeroed recall PPL 1876.24 = +40106% → safety-valve FLAG+DISCARD Gemma-4-12B B1 (06-R10) + E2B, RTX 2060 12 GB; host state machine tools/curator/{build_registry,discrete_resolve,curator_loop}.py composing engine seams (SP_ARM_DUMP cue observer → host r=256 hash → discrete_resolveSP_REPLAYSP_G4_SCORE); gemma4_decode_cuda left byte-untouched G-MEMO-NULL (bit-identical when off) + G-MEMO-CUE(discrete r=256) (selectivity, integer-Hamming) + G-MEMO-LOOP (ACCEPT/REJECT, the deflection safety valve) registry = 2 real episodes vs synthetic background noise (large-registry stress = named lever); the deflection numbers are single-chunk/deterministic (paper 11's caveat); Ring-3 gist, a learned cue, and dual-candidate recall are out of scope here (Ring-2 verbatim recall only); 1 model / 1 card measured + gated — citable
X-222 The curator can speculate and undo: a stored episode replayed into the resident cache is load-bearing, and a rejected recall is undone O(1) byte-exact G-222: gemma4_kv_replay injects a stored episode's owner-K/V into the resident cache and is load-bearing (a zeroed episode's injected slots read back all-zero: 0/36864 E2B, 0/688128 12B); gemma4_kv_rewind(npos) resets the pre-injection floor [0,anchor) byte-identical (layer-diffs=0), pos→anchor, O(1) (full-cache slot==pos shear touches zero cache bytes). G-222-WRAP: SWA-ring replay aliasing live slots 8–15; the KAI-1c journal-backed rewind restores the live window byte-identical (layer-diffs=0) Gemma-4-12B B1 (48 owners) + E2B (15 owners / 20 sharers, owner-indirection), RTX 2060 12 GB; gemma4_kv_replay / gemma4_kv_rewind in cuda_forward.cu (the persistent twin of the one-shot SP_REPLAY); gemma4_decode_cuda left byte-untouched G-222 (load-bearing + byte-exact undo, full cache) + G-222-WRAP (journal-backed, SWA-ring) the O(1) is in the byte-count (proven by byte comparison) — the wall-clock latency slope is the KAIROS-02/03 telemetry (separate, with the unlockable-mem-clock caveat); the persistent-ABI scorer port + the compact-slab global rewind are named follow-ons; 1 model / 1 card measured + gated — citable
X-R3VSA A parameter-free neocortex: Ring-3 superposes episodes into one bounded store and recalls by content address, zero training, on the existing NTT substrate — retrieve-and-verify (not generate-fill) G-R3-BIND: superposition M=Σ(addr⊛id) recall@1 = 1.0 to N=32 @ D=1024 (N=2 margins +0.586 / +0.568; recall@5 ≥ 0.90 to N=64); the ±1 substrate carrier ≈ the ideal unitary carrier (metric-bug caught: SNR-ratio → margin/z-score, math never failed). G-R3-LOSS: consolidation loss is a step function — hit (verbatim verify) +0.000% / miss (foreign) +8.04% caught by the 2% gate; budget ≤32 episodes/vector; 71 µs unbind + 1 Optane read. G-R3-DUALROUTE: cue→shortlist→verify→land survives a decoy scan (reject rank-1 +8.04%/O(1)-rewind, accept rank-2 +0.000%) + null parity. G-R3-NIGHTSHIFT: idle consolidation demotes 349.8 MB resident KV → a 16.3 KB Ring-3 index; D=128 gate-driven seal proves the cap is the math, not a constant Gemma-4-12B B1, RTX 2060 12 GB; Path A VSA/HRR (circular conv = the engine NTT/Z_q algebra), tools/ring3/{g_r3_bind,g_r3_dualroute,g_r3_nightshift}.py + _run_g_r3_loss.bat; addr = the C2 256-bit-signature-seeded carrier, id = a clean ±1 pointer back to the verbatim Ring-2 episode (recall-time gist upsampling FORBIDDEN — §4 trap); gemma4_decode_cuda byte-untouched G-R3-BIND (capacity) + G-R3-LOSS (the irreversible consolidation gate, pre-registered <2%) + G-R3-DUALROUTE (the two-stage loop) + G-R3-NIGHTSHIFT (the idle GC, pre-eviction-gated) the VSA retrieve is host-numpy — the Z_q/NTT engine port (exact integer) is the named deployment follow-on; Path B (the trained adapter) stays budget-gated and untouched; the provenance tag (G-R3-PROV) is deferred; deflection numbers carry paper 11's single-chunk caveat; 1 model / 1 card measured + gated — citable
X-ORG The organism breathes: real audio becomes a functional episodic memory — heard on physical silicon, written as a curator-indexable, replayable Ring-2 episode G-XBAR-ORGANISM (step 1): a real audio packet (7 projector frames) injected via the KAI-3 gemma4_kv_inject_seq path → conditioned cache npos=114 → serialized as ep_audio in the canonical uniform-512 episode format (ep.k = 48 × 114 × 512 × 4 = 11,206,656 B, matching the text-episode layout; a jagged-2048 first cut was caught and clamped). Signature: audio self 211/256, vs text episodes 118–131/256 → margin +79; text episodes' own margins unchanged (a noisier latent stays cleanly addressable). Round-trip: replay injects clean (RT_EXIT=0, checks=5, fails=0); +1989% deflection is foreign-by-design (audio episode vs a wikitext score context — the reject signal the verify gate needs; ~0% is matched-context only) Gemma-4-12B B1, RTX 2060 12 GB; the EAR front-end on a physical Intel GNA 2.0 (KAIROS-04: 0.877 token-recovery == emu == FP32, 7/8 pivot); write seam run_kai3_write (SP_G4_KAI3_WRITE) + the paper-11/13 replay seam; receipts tests/fixtures/xbar_organism/ G-XBAR-ORGANISM (write seam + 256-bit signature separation + structural round-trip) step 1, not the full loop — write+sig+round-trip are closed; matched-context audio recall + the full audio-cue→shortlist→verify→land loop are open; the +1989% is the foreign-context reject signal, not an audio-recall quality claim; the sig pipeline hashes a period-8 layer subset while the 12B's true SWA period is 6 (separation robust, all gates stand — period-6 re-base is a noted tidy-up); 1 model / 1 card / 1 GNA part measured + gated — citable (step 1)

Provenance: lattice papers/CONTRACT-XBAR-C2-memo-curator-loop.md (X-C2 run-records G-MEMO-NULL / G-MEMO-CUE / G-MEMO-LOOP; X-222 run-records G-222 / G-222-WRAP) + papers/CONTRACT-XBAR-R3-consolidation.md (X-R3VSA run-records G-R3-BIND / -LOSS / -DUALROUTE / -NIGHTSHIFT) + papers/CONTRACT-KAIROS-K0-K1.md §7.4–§7.6 (X-ORG EAR/GNA line); engine tools/curator/*.py + tools/ring3/*.py + gemma4_kv_replay / gemma4_kv_rewind / run_kai3_write in cuda_forward.cu; harnesses _run_memo_null.bat / _run_memo_loop.bat / _run_g222.bat / _run_g222_wrap.bat / _run_g_r3_loss.bat / _run_organism_write.bat / _run_organism_rt.bat; receipts tests/fixtures/xbar_c2/ + tests/fixtures/xbar_r3/ + tests/fixtures/xbar_organism/; commits engine 31b1de1/6dd87b9/3ea0587/627dfad (X-C2), b4b037a/24071bc (X-222), 23539b7/aae3131/69638cf/a64a916 (X-R3VSA), 6600cf4 (X-ORG).

XBAR — the discrete container: the crossbar re-carried onto the exact-integer O_K substrate (papers 16–18, closed 2026-06-18)

# Claim Number Config Gate Caveat Status
X-OK-BIND The Unification: the Ring-3 holographic bind, which had been computed in host floating-point, is re-carried onto the engine-native exact-integer dual-prime negacyclic CRT-NTT — bit-identical to the native arithmetic and reduction-order-immune Leg A (G-R3-BIND-on-OK): C-PARITY 256/256 bit-identical (numpy-int == native sp_pr_mul / ntt_forward∘pointwise∘inverse / sp_pr_inner / sp_pr_score_kstore); margin parity (±1 integer carrier recall@1=1.0 to N=16, recall@5=1.0 to N=32 @ deg-512, == float); reduction-order immunity — integer M byte-identical across all 8 summation permutations vs float drift 4.44e-15. Wiring (G-R3-ORGANISM-NATIVE): ok_bind routes bind/unbind through native sp_pr_mul (D=1024 = direct sum of two 512-blocks); CAP=32 not regressed; dualroute + nightshift GREEN native (349.8 MB resident demoted / 16.3 KB index). Leg B (honest negative, G-R3-BIND-on-OK-legB): Kronecker χ_d carriers on the Heegner ladder lower coherence (random 0.0355 > d=-67 0.0153 > d=-163 0.0086, Weil bound) but are operationally inert — recall worse (spiky periodic spectrum), SimHash Hamming ~128 unchanged Gemma-4-12B B1 (06-R10) recall/nightshift, CPU libsp.so (built from core/ntt_crt+core/poly_ring) for parity, RTX 2060 12 GB; frozen primes q1=1073738753 q2=1073732609 M=1152908312643096577, R_q=Z_q[x]/(x^N+1) N∈{128,256,512}, Q(√-163) class number 1; tools/ring3/{g_r3_bind_ok,ok_bind,g_r3_bind_ok_leg_b}.py; gemma4_decode_cuda left byte-untouched G-R3-BIND-on-OK (256/256 bit-identical + margin parity + order-immunity) + G-R3-ORGANISM-NATIVE (live loop on native sp_pr_mul, CAP unregressed) + G-R3-BIND-on-OK-legB (the inert carrier negative) bit-identical is scoped to the bind algebra (cleanup glue still host code); D=1024 is a direct sum of two N=512 blocks (a native N=1024 ring is a separate build); Leg B is a negative, kept because the structure is real and the inertness is the point; 1 model / 1 card measured + gated — citable
X-OK-FROB The Frobenius integer episode store, and the boundary of the algebra: the Ring-2 episode store carried onto the integer O_K substrate (sub-ULP at 24 bits), and the line beyond which the algebra is measured-inert Codec (G-R2-FROB): rank-2 O_K lattice x = a·s_a + b·s_b (real scales) — a16 (16b relL2 3e-5, 2.0× store) / a8b4 (12b 6e-4, 2.86×) / a16b8 (24b relL2 1.2e-7 SUB-ULP, 18% byte-exact, 0.76× store) / a16b16 (32b 8e-11, 98.9% byte-exact); T4 π^k scale free, replays clean. Honest scoping (load-bearing): the n=42 single-chunk PPL gate is blind below ~1% — float-exact == baseline 4.6665, but frob variants jitter −2.27%…+3.37% non-monotonically in fidelity (tie-flip noise) ⇒ losslessness is established by reconstruction fidelity, NOT a fake +0.000%. Boundary negatives: entropy-coding the codes 1.02× (b8 residual incompressible) — G-R2-FROB-ENTROPY; Möbius over the superposition sheds memories (dense M 99.6% nonzero; square-free density 60.94% = 6/π² real, but divisor-recon err 1.35× signal; masking recall 1.000→0.969 @N=32, mean 0.988) — G-R3-MOBIUS; T2 on real gemma-4-12B embed_tokens (V=262144 E=3840 bf16) recon cos 0.032 ≈ random 0.039 (Möbius 43.6% energy non-square-free vs claimed ~0%; T2 was a design proposal, never validated — unlike T4) — G-T2-WEIGHTS Gemma-4-12B B1 (06-R10), RTX 2060 12 GB, codec parity CPU; tools/curator/frob_episode.py + tools/ring3/{g_r3_mobius_probe,g_t2_weights_probe}.py; T4 π^k is paper-03's validated lever; O_K=Z[ω] ω²=ω−41; gemma4_decode_cuda byte-untouched G-R2-FROB (codec fidelity ladder + the PPL-is-blind discipline) + G-R2-FROB-ENTROPY + G-R3-MOBIUS + G-T2-WEIGHTS (the three boundary negatives) losslessness is reconstruction fidelity, not a PPL pass (the gate is blind below ~1% — no +0.000% claimed for the codec); the negatives are negatives, kept on the record; T2 never passed a gate (only T4 did); the T2-weights probe is one tensor of one model; 1 model / 1 card. Boundary thesis: O_K wins on exact arithmetic (the container), never on structuring high-entropy content measured + gated — citable
X-OK-ORG The organism breathes (full loop): a cross-modal continuous→discrete→continuous loop on the integer substrate — real audio sensed on silicon → discrete signature → integer superposition → cross-modal verify → integer codec → live 12B cache G-XBAR-ORGANISM-FULL: real episodes ep_audio (EAR, P=114, physical Intel GNA 2.0) + ep_wiki (P=294) + ep_toy (P=56). C2 256-bit signature: audio self 256/256, vs wiki 147, vs toy 129 (decoys ≪ TAU_BITS=168). Native integer Ring-3 bind (X-OK-BIND) shadow-gate recall@1 all. Dual-route: audio cue → ep_audio top-1 (cos +0.47). C2 integer-Hamming verify: accept audio (256 ≥ 168), reject text decoys — the cross-modal verify. Land: integer Frobenius a16b8 decode (X-OK-FROB) K/V relative-L2 ~9e-8. Inject: metal SP_REPLAY (X-222) of the integer-decoded ep_audio into the resident 12B cache checks=5, fails=0, LAND_EXIT=0 (PPL 88.89 foreign-by-design — the cross-context reject signal, not an audio-recall quality claim). G-PERIOD6-REBASE: content hash rebased period-8 → true gemma-4 global period-6 {5,11,17,23,29,35,41,47} (L_i % 6 == 5); re-gated GREEN, separation cleaner (decoy 154→129) — the paper-15 caveat retired Gemma-4-12B B1 (06-R10), RTX 2060 12 GB; EAR front-end on a physical Intel GNA 2.0 (KAIROS-04: 0.877 == emu == FP32, 7/8 pivot); tools/ring3/g_xbar_organism_full.py (native bind X-OK-BIND + Frobenius codec X-OK-FROB + metal SP_REPLAY X-222); receipts tests/fixtures/xbar_organism/; gemma4_decode_cuda byte-untouched G-XBAR-ORGANISM-FULL (the full audio-cue→retrieve→cross-modal-verify→integer-land→inject loop) + G-PERIOD6-REBASE (the true global-period content hash) the full loop is closed, but matched-context audio recall is NOT claimed — the 88.89 is the cross-context reject signal (audio episode over a wikitext score context), ~0% is matched-context only; one audio episode vs two text decoys (a larger cross-modal registry is a named lever); the EAR/GNA numbers are paper-09's (KAIROS-04), cited as upstream sense; 1 model / 1 card / 1 GNA part measured + gated — citable

Provenance: lattice papers/CONTRACT-XBAR-R3-consolidation.md (X-OK-BIND run-records G-R3-BIND-on-OK / G-R3-ORGANISM-NATIVE / G-R3-BIND-on-OK-legB; X-OK-FROB run-records G-R2-FROB / -ENTROPY / G-R3-MOBIUS / G-T2-WEIGHTS) + papers/CONTRACT-KAIROS-K0-K1.md §7.4–§7.6 (X-OK-ORG EAR/GNA line); engine tools/ring3/{g_r3_bind_ok,ok_bind,g_r3_bind_ok_leg_b,g_r3_mobius_probe,g_t2_weights_probe,g_xbar_organism_full}.py + tools/curator/frob_episode.py + libsp.so from core/ntt_crt+core/poly_ring; receipts tests/fixtures/xbar_r3/{G-R3-BIND-on-OK,G-R3-ORGANISM-NATIVE,G-R3-BIND-on-OK-legB,G-R2-FROB-PARITY,G-R2-FROB-AB,G-R2-FROB-ENTROPY,G-R3-MOBIUS,G-T2-WEIGHTS}.log + tests/fixtures/xbar_organism/{G-XBAR-ORGANISM-FULL,G-PERIOD6-REBASE}.log; commits engine 0019b86 (Leg A) / 1f0f6be (native wiring) / d7d96fe (Leg B) (X-OK-BIND), dbe4103/d076797 (codec) / e6d17bb (entropy) / 1e70763 (Möbius) / ac76c8e (T2-weights) (X-OK-FROB), 15e7051 (full loop) / d2d7ceb (period-6 rebase) (X-OK-ORG).

Byte-exact — the whole forward carried onto the exact-integer substrate (papers 19–21, closed 2026-06-18)

# Claim Number Config Gate Caveat Status
X-BX-ISLANDS Killing the float islands: the four nonlinear fp32 operations of the gemma-4-12B forward — RMSNorm, softmax, GELU, RoPE — converted to deterministic exact-integer fixed-point functions, with no libm and no __int128; the whole forward then reproduces the bf16 gold byte-identically when off and is run-to-run bit-identical when on G-ISLANDS-Q-REF (host x86, sp_islands_q_ref.rs, no GPU): RMS 5.8e-6 / softmax 1.3e-6 / GELU 2.8e-6 / RoPE 9.2e-6 vs float, all reduction-order-immune / deterministic (RoPE via a fixed-point CORDIC, no sin/cos; RMS reciprocal-root via a 64-bit isqrt split; exp via an integer 2^x poly). G-BYTEEXACT-ISLANDS-CUDA (real 12B activations, layer 24): RMS 3.8e-5 / GELU 8.2e-7 / RoPE 9.6e-6 vs the integer refs. G-BYTEEXACT-FORWARD-12B: the four islands + attention (X-BX-WIRE) as exact-integer device kernels behind a default-off SP_BYTEEXACTOFF = PPL 4.6665 == bf16-gold baseline, byte-identical (the null floor); ON = 4.6569 (parity, −0.21% at n=42); the ON run is run-to-run bit-identical (the on-machine cross-machine-determinism proxy) Gemma-4-12B B1 (06-R10) gemma4-12b-b1.sp-model, RTX 2060 12 GB (sm_75), host reference x86; frozen dual primes q1=1073738753 q2=1073732609 M=q1·q2=1152908312643096577 (< 2^60fits a u64, no __int128; Garner inverse 894602413; device __umul64hi for wide products); gemma4_decode_cuda (the one-shot decode) left byte-untouched = null floor G-ISLANDS-Q-REF (host island fidelity + order-immunity) + G-BYTEEXACT-ISLANDS-CUDA (on-model island fidelity) + G-BYTEEXACT-FORWARD-12B (OFF byte-identical to gold / ON parity / ON run-to-run bit-identical) the only remaining gap is external — a literal bit-identical logit comparison across two physical GPUs (needs a second machine); the n=42 PPL parity carries the small-N caveat (the OFF byte-identity and ON run-to-run identity are exact; the −0.21% ON deflection is not a quality claim); byte-exact buys auditability, not speed or size; 1 model / 1 card measured + gated — citable (2-GPU check external)
X-BX-WIRE One substrate, every backend: a universal Rust crate is both the L2 orchestrator and the scalar bit-exact reference the backends gate to; attention's inner products become an exact-integer dual-prime negacyclic-convolution dot; and the resident daemon drives the 12B over the L1 C ABI, including a new O(1) token-by-token decode verb Reference: tools/sp_dsp_smoke owns the byte-exact linear algebra — dual-prime Barrett, mod-q matmul, Garner CRT, NTT ladder — each bit-exact-gated (T_GARNER_BIT_EXACT &c.); C / CUDA / HVX backends correct iff they gate to it. Attention (G-BYTEEXACT-ATTN-{NTT,FULL}): float ⟨q,k⟩ / p·V⟨q,k⟩ = coeff_{N-1}(Q(x)·K(x̂))/Δ² (a single convolution coefficient = the plain dot, mod q1/q2 + Garner; on the 12B = k_attn_decode_win_bx); dot == exact integer at every Δ, and the p·V accumulator stays ~2^46 ≪ M at window W up to 16384dual-prime sufficient, no third prime. Wire: prefill via sp_session_register_forward_backend (G-WIRE-CUDA-GEMMA4); decode via the new L1 verb sp_session_register_kvdecode_backend + additive gemma4_kv_decode_logitsG-WIRE-CUDA-DECODE-GEMMA4 GREEN: 32/32 tokens bit-identical to the null-floor oracle, VRAM flat (O(1) resident cache) Gemma-4-12B B1 (06-R10), RTX 2060 12 GB; reference tools/sp_dsp_smoke (Rust); attention conv + the kvdecode verb in the engine CUDA forward / daemon; gemma4_decode_cuda left byte-untouched G-BYTEEXACT-ATTN-NTT (the convolution dot) + G-BYTEEXACT-ATTN-FULL (the p·V accumulator bound ⇒ no third prime) + G-WIRE-CUDA-GEMMA4 (daemon prefill) + G-WIRE-CUDA-DECODE-GEMMA4 (daemon decode, 32/32 bit-identical, O(1) VRAM) the decode bit-identity is 32/32 vs the oracle on one host (the cross-machine check is X-BX-ISLANDS's open external step); "O(1) VRAM" is the KV-cache term (the harness still carries the resident model, as in X-R2); "no third prime" is measured at W ≤ 16384; the byte-exact linear algebra already existed in the crate, bit-exact-gated (the offline re-derivation was the campaign's one wasted motion — kept on the record); 1 model / 1 card measured + gated — citable
X-BX-BOUNDARY Byte-exact, not compression: the de-conflation, the convicted compression levers, the boundary thesis on the forward, and a re-derivation kept on the record (the reflective / honest-record paper — no new performance number) De-conflation: byte-exact buys auditability (exact arithmetic, reduction-order immunity, cross-machine determinism), explicitly not speed or size. Compression convicted (G-WEIGHT-{TRANSFORMS,FOLD-ORACLE}): incoherence rotation ~1.37× @ int4 / column reorder ~1.05×, both redundant vs the per-32-block OK_Q4B (already gold PPL 4.6665 ≈ 4.68); the 3-bit unlock is a separate axis (QAT/codebook/mixed-precision), out of scope. Boundary thesis (O_K = the container, never the content): four measured-inert content-side levers — split-prime O_K Dirichlet carriers (X-OK-BIND Leg B), Möbius-on-M (sheds memories 1.000→0.969, X-OK-FROB), entropy-on-Frobenius-codes (1.02×, X-OK-FROB), and T2 on real gemma-4-12B embed_tokens (recon cos 0.032 ≈ random 0.039; T2 was a design proposal, never validated, unlike T4). Re-derivation kept on the record: the byte-exact linear algebra was already in the bounded crate, bit-exact-gated (T_GARNER_BIT_EXACT &c.) — re-deriving it offline was the one wasted motion; the genuinely new work was the four islands (X-BX-ISLANDS) + the attention conv & kvdecode verb (X-BX-WIRE) Gemma-4-12B B1 (06-R10), RTX 2060 12 GB; G-WEIGHT-* weight-transform probes + papers 16–17's boundary probes; gemma4_decode_cuda byte-untouched G-WEIGHT-TRANSFORMS + G-WEIGHT-FOLD-ORACLE (compression convicted) + the X-OK-BIND/X-OK-FROB content-side negatives (the boundary) no new performance number — the reflective record; the negatives are negatives, kept attached (T2 never passed a gate); the compression conviction is scoped to this artifact (OK_Q4B already at gold PPL); 1 model / 1 card the honest-record paper — citable

Provenance: lattice papers/CONTRACT-BYTEEXACT-forward.md (§3–§5 the four islands + the build, §5.1–§5.2 the L1-ABI wire, §0–§1 the de-conflation); the universal reference crate engine tools/sp_dsp_smoke (sp_islands_q_ref.rs; T_GARNER_BIT_EXACT &c.); engine src/backends/cuda/cuda_forward.cu (SP_BYTEEXACT island + attention kernels, k_attn_decode_win_bx, gemma4_kv_decode_logits) + the resident daemon (sp_session_register_{forward,kvdecode}_backend); receipts tests/fixtures/xbar_r3/{G-ISLANDS-Q-REF,G-BYTEEXACT-ATTN-NTT,G-BYTEEXACT-ATTN-FULL,G-BYTEEXACT-ISLANDS-CUDA,G-WIRE-CUDA-GEMMA4,G-WIRE-CUDA-DECODE-GEMMA4,G-BYTEEXACT-FORWARD-12B,G-WEIGHT-TRANSFORMS,G-WEIGHT-FOLD-ORACLE}.log; commits engine 69c0588 + math-core submodule d9d96f3 + lattice contract 2751407 (CONTRACT-BYTEEXACT §5.1/§5.2/§8). The OK_Q4B per-32-block quantization at gold PPL is paper 06's 06-R10.

Not claimed (yet) — kept out of every front door

  • The transformer is a CM-elliptic-curve endomorphism sequence; training is BSD analytic-rank maximization. Real research program; no explicit curve, no model trained this way. Companion only.
  • Anything on models larger than the references, multi-model generality, or independent reproduction. Until those exist, the phrase is "proof-of-mechanism."

KAIROS — the resident kernel (KAI-1 + KAI-1b + KAI-1c, closed 2026-06-14; ≥24 h soak in-flight)

# Claim Number Config Gate Caveat Status
KAIROS-01 A 12B agent runs as a resident background daemon: mathematically silent and O(Δ)-flat until a high-salience event, remaining stable after execution 24-tick crucible PERFECT: 21/21 idle → NO_OP (KV prefix flat), 3/3 salient → coherent contextual ACTION (start / clean / renew for build-finished / disk-95% / ttl-expiring), 0 false-action, 0 missed, 0 malformed; every post-action idle tick reverts to NO_OP with zero drift — the exact condition that collapses a 0.6B into a deterministic corruption attractor (NO_克作) gemma-4-12B B1 artifact (06-R10 / X-R2), RTX 2060 12 GB, gemma4_decode_cuda backend-direct, ~8–17 s/tick, 10.8 GB resident; cold-evict = prefix-grow on the one-shot decoder (NO_OP ⇒ prefix unchanged ⇒ next idle byte-identical to the first ⇒ O(Δ); ACTION ⇒ prefix grows = the post-action crucible); SALIENCE≥0.5 policy + gemma <start_of_turn> template, runtime-encoded via the parity-validated .sp-tokenizer (T_G4_TOK_PARITY 5432/5432) G-KAIROS-1 discipline (0 false-action / 0 missed) + the tick-5 post-action reversion crucible + a 0.6B negative control (same harness, same tape → collapses → isolates capacity, not plumbing) proof-of-mechanism on a 24-event scripted tape (not live sensors — that is KAI-4); the ≥24 h unattended soak is a pending operational run; 1 model / 1 card measured + gated — CLOSED
KAIROS-02 Cold-evict happens at the metal: a resident daemon shears its KV cache by an O(1) memory-coordinate operation, byte-exact — decoupling idle compute from action history G-1b-REWIND-NULL: after an idle tick + rewind(Δ), the KV is byte-identical to never-visited across all 48 owner layers (16.5 MB, diffs=0); the re-run reproduces identical tokens. Idle-tick latency vs retained-action count A: prefix-grow (host re-ingest) 0.924 s/action vs metal 0.0073 s/action (127× shallower) — grow/metal 3.08× → 16.70× @ A=16 gemma-4-12B B1 (06-R10 / X-R2), RTX 2060 12 GB, persistent-KV gemma4_kv_* on gemma4_decode_cuda (left byte-untouched = null floor); rewind(Δ) = logical decode-position decrement; full cache (slot==pos ⇒ sheared slots never read, overwritten on next append ⇒ perfect inverse); clocks pinned G-1b-REWIND-NULL (D2H memcmp all owners, diffs=0) + EQUIV gen-reproduce + the O(actions)→O(1) latency sweep full-cache rewind (SWA via windowed attention); the idle tick still carries the O(context) attention-read term — the O(actions) elimination is in the step count (metal constant; grow ∝ A); SWA-ring wrap-aware rewind + the daemon-resident loop are follow-ons; 1 model / 1 card measured + gated — CLOSED

| KAIROS-03 | The O(1)-time rewind (KAIROS-02) now runs on the O(1)-space SWA ring, uniting both axes, and the full semantic daemon loop runs on that journaled ring — the crossbar substrate (space ⊗ time ⊗ cognition) as one resident system | G-1b-WRAP-NULL: a forced wrap-crossing idle tick clobbers live-window slots in all 40 SWA-owner layers (non-vacuous), yet after rewind the ring is byte-identical (diffs=0) + reproduces identical tokens — the undo-journal is a perfect inverse across the wrap. Journaled-ring O(1): ring slope 0.00365 s/action ≈ full-cache 0.00371 (the journal adds no asymptotic cost). run_kairos_metal semantic crucible: 24-tick tape 0 false / 0 missed / 0 malformed / 0 pos-violations — commit-on-action / rewind-on-idle on the live journaled ring, 3 salient → coherent ACTIONs, every post-action idle tick reverts to NO_OP | gemma-4-12B B1 (06-R10 / X-R2), RTX 2060 12 GB, persistent-KV gemma4_kv_* with a per-tick undo-journal on SWA owners (SP_G4_KV_RING_W; save-before-overwrite, restore-in-reverse on rewind, cleared by commit; bound min(k,W)/owner/tick = constant ⇒ O(1) in both axes); globals stay full-cache; gemma4_decode_cuda left byte-untouched (null floor); core clock pinned (the 2060 cannot pin its memory clock) | G-1b-WRAP-NULL (D2H byte-identity across a forced wrap, 40/40 owners clobbered) + journaled-ring O(1) telemetry (within-leg slope, drift-robust) + the run_kairos_metal crucible (semantic + pos-discipline gate) | the fine per-tick journal D2D tax is below the 2060's wall-clock floor (memory clock unpinnable ⇒ ±~12% jitter swamps the ~1-3% tax — exact figure pending cudaEvent timing); wrap-correctness and semantic-correctness proven on orthogonal axes (forced-wrap in isolation; the crucible on the faithful W=1024 window); the ≥24 h endurance soak is IN-FLIGHT (no verdict); 1 model / 1 card | measured + gated — CLOSED (mechanism); endurance soak in-flight |

| KAIROS-04 | The model gains a real-world audio sense, physically realized on dedicated silicon: real speech is encoded, lowered to a fixed-point neural front-end, and run on an Intel GNA 2.0 accelerator at no quality loss, and its output pivots the resident 12B | Real speech → 12B pivot 7/8 — real TTS speech → log-mel → a GNA-conservative Conv1d encoder + CTC head → the proven KV-inject seam → the resident 12B replies with the right NO_OP/ACTION on 7 of 8 events (all 4 idle events correct; 3 of 4 salient events correct + coherent; the 1 miss is a conservative ACTION→NO_OP). Held-out CTC token recovery climbed 0.44 → 0.868 under a multi-voice training set (924 samples, 2 voices). Front-end on physical GNA 2.0: the FP32 model is bit-exact through ONNX→OpenVINO-IR (0.877 == torch); a naive int16 quantization shears it to 0.667, but a calibrated GNA-native int16 PTQ fully recovers 0.877, and running it on the physical Intel GNA 2.0 accelerator scores 0.877 == software-emulation == FP32 | Gemma-4-12B B1 (resident), RTX 2060 for the model; the audio front-end is a small Conv1d+CTC encoder lowered to GNA 2.0 (Intel "Beast Canyon" NUC, i9-11900KB, driver gna_03.05.00.2116, BIOS-enabled) via OpenVINO 2023.3 + POT int16 PTQ; GNA conv constraints handled (no padding → VALID; output-channels padded to a multiple of 4); native-Windows runtime (WSL2 has no GNA hardware passthrough) | the 7/8 event gate on real speech; an FP32-vs-i16-vs-GNA_HW token-recovery ladder showing the calibrated quant and the physical silicon both at 0.877 (== FP32) — i.e. the lowering is lossless end-to-end | the 1/8 miss is a conservative ACTION→NO_OP (a quiet error, not a false alarm); proof-of-mechanism on a scripted event set with one model / one card / one GNA part; the audio sense is a separate-but-related sibling of the latent-memory crossbar, sharing the same KV-inject seam | measured + gated — citable |

Provenance: lattice papers/CONTRACT-KAIROS-K0-K1.md §4 + §5.5-5.8 + §7.4–§7.6 (closures); engine tests/test_gemma4_cuda.c (SP_G4_KAIROS / SP_G4_KAIROS_METAL / SP_G4_KV_REWIND / SP_G4_KV_WRAP / SP_G4_KV_TELEMETRY / SP_G4_KV_RING_TEL / SP_G4_KAI3) + src/backends/cuda/cuda_forward.cu (the gemma4_kv_* ABI + undo-journal + gemma4_kv_inject_seq) + tools/sp_daemon/src/kairos*.rs (Path A control plane) + tools/audio_port/{ov_gna_score,ov_score_ir,pot_gna_quantize}.py + run_gna_hw.bat + GNA_HW_BRINGUP.md (KAIROS-04 GNA EAR); commits KAI-1b e06e3ae / 0bb94f1, KAI-1c d90945f / f201bf3 / d0a6717 / b0d2bf6; receipts results/kairos_12b_pathB_crucible.log, kai1b_rewind_null_gate.log, kai1b_oactions_to_o1_telemetry.log, kai1c_wrap_null_gate.log, kai1c_ring_telemetry.log, kai1c_kairos_metal.log, _xbar/p2b/kai3/G-KAIROS-3-{AUDIO_7of8,GNA-i16_quant_gate,GNA-HW}.log.

B3 — the autonomous librarian (X-B3-*, papers 22–24, 2026-06-19/20)

# Claim Number Config Gate Caveat Status
X-B3-NEGATIVES Hand-designed relevance cannot pick the dependent episode open-world 9 signals refuted (6 verifiers + 4 Disposer + cosine-q·K + ΔLL-polarity); best hand-design 2/3 rank, 1/3 consensus @N=3; the two "pristine" fixes (cosine, argmin-ΔLL) made it WORSE gemma-4-12B B1, RTX 2060; post-inject verify-and-rewind on the resident chat decode; signals measured, none shipped the full Disposer signal sweep + N-sweep + dual-fix, each diagnosed (per-episode bias / super-attractor / magnitude-not-direction) small-N (N≤6) curated; 1 model / 1 card; negatives are negatives, kept attached measured — honest negative (paper 22)
X-B3-ABLATION A teacher-forced ablation knockout makes episodic dependency measurable on novel needles novel collapse −33.56 vs parametric −0.15 (~220×); 3-archetype matrix −33.56 / −18.58 / −16.10 vs control band [−0.15,+1.45]; TAU pinned −8.0 teacher-force the known secret, cudaMemset-ablate exactly its source KV rows, score ΣΔLL with/without, O(1) rewind; the probe is both admission oracle AND perfect labeler parametric facts are unmeasurable ("parametric steel") — requires novel needles; offline (needs the answer key); 1 model / 1 card measured + gated — CLOSED (paper 23)
X-B3-WC A learned W_c head does autonomous instance-level recall + clean foreign-reject, int16-exact 360/361 recall + 50/50 foreign-reject, int16 == f32 lossless, s0=+0.102; instance top-1 34% → 100% under corpus diversity; reduction logsumexp-mean=361/361 (max 12 / top-8-mean 16) W_c HD=512→r=32, logsumexp-over-positions then mean-over-heads, InfoNCE over [episodes + NULL/s0] + reject hinge; (E+1)-argmax reject; trained on ablation-oracle labels reject is relative (NULL-argmax), not an absolute threshold; diversity was the binding constraint; 90-needle curated corpus measured + gated — CLOSED (paper 24)
X-B3-WC-DEPLOY The learned librarian is DEPLOYED LIVE on the served 12B chat matched query → RECALL its needle (score 9.858, clear argmax, replay @M=42); "capital of France?" → whole pop negative (best −0.026 < s0) → NULL → clean "Paris" recall.rs WcHead + routes.rs SP_B3_WC (E+1)-argmax + SP_REPLAY_MTARGET=42 bounded replay; default-off = byte-identical null floor proof-of-mechanism; between-turn consolidation (B4 NIGHTSHIFT) scoped but not built; 1 model / 1 card measured + gated — LIVE (paper 24)

Provenance: engine tests/fixtures/chat_fullstack/G-CHAT-B3-{DISPOSER-*,NEEDLE-v12,NEEDLE-v13-MATRIX,LABELER,ADMISSION-200,WC-DIV2,WC-DEPLOY}.log; tools/sp_daemon/src/{recall.rs,routes.rs} + tools/xbar_lsh/{mint_corpus,mint_corpus_v2,b3_train_wc_fast2,export_wc_deploy}.py; commits 4dba6c8/2b623ab/acd7b3a (negatives), 15738c1/b6470cc/7556d04/f4166c7 (ablation+labeler+scale), 87044d8/f62e6ef/edc8079 (head+diversity+deploy). Architecture: lattice papers/CONTRACT-CHAT-FULLSTACK.md + SESSION-HANDOFF.md §0d.

NIGHTSHIFT — the offline curator (X-NS-*, 2026-06-21)

# Claim Number Config Gate Caveat Status
X-NS-CURATOR An offline curator can autonomously mint a Ring-2 episode: it extracts the load-bearing secret by a model call, admits it only if a teacher-forced causal ablation proves the fact is episodic (not parametric), and emits a conformant MEM-OKF record G-NIGHTSHIFT-CURATOR (synthetic 2-episode gate on the 12B): the novel needle (KAI-3 vault, model-extracted secret 8-FALCON-7729) collapses −33.59 → ACCEPT (matches the X-B3-ABLATION oracle's −33.56); the parametric control (The capital of France is Paris., extracted Paris) collapses 0.00 → REJECT~33-nat separation, clean across TAU = −8.0. The model-call extractor (vs token-rarity) yields a surgical secret, so the ablation measures fact-dependency not context-destruction (a whole-sentence v1 over-fired the parametric control at −22.01). Emit verified: MEM-OKF record rc=0, conformant, c2-sig joined, okf_mem verify GREEN; accepted=1 rejected=1 Gemma-4-12B B1 (06-R10), RTX 2060 12 GB; run_kairos_curator (model-call ep.secret extractor → teacher-forced ablation admit, the X-B3-ABLATION instrument → MEM-OKF emit via okf_mem), default-off SP_NIGHTSHIFT_OFFLINE; gemma4_decode_cuda left byte-untouched = null floor G-NIGHTSHIFT-CURATOR (the synthetic admit/reject gate + conformant emit) — criteria 1–4 GREEN synthetic gate, NOT livecriterion 5 (live B4 in-distribution on real chat turns) is PENDING (the live capture path is not yet wired; it is gated-GREEN / default-off, not running on the served chat); the admission oracle inherits X-B3-ABLATION's "parametric steel" caveat (only novel needles are measurable); 2-episode synthetic registry; 1 model / 1 card measured + gated — GREEN on the synthetic gate (live PENDING)

Provenance: lattice papers/CONTRACT-NIGHTSHIFT-CURATOR.md + papers/STATUS-MAP-2026-06-21.md; engine tools/sp_daemon run_kairos_curator + tools/okf_mem.py + memory-okf/ (MEM-OKF profile papers/MEMORY-OKF-PROFILE.md); receipt tests/fixtures/chat_fullstack/G-NIGHTSHIFT-CURATOR.log; commit engine 6107f3e (lineage 9ad7ede9ee46686107f3e). The teacher-forced ablation admission oracle is paper-23's X-B3-ABLATION (TAU −8.0); the (E+1)-NULL recall selector is paper-24's X-B3-WC.

Diffusion judge — whole-machine throughput (perf levers, X-DG-*, 2026-06-23/24)

# Claim Number Config Gate Caveat Status
X-DG-SCRATCHREUSE Hoisting the per-expert dequant scratch out of the MoE loop is a byte-identical-by-construction throughput win on the diffusion judge — promoted default-on ~1.46× on the diffusion judge — reversed-order 2×2 A/B (RTX 2060 @1800, STEPS=8 CANVAS=16, 3 items): OFF 281 s / 285 s (the per-call cudaMalloc/cudaFree null floor) vs ON 193 s / 194 sorder-independent (ON is 193–194 s regardless of leg position ⇒ the speedup is the change, not warmup), recall 0/2 == 0/2 and reject 0/1 == 0/1 across all four legs. Promoted default-on (no env → 74 s, deterministic ans_tok=236799 matches the OFF value); SP_DG_SCRATCHREUSE=0 is now the explicit opt-out the 26B-A4B MoE diffusion-gemma judge, RTX 2060 12 GB (NUC11), dg_gemm_packed / dg_gemm_packed_rows; the reused dequant scratch is byte-identical by construction (the dequant overwrites the full region before the SGEMM reads it ⇒ default-on changes no outputs, it only removes the synchronizing per-expert cudaFree that serialized the stream); the gemma4 decode + all other paths untouched reversed-order 2×2 A/B (order-independence is the gate) + the default-on rebuild verify (deterministic-item match to the OFF value) scope is the diffusion-judge path only (dg_ prefix); the speedup is wall-clock on one card (one host, clocks at 1800); the judge's own accuracy/quality is unproven and held in the drawer — this is a throughput lever on it, not a judge-quality claim; 1 card measured + gated — citable (default-on)
X-DG-ASYNC A pinned double-buffer prefetch of spillover experts (overlap upload N+1 with compute N) is byte-exact — and proving it required fixing a pre-existing out-of-bounds the async heap-shift exposed G-DG-ASYNC (MOECHK OFF vs ON, self-cond on): BYTE-EXACT 240/240, recall/reject identical (OFF 0/1,0/1 == ON). The async path was correct all along: compute-sanitizer found the only hazard was a pre-existing dg_self_cond OOBdg_k_softmax_rows read prev_logits as [C×V] while the harness sized prev_dev to the TEST canvas override (CL=16) instead of the model's dg_canvas_length (256), over-reading adjacent VRAM; serial baked that garbage into the baseline, the async path's 4 slot allocations shifted the heap → different timing-dependent garbage → the step-1 divergence (step 0 was byte-exact because self-cond runs between steps). Independent proof async is innocent: SP_DJ_NOSC=1 (self-cond off) → async == serial byte-exact 240/240. Marginal speed ON 53 s vs OFF 69 s ≈ 1.3× (order-confounded). FIX = size prev_dev to the forward's canvas + cudaMemset 0 the rows the harness never fills (deterministic zeros, not uninitialized garbage) the 26B-A4B MoE diffusion-gemma judge, RTX 2060 12 GB (NUC11), SP_DG_ASYNC spillover double-buffer; gemma4_decode_cuda + all other paths untouched; default-off = byte-identical null floor G-DG-ASYNC (MOECHK byte-exact 240/240, self-cond on) + the SP_DJ_NOSC innocence proof (async == serial 240/240) byte-exact GREEN + default-off — NOT promoted: the default-on promotion is PENDING a reversed-order timing lock and a full-corpus recall re-verify (zeroing the over-read region changes self-cond input from garbage to deterministic zeros — strictly more correct, but the prior self-cond was undefined behavior, so recall is re-run before promotion); the ~1.3× is order-confounded (the clean reversed A/B is the next step); 1 card measured + gated — byte-exact, default-off (promotion pending)

Provenance: engine tests/test_diffjudge_denoise.c + src/backends/cuda/cuda_forward.cu (dg_gemm_packed / dg_gemm_packed_rows / dg_dev_alloc_f32 / the SP_DG_SCRATCHREUSE + SP_DG_ASYNC + SP_DG_MOECHK knobs); receipts _srrev_on.log / _srrev_off.log (scratchreuse 2×2) + _sr_verify_default.log (default-on) + _async_parity.bat (G-DG-ASYNC); commits engine e31c70d (scratchreuse default-on; lineage 1d0e414e31c70d) + 2a1c830 (async byte-exact fix; campaign 30aaaa6 impl → 6158943 host-WAR+bisect → 2a1c830 root-cause). The diffusion judge's accuracy is unproven (kept out of every front door per the standing scope) — these are throughput levers on the path, not judge-quality claims.

KEYSTONE — the integrated organism (X-AGENCY / X-HARNESS-TOOLS / X-AGENCY-LOOP / X-CONVMEM / X-FAITHFUL / X-KEYSTONE, 2026-06-25)

# Claim Number Config Gate Caveat Status
X-AGENCY The model owns its memory: it forgets on intent, supersedes a changed fact, and consolidates two complementary facts into one synthesized truth — a model-driven 3-stage agency, framed as detection-not-decision with a forced answer prefix FORGET: "forget 7-RAVEN-3300" → token-overlap match @0.25 → drop from live set + rewrite persisted registry → confirmed gone (registry 8 seeds intact). DECIDE: "blue" → "green" → side model-call emits CHANGED=1 → daemon silently forgets blue → recall returns "Green" (no over-deletion). MERGE ("holy grail"): "sister is a doctor" + "sister lives in Boston" → CHANGED=NONE → model emits MERGE:: lives in Boston and is a doctor → drop both + capture the synthesis → recall returns the combined truth gemma-4-12B, RTX 2060; daemon routes.rs SP_FORGET / SP_DECIDE (supersede + merge); forget-turns excluded from capture; admit-gate question-detect; conflict-gated; default-off = byte-identical null floor G-FORGET, G-DECIDE, G-MERGE (engine tests/fixtures/chat_fullstack/) proof-of-mechanism; meta-cognitive model-calls must be framed as detection with a forced prefix (the banked lesson); 1 model / 1 card measured + gated — LIVE
X-HARNESS-TOOLS Ephemeral tool calling with no native tool channel: the model emits a text-protocol tool call, the harness parses/executes/feeds-back in a ReAct loop, over the daemon /v1/chat SSE seam live ephemeral tool calling end-to-end: calculate → 4183, run_python → 5050; the model emits <tool name="X">{json}</tool> in plain text, run_with_tools loops until it stops. H1 streams live tokens (Paris) through SPDaemonClient shannon-prime-harness (CosySim runtime re-hosted on sp-daemon, lmstudio stripped); ToolSpec.from_callable derives the schema from a Python signature; run_with_tools(messages, tools, max_rounds=6) G-HARNESS-DAEMON-E2E (H1, cd4d935), G-HARNESS-TOOLCALL-E2E (H2, 438738c) honest-neg: multi-line indented code inside JSON is unreliable (single-line preferred); proof-of-mechanism; 1 model / 1 card measured + gated — GREEN
X-AGENCY-LOOP An autonomous agency loop on a heartbeat (KAIROS tick): between turns, idle-gated, the organism consolidates the written conversation then runs a model-driven memory-maintenance round — zero manual steps the KAIROS scheduler beats on SP_AGENCY_INTERVAL, each tick (idle-gated) consolidates SP_CURRENT_CONVO (facts → mid, transcript → long) then runs agency_round (the model reviews its memory + curation tools and decides what to forget/consolidate); the daemon writes each turn to disk (the consolidation hook) so the loop closes with no manual steps harness control.agency (agency_round / run_agency_scheduler / consolidate_current); launcher run_agency.py / engine run_agency.bat; daemon writes SP_CURRENT_CONVO per turn G-HARNESS-AGENCY-E2E (H4, 71873bb), G-HARNESS-KAIROS-TICK-E2E (H5, 698524c), G-HARNESS-HOOK-E2E (H7, cb8b7ec/4f97667) proof-of-mechanism on the live loop; the round is model-driven (detection-framed); 1 model / 1 card measured + gated — GREEN
X-CONVMEM Tiered conversation memory: SHORT (live convo) → MID (extracted facts) → LONG (full+summary), one content-address scheme (sha256 / C2-sig) linking the tiers so the model gets the gist and digs deeper only on demand summarize/store/recall/read_conversation + extract_facts + consolidate_conversation over a content-addressed MEM-OKF store (LUT→sum/full/); a recallable capabilities corpus + the init primer ride the same store; the daemon carries the full thread (re-prefill) each turn harness skills.conversation_memory on tools/okf_mem.py (sha256 addr); roots SP_CONV_OKF_ROOT / SP_CAPS_OKF_ROOT; memory-as-tools skills.memory.{list,remember,forget} G-HARNESS-CONVMEM-E2E (H6, 7e0926b), G-HARNESS-MEMTOOLS-E2E (H3, 8e855ca), G-MEM-OKF-CONFORM the SHORT tier is re-prefilled per turn (O(n) — the persistent-O(1)-KV edge is open); extraction into MID/LONG is the durability path; 1 model / 1 card measured + gated — GREEN
X-FAITHFUL The served chat is made faithful to the in-context conversation (use stated facts, never substitute a parametric prior) — the "restarting each turn" complaint was the model being unfaithful, not a cache bug the daemon DOES carry the full conversation (facts carried recall ON+OFF); the real issue was confabulation ("octopus" → "dog", recent-fact-ok older-fact-confabulated); FIX = a default system prompt (identity + capabilities + "use stated facts faithfully, octopus is octopus" + concise) → diagnostic FAITHFUL=True gemma-4-12B served console index.html default system prompt (engine 88d924e); covers the operator's seed-capabilities + reply-style asks G-JUDGE-SERVED / the served FAITHFUL diagnostic the system prompt is the patch; reliable tiered recall is the structural answer (open edge 3); requires hard-refresh; 1 model / 1 card measured — LIVE (patch; structural fix is the tiered-memory edge)
X-KEYSTONE The pieces proven in isolation are integrated into one self-supporting organism on the served 12B: byte-exact forward + two-ring/XBAR memory + autonomous recall/reject + the full memory agency + the tool-calling harness + tiered conversation memory + the live consolidation loop, every mechanism default-off = null floor the integration milestone (keystone-1): the served chat holds the thread faithfully, learns/recalls/forgets/supersedes/merges facts on the model's own verdict, calls tools + runs Python, manages its own memory on a heartbeat tick, and stores conversations in tiers — the loop closes with zero manual steps. ~90% of the envisioned organism the five repos (lattice / system / engine / harness / Position_Is_Arithmetic); run via _e2e_seed_serve.bat (daemon) + run_agency.bat (the scheduler); the call surface is PPT-LAT-KEYSTONE-API.md KEYSTONE-1 (the union of the per-pillar gates: G-BYTEEXACT-FORWARD-12B, G-CHAT-B3-WC-DEPLOY, G-FORGET/DECIDE/MERGE, G-HARNESS-* H1–H7, G-JUDGE-BATTERY) proof-of-mechanism integration on 1 model / 1 card; four open edges remain (persistent O(1) KV; the 2-physical-GPU byte-exact check; deeper faithfulness via reliable tiered recall; native-C XBAR port + T4 Frobenius of the weights) measured + gated — GREEN-LIVE (the integration milestone)

Provenance: lattice papers/PPT-LAT-KEYSTONE.md (the map) + papers/PPT-LAT-KEYSTONE-API.md (the call surface) + papers/PPT-LAT-RFC-001-Universal-Discrete-Architecture.md (KEYSTONE addendum) + papers/PPT-LAT-Roadmap.md (post-KEYSTONE). Memory agency: engine tools/sp_daemon/src/routes.rs (SP_FORGET / SP_DECIDE supersede+merge), commits FORGET 78e4acf / DECIDE be9a426 / MERGE 0fd52e4; receipts tests/fixtures/chat_fullstack/G-{FORGET,DECIDE,MERGE}.log. Harness: shannon-prime-harness/ (harness/ + run_agency.py + tests/), commits H1 cd4d935 / H2 438738c / H3 8e855ca / H4 71873bb / H5 698524c / H6 7e0926b / H7 cb8b7ec+4f97667; gates G-HARNESS-{DAEMON,TOOLCALL,MEMTOOLS,AGENCY,KAIROS-TICK,CONVMEM,HOOK}-E2E. Faithfulness: engine 88d924e (served index.html default system prompt). The integration rides the closed pillars (byte-exact forward Papers 19–21, B3 librarian Papers 22–24, the two-ring/XBAR substrate Papers 16–18) — see those sections above.