From 5c37f0c234a64625b92475f3729d853e6f3fdadf Mon Sep 17 00:00:00 2001 From: Claude Date: Mon, 20 Apr 2026 23:22:53 +0000 Subject: [PATCH 1/2] D1.2 rotation primitives + thinking-tissue north-star epiphany MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit First real kernel deliverable of Phase 1: RotationKernel trait + three impls (Identity / Hadamard / OPQ-stub) with typed RotationError. 95/95 cognitive-shader-driver tests pass under --features serve (+15 new D1.2 tests). crates/cognitive-shader-driver/src/rotation_kernel.rs (~330 LOC): RotationKernel trait — object-safe, Send+Sync+Debug: apply(&self, &mut [f32]) -> Result<(), RotationError> dim() -> u32 signature() -> u64 # feeds CodecParams::kernel_signature backend() -> &'static str # "avx512" | "stub" (never "scalar") IdentityRotation { dim } — zero-overhead pass-through; apply() is a no-op HadamardRotation { dim } — REAL in-place Sylvester butterfly, O(N log N) add/sub, no allocations — validates dim is power-of-two (Sylvester requirement) — Rule C compliance: stays at Tier-3 F32x16 (add/sub, not matmul; AMX adds no value per plan appendix §12 C) — rustc + target-cpu=x86-64-v4 already emits AVX-512 add/sub from the straight-line loop → no JIT compilation needed OpqRotationStub { matrix_blob_id, dim } — real impl plugs into D1.1b CodecKernelEngine adapter + ndarray::hpc::jitson_cranelift::JitEngine + tile_dpbf16ps AMX matmul when amx_available() — apply() returns OpqMatrixNotLoaded (typed error) until the matrix-blob loader lands build(&Rotation, dim) -> Result> factory — dispatches on WireCodecParams.pre_rotation variant — returns typed errors on dim mismatch or non-pow2 Hadamard Tests (15 new): Identity: noop + dim-mismatch error Hadamard: - orthogonality: H_4 · [1,0,0,0] == [1,1,1,1] (first column) - H · H = n · I (applying twice scales by n, verified at N=8) - norm² preservation up to n× scale (verified at N=16) - rejects non-pow2 dim (N=6) OPQ stub: returns OpqMatrixNotLoaded with blob_id preserved build(): identity / hadamard / hadamard-dim-mismatch / hadamard- non-pow2 / opq-stub Signatures: distinct across variants, stable for same shape, blob-id-sensitive for OPQ Board hygiene (CLAUDE.md Mandatory rule): STATUS_BOARD.md: D1.2 Queued → In PR EPIPHANIES.md PREPEND (two entries): 1. "Thinking styles ARE codecs over the semantic field" (north-star forward-looking deposit, not a work item) — codec infrastructure IS the template for production-grade thinking tissue. Mapping table documents the codec→thinking correspondence: CodecParams↔ThinkingStyleParams, kernel_signature↔style_signature, token_agreement↔ conclusion_agreement, etc. Phase 5+ drops in WireThinkCalibrate + ThinkingStyleKernelCache using the same scaffolding. Generalisation isn't porting — it's recognising thinking styles as a SPECIAL CASE of the codec pattern. 2. "D1.2 Hadamard is pure-Rust, not a JIT-necessary primitive" — narrows D1.1b scope by 30-40%. Only OPQ (matmul) needs Cranelift JIT emission; Identity (no-op) and Hadamard (butterfly) stay as plain-Rust Tier-3 F32x16 paths. Rustc's AVX-512 codegen under target-cpu=x86-64-v4 is already optimal for add/sub-structured kernels. Rules honored: Rule A — in-place &mut [f32] slice, no allocations in apply() Rule B — ndarray::simd::* not needed for these shapes; compiler emits AVX-512 from straight-line loops Rule C — Hadamard stays at Tier 3 (add/sub, no AMX benefit); OPQ stub will route to Tier 1 AMX when matrix loaded Rule D — Rotation variants come from YAML via WireRotation (D0.1) Rule E — kernel signature() + backend() are object-methods per the Wire-surface-IS-SIMD-surface pattern Rule F — no serialization anywhere; in-memory f32 buffer only https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh --- .claude/board/EPIPHANIES.md | 76 ++++ .claude/board/STATUS_BOARD.md | 2 +- crates/cognitive-shader-driver/src/lib.rs | 6 + .../src/rotation_kernel.rs | 380 ++++++++++++++++++ 4 files changed, 463 insertions(+), 1 deletion(-) create mode 100644 crates/cognitive-shader-driver/src/rotation_kernel.rs diff --git a/.claude/board/EPIPHANIES.md b/.claude/board/EPIPHANIES.md index 3feb2932..58fef3f2 100644 --- a/.claude/board/EPIPHANIES.md +++ b/.claude/board/EPIPHANIES.md @@ -65,6 +65,82 @@ stay as historical references. ## Entries (reverse chronological) +## 2026-04-20 — Thinking styles ARE codecs over the semantic field (north star) + +**Status:** FINDING (forward-looking deposit — not a current work item; reference when Phase 5+ generalises) + +A codec compresses tensor content into fingerprints; a thinking style +compresses reasoning trajectories into NARS-revised beliefs. Same +underlying operation — structure-preserving compression on a binary +Hamming substrate. Different input/output domains, same substrate +guarantees (E-SUBSTRATE-1, I-SUBSTRATE-MARKOV), same compile-and-swap +machinery. + +**The codec infrastructure IS the template for production-grade +thinking tissue.** When Phase 5+ activates: + +| Codec (shipped D0.1–D1.2, D1.1b queued) | Thinking-style analog | +|---|---| +| `CodecParams` | `ThinkingStyleParams { style, modulation_7d, nars_priors, fallback_chain, sigma_priority, semiring_choice }` | +| `kernel_signature()` — excludes runtime drift | `style_signature()` — excludes per-cycle modulation drift | +| `CodecKernelCache` | `ThinkingStyleKernelCache` — same generic scaffold | +| JIT kernel = Cranelift-compiled decode | JIT kernel = compiled scan-walk on 36-node topology (already shipped ndarray-side via `scan_jit.rs` + `ScanParams`) | +| **Token agreement** (I11 cert gate) | **Conclusion agreement** — same NARS-revised conclusions as reference style? | +| Sweep grid = N codec candidates | Sweep grid = N (style × modulation × NARS fallback) candidates | +| `/v1/shader/calibrate` | `/v1/shader/think-calibrate` | +| `[FORMAL-SCAFFOLD]` 5 pillars | **Same scaffold** — E-SUBSTRATE-1 covers any transition under bundle | + +**Generalisation isn't "port codec pattern to thinking"** — it's +recognising thinking styles as a SPECIAL CASE of the codec pattern we +just built. When Phase 5+ lands, `WireThinkCalibrate` + +`ThinkingStyleKernelCache` + `conclusion_agreement` metric drop in +alongside the codec versions. Same JIT engine, same tests, same +board-hygiene discipline. + +**The phrase "production-grade thinking tissue"** names the telos +cleanly: once codec infra is at Phase 3 token-agreement pass rates, +cloning to thinking styles yields production-grade swappable +reasoning — YAML-configured, JIT-compiled, sweep-certified. No +rebuild per new style, no black box, signature-keyed reproducibility. + +**Cross-ref:** D0.6 `CodecParams` (the parameter-shape template); +D1.1 `CodecKernelCache` (the cache pattern — generic-over-H is the +wedge for reuse); I5 (thinking IS an AdjacencyStore — already +topologically unified with data graph); codec-sweep-via-lab-infra-v1. + +--- + +## 2026-04-20 — D1.2 Hadamard is pure-Rust, not a JIT-necessary primitive + +**Status:** FINDING + +D1.2's HadamardRotation is implemented as a plain Rust in-place +Sylvester butterfly (O(N log N) add/sub, no allocations). It does NOT +need JIT compilation or Cranelift code emission because: + +1. **Fixed shape** — the butterfly structure is identical across all + power-of-two dims. Rust's compiler (under `target-cpu=x86-64-v4`) + already emits AVX-512 add/sub from the straight-line loop. +2. **Not matmul** — Hadamard is a pattern of adds and subtracts, + never a dot product. Per Rule C polyfill hierarchy, matmul-heavy + paths benefit from AMX (Tier 1); add/sub stays at Tier 3 F32x16. + AMX gives no speedup here — confirmed in plan Appendix §12 C. + +**Consequence for D1.1b (Cranelift wiring):** only OPQ rotation needs +the JIT path — it's the one that's actually a learned matmul. The +Cranelift integration scope narrows: we don't need to JIT-compile +Identity (no-op) or Hadamard (butterfly); just OPQ (matmul) and the +main codec decode loop (ADC distance with palette lookup). + +This reduces D1.1b scope by maybe 30-40% — fewer kernel shapes to +emit, only the ones that actually benefit. + +Cross-ref: D1.2 `rotation_kernel.rs::HadamardRotation`; Rule C +(polyfill hierarchy); plan Appendix B (CartanCascade harmonic +compression ratios rely on real Hadamard, so this matters). + +--- + ## 2026-04-20 — CORRECTION to D1.1 scaffold: ndarray::hpc::jitson_cranelift already ships JitEngine **Status:** FINDING / CORRECTION diff --git a/.claude/board/STATUS_BOARD.md b/.claude/board/STATUS_BOARD.md index 9fd1ed7a..17e81136 100644 --- a/.claude/board/STATUS_BOARD.md +++ b/.claude/board/STATUS_BOARD.md @@ -63,7 +63,7 @@ afterwards is a JIT kernel, not a rebuild. Plan path: |---|---|---|---| | D1.1 | `CodecKernelCache` — structural cache layer (generic over handle) | **In PR** | branch — `CodecKernelCache` + `StubKernel` + `get_or_compile` / `try_get_or_compile` with RwLock concurrent-safe double-check + compile/hit/ratio counters + 9 tests. Scaffold ships NOW; D1.1b Cranelift IR emission follows. | | D1.1b | Adapter: `CodecKernelEngine` wrapping `ndarray::hpc::jitson_cranelift::JitEngine` with two-phase BUILD/RUN lifecycle (Arc-freeze). CodecParams → CodecScanParams adapter + codec-specific IR emission in jitson_cranelift/scan_jit analog | **Queued** | target ~250 LOC; `JitEngine` already ships (`/home/user/ndarray/src/hpc/jitson_cranelift/engine.rs`); the work is the CodecParams adapter + codec-specific JITSON template | -| D1.2 | Rotation primitives: Identity / Hadamard / OPQ as JIT kernels | **Queued** | target ~190 LOC | +| D1.2 | Rotation primitives: Identity / Hadamard / OPQ as `RotationKernel` impls | **In PR** | branch — `RotationKernel` trait (Send+Sync+Debug, object-safe) + `IdentityRotation` (no-op) + `HadamardRotation` (real Sylvester butterfly, O(N log N) in-place, norm²-scaling verified) + `OpqRotationStub` (matrix-blob-id placeholder for D1.1b) + `build(&Rotation, dim)` factory + `RotationError` typed errors + 15 tests. Hadamard stays at Tier-3 F32x16 (add/sub, not matmul → no AMX benefit per Rule C). | | D1.3 | Residual PQ via JIT composition | **Queued** | target ~150 LOC | ### Phase 2 — Token-agreement harness (I11 cert gate) — Queued diff --git a/crates/cognitive-shader-driver/src/lib.rs b/crates/cognitive-shader-driver/src/lib.rs index e944ce08..7cb3cdbe 100644 --- a/crates/cognitive-shader-driver/src/lib.rs +++ b/crates/cognitive-shader-driver/src/lib.rs @@ -125,6 +125,12 @@ pub mod auto_detect; #[cfg(feature = "serve")] pub mod codec_kernel_cache; +// D1.2 — rotation primitives (Identity / Hadamard / OPQ-stub). LAB-ONLY. +// Hadamard is real (in-place butterfly); OPQ is stub pending D1.1b's +// ndarray::hpc::jitson_cranelift::JitEngine adapter + matrix-blob loader. +#[cfg(feature = "serve")] +pub mod rotation_kernel; + // Axum REST server. LAB-ONLY. #[cfg(feature = "serve")] pub mod serve; diff --git a/crates/cognitive-shader-driver/src/rotation_kernel.rs b/crates/cognitive-shader-driver/src/rotation_kernel.rs new file mode 100644 index 00000000..d3576b10 --- /dev/null +++ b/crates/cognitive-shader-driver/src/rotation_kernel.rs @@ -0,0 +1,380 @@ +//! **LAB-ONLY.** D1.2 — rotation primitives as `RotationKernel` +//! implementations. +//! +//! Three variants matching `lance_graph_contract::cam::Rotation`: +//! +//! - **Identity** — no-op; zero-overhead pass-through. `signature()` only +//! depends on dim so the JIT cache hit is trivial. +//! - **Hadamard** — real Sylvester butterfly in-place, `O(N log N)` add/sub +//! operations. No JIT needed — the butterfly is a fixed-shape kernel and +//! plain Rust compiles to AVX-512 under `target-cpu=x86-64-v4`. +//! Per Rule C: Hadamard stays at Tier-3 F32x16 because it's add/sub, +//! not matmul — AMX adds no value here (confirmed in plan appendix §12). +//! - **OPQ** — learned rotation matmul; placeholder stub. Real impl +//! plugs into `ndarray::hpc::jitson_cranelift::JitEngine` via the +//! D1.1b `CodecKernelEngine` adapter and uses AMX tile_dpbf16ps when +//! `amx_available()`. +//! +//! Per ndarray/.claude/rules/data-flow.md: in-place `&mut [f32]` slice; +//! no heap allocations inside rotation; computation paths never mutate +//! `self` — the `RotationKernel` trait's `&self` receiver is load-bearing. + +use lance_graph_contract::cam::Rotation; +use std::collections::hash_map::DefaultHasher; +use std::hash::{Hash, Hasher}; + +/// Error produced when a rotation cannot be applied — dimensional +/// mismatch, non-power-of-two for Hadamard, or missing OPQ matrix. +#[derive(Debug, Clone, PartialEq, Eq)] +pub enum RotationError { + /// Input slice length does not match the kernel's declared dim. + DimMismatch { expected: usize, actual: usize }, + /// Hadamard dim must be a power of two (Sylvester construction). + HadamardNotPow2 { dim: u32 }, + /// OPQ rotation matrix not loaded (stub path). + OpqMatrixNotLoaded { matrix_blob_id: u64 }, +} + +impl std::fmt::Display for RotationError { + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { + match self { + Self::DimMismatch { expected, actual } => { + write!(f, "rotation input dim mismatch: expected {expected}, got {actual}") + } + Self::HadamardNotPow2 { dim } => { + write!(f, "Hadamard dim must be power of two, got {dim}") + } + Self::OpqMatrixNotLoaded { matrix_blob_id } => { + write!(f, "OPQ rotation matrix blob {matrix_blob_id:#x} not loaded") + } + } + } +} + +impl std::error::Error for RotationError {} + +/// A compiled rotation kernel. +/// +/// Implementors run the rotation in-place on a `&mut [f32]` slice. +/// The trait is object-safe so callers can hold a `Box` +/// when the variant is chosen at runtime from a `CodecParams::pre_rotation`. +pub trait RotationKernel: Send + Sync + std::fmt::Debug { + /// Apply the rotation in place. Contract: modifies `vec` in-place; + /// returns `Err` on dim mismatch, never on a valid call shape. + fn apply(&self, vec: &mut [f32]) -> Result<(), RotationError>; + + /// Declared input dimension. Used by the cache-signature computation + /// and by the `CodecKernelCache` key (distinct dims → distinct kernels). + fn dim(&self) -> u32; + + /// Stable hash over the kernel's identity — used as part of + /// `CodecParams::kernel_signature()` so the cache keys cleanly. + fn signature(&self) -> u64; + + /// Backend tier label for the SIMD dispatch trace — "avx512" for + /// identity/Hadamard on x86_64-v4, "amx" for OPQ when AMX is live, + /// "stub" for OPQ without a loaded matrix. Never "scalar" — iron rule. + fn backend(&self) -> &'static str; +} + +/// Build a boxed kernel from a `Rotation` enum + input dim. +/// +/// This is the factory the JIT cache's compile closure calls: +/// `cache.get_or_compile(params, || build(params.pre_rotation, d)?)`. +pub fn build(rotation: &Rotation, dim: u32) -> Result, RotationError> { + match rotation { + Rotation::Identity => Ok(Box::new(IdentityRotation { dim })), + Rotation::Hadamard { dim: h_dim } => { + // Respect the rotation's declared dim — caller must size to match. + if *h_dim != dim { + return Err(RotationError::DimMismatch { + expected: *h_dim as usize, + actual: dim as usize, + }); + } + if *h_dim == 0 || !h_dim.is_power_of_two() { + return Err(RotationError::HadamardNotPow2 { dim: *h_dim }); + } + Ok(Box::new(HadamardRotation { dim: *h_dim })) + } + Rotation::Opq { matrix_blob_id, dim: o_dim } => { + if *o_dim != dim { + return Err(RotationError::DimMismatch { + expected: *o_dim as usize, + actual: dim as usize, + }); + } + // Stub — D1.1b wires the real matrix load through + // ndarray::hpc::jitson_cranelift::JitEngine + tile_dpbf16ps. + Ok(Box::new(OpqRotationStub { + matrix_blob_id: *matrix_blob_id, + dim: *o_dim, + })) + } + } +} + +// ─── Identity ──────────────────────────────────────────────────────────── + +/// Zero-overhead pass-through rotation. `apply()` is a no-op. +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub struct IdentityRotation { + pub dim: u32, +} + +impl RotationKernel for IdentityRotation { + fn apply(&self, vec: &mut [f32]) -> Result<(), RotationError> { + if vec.len() != self.dim as usize { + return Err(RotationError::DimMismatch { + expected: self.dim as usize, + actual: vec.len(), + }); + } + // No-op. + Ok(()) + } + + fn dim(&self) -> u32 { self.dim } + + fn signature(&self) -> u64 { + let mut h = DefaultHasher::new(); + "identity".hash(&mut h); + self.dim.hash(&mut h); + h.finish() + } + + fn backend(&self) -> &'static str { "avx512" } +} + +// ─── Hadamard (Sylvester butterfly) ────────────────────────────────────── + +/// Sylvester Hadamard transform via in-place butterfly. +/// +/// For dim `N = 2^k`, the Sylvester Hadamard matrix `H_N` satisfies +/// `H_N · H_N^T = N · I`. We apply `H_N` in-place using the classic +/// butterfly algorithm: `log2(N)` stages, each swapping pairs of elements +/// at stride `2^stage` with `(a, b) → (a+b, a-b)`. +/// +/// Complexity: `O(N log N)` add/sub operations. No allocations. +/// No AMX benefit (Rule C) — Hadamard is butterfly add/sub, not matmul, +/// so it stays at Tier-3 F32x16 (AVX-512 baseline). +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub struct HadamardRotation { + pub dim: u32, +} + +impl RotationKernel for HadamardRotation { + fn apply(&self, vec: &mut [f32]) -> Result<(), RotationError> { + let n = self.dim as usize; + if vec.len() != n { + return Err(RotationError::DimMismatch { expected: n, actual: vec.len() }); + } + if n == 0 || !n.is_power_of_two() { + return Err(RotationError::HadamardNotPow2 { dim: self.dim }); + } + // In-place Sylvester butterfly. `stride` doubles each stage. + let mut stride = 1usize; + while stride < n { + let mut i = 0; + while i < n { + for j in 0..stride { + let a_idx = i + j; + let b_idx = i + j + stride; + let a = vec[a_idx]; + let b = vec[b_idx]; + vec[a_idx] = a + b; + vec[b_idx] = a - b; + } + i += stride * 2; + } + stride *= 2; + } + Ok(()) + } + + fn dim(&self) -> u32 { self.dim } + + fn signature(&self) -> u64 { + let mut h = DefaultHasher::new(); + "hadamard".hash(&mut h); + self.dim.hash(&mut h); + h.finish() + } + + fn backend(&self) -> &'static str { "avx512" } +} + +// ─── OPQ (stub — real impl plugs JIT engine in D1.1b) ──────────────────── + +/// OPQ learned rotation matmul — stub. `apply()` returns +/// `OpqMatrixNotLoaded`. +/// +/// The real implementation loads the rotation matrix from a Lance blob +/// column (one-time per `matrix_blob_id`) and applies it via +/// `ndarray::hpc::amx_matmul::tile_dpbf16ps` when +/// `ndarray::simd_amx::amx_available()` (Tier-1), falling through to +/// VNNI (Tier-2) or F32x16 matmul (Tier-3) per the polyfill hierarchy. +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub struct OpqRotationStub { + pub matrix_blob_id: u64, + pub dim: u32, +} + +impl RotationKernel for OpqRotationStub { + fn apply(&self, vec: &mut [f32]) -> Result<(), RotationError> { + if vec.len() != self.dim as usize { + return Err(RotationError::DimMismatch { + expected: self.dim as usize, + actual: vec.len(), + }); + } + // Stub — no matrix loaded yet. + Err(RotationError::OpqMatrixNotLoaded { matrix_blob_id: self.matrix_blob_id }) + } + + fn dim(&self) -> u32 { self.dim } + + fn signature(&self) -> u64 { + let mut h = DefaultHasher::new(); + "opq".hash(&mut h); + self.matrix_blob_id.hash(&mut h); + self.dim.hash(&mut h); + h.finish() + } + + fn backend(&self) -> &'static str { "stub" } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn identity_rotation_is_noop() { + let r = IdentityRotation { dim: 8 }; + let mut v = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]; + let before = v.clone(); + r.apply(&mut v).unwrap(); + assert_eq!(v, before); + assert_eq!(r.backend(), "avx512"); + } + + #[test] + fn identity_rotation_rejects_dim_mismatch() { + let r = IdentityRotation { dim: 8 }; + let mut v = vec![0.0; 16]; + let err = r.apply(&mut v).unwrap_err(); + assert!(matches!(err, RotationError::DimMismatch { expected: 8, actual: 16 })); + } + + #[test] + fn hadamard_orthogonality_property_n4() { + // H_4 applied to [1,0,0,0] produces [1,1,1,1] (first column of H_4). + let r = HadamardRotation { dim: 4 }; + let mut v = vec![1.0, 0.0, 0.0, 0.0]; + r.apply(&mut v).unwrap(); + assert_eq!(v, vec![1.0, 1.0, 1.0, 1.0]); + } + + #[test] + fn hadamard_n8_applied_twice_scales_by_n() { + // H · H = n · I ⇒ applying twice multiplies every element by n. + let r = HadamardRotation { dim: 8 }; + let input = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]; + let mut v = input.clone(); + r.apply(&mut v).unwrap(); + r.apply(&mut v).unwrap(); + let n = 8.0; + for (a, b) in v.iter().zip(input.iter()) { + assert!((a - n * b).abs() < 1e-4, "expected {} got {}", n * b, a); + } + } + + #[test] + fn hadamard_rejects_non_pow2_dim() { + let r = HadamardRotation { dim: 6 }; + let mut v = vec![0.0; 6]; + let err = r.apply(&mut v).unwrap_err(); + assert!(matches!(err, RotationError::HadamardNotPow2 { dim: 6 })); + } + + #[test] + fn hadamard_preserves_norm_squared_up_to_scale() { + // ‖Hx‖² = n ‖x‖² for Sylvester Hadamard. + let r = HadamardRotation { dim: 16 }; + let input: Vec = (0..16).map(|i| (i + 1) as f32).collect(); + let norm_sq_in: f32 = input.iter().map(|x| x * x).sum(); + let mut v = input.clone(); + r.apply(&mut v).unwrap(); + let norm_sq_out: f32 = v.iter().map(|x| x * x).sum(); + let expected = 16.0 * norm_sq_in; + let rel_err = (norm_sq_out - expected).abs() / expected; + assert!(rel_err < 1e-5, "norm² out {norm_sq_out} vs expected {expected}"); + } + + #[test] + fn opq_stub_returns_matrix_not_loaded() { + let r = OpqRotationStub { matrix_blob_id: 0xDEAD_BEEF, dim: 4096 }; + let mut v = vec![0.0; 4096]; + let err = r.apply(&mut v).unwrap_err(); + assert!(matches!(err, RotationError::OpqMatrixNotLoaded { matrix_blob_id: 0xDEAD_BEEF })); + assert_eq!(r.backend(), "stub"); + } + + #[test] + fn build_identity() { + let k = build(&Rotation::Identity, 256).unwrap(); + assert_eq!(k.dim(), 256); + assert_eq!(k.backend(), "avx512"); + } + + #[test] + fn build_hadamard() { + let k = build(&Rotation::Hadamard { dim: 4096 }, 4096).unwrap(); + assert_eq!(k.dim(), 4096); + assert_eq!(k.backend(), "avx512"); + } + + #[test] + fn build_hadamard_rejects_mismatched_dim() { + let err = build(&Rotation::Hadamard { dim: 4096 }, 2048).unwrap_err(); + assert!(matches!(err, RotationError::DimMismatch { expected: 4096, actual: 2048 })); + } + + #[test] + fn build_hadamard_rejects_non_pow2() { + let err = build(&Rotation::Hadamard { dim: 100 }, 100).unwrap_err(); + assert!(matches!(err, RotationError::HadamardNotPow2 { dim: 100 })); + } + + #[test] + fn build_opq_returns_stub() { + let k = build(&Rotation::Opq { matrix_blob_id: 42, dim: 4096 }, 4096).unwrap(); + assert_eq!(k.dim(), 4096); + assert_eq!(k.backend(), "stub"); + } + + #[test] + fn kernel_signatures_are_distinct_across_variants() { + let id = IdentityRotation { dim: 256 }; + let had = HadamardRotation { dim: 256 }; + let opq = OpqRotationStub { matrix_blob_id: 1, dim: 256 }; + assert_ne!(id.signature(), had.signature()); + assert_ne!(id.signature(), opq.signature()); + assert_ne!(had.signature(), opq.signature()); + } + + #[test] + fn kernel_signatures_stable_for_same_shape() { + let a = HadamardRotation { dim: 4096 }; + let b = HadamardRotation { dim: 4096 }; + assert_eq!(a.signature(), b.signature()); + } + + #[test] + fn opq_signature_depends_on_matrix_blob_id() { + let a = OpqRotationStub { matrix_blob_id: 1, dim: 4096 }; + let b = OpqRotationStub { matrix_blob_id: 2, dim: 4096 }; + assert_ne!(a.signature(), b.signature()); + } +} From aad6e6a9e067f87b38ea82252d81a3959f555d1b Mon Sep 17 00:00:00 2001 From: Claude Date: Mon, 20 Apr 2026 23:32:28 +0000 Subject: [PATCH 2/2] epiphanies: resolution ladder + shader/engine boundary + thinking-tissue taxonomy MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Three related forward-looking deposits from this session's strategic thread (codec IS thinking at scale → thinking-styles are codecs over the semantic field → cognitive shader vs thinking engine boundary): 1. Thinking styles ARE codecs over the semantic field (north star). The codec infrastructure IS the template for production-grade thinking tissue. Codec → thinking-style mapping: CodecParams ↔ ThinkingStyleParams kernel_signature ↔ style_signature CodecKernelCache ↔ ThinkingStyleKernelCache token_agreement ↔ conclusion_agreement Generalisation isn't porting — it's recognising thinking styles as a SPECIAL CASE of the codec pattern. 2. Resolution ladder 64×64 > 256×257 >> 4096×4096 > 16k (user-named). The 5-layer stack is a resolution ladder, not a layer cake: 64×64 — p64 topology mask (HEEL) 256×257 — bgz17 palette distance (HIP) 4096×4096 — cross-vocab / cross-context (BRANCH/TWIG) 16 K — Fingerprint<256> identity (LEAF) The `>>` between 256×257 and 4096×4096 is the big jump — where palette-level meets vocabulary-level. Each JIT targets its own resolution, no overlap. p64::CognitiveShader operates at coarsest (64×64); codec-sweep D1.x at finest (16k); they compose in cognitive-shader-driver::ShaderDriver. p64 double-check: architecturally clean, no reimplementation in my work. 3. Shader vs engine: statelessness is the boundary. Cognitive shader = stateless atomic compute (eye — reports current frame, no memory). Thinking engine = stateful orchestrator (mind — assembles frames into narrative, carries persona/qualia/world_model across cycles). engine_bridge.rs is the seam. Codec-flexibility-as-thinking lands at the ENGINE level, not shader level — D5/Phase 5+ drops into thinking-engine mid-layer. All three epiphanies are forward-looking deposits (not current work items). They clarify where future work lands when codec-sweep Phase 1 completes and Phase 5+ generalises. Cross-references: - I10 HEEL/HIP/BRANCH/TWIG/LEAF (LATEST_STATE.md) - I5 thinking IS AdjacencyStore - p64_bridge::cognitive_shader::CognitiveShader (64×64 cascade) - thinking-engine crate structure (CLAUDE.md) https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh --- .claude/board/EPIPHANIES.md | 98 +++++++++++++++++++++++++++++++++++++ 1 file changed, 98 insertions(+) diff --git a/.claude/board/EPIPHANIES.md b/.claude/board/EPIPHANIES.md index 58fef3f2..9a18aba2 100644 --- a/.claude/board/EPIPHANIES.md +++ b/.claude/board/EPIPHANIES.md @@ -65,6 +65,104 @@ stay as historical references. ## Entries (reverse chronological) +## 2026-04-20 — Shader vs engine: statelessness is the boundary + +**Status:** FINDING (sharpens the three-level taxonomy) + +**Cognitive shader** = stateless atomic compute. Given `ShaderDispatch` ++ `BindSpace` columns, returns `ShaderHit`s + `MetaWord`. Knows nothing +of why it fires. Output is one-cycle-wide, no history. + +**Thinking engine** = stateful orchestrator. Calls `shader.dispatch()` +many times per cognitive cycle; composes per-lens hits into +persona/qualia/world_model/ghost state; revises beliefs for the next +cycle. The cognitive stack IS the state. + +**The engine_bridge is where they meet** — +`cognitive-shader-driver/src/engine_bridge.rs` is the seam. Shader +side: `ShaderDriver::dispatch` stateless. Engine side: +`cognitive_stack::cycle` accumulates dispatches through +`bf16_engine` / `signed_engine` / `composite_engine` / `dual_engine` / +`layered` / `domino`, folds into persona/qualia, emits state for next +cycle. + +**Analogy:** shader = eye (no memory, reports the current frame); +engine = mind (memory, assembles frames into narrative, counterfactually +imagines alternatives). + +**Where codec-flexibility-as-thinking lands:** the **engine** level, +not the shader level. A "new thinking style" = a new engine +configuration (lens composition, persona, qualia-update rule) that +picks DIFFERENT shader configs per cycle. Shader stays the same; the +engine's orchestration changes. That's why Phase 5+ "production-grade +thinking tissue" drops into mid (engine), not L2 (shader). + +**Concrete Phase 1-5 shipping:** codec-sweep D1.x work = shader layer +(tensor decode primitives). Engine-level codec-flexibility (swap +lenses via YAML) = D5 / Phase 5+, plugging INTO the codec infrastructure. + +Cross-ref: three-level taxonomy above; resolution-ladder entry +`64×64 > 256×257 >> 4096×4096 > 16k`; `engine_bridge.rs` seam. + +--- + +## 2026-04-20 — Resolution hierarchy: `64×64 > 256×257 >> 4096×4096 > 16k` (user-named) + +**Status:** FINDING (capstone of the three-level taxonomy from earlier this session) + +The 5-layer stack is a **resolution ladder**, not a layer cake. Each +level operates at its own granularity and has its own "shader" / +"kernel cache" / "distance table" at that scale: + +| Size | Role | Where | HHTL stage (I10) | +|---|---|---|---| +| **64×64** | p64 topology mask — 8 predicate planes × 64 rows × u64 — "which archetype blocks relate via predicate z" | `p64_bridge::cognitive_shader::CognitiveShader` | HEEL (coarse basin) | +| **256×257** | bgz17 palette distance table — 256 archetypes × 256 + 1 sentinel — O(1) lookup `semiring.distance(a, b)` | `bgz17::PaletteSemiring` | HIP (family sharpen) | +| **4096×4096** | Cross-vocabulary / cross-context correlation — COCA × COCA, or 4096 τ-prefix × 4096 slot space | ndarray `ScanParams` JIT (`jitson_cranelift`) | BRANCH / TWIG | +| **16 K** | Individual fingerprint bit identity — 16384-bit `Fingerprint<256>` | `ndarray::simd::Fingerprint<256>` + codec decoder (D1.x) | LEAF (exact member) | + +**The `>>` between 256×257 and 4096×4096 is the big jump** (~64×) +matching HIP → BRANCH refinement. That's where palette-level (one +row of the codebook) meets vocabulary-level (COCA 4096). Below that +jump, everything is O(1) table lookup; above it, JIT kernels become +worth the compile cost. + +**Each JIT targets its own resolution — no overlap:** + +- p64 cascade: 64×64 bitmask ops. Not JIT'd (bit tricks in hot loop + already optimal under AVX-512). +- bgz17 palette: 256×256 precomputed. Not JIT'd (memory-bound). +- ndarray ScanParams: 4096×4096 scan kernels. **JIT'd via + `jitson_cranelift::JitEngine`** — shipped. +- Codec kernels (D1.x): 16k bit-level tensor decode. **Will be JIT'd + via D1.1b `CodecKernelEngine` adapter**. Scaffold (D1.1) + rotation + primitives (D1.2) landed; Cranelift IR emission deferred to D1.1b. + +**Three-level taxonomy (from earlier this session) maps onto the +resolution ladder:** + +- **L2 small-precision cognitive shaders** (ns budget) → + 64×64 + 256×257 (p64 + bgz17 palette). Pure table lookups. +- **mid thinking-engine layers** (µs-ms) → + 4096×4096 (cross-vocab, persona-aware lens composition). JIT'd + scan kernels. +- **L4 thinking styles / NARS / JIT** (ms) → + orchestrates traversal ACROSS resolutions (starts at 64×64 cascade + to find candidates, narrows to 256×257 for family, drops to + 4096×4096 for context, verifies at 16k fingerprint identity). + +**p64::CognitiveShader double-check conclusion:** architecturally +clean. Operates at the coarsest (64×64) level; codec-sweep work at +finest (16k); they compose in `cognitive_shader_driver::ShaderDriver` +without overlap. Different layers of the ladder, different +operations, different JIT targets (if any). + +Cross-ref: I10 (HEEL/HIP/BRANCH/TWIG/LEAF); three-level taxonomy entry +above; `p64_bridge::cognitive_shader::CognitiveShader::cascade`; +D1.1 `CodecKernelCache`; D1.2 `RotationKernel`; bgz17 `PaletteSemiring`. + +--- + ## 2026-04-20 — Thinking styles ARE codecs over the semantic field (north star) **Status:** FINDING (forward-looking deposit — not a current work item; reference when Phase 5+ generalises)