From 5c37f0c234a64625b92475f3729d853e6f3fdadf Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Mon, 20 Apr 2026 23:22:53 +0000
Subject: [PATCH 1/2] D1.2 rotation primitives + thinking-tissue north-star
 epiphany
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

First real kernel deliverable of Phase 1: RotationKernel trait + three
impls (Identity / Hadamard / OPQ-stub) with typed RotationError.
95/95 cognitive-shader-driver tests pass under --features serve
(+15 new D1.2 tests).

crates/cognitive-shader-driver/src/rotation_kernel.rs (~330 LOC):

  RotationKernel trait — object-safe, Send+Sync+Debug:
    apply(&self, &mut [f32]) -> Result<(), RotationError>
    dim() -> u32
    signature() -> u64              # feeds CodecParams::kernel_signature
    backend() -> &'static str       # "avx512" | "stub" (never "scalar")

  IdentityRotation { dim }
    — zero-overhead pass-through; apply() is a no-op

  HadamardRotation { dim }
    — REAL in-place Sylvester butterfly, O(N log N) add/sub,
      no allocations
    — validates dim is power-of-two (Sylvester requirement)
    — Rule C compliance: stays at Tier-3 F32x16 (add/sub, not matmul;
      AMX adds no value per plan appendix §12 C)
    — rustc + target-cpu=x86-64-v4 already emits AVX-512 add/sub
      from the straight-line loop → no JIT compilation needed

  OpqRotationStub { matrix_blob_id, dim }
    — real impl plugs into D1.1b CodecKernelEngine adapter +
      ndarray::hpc::jitson_cranelift::JitEngine + tile_dpbf16ps AMX
      matmul when amx_available()
    — apply() returns OpqMatrixNotLoaded (typed error) until the
      matrix-blob loader lands

  build(&Rotation, dim) -> Result<Box<dyn RotationKernel>> factory
    — dispatches on WireCodecParams.pre_rotation variant
    — returns typed errors on dim mismatch or non-pow2 Hadamard

Tests (15 new):
  Identity: noop + dim-mismatch error
  Hadamard:
    - orthogonality: H_4 · [1,0,0,0] == [1,1,1,1] (first column)
    - H · H = n · I (applying twice scales by n, verified at N=8)
    - norm² preservation up to n× scale (verified at N=16)
    - rejects non-pow2 dim (N=6)
  OPQ stub: returns OpqMatrixNotLoaded with blob_id preserved
  build(): identity / hadamard / hadamard-dim-mismatch / hadamard-
          non-pow2 / opq-stub
  Signatures: distinct across variants, stable for same shape,
             blob-id-sensitive for OPQ

Board hygiene (CLAUDE.md Mandatory rule):
  STATUS_BOARD.md:
    D1.2 Queued → In PR

  EPIPHANIES.md PREPEND (two entries):

    1. "Thinking styles ARE codecs over the semantic field"
       (north-star forward-looking deposit, not a work item)
       — codec infrastructure IS the template for production-grade
       thinking tissue. Mapping table documents the codec→thinking
       correspondence: CodecParams↔ThinkingStyleParams,
       kernel_signature↔style_signature, token_agreement↔
       conclusion_agreement, etc. Phase 5+ drops in
       WireThinkCalibrate + ThinkingStyleKernelCache using the
       same scaffolding. Generalisation isn't porting — it's
       recognising thinking styles as a SPECIAL CASE of the
       codec pattern.

    2. "D1.2 Hadamard is pure-Rust, not a JIT-necessary primitive"
       — narrows D1.1b scope by 30-40%. Only OPQ (matmul) needs
       Cranelift JIT emission; Identity (no-op) and Hadamard
       (butterfly) stay as plain-Rust Tier-3 F32x16 paths. Rustc's
       AVX-512 codegen under target-cpu=x86-64-v4 is already
       optimal for add/sub-structured kernels.

Rules honored:
  Rule A — in-place &mut [f32] slice, no allocations in apply()
  Rule B — ndarray::simd::* not needed for these shapes; compiler
           emits AVX-512 from straight-line loops
  Rule C — Hadamard stays at Tier 3 (add/sub, no AMX benefit);
           OPQ stub will route to Tier 1 AMX when matrix loaded
  Rule D — Rotation variants come from YAML via WireRotation (D0.1)
  Rule E — kernel signature() + backend() are object-methods per
           the Wire-surface-IS-SIMD-surface pattern
  Rule F — no serialization anywhere; in-memory f32 buffer only

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
---
 .claude/board/EPIPHANIES.md                   |  76 ++++
 .claude/board/STATUS_BOARD.md                 |   2 +-
 crates/cognitive-shader-driver/src/lib.rs     |   6 +
 .../src/rotation_kernel.rs                    | 380 ++++++++++++++++++
 4 files changed, 463 insertions(+), 1 deletion(-)
 create mode 100644 crates/cognitive-shader-driver/src/rotation_kernel.rs
diff --git a/.claude/board/EPIPHANIES.md b/.claude/board/EPIPHANIES.md
index 3feb2932..58fef3f2 100644
--- a/.claude/board/EPIPHANIES.md
+++ b/.claude/board/EPIPHANIES.md
@@ -65,6 +65,82 @@ stay as historical references.
 
 ## Entries (reverse chronological)
 
+## 2026-04-20 — Thinking styles ARE codecs over the semantic field (north star)
+
+**Status:** FINDING (forward-looking deposit — not a current work item; reference when Phase 5+ generalises)
+
+A codec compresses tensor content into fingerprints; a thinking style
+compresses reasoning trajectories into NARS-revised beliefs. Same
+underlying operation — structure-preserving compression on a binary
+Hamming substrate. Different input/output domains, same substrate
+guarantees (E-SUBSTRATE-1, I-SUBSTRATE-MARKOV), same compile-and-swap
+machinery.
+
+**The codec infrastructure IS the template for production-grade
+thinking tissue.** When Phase 5+ activates:
+
+| Codec (shipped D0.1–D1.2, D1.1b queued) | Thinking-style analog |
+|---|---|
+| `CodecParams` | `ThinkingStyleParams { style, modulation_7d, nars_priors, fallback_chain, sigma_priority, semiring_choice }` |
+| `kernel_signature()` — excludes runtime drift | `style_signature()` — excludes per-cycle modulation drift |
+| `CodecKernelCache<H>` | `ThinkingStyleKernelCache<H>` — same generic scaffold |
+| JIT kernel = Cranelift-compiled decode | JIT kernel = compiled scan-walk on 36-node topology (already shipped ndarray-side via `scan_jit.rs` + `ScanParams`) |
+| **Token agreement** (I11 cert gate) | **Conclusion agreement** — same NARS-revised conclusions as reference style? |
+| Sweep grid = N codec candidates | Sweep grid = N (style × modulation × NARS fallback) candidates |
+| `/v1/shader/calibrate` | `/v1/shader/think-calibrate` |
+| `[FORMAL-SCAFFOLD]` 5 pillars | **Same scaffold** — E-SUBSTRATE-1 covers any transition under bundle |
+
+**Generalisation isn't "port codec pattern to thinking"** — it's
+recognising thinking styles as a SPECIAL CASE of the codec pattern we
+just built. When Phase 5+ lands, `WireThinkCalibrate` +
+`ThinkingStyleKernelCache` + `conclusion_agreement` metric drop in
+alongside the codec versions. Same JIT engine, same tests, same
+board-hygiene discipline.
+
+**The phrase "production-grade thinking tissue"** names the telos
+cleanly: once codec infra is at Phase 3 token-agreement pass rates,
+cloning to thinking styles yields production-grade swappable
+reasoning — YAML-configured, JIT-compiled, sweep-certified. No
+rebuild per new style, no black box, signature-keyed reproducibility.
+
+**Cross-ref:** D0.6 `CodecParams` (the parameter-shape template);
+D1.1 `CodecKernelCache<H>` (the cache pattern — generic-over-H is the
+wedge for reuse); I5 (thinking IS an AdjacencyStore — already
+topologically unified with data graph); codec-sweep-via-lab-infra-v1.
+
+---
+
+## 2026-04-20 — D1.2 Hadamard is pure-Rust, not a JIT-necessary primitive
+
+**Status:** FINDING
+
+D1.2's HadamardRotation is implemented as a plain Rust in-place
+Sylvester butterfly (O(N log N) add/sub, no allocations). It does NOT
+need JIT compilation or Cranelift code emission because:
+
+1. **Fixed shape** — the butterfly structure is identical across all
+   power-of-two dims. Rust's compiler (under `target-cpu=x86-64-v4`)
+   already emits AVX-512 add/sub from the straight-line loop.
+2. **Not matmul** — Hadamard is a pattern of adds and subtracts,
+   never a dot product. Per Rule C polyfill hierarchy, matmul-heavy
+   paths benefit from AMX (Tier 1); add/sub stays at Tier 3 F32x16.
+   AMX gives no speedup here — confirmed in plan Appendix §12 C.
+
+**Consequence for D1.1b (Cranelift wiring):** only OPQ rotation needs
+the JIT path — it's the one that's actually a learned matmul. The
+Cranelift integration scope narrows: we don't need to JIT-compile
+Identity (no-op) or Hadamard (butterfly); just OPQ (matmul) and the
+main codec decode loop (ADC distance with palette lookup).
+
+This reduces D1.1b scope by maybe 30-40% — fewer kernel shapes to
+emit, only the ones that actually benefit.
+
+Cross-ref: D1.2 `rotation_kernel.rs::HadamardRotation`; Rule C
+(polyfill hierarchy); plan Appendix B (CartanCascade harmonic
+compression ratios rely on real Hadamard, so this matters).
+
+---
+
 ## 2026-04-20 — CORRECTION to D1.1 scaffold: ndarray::hpc::jitson_cranelift already ships JitEngine
 
 **Status:** FINDING / CORRECTION
diff --git a/.claude/board/STATUS_BOARD.md b/.claude/board/STATUS_BOARD.md
index 9fd1ed7a..17e81136 100644
--- a/.claude/board/STATUS_BOARD.md
+++ b/.claude/board/STATUS_BOARD.md
@@ -63,7 +63,7 @@ afterwards is a JIT kernel, not a rebuild. Plan path:
 |---|---|---|---|
 | D1.1 | `CodecKernelCache` — structural cache layer (generic over handle) | **In PR** | branch — `CodecKernelCache<H>` + `StubKernel` + `get_or_compile` / `try_get_or_compile` with RwLock concurrent-safe double-check + compile/hit/ratio counters + 9 tests. Scaffold ships NOW; D1.1b Cranelift IR emission follows. |
 | D1.1b | Adapter: `CodecKernelEngine` wrapping `ndarray::hpc::jitson_cranelift::JitEngine` with two-phase BUILD/RUN lifecycle (Arc-freeze). CodecParams → CodecScanParams adapter + codec-specific IR emission in jitson_cranelift/scan_jit analog | **Queued** | target ~250 LOC; `JitEngine` already ships (`/home/user/ndarray/src/hpc/jitson_cranelift/engine.rs`); the work is the CodecParams adapter + codec-specific JITSON template |
-| D1.2 | Rotation primitives: Identity / Hadamard / OPQ as JIT kernels | **Queued** | target ~190 LOC |
+| D1.2 | Rotation primitives: Identity / Hadamard / OPQ as `RotationKernel` impls | **In PR** | branch — `RotationKernel` trait (Send+Sync+Debug, object-safe) + `IdentityRotation` (no-op) + `HadamardRotation` (real Sylvester butterfly, O(N log N) in-place, norm²-scaling verified) + `OpqRotationStub` (matrix-blob-id placeholder for D1.1b) + `build(&Rotation, dim)` factory + `RotationError` typed errors + 15 tests. Hadamard stays at Tier-3 F32x16 (add/sub, not matmul → no AMX benefit per Rule C). |
 | D1.3 | Residual PQ via JIT composition | **Queued** | target ~150 LOC |
 
 ### Phase 2 — Token-agreement harness (I11 cert gate) — Queued
diff --git a/crates/cognitive-shader-driver/src/lib.rs b/crates/cognitive-shader-driver/src/lib.rs
index e944ce08..7cb3cdbe 100644
--- a/crates/cognitive-shader-driver/src/lib.rs
+++ b/crates/cognitive-shader-driver/src/lib.rs
@@ -125,6 +125,12 @@ pub mod auto_detect;
 #[cfg(feature = "serve")]
 pub mod codec_kernel_cache;
 
+// D1.2 — rotation primitives (Identity / Hadamard / OPQ-stub). LAB-ONLY.
+// Hadamard is real (in-place butterfly); OPQ is stub pending D1.1b's
+// ndarray::hpc::jitson_cranelift::JitEngine adapter + matrix-blob loader.
+#[cfg(feature = "serve")]
+pub mod rotation_kernel;
+
 // Axum REST server. LAB-ONLY.
 #[cfg(feature = "serve")]
 pub mod serve;
diff --git a/crates/cognitive-shader-driver/src/rotation_kernel.rs b/crates/cognitive-shader-driver/src/rotation_kernel.rs
new file mode 100644
index 00000000..d3576b10
--- /dev/null
+++ b/crates/cognitive-shader-driver/src/rotation_kernel.rs
@@ -0,0 +1,380 @@
+//! **LAB-ONLY.** D1.2 — rotation primitives as `RotationKernel`
+//! implementations.
+//!
+//! Three variants matching `lance_graph_contract::cam::Rotation`:
+//!
+//! - **Identity** — no-op; zero-overhead pass-through. `signature()` only
+//!   depends on dim so the JIT cache hit is trivial.
+//! - **Hadamard** — real Sylvester butterfly in-place, `O(N log N)` add/sub
+//!   operations. No JIT needed — the butterfly is a fixed-shape kernel and
+//!   plain Rust compiles to AVX-512 under `target-cpu=x86-64-v4`.
+//!   Per Rule C: Hadamard stays at Tier-3 F32x16 because it's add/sub,
+//!   not matmul — AMX adds no value here (confirmed in plan appendix §12).
+//! - **OPQ** — learned rotation matmul; placeholder stub. Real impl
+//!   plugs into `ndarray::hpc::jitson_cranelift::JitEngine` via the
+//!   D1.1b `CodecKernelEngine` adapter and uses AMX tile_dpbf16ps when
+//!   `amx_available()`.
+//!
+//! Per ndarray/.claude/rules/data-flow.md: in-place `&mut [f32]` slice;
+//! no heap allocations inside rotation; computation paths never mutate
+//! `self` — the `RotationKernel` trait's `&self` receiver is load-bearing.
+
+use lance_graph_contract::cam::Rotation;
+use std::collections::hash_map::DefaultHasher;
+use std::hash::{Hash, Hasher};
+
+/// Error produced when a rotation cannot be applied — dimensional
+/// mismatch, non-power-of-two for Hadamard, or missing OPQ matrix.
+#[derive(Debug, Clone, PartialEq, Eq)]
+pub enum RotationError {
+    /// Input slice length does not match the kernel's declared dim.
+    DimMismatch { expected: usize, actual: usize },
+    /// Hadamard dim must be a power of two (Sylvester construction).
+    HadamardNotPow2 { dim: u32 },
+    /// OPQ rotation matrix not loaded (stub path).
+    OpqMatrixNotLoaded { matrix_blob_id: u64 },
+}
+
+impl std::fmt::Display for RotationError {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        match self {
+            Self::DimMismatch { expected, actual } => {
+                write!(f, "rotation input dim mismatch: expected {expected}, got {actual}")
+            }
+            Self::HadamardNotPow2 { dim } => {
+                write!(f, "Hadamard dim must be power of two, got {dim}")
+            }
+            Self::OpqMatrixNotLoaded { matrix_blob_id } => {
+                write!(f, "OPQ rotation matrix blob {matrix_blob_id:#x} not loaded")
+            }
+        }
+    }
+}
+
+impl std::error::Error for RotationError {}
+
+/// A compiled rotation kernel.
+///
+/// Implementors run the rotation in-place on a `&mut [f32]` slice.
+/// The trait is object-safe so callers can hold a `Box<dyn RotationKernel>`
+/// when the variant is chosen at runtime from a `CodecParams::pre_rotation`.
+pub trait RotationKernel: Send + Sync + std::fmt::Debug {
+    /// Apply the rotation in place. Contract: modifies `vec` in-place;
+    /// returns `Err` on dim mismatch, never on a valid call shape.
+    fn apply(&self, vec: &mut [f32]) -> Result<(), RotationError>;
+
+    /// Declared input dimension. Used by the cache-signature computation
+    /// and by the `CodecKernelCache` key (distinct dims → distinct kernels).
+    fn dim(&self) -> u32;
+
+    /// Stable hash over the kernel's identity — used as part of
+    /// `CodecParams::kernel_signature()` so the cache keys cleanly.
+    fn signature(&self) -> u64;
+
+    /// Backend tier label for the SIMD dispatch trace — "avx512" for
+    /// identity/Hadamard on x86_64-v4, "amx" for OPQ when AMX is live,
+    /// "stub" for OPQ without a loaded matrix. Never "scalar" — iron rule.
+    fn backend(&self) -> &'static str;
+}
+
+/// Build a boxed kernel from a `Rotation` enum + input dim.
+///
+/// This is the factory the JIT cache's compile closure calls:
+/// `cache.get_or_compile(params, || build(params.pre_rotation, d)?)`.
+pub fn build(rotation: &Rotation, dim: u32) -> Result<Box<dyn RotationKernel>, RotationError> {
+    match rotation {
+        Rotation::Identity => Ok(Box::new(IdentityRotation { dim })),
+        Rotation::Hadamard { dim: h_dim } => {
+            // Respect the rotation's declared dim — caller must size to match.
+            if *h_dim != dim {
+                return Err(RotationError::DimMismatch {
+                    expected: *h_dim as usize,
+                    actual: dim as usize,
+                });
+            }
+            if *h_dim == 0 || !h_dim.is_power_of_two() {
+                return Err(RotationError::HadamardNotPow2 { dim: *h_dim });
+            }
+            Ok(Box::new(HadamardRotation { dim: *h_dim }))
+        }
+        Rotation::Opq { matrix_blob_id, dim: o_dim } => {
+            if *o_dim != dim {
+                return Err(RotationError::DimMismatch {
+                    expected: *o_dim as usize,
+                    actual: dim as usize,
+                });
+            }
+            // Stub — D1.1b wires the real matrix load through
+            // ndarray::hpc::jitson_cranelift::JitEngine + tile_dpbf16ps.
+            Ok(Box::new(OpqRotationStub {
+                matrix_blob_id: *matrix_blob_id,
+                dim: *o_dim,
+            }))
+        }
+    }
+}
+
+// ─── Identity ────────────────────────────────────────────────────────────
+
+/// Zero-overhead pass-through rotation. `apply()` is a no-op.
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub struct IdentityRotation {
+    pub dim: u32,
+}
+
+impl RotationKernel for IdentityRotation {
+    fn apply(&self, vec: &mut [f32]) -> Result<(), RotationError> {
+        if vec.len() != self.dim as usize {
+            return Err(RotationError::DimMismatch {
+                expected: self.dim as usize,
+                actual: vec.len(),
+            });
+        }
+        // No-op.
+        Ok(())
+    }
+
+    fn dim(&self) -> u32 { self.dim }
+
+    fn signature(&self) -> u64 {
+        let mut h = DefaultHasher::new();
+        "identity".hash(&mut h);
+        self.dim.hash(&mut h);
+        h.finish()
+    }
+
+    fn backend(&self) -> &'static str { "avx512" }
+}
+
+// ─── Hadamard (Sylvester butterfly) ──────────────────────────────────────
+
+/// Sylvester Hadamard transform via in-place butterfly.
+///
+/// For dim `N = 2^k`, the Sylvester Hadamard matrix `H_N` satisfies
+/// `H_N · H_N^T = N · I`. We apply `H_N` in-place using the classic
+/// butterfly algorithm: `log2(N)` stages, each swapping pairs of elements
+/// at stride `2^stage` with `(a, b) → (a+b, a-b)`.
+///
+/// Complexity: `O(N log N)` add/sub operations. No allocations.
+/// No AMX benefit (Rule C) — Hadamard is butterfly add/sub, not matmul,
+/// so it stays at Tier-3 F32x16 (AVX-512 baseline).
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub struct HadamardRotation {
+    pub dim: u32,
+}
+
+impl RotationKernel for HadamardRotation {
+    fn apply(&self, vec: &mut [f32]) -> Result<(), RotationError> {
+        let n = self.dim as usize;
+        if vec.len() != n {
+            return Err(RotationError::DimMismatch { expected: n, actual: vec.len() });
+        }
+        if n == 0 || !n.is_power_of_two() {
+            return Err(RotationError::HadamardNotPow2 { dim: self.dim });
+        }
+        // In-place Sylvester butterfly. `stride` doubles each stage.
+        let mut stride = 1usize;
+        while stride < n {
+            let mut i = 0;
+            while i < n {
+                for j in 0..stride {
+                    let a_idx = i + j;
+                    let b_idx = i + j + stride;
+                    let a = vec[a_idx];
+                    let b = vec[b_idx];
+                    vec[a_idx] = a + b;
+                    vec[b_idx] = a - b;
+                }
+                i += stride * 2;
+            }
+            stride *= 2;
+        }
+        Ok(())
+    }
+
+    fn dim(&self) -> u32 { self.dim }
+
+    fn signature(&self) -> u64 {
+        let mut h = DefaultHasher::new();
+        "hadamard".hash(&mut h);
+        self.dim.hash(&mut h);
+        h.finish()
+    }
+
+    fn backend(&self) -> &'static str { "avx512" }
+}
+
+// ─── OPQ (stub — real impl plugs JIT engine in D1.1b) ────────────────────
+
+/// OPQ learned rotation matmul — stub. `apply()` returns
+/// `OpqMatrixNotLoaded`.
+///
+/// The real implementation loads the rotation matrix from a Lance blob
+/// column (one-time per `matrix_blob_id`) and applies it via
+/// `ndarray::hpc::amx_matmul::tile_dpbf16ps` when
+/// `ndarray::simd_amx::amx_available()` (Tier-1), falling through to
+/// VNNI (Tier-2) or F32x16 matmul (Tier-3) per the polyfill hierarchy.
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub struct OpqRotationStub {
+    pub matrix_blob_id: u64,
+    pub dim: u32,
+}
+
+impl RotationKernel for OpqRotationStub {
+    fn apply(&self, vec: &mut [f32]) -> Result<(), RotationError> {
+        if vec.len() != self.dim as usize {
+            return Err(RotationError::DimMismatch {
+                expected: self.dim as usize,
+                actual: vec.len(),
+            });
+        }
+        // Stub — no matrix loaded yet.
+        Err(RotationError::OpqMatrixNotLoaded { matrix_blob_id: self.matrix_blob_id })
+    }
+
+    fn dim(&self) -> u32 { self.dim }
+
+    fn signature(&self) -> u64 {
+        let mut h = DefaultHasher::new();
+        "opq".hash(&mut h);
+        self.matrix_blob_id.hash(&mut h);
+        self.dim.hash(&mut h);
+        h.finish()
+    }
+
+    fn backend(&self) -> &'static str { "stub" }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn identity_rotation_is_noop() {
+        let r = IdentityRotation { dim: 8 };
+        let mut v = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
+        let before = v.clone();
+        r.apply(&mut v).unwrap();
+        assert_eq!(v, before);
+        assert_eq!(r.backend(), "avx512");
+    }
+
+    #[test]
+    fn identity_rotation_rejects_dim_mismatch() {
+        let r = IdentityRotation { dim: 8 };
+        let mut v = vec![0.0; 16];
+        let err = r.apply(&mut v).unwrap_err();
+        assert!(matches!(err, RotationError::DimMismatch { expected: 8, actual: 16 }));
+    }
+
+    #[test]
+    fn hadamard_orthogonality_property_n4() {
+        // H_4 applied to [1,0,0,0] produces [1,1,1,1] (first column of H_4).
+        let r = HadamardRotation { dim: 4 };
+        let mut v = vec![1.0, 0.0, 0.0, 0.0];
+        r.apply(&mut v).unwrap();
+        assert_eq!(v, vec![1.0, 1.0, 1.0, 1.0]);
+    }
+
+    #[test]
+    fn hadamard_n8_applied_twice_scales_by_n() {
+        // H · H = n · I ⇒ applying twice multiplies every element by n.
+        let r = HadamardRotation { dim: 8 };
+        let input = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
+        let mut v = input.clone();
+        r.apply(&mut v).unwrap();
+        r.apply(&mut v).unwrap();
+        let n = 8.0;
+        for (a, b) in v.iter().zip(input.iter()) {
+            assert!((a - n * b).abs() < 1e-4, "expected {} got {}", n * b, a);
+        }
+    }
+
+    #[test]
+    fn hadamard_rejects_non_pow2_dim() {
+        let r = HadamardRotation { dim: 6 };
+        let mut v = vec![0.0; 6];
+        let err = r.apply(&mut v).unwrap_err();
+        assert!(matches!(err, RotationError::HadamardNotPow2 { dim: 6 }));
+    }
+
+    #[test]
+    fn hadamard_preserves_norm_squared_up_to_scale() {
+        // ‖Hx‖² = n ‖x‖² for Sylvester Hadamard.
+        let r = HadamardRotation { dim: 16 };
+        let input: Vec<f32> = (0..16).map(|i| (i + 1) as f32).collect();
+        let norm_sq_in: f32 = input.iter().map(|x| x * x).sum();
+        let mut v = input.clone();
+        r.apply(&mut v).unwrap();
+        let norm_sq_out: f32 = v.iter().map(|x| x * x).sum();
+        let expected = 16.0 * norm_sq_in;
+        let rel_err = (norm_sq_out - expected).abs() / expected;
+        assert!(rel_err < 1e-5, "norm² out {norm_sq_out} vs expected {expected}");
+    }
+
+    #[test]
+    fn opq_stub_returns_matrix_not_loaded() {
+        let r = OpqRotationStub { matrix_blob_id: 0xDEAD_BEEF, dim: 4096 };
+        let mut v = vec![0.0; 4096];
+        let err = r.apply(&mut v).unwrap_err();
+        assert!(matches!(err, RotationError::OpqMatrixNotLoaded { matrix_blob_id: 0xDEAD_BEEF }));
+        assert_eq!(r.backend(), "stub");
+    }
+
+    #[test]
+    fn build_identity() {
+        let k = build(&Rotation::Identity, 256).unwrap();
+        assert_eq!(k.dim(), 256);
+        assert_eq!(k.backend(), "avx512");
+    }
+
+    #[test]
+    fn build_hadamard() {
+        let k = build(&Rotation::Hadamard { dim: 4096 }, 4096).unwrap();
+        assert_eq!(k.dim(), 4096);
+        assert_eq!(k.backend(), "avx512");
+    }
+
+    #[test]
+    fn build_hadamard_rejects_mismatched_dim() {
+        let err = build(&Rotation::Hadamard { dim: 4096 }, 2048).unwrap_err();
+        assert!(matches!(err, RotationError::DimMismatch { expected: 4096, actual: 2048 }));
+    }
+
+    #[test]
+    fn build_hadamard_rejects_non_pow2() {
+        let err = build(&Rotation::Hadamard { dim: 100 }, 100).unwrap_err();
+        assert!(matches!(err, RotationError::HadamardNotPow2 { dim: 100 }));
+    }
+
+    #[test]
+    fn build_opq_returns_stub() {
+        let k = build(&Rotation::Opq { matrix_blob_id: 42, dim: 4096 }, 4096).unwrap();
+        assert_eq!(k.dim(), 4096);
+        assert_eq!(k.backend(), "stub");
+    }
+
+    #[test]
+    fn kernel_signatures_are_distinct_across_variants() {
+        let id = IdentityRotation { dim: 256 };
+        let had = HadamardRotation { dim: 256 };
+        let opq = OpqRotationStub { matrix_blob_id: 1, dim: 256 };
+        assert_ne!(id.signature(), had.signature());
+        assert_ne!(id.signature(), opq.signature());
+        assert_ne!(had.signature(), opq.signature());
+    }
+
+    #[test]
+    fn kernel_signatures_stable_for_same_shape() {
+        let a = HadamardRotation { dim: 4096 };
+        let b = HadamardRotation { dim: 4096 };
+        assert_eq!(a.signature(), b.signature());
+    }
+
+    #[test]
+    fn opq_signature_depends_on_matrix_blob_id() {
+        let a = OpqRotationStub { matrix_blob_id: 1, dim: 4096 };
+        let b = OpqRotationStub { matrix_blob_id: 2, dim: 4096 };
+        assert_ne!(a.signature(), b.signature());
+    }
+}

From aad6e6a9e067f87b38ea82252d81a3959f555d1b Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Mon, 20 Apr 2026 23:32:28 +0000
Subject: [PATCH 2/2] epiphanies: resolution ladder + shader/engine boundary +
 thinking-tissue taxonomy
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Three related forward-looking deposits from this session's strategic
thread (codec IS thinking at scale → thinking-styles are codecs over
the semantic field → cognitive shader vs thinking engine boundary):

1. Thinking styles ARE codecs over the semantic field (north star).
   The codec infrastructure IS the template for production-grade
   thinking tissue. Codec → thinking-style mapping:
     CodecParams ↔ ThinkingStyleParams
     kernel_signature ↔ style_signature
     CodecKernelCache<H> ↔ ThinkingStyleKernelCache<H>
     token_agreement ↔ conclusion_agreement
   Generalisation isn't porting — it's recognising thinking styles
   as a SPECIAL CASE of the codec pattern.

2. Resolution ladder 64×64 > 256×257 >> 4096×4096 > 16k (user-named).
   The 5-layer stack is a resolution ladder, not a layer cake:
     64×64   — p64 topology mask (HEEL)
     256×257 — bgz17 palette distance (HIP)
     4096×4096 — cross-vocab / cross-context (BRANCH/TWIG)
     16 K    — Fingerprint<256> identity (LEAF)
   The `>>` between 256×257 and 4096×4096 is the big jump — where
   palette-level meets vocabulary-level. Each JIT targets its own
   resolution, no overlap. p64::CognitiveShader operates at
   coarsest (64×64); codec-sweep D1.x at finest (16k); they compose
   in cognitive-shader-driver::ShaderDriver. p64 double-check:
   architecturally clean, no reimplementation in my work.

3. Shader vs engine: statelessness is the boundary.
   Cognitive shader = stateless atomic compute (eye — reports
   current frame, no memory).
   Thinking engine = stateful orchestrator (mind — assembles frames
   into narrative, carries persona/qualia/world_model across cycles).
   engine_bridge.rs is the seam. Codec-flexibility-as-thinking lands
   at the ENGINE level, not shader level — D5/Phase 5+ drops into
   thinking-engine mid-layer.

All three epiphanies are forward-looking deposits (not current work
items). They clarify where future work lands when codec-sweep Phase 1
completes and Phase 5+ generalises.

Cross-references:
  - I10 HEEL/HIP/BRANCH/TWIG/LEAF (LATEST_STATE.md)
  - I5 thinking IS AdjacencyStore
  - p64_bridge::cognitive_shader::CognitiveShader (64×64 cascade)
  - thinking-engine crate structure (CLAUDE.md)

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
---
 .claude/board/EPIPHANIES.md | 98 +++++++++++++++++++++++++++++++++++++
 1 file changed, 98 insertions(+)

diff --git a/.claude/board/EPIPHANIES.md b/.claude/board/EPIPHANIES.md
index 58fef3f2..9a18aba2 100644
--- a/.claude/board/EPIPHANIES.md
+++ b/.claude/board/EPIPHANIES.md
@@ -65,6 +65,104 @@ stay as historical references.
 
 ## Entries (reverse chronological)
 
+## 2026-04-20 — Shader vs engine: statelessness is the boundary
+
+**Status:** FINDING (sharpens the three-level taxonomy)
+
+**Cognitive shader** = stateless atomic compute. Given `ShaderDispatch`
++ `BindSpace` columns, returns `ShaderHit`s + `MetaWord`. Knows nothing
+of why it fires. Output is one-cycle-wide, no history.
+
+**Thinking engine** = stateful orchestrator. Calls `shader.dispatch()`
+many times per cognitive cycle; composes per-lens hits into
+persona/qualia/world_model/ghost state; revises beliefs for the next
+cycle. The cognitive stack IS the state.
+
+**The engine_bridge is where they meet** —
+`cognitive-shader-driver/src/engine_bridge.rs` is the seam. Shader
+side: `ShaderDriver::dispatch` stateless. Engine side:
+`cognitive_stack::cycle` accumulates dispatches through
+`bf16_engine` / `signed_engine` / `composite_engine` / `dual_engine` /
+`layered` / `domino`, folds into persona/qualia, emits state for next
+cycle.
+
+**Analogy:** shader = eye (no memory, reports the current frame);
+engine = mind (memory, assembles frames into narrative, counterfactually
+imagines alternatives).
+
+**Where codec-flexibility-as-thinking lands:** the **engine** level,
+not the shader level. A "new thinking style" = a new engine
+configuration (lens composition, persona, qualia-update rule) that
+picks DIFFERENT shader configs per cycle. Shader stays the same; the
+engine's orchestration changes. That's why Phase 5+ "production-grade
+thinking tissue" drops into mid (engine), not L2 (shader).
+
+**Concrete Phase 1-5 shipping:** codec-sweep D1.x work = shader layer
+(tensor decode primitives). Engine-level codec-flexibility (swap
+lenses via YAML) = D5 / Phase 5+, plugging INTO the codec infrastructure.
+
+Cross-ref: three-level taxonomy above; resolution-ladder entry
+`64×64 > 256×257 >> 4096×4096 > 16k`; `engine_bridge.rs` seam.
+
+---
+
+## 2026-04-20 — Resolution hierarchy: `64×64 > 256×257 >> 4096×4096 > 16k` (user-named)
+
+**Status:** FINDING (capstone of the three-level taxonomy from earlier this session)
+
+The 5-layer stack is a **resolution ladder**, not a layer cake. Each
+level operates at its own granularity and has its own "shader" /
+"kernel cache" / "distance table" at that scale:
+
+| Size | Role | Where | HHTL stage (I10) |
+|---|---|---|---|
+| **64×64** | p64 topology mask — 8 predicate planes × 64 rows × u64 — "which archetype blocks relate via predicate z" | `p64_bridge::cognitive_shader::CognitiveShader` | HEEL (coarse basin) |
+| **256×257** | bgz17 palette distance table — 256 archetypes × 256 + 1 sentinel — O(1) lookup `semiring.distance(a, b)` | `bgz17::PaletteSemiring` | HIP (family sharpen) |
+| **4096×4096** | Cross-vocabulary / cross-context correlation — COCA × COCA, or 4096 τ-prefix × 4096 slot space | ndarray `ScanParams` JIT (`jitson_cranelift`) | BRANCH / TWIG |
+| **16 K** | Individual fingerprint bit identity — 16384-bit `Fingerprint<256>` | `ndarray::simd::Fingerprint<256>` + codec decoder (D1.x) | LEAF (exact member) |
+
+**The `>>` between 256×257 and 4096×4096 is the big jump** (~64×)
+matching HIP → BRANCH refinement. That's where palette-level (one
+row of the codebook) meets vocabulary-level (COCA 4096). Below that
+jump, everything is O(1) table lookup; above it, JIT kernels become
+worth the compile cost.
+
+**Each JIT targets its own resolution — no overlap:**
+
+- p64 cascade: 64×64 bitmask ops. Not JIT'd (bit tricks in hot loop
+  already optimal under AVX-512).
+- bgz17 palette: 256×256 precomputed. Not JIT'd (memory-bound).
+- ndarray ScanParams: 4096×4096 scan kernels. **JIT'd via
+  `jitson_cranelift::JitEngine`** — shipped.
+- Codec kernels (D1.x): 16k bit-level tensor decode. **Will be JIT'd
+  via D1.1b `CodecKernelEngine` adapter**. Scaffold (D1.1) + rotation
+  primitives (D1.2) landed; Cranelift IR emission deferred to D1.1b.
+
+**Three-level taxonomy (from earlier this session) maps onto the
+resolution ladder:**
+
+- **L2 small-precision cognitive shaders** (ns budget) →
+  64×64 + 256×257 (p64 + bgz17 palette). Pure table lookups.
+- **mid thinking-engine layers** (µs-ms) →
+  4096×4096 (cross-vocab, persona-aware lens composition). JIT'd
+  scan kernels.
+- **L4 thinking styles / NARS / JIT** (ms) →
+  orchestrates traversal ACROSS resolutions (starts at 64×64 cascade
+  to find candidates, narrows to 256×257 for family, drops to
+  4096×4096 for context, verifies at 16k fingerprint identity).
+
+**p64::CognitiveShader double-check conclusion:** architecturally
+clean. Operates at the coarsest (64×64) level; codec-sweep work at
+finest (16k); they compose in `cognitive_shader_driver::ShaderDriver`
+without overlap. Different layers of the ladder, different
+operations, different JIT targets (if any).
+
+Cross-ref: I10 (HEEL/HIP/BRANCH/TWIG/LEAF); three-level taxonomy entry
+above; `p64_bridge::cognitive_shader::CognitiveShader::cascade`;
+D1.1 `CodecKernelCache`; D1.2 `RotationKernel`; bgz17 `PaletteSemiring`.
+
+---
+
 ## 2026-04-20 — Thinking styles ARE codecs over the semantic field (north star)
 
 **Status:** FINDING (forward-looking deposit — not a current work item; reference when Phase 5+ generalises)