Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
174 changes: 174 additions & 0 deletions .claude/board/EPIPHANIES.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,180 @@ stay as historical references.

## Entries (reverse chronological)

## 2026-04-20 — Shader vs engine: statelessness is the boundary

**Status:** FINDING (sharpens the three-level taxonomy)

**Cognitive shader** = stateless atomic compute. Given `ShaderDispatch`
+ `BindSpace` columns, returns `ShaderHit`s + `MetaWord`. Knows nothing
of why it fires. Output is one-cycle-wide, no history.

**Thinking engine** = stateful orchestrator. Calls `shader.dispatch()`
many times per cognitive cycle; composes per-lens hits into
persona/qualia/world_model/ghost state; revises beliefs for the next
cycle. The cognitive stack IS the state.

**The engine_bridge is where they meet** —
`cognitive-shader-driver/src/engine_bridge.rs` is the seam. Shader
side: `ShaderDriver::dispatch` stateless. Engine side:
`cognitive_stack::cycle` accumulates dispatches through
`bf16_engine` / `signed_engine` / `composite_engine` / `dual_engine` /
`layered` / `domino`, folds into persona/qualia, emits state for next
cycle.

**Analogy:** shader = eye (no memory, reports the current frame);
engine = mind (memory, assembles frames into narrative, counterfactually
imagines alternatives).

**Where codec-flexibility-as-thinking lands:** the **engine** level,
not the shader level. A "new thinking style" = a new engine
configuration (lens composition, persona, qualia-update rule) that
picks DIFFERENT shader configs per cycle. Shader stays the same; the
engine's orchestration changes. That's why Phase 5+ "production-grade
thinking tissue" drops into mid (engine), not L2 (shader).

**Concrete Phase 1-5 shipping:** codec-sweep D1.x work = shader layer
(tensor decode primitives). Engine-level codec-flexibility (swap
lenses via YAML) = D5 / Phase 5+, plugging INTO the codec infrastructure.

Cross-ref: three-level taxonomy above; resolution-ladder entry
`64×64 > 256×257 >> 4096×4096 > 16k`; `engine_bridge.rs` seam.

---

## 2026-04-20 — Resolution hierarchy: `64×64 > 256×257 >> 4096×4096 > 16k` (user-named)

**Status:** FINDING (capstone of the three-level taxonomy from earlier this session)

The 5-layer stack is a **resolution ladder**, not a layer cake. Each
level operates at its own granularity and has its own "shader" /
"kernel cache" / "distance table" at that scale:

| Size | Role | Where | HHTL stage (I10) |
|---|---|---|---|
| **64×64** | p64 topology mask — 8 predicate planes × 64 rows × u64 — "which archetype blocks relate via predicate z" | `p64_bridge::cognitive_shader::CognitiveShader` | HEEL (coarse basin) |
| **256×257** | bgz17 palette distance table — 256 archetypes × 256 + 1 sentinel — O(1) lookup `semiring.distance(a, b)` | `bgz17::PaletteSemiring` | HIP (family sharpen) |
| **4096×4096** | Cross-vocabulary / cross-context correlation — COCA × COCA, or 4096 τ-prefix × 4096 slot space | ndarray `ScanParams` JIT (`jitson_cranelift`) | BRANCH / TWIG |
| **16 K** | Individual fingerprint bit identity — 16384-bit `Fingerprint<256>` | `ndarray::simd::Fingerprint<256>` + codec decoder (D1.x) | LEAF (exact member) |

**The `>>` between 256×257 and 4096×4096 is the big jump** (~64×)
matching HIP → BRANCH refinement. That's where palette-level (one
row of the codebook) meets vocabulary-level (COCA 4096). Below that
jump, everything is O(1) table lookup; above it, JIT kernels become
worth the compile cost.

**Each JIT targets its own resolution — no overlap:**

- p64 cascade: 64×64 bitmask ops. Not JIT'd (bit tricks in hot loop
already optimal under AVX-512).
- bgz17 palette: 256×256 precomputed. Not JIT'd (memory-bound).
- ndarray ScanParams: 4096×4096 scan kernels. **JIT'd via
`jitson_cranelift::JitEngine`** — shipped.
- Codec kernels (D1.x): 16k bit-level tensor decode. **Will be JIT'd
via D1.1b `CodecKernelEngine` adapter**. Scaffold (D1.1) + rotation
primitives (D1.2) landed; Cranelift IR emission deferred to D1.1b.

**Three-level taxonomy (from earlier this session) maps onto the
resolution ladder:**

- **L2 small-precision cognitive shaders** (ns budget) →
64×64 + 256×257 (p64 + bgz17 palette). Pure table lookups.
- **mid thinking-engine layers** (µs-ms) →
4096×4096 (cross-vocab, persona-aware lens composition). JIT'd
scan kernels.
- **L4 thinking styles / NARS / JIT** (ms) →
orchestrates traversal ACROSS resolutions (starts at 64×64 cascade
to find candidates, narrows to 256×257 for family, drops to
4096×4096 for context, verifies at 16k fingerprint identity).

**p64::CognitiveShader double-check conclusion:** architecturally
clean. Operates at the coarsest (64×64) level; codec-sweep work at
finest (16k); they compose in `cognitive_shader_driver::ShaderDriver`
without overlap. Different layers of the ladder, different
operations, different JIT targets (if any).

Cross-ref: I10 (HEEL/HIP/BRANCH/TWIG/LEAF); three-level taxonomy entry
above; `p64_bridge::cognitive_shader::CognitiveShader::cascade`;
D1.1 `CodecKernelCache`; D1.2 `RotationKernel`; bgz17 `PaletteSemiring`.

---

## 2026-04-20 — Thinking styles ARE codecs over the semantic field (north star)

**Status:** FINDING (forward-looking deposit — not a current work item; reference when Phase 5+ generalises)

A codec compresses tensor content into fingerprints; a thinking style
compresses reasoning trajectories into NARS-revised beliefs. Same
underlying operation — structure-preserving compression on a binary
Hamming substrate. Different input/output domains, same substrate
guarantees (E-SUBSTRATE-1, I-SUBSTRATE-MARKOV), same compile-and-swap
machinery.

**The codec infrastructure IS the template for production-grade
thinking tissue.** When Phase 5+ activates:

| Codec (shipped D0.1–D1.2, D1.1b queued) | Thinking-style analog |
|---|---|
| `CodecParams` | `ThinkingStyleParams { style, modulation_7d, nars_priors, fallback_chain, sigma_priority, semiring_choice }` |
| `kernel_signature()` — excludes runtime drift | `style_signature()` — excludes per-cycle modulation drift |
| `CodecKernelCache<H>` | `ThinkingStyleKernelCache<H>` — same generic scaffold |
| JIT kernel = Cranelift-compiled decode | JIT kernel = compiled scan-walk on 36-node topology (already shipped ndarray-side via `scan_jit.rs` + `ScanParams`) |
| **Token agreement** (I11 cert gate) | **Conclusion agreement** — same NARS-revised conclusions as reference style? |
| Sweep grid = N codec candidates | Sweep grid = N (style × modulation × NARS fallback) candidates |
| `/v1/shader/calibrate` | `/v1/shader/think-calibrate` |
| `[FORMAL-SCAFFOLD]` 5 pillars | **Same scaffold** — E-SUBSTRATE-1 covers any transition under bundle |

**Generalisation isn't "port codec pattern to thinking"** — it's
recognising thinking styles as a SPECIAL CASE of the codec pattern we
just built. When Phase 5+ lands, `WireThinkCalibrate` +
`ThinkingStyleKernelCache` + `conclusion_agreement` metric drop in
alongside the codec versions. Same JIT engine, same tests, same
board-hygiene discipline.

**The phrase "production-grade thinking tissue"** names the telos
cleanly: once codec infra is at Phase 3 token-agreement pass rates,
cloning to thinking styles yields production-grade swappable
reasoning — YAML-configured, JIT-compiled, sweep-certified. No
rebuild per new style, no black box, signature-keyed reproducibility.

**Cross-ref:** D0.6 `CodecParams` (the parameter-shape template);
D1.1 `CodecKernelCache<H>` (the cache pattern — generic-over-H is the
wedge for reuse); I5 (thinking IS an AdjacencyStore — already
topologically unified with data graph); codec-sweep-via-lab-infra-v1.

---

## 2026-04-20 — D1.2 Hadamard is pure-Rust, not a JIT-necessary primitive

**Status:** FINDING

D1.2's HadamardRotation is implemented as a plain Rust in-place
Sylvester butterfly (O(N log N) add/sub, no allocations). It does NOT
need JIT compilation or Cranelift code emission because:

1. **Fixed shape** — the butterfly structure is identical across all
power-of-two dims. Rust's compiler (under `target-cpu=x86-64-v4`)
already emits AVX-512 add/sub from the straight-line loop.
2. **Not matmul** — Hadamard is a pattern of adds and subtracts,
never a dot product. Per Rule C polyfill hierarchy, matmul-heavy
paths benefit from AMX (Tier 1); add/sub stays at Tier 3 F32x16.
AMX gives no speedup here — confirmed in plan Appendix §12 C.

**Consequence for D1.1b (Cranelift wiring):** only OPQ rotation needs
the JIT path — it's the one that's actually a learned matmul. The
Cranelift integration scope narrows: we don't need to JIT-compile
Identity (no-op) or Hadamard (butterfly); just OPQ (matmul) and the
main codec decode loop (ADC distance with palette lookup).

This reduces D1.1b scope by maybe 30-40% — fewer kernel shapes to
emit, only the ones that actually benefit.

Cross-ref: D1.2 `rotation_kernel.rs::HadamardRotation`; Rule C
(polyfill hierarchy); plan Appendix B (CartanCascade harmonic
compression ratios rely on real Hadamard, so this matters).

---

## 2026-04-20 — CORRECTION to D1.1 scaffold: ndarray::hpc::jitson_cranelift already ships JitEngine

**Status:** FINDING / CORRECTION
Expand Down
2 changes: 1 addition & 1 deletion .claude/board/STATUS_BOARD.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ afterwards is a JIT kernel, not a rebuild. Plan path:
|---|---|---|---|
| D1.1 | `CodecKernelCache` — structural cache layer (generic over handle) | **In PR** | branch — `CodecKernelCache<H>` + `StubKernel` + `get_or_compile` / `try_get_or_compile` with RwLock concurrent-safe double-check + compile/hit/ratio counters + 9 tests. Scaffold ships NOW; D1.1b Cranelift IR emission follows. |
| D1.1b | Adapter: `CodecKernelEngine` wrapping `ndarray::hpc::jitson_cranelift::JitEngine` with two-phase BUILD/RUN lifecycle (Arc-freeze). CodecParams → CodecScanParams adapter + codec-specific IR emission in jitson_cranelift/scan_jit analog | **Queued** | target ~250 LOC; `JitEngine` already ships (`/home/user/ndarray/src/hpc/jitson_cranelift/engine.rs`); the work is the CodecParams adapter + codec-specific JITSON template |
| D1.2 | Rotation primitives: Identity / Hadamard / OPQ as JIT kernels | **Queued** | target ~190 LOC |
| D1.2 | Rotation primitives: Identity / Hadamard / OPQ as `RotationKernel` impls | **In PR** | branch — `RotationKernel` trait (Send+Sync+Debug, object-safe) + `IdentityRotation` (no-op) + `HadamardRotation` (real Sylvester butterfly, O(N log N) in-place, norm²-scaling verified) + `OpqRotationStub` (matrix-blob-id placeholder for D1.1b) + `build(&Rotation, dim)` factory + `RotationError` typed errors + 15 tests. Hadamard stays at Tier-3 F32x16 (add/sub, not matmul → no AMX benefit per Rule C). |
| D1.3 | Residual PQ via JIT composition | **Queued** | target ~150 LOC |

### Phase 2 — Token-agreement harness (I11 cert gate) — Queued
Expand Down
6 changes: 6 additions & 0 deletions crates/cognitive-shader-driver/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,12 @@ pub mod auto_detect;
#[cfg(feature = "serve")]
pub mod codec_kernel_cache;

// D1.2 — rotation primitives (Identity / Hadamard / OPQ-stub). LAB-ONLY.
// Hadamard is real (in-place butterfly); OPQ is stub pending D1.1b's
// ndarray::hpc::jitson_cranelift::JitEngine adapter + matrix-blob loader.
#[cfg(feature = "serve")]
pub mod rotation_kernel;

// Axum REST server. LAB-ONLY.
#[cfg(feature = "serve")]
pub mod serve;
Expand Down
Loading
Loading