Skip to content

Commit a3529ff

Browse files
authored
Merge pull request #234 from AdaWorldAPI/claude/teleport-session-setup-wMZfb
D1.2 rotation primitives + taxonomy/ladder/shader-vs-engine epiphanies (95/95 tests)
2 parents cf42a4a + aad6e6a commit a3529ff

4 files changed

Lines changed: 561 additions & 1 deletion

File tree

.claude/board/EPIPHANIES.md

Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,180 @@ stay as historical references.
6565

6666
## Entries (reverse chronological)
6767

68+
## 2026-04-20 — Shader vs engine: statelessness is the boundary
69+
70+
**Status:** FINDING (sharpens the three-level taxonomy)
71+
72+
**Cognitive shader** = stateless atomic compute. Given `ShaderDispatch`
73+
+ `BindSpace` columns, returns `ShaderHit`s + `MetaWord`. Knows nothing
74+
of why it fires. Output is one-cycle-wide, no history.
75+
76+
**Thinking engine** = stateful orchestrator. Calls `shader.dispatch()`
77+
many times per cognitive cycle; composes per-lens hits into
78+
persona/qualia/world_model/ghost state; revises beliefs for the next
79+
cycle. The cognitive stack IS the state.
80+
81+
**The engine_bridge is where they meet**
82+
`cognitive-shader-driver/src/engine_bridge.rs` is the seam. Shader
83+
side: `ShaderDriver::dispatch` stateless. Engine side:
84+
`cognitive_stack::cycle` accumulates dispatches through
85+
`bf16_engine` / `signed_engine` / `composite_engine` / `dual_engine` /
86+
`layered` / `domino`, folds into persona/qualia, emits state for next
87+
cycle.
88+
89+
**Analogy:** shader = eye (no memory, reports the current frame);
90+
engine = mind (memory, assembles frames into narrative, counterfactually
91+
imagines alternatives).
92+
93+
**Where codec-flexibility-as-thinking lands:** the **engine** level,
94+
not the shader level. A "new thinking style" = a new engine
95+
configuration (lens composition, persona, qualia-update rule) that
96+
picks DIFFERENT shader configs per cycle. Shader stays the same; the
97+
engine's orchestration changes. That's why Phase 5+ "production-grade
98+
thinking tissue" drops into mid (engine), not L2 (shader).
99+
100+
**Concrete Phase 1-5 shipping:** codec-sweep D1.x work = shader layer
101+
(tensor decode primitives). Engine-level codec-flexibility (swap
102+
lenses via YAML) = D5 / Phase 5+, plugging INTO the codec infrastructure.
103+
104+
Cross-ref: three-level taxonomy above; resolution-ladder entry
105+
`64×64 > 256×257 >> 4096×4096 > 16k`; `engine_bridge.rs` seam.
106+
107+
---
108+
109+
## 2026-04-20 — Resolution hierarchy: `64×64 > 256×257 >> 4096×4096 > 16k` (user-named)
110+
111+
**Status:** FINDING (capstone of the three-level taxonomy from earlier this session)
112+
113+
The 5-layer stack is a **resolution ladder**, not a layer cake. Each
114+
level operates at its own granularity and has its own "shader" /
115+
"kernel cache" / "distance table" at that scale:
116+
117+
| Size | Role | Where | HHTL stage (I10) |
118+
|---|---|---|---|
119+
| **64×64** | p64 topology mask — 8 predicate planes × 64 rows × u64 — "which archetype blocks relate via predicate z" | `p64_bridge::cognitive_shader::CognitiveShader` | HEEL (coarse basin) |
120+
| **256×257** | bgz17 palette distance table — 256 archetypes × 256 + 1 sentinel — O(1) lookup `semiring.distance(a, b)` | `bgz17::PaletteSemiring` | HIP (family sharpen) |
121+
| **4096×4096** | Cross-vocabulary / cross-context correlation — COCA × COCA, or 4096 τ-prefix × 4096 slot space | ndarray `ScanParams` JIT (`jitson_cranelift`) | BRANCH / TWIG |
122+
| **16 K** | Individual fingerprint bit identity — 16384-bit `Fingerprint<256>` | `ndarray::simd::Fingerprint<256>` + codec decoder (D1.x) | LEAF (exact member) |
123+
124+
**The `>>` between 256×257 and 4096×4096 is the big jump** (~64×)
125+
matching HIP → BRANCH refinement. That's where palette-level (one
126+
row of the codebook) meets vocabulary-level (COCA 4096). Below that
127+
jump, everything is O(1) table lookup; above it, JIT kernels become
128+
worth the compile cost.
129+
130+
**Each JIT targets its own resolution — no overlap:**
131+
132+
- p64 cascade: 64×64 bitmask ops. Not JIT'd (bit tricks in hot loop
133+
already optimal under AVX-512).
134+
- bgz17 palette: 256×256 precomputed. Not JIT'd (memory-bound).
135+
- ndarray ScanParams: 4096×4096 scan kernels. **JIT'd via
136+
`jitson_cranelift::JitEngine`** — shipped.
137+
- Codec kernels (D1.x): 16k bit-level tensor decode. **Will be JIT'd
138+
via D1.1b `CodecKernelEngine` adapter**. Scaffold (D1.1) + rotation
139+
primitives (D1.2) landed; Cranelift IR emission deferred to D1.1b.
140+
141+
**Three-level taxonomy (from earlier this session) maps onto the
142+
resolution ladder:**
143+
144+
- **L2 small-precision cognitive shaders** (ns budget) →
145+
64×64 + 256×257 (p64 + bgz17 palette). Pure table lookups.
146+
- **mid thinking-engine layers** (µs-ms) →
147+
4096×4096 (cross-vocab, persona-aware lens composition). JIT'd
148+
scan kernels.
149+
- **L4 thinking styles / NARS / JIT** (ms) →
150+
orchestrates traversal ACROSS resolutions (starts at 64×64 cascade
151+
to find candidates, narrows to 256×257 for family, drops to
152+
4096×4096 for context, verifies at 16k fingerprint identity).
153+
154+
**p64::CognitiveShader double-check conclusion:** architecturally
155+
clean. Operates at the coarsest (64×64) level; codec-sweep work at
156+
finest (16k); they compose in `cognitive_shader_driver::ShaderDriver`
157+
without overlap. Different layers of the ladder, different
158+
operations, different JIT targets (if any).
159+
160+
Cross-ref: I10 (HEEL/HIP/BRANCH/TWIG/LEAF); three-level taxonomy entry
161+
above; `p64_bridge::cognitive_shader::CognitiveShader::cascade`;
162+
D1.1 `CodecKernelCache`; D1.2 `RotationKernel`; bgz17 `PaletteSemiring`.
163+
164+
---
165+
166+
## 2026-04-20 — Thinking styles ARE codecs over the semantic field (north star)
167+
168+
**Status:** FINDING (forward-looking deposit — not a current work item; reference when Phase 5+ generalises)
169+
170+
A codec compresses tensor content into fingerprints; a thinking style
171+
compresses reasoning trajectories into NARS-revised beliefs. Same
172+
underlying operation — structure-preserving compression on a binary
173+
Hamming substrate. Different input/output domains, same substrate
174+
guarantees (E-SUBSTRATE-1, I-SUBSTRATE-MARKOV), same compile-and-swap
175+
machinery.
176+
177+
**The codec infrastructure IS the template for production-grade
178+
thinking tissue.** When Phase 5+ activates:
179+
180+
| Codec (shipped D0.1–D1.2, D1.1b queued) | Thinking-style analog |
181+
|---|---|
182+
| `CodecParams` | `ThinkingStyleParams { style, modulation_7d, nars_priors, fallback_chain, sigma_priority, semiring_choice }` |
183+
| `kernel_signature()` — excludes runtime drift | `style_signature()` — excludes per-cycle modulation drift |
184+
| `CodecKernelCache<H>` | `ThinkingStyleKernelCache<H>` — same generic scaffold |
185+
| JIT kernel = Cranelift-compiled decode | JIT kernel = compiled scan-walk on 36-node topology (already shipped ndarray-side via `scan_jit.rs` + `ScanParams`) |
186+
| **Token agreement** (I11 cert gate) | **Conclusion agreement** — same NARS-revised conclusions as reference style? |
187+
| Sweep grid = N codec candidates | Sweep grid = N (style × modulation × NARS fallback) candidates |
188+
| `/v1/shader/calibrate` | `/v1/shader/think-calibrate` |
189+
| `[FORMAL-SCAFFOLD]` 5 pillars | **Same scaffold** — E-SUBSTRATE-1 covers any transition under bundle |
190+
191+
**Generalisation isn't "port codec pattern to thinking"** — it's
192+
recognising thinking styles as a SPECIAL CASE of the codec pattern we
193+
just built. When Phase 5+ lands, `WireThinkCalibrate` +
194+
`ThinkingStyleKernelCache` + `conclusion_agreement` metric drop in
195+
alongside the codec versions. Same JIT engine, same tests, same
196+
board-hygiene discipline.
197+
198+
**The phrase "production-grade thinking tissue"** names the telos
199+
cleanly: once codec infra is at Phase 3 token-agreement pass rates,
200+
cloning to thinking styles yields production-grade swappable
201+
reasoning — YAML-configured, JIT-compiled, sweep-certified. No
202+
rebuild per new style, no black box, signature-keyed reproducibility.
203+
204+
**Cross-ref:** D0.6 `CodecParams` (the parameter-shape template);
205+
D1.1 `CodecKernelCache<H>` (the cache pattern — generic-over-H is the
206+
wedge for reuse); I5 (thinking IS an AdjacencyStore — already
207+
topologically unified with data graph); codec-sweep-via-lab-infra-v1.
208+
209+
---
210+
211+
## 2026-04-20 — D1.2 Hadamard is pure-Rust, not a JIT-necessary primitive
212+
213+
**Status:** FINDING
214+
215+
D1.2's HadamardRotation is implemented as a plain Rust in-place
216+
Sylvester butterfly (O(N log N) add/sub, no allocations). It does NOT
217+
need JIT compilation or Cranelift code emission because:
218+
219+
1. **Fixed shape** — the butterfly structure is identical across all
220+
power-of-two dims. Rust's compiler (under `target-cpu=x86-64-v4`)
221+
already emits AVX-512 add/sub from the straight-line loop.
222+
2. **Not matmul** — Hadamard is a pattern of adds and subtracts,
223+
never a dot product. Per Rule C polyfill hierarchy, matmul-heavy
224+
paths benefit from AMX (Tier 1); add/sub stays at Tier 3 F32x16.
225+
AMX gives no speedup here — confirmed in plan Appendix §12 C.
226+
227+
**Consequence for D1.1b (Cranelift wiring):** only OPQ rotation needs
228+
the JIT path — it's the one that's actually a learned matmul. The
229+
Cranelift integration scope narrows: we don't need to JIT-compile
230+
Identity (no-op) or Hadamard (butterfly); just OPQ (matmul) and the
231+
main codec decode loop (ADC distance with palette lookup).
232+
233+
This reduces D1.1b scope by maybe 30-40% — fewer kernel shapes to
234+
emit, only the ones that actually benefit.
235+
236+
Cross-ref: D1.2 `rotation_kernel.rs::HadamardRotation`; Rule C
237+
(polyfill hierarchy); plan Appendix B (CartanCascade harmonic
238+
compression ratios rely on real Hadamard, so this matters).
239+
240+
---
241+
68242
## 2026-04-20 — CORRECTION to D1.1 scaffold: ndarray::hpc::jitson_cranelift already ships JitEngine
69243

70244
**Status:** FINDING / CORRECTION

.claude/board/STATUS_BOARD.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ afterwards is a JIT kernel, not a rebuild. Plan path:
6363
|---|---|---|---|
6464
| D1.1 | `CodecKernelCache` — structural cache layer (generic over handle) | **In PR** | branch — `CodecKernelCache<H>` + `StubKernel` + `get_or_compile` / `try_get_or_compile` with RwLock concurrent-safe double-check + compile/hit/ratio counters + 9 tests. Scaffold ships NOW; D1.1b Cranelift IR emission follows. |
6565
| D1.1b | Adapter: `CodecKernelEngine` wrapping `ndarray::hpc::jitson_cranelift::JitEngine` with two-phase BUILD/RUN lifecycle (Arc-freeze). CodecParams → CodecScanParams adapter + codec-specific IR emission in jitson_cranelift/scan_jit analog | **Queued** | target ~250 LOC; `JitEngine` already ships (`/home/user/ndarray/src/hpc/jitson_cranelift/engine.rs`); the work is the CodecParams adapter + codec-specific JITSON template |
66-
| D1.2 | Rotation primitives: Identity / Hadamard / OPQ as JIT kernels | **Queued** | target ~190 LOC |
66+
| D1.2 | Rotation primitives: Identity / Hadamard / OPQ as `RotationKernel` impls | **In PR** | branch — `RotationKernel` trait (Send+Sync+Debug, object-safe) + `IdentityRotation` (no-op) + `HadamardRotation` (real Sylvester butterfly, O(N log N) in-place, norm²-scaling verified) + `OpqRotationStub` (matrix-blob-id placeholder for D1.1b) + `build(&Rotation, dim)` factory + `RotationError` typed errors + 15 tests. Hadamard stays at Tier-3 F32x16 (add/sub, not matmul → no AMX benefit per Rule C). |
6767
| D1.3 | Residual PQ via JIT composition | **Queued** | target ~150 LOC |
6868

6969
### Phase 2 — Token-agreement harness (I11 cert gate) — Queued

crates/cognitive-shader-driver/src/lib.rs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,12 @@ pub mod auto_detect;
125125
#[cfg(feature = "serve")]
126126
pub mod codec_kernel_cache;
127127

128+
// D1.2 — rotation primitives (Identity / Hadamard / OPQ-stub). LAB-ONLY.
129+
// Hadamard is real (in-place butterfly); OPQ is stub pending D1.1b's
130+
// ndarray::hpc::jitson_cranelift::JitEngine adapter + matrix-blob loader.
131+
#[cfg(feature = "serve")]
132+
pub mod rotation_kernel;
133+
128134
// Axum REST server. LAB-ONLY.
129135
#[cfg(feature = "serve")]
130136
pub mod serve;

0 commit comments

Comments
 (0)