Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 113 additions & 0 deletions .claude/board/EPIPHANIES.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,119 @@ stay as historical references.

## Entries (reverse chronological)

## 2026-04-20 — CORRECTION to D1.1 scaffold: ndarray::hpc::jitson_cranelift already ships JitEngine

**Status:** FINDING / CORRECTION

The D1.1 `CodecKernelCache` scaffold (RwLock + double-check) is
strictly worse than what ndarray's `jitson_cranelift::JitEngine`
already provides. Real upstream:

```
/home/user/ndarray/src/hpc/
├── jitson/ — JITSON template format (parser/validator/
│ template/precompile/scan_config/packed/noise)
└── jitson_cranelift/ — real Cranelift engine
├── engine.rs — JitEngine + JitEngineBuilder
├── ir.rs — IR emission
├── scan_jit.rs — scan kernel codegen
├── noise_jit.rs — noise kernel codegen
└── detect.rs — CPU capability detection
```

Dependencies behind `jit-native` feature:
`cranelift-{codegen, jit, module, frontend} 0.116` + `target-lexicon`.

**Upstream two-phase lifecycle is stronger than my scaffold:**

- **BUILD phase:** `&mut JitEngine`, `compile(ScanParams) -> Result<u64>`,
mutable cache via `&mut self`.
- **RUN phase:** `Arc<JitEngine>` freezes the cache by Rust's ownership
(`&mut self` unreachable through `Arc`). `get()` drops from
~25 ns (my RwLock read) to ~5 ns (plain `HashMap::get`, no
synchronization needed).

The freeze is enforced by the type system, not by a runtime lock.
That's the right design for this domain (build-once, run-many).

**What the D1.1 scaffold is still good for:** `CodecParams` is the
codec-sweep key; `ScanParams` is ndarray's thinking-style-scan key.
Different domains; a `CodecParams`-keyed adapter layer is still
needed. My generic-over-handle design anticipates this — the
scaffold wraps ndarray's `JitEngine` at the `H` slot when D1.1b
lands.

**Revised D1.1b plan:**

Mirror ndarray's two-phase pattern in `cognitive-shader-driver`:

```rust
// BUILD phase — mutable, single-threaded
pub struct CodecKernelEngine {
inner: ndarray::hpc::jitson_cranelift::JitEngine,
codec_sig_to_inner_id: HashMap<u64, u64>, // CodecParams signature → JitEngine id
}

// RUN phase — frozen via Arc
impl CodecKernelEngine {
pub fn build() -> CodecKernelEngineBuilder { ... }
pub fn compile(&mut self, params: &CodecParams) -> Result<u64, JitError>;
pub fn freeze(self) -> Arc<Self>; // moves to RUN phase
pub fn get(&self, params: &CodecParams) -> Option<KernelHandle>;
}
```

Then D1.2/D1.3 call `inner.compile` with codec-specific
`ScanParams`-analogs (new `CodecScanParams` struct or a JITSON
template constructed from `CodecParams`).

**Honesty note:** user asked "I presume you are aware of
cranelift/jitson" — answer is: Cranelift yes (Bytecode Alliance,
wasmtime), ndarray jitson NO (didn't inspect the upstream surface
before writing D1.1). This correction surfaces that gap explicitly
so the next session doesn't repeat it.

**Cross-ref:** D1.1 `crates/cognitive-shader-driver/src/codec_kernel_cache.rs`
(keep as `StubKernel`-backed test fixture); `ndarray::hpc::jitson_cranelift::JitEngine`;
D1.1b revised plan above.

---

## 2026-04-20 — D1.1 scaffold-before-codegen: cache semantics testable without Cranelift

**Status:** FINDING

`CodecKernelCache<H>` is generic over the kernel-handle type. The same
cache hosts `StubKernel` (deterministic fake, no compilation) for tests
AND `KernelHandle` (real Cranelift function pointer) for production.

This separates TWO concerns that are usually tangled:

1. **Cache semantics** — signature-keyed insertion, double-checked
locking under concurrent miss, counters for hit-ratio measurement.
Testable in microseconds without a JIT engine.
2. **IR emission** — the actual Cranelift / jitson code generation
that takes `CodecParams` and produces a callable function pointer.
Heavy; takes minutes per build; requires ndarray's jitson surface
to be finalized.

By shipping the cache layer with `StubKernel` NOW, Phase 1's cache
semantics are verified + CI-gated before the Cranelift work starts.
When D1.1b lands, the only change is `H = KernelHandle`; all 9 cache
tests remain valid. This is the **scaffold-before-codegen** pattern:
test the hard-to-change contract first, defer the hard-to-build
implementation.

Generalises: any JIT pipeline should separate cache-keying from IR
emission at the type level. Generic over handle type is the wedge
that makes this possible.

Cross-ref: D1.1 `crates/cognitive-shader-driver/src/codec_kernel_cache.rs`;
D0.3 sweep-grid-IS-cache-warmer epiphany (same signature-as-identity
insight); PR #225 `CodecParams::kernel_signature()`.

---

## 2026-04-20 — D0.3 sweep grid IS the JIT cache warmer

**Status:** FINDING
Expand Down
5 changes: 3 additions & 2 deletions .claude/board/STATUS_BOARD.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,11 +57,12 @@ afterwards is a JIT kernel, not a rebuild. Plan path:
| D0.6 | `CodecParamsBuilder` fluent API | **Shipped** | #225 — `contract::cam` +290 LOC of codec-params types, 14 tests (CODING_PRACTICES gap 3) |
| D0.7 | Precision-ladder validation (OPQ↔BF16x32, Hadamard pow2, overfit guard) | **Shipped** | #225 — `CodecParamsError` at `.build()` BEFORE JIT compile |

### Phase 1 — JIT codec kernels — Queued
### Phase 1 — JIT codec kernels

| D-id | Title | Status | PR / Evidence |
|---|---|---|---|
| D1.1 | `CodecKernelCache` via `JitCompiler` (Cranelift) | **Queued** | target ~180 LOC |
| D1.1 | `CodecKernelCache` — structural cache layer (generic over handle) | **In PR** | branch — `CodecKernelCache<H>` + `StubKernel` + `get_or_compile` / `try_get_or_compile` with RwLock concurrent-safe double-check + compile/hit/ratio counters + 9 tests. Scaffold ships NOW; D1.1b Cranelift IR emission follows. |
| D1.1b | Adapter: `CodecKernelEngine` wrapping `ndarray::hpc::jitson_cranelift::JitEngine` with two-phase BUILD/RUN lifecycle (Arc-freeze). CodecParams → CodecScanParams adapter + codec-specific IR emission in jitson_cranelift/scan_jit analog | **Queued** | target ~250 LOC; `JitEngine` already ships (`/home/user/ndarray/src/hpc/jitson_cranelift/engine.rs`); the work is the CodecParams adapter + codec-specific JITSON template |
| D1.2 | Rotation primitives: Identity / Hadamard / OPQ as JIT kernels | **Queued** | target ~190 LOC |
| D1.3 | Residual PQ via JIT composition | **Queued** | target ~150 LOC |

Expand Down
Loading
Loading