Skip to content

Commit cf42a4a

Browse files
authored
Merge pull request #233 from AdaWorldAPI/claude/teleport-session-setup-wMZfb
D1.1 CodecKernelCache scaffold + honest CORRECTION (80/80 tests, ndarray jitson surface verified)
2 parents b06f9f9 + 562a31c commit cf42a4a

4 files changed

Lines changed: 460 additions & 2 deletions

File tree

.claude/board/EPIPHANIES.md

Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,119 @@ stay as historical references.
6565

6666
## Entries (reverse chronological)
6767

68+
## 2026-04-20 — CORRECTION to D1.1 scaffold: ndarray::hpc::jitson_cranelift already ships JitEngine
69+
70+
**Status:** FINDING / CORRECTION
71+
72+
The D1.1 `CodecKernelCache` scaffold (RwLock + double-check) is
73+
strictly worse than what ndarray's `jitson_cranelift::JitEngine`
74+
already provides. Real upstream:
75+
76+
```
77+
/home/user/ndarray/src/hpc/
78+
├── jitson/ — JITSON template format (parser/validator/
79+
│ template/precompile/scan_config/packed/noise)
80+
└── jitson_cranelift/ — real Cranelift engine
81+
├── engine.rs — JitEngine + JitEngineBuilder
82+
├── ir.rs — IR emission
83+
├── scan_jit.rs — scan kernel codegen
84+
├── noise_jit.rs — noise kernel codegen
85+
└── detect.rs — CPU capability detection
86+
```
87+
88+
Dependencies behind `jit-native` feature:
89+
`cranelift-{codegen, jit, module, frontend} 0.116` + `target-lexicon`.
90+
91+
**Upstream two-phase lifecycle is stronger than my scaffold:**
92+
93+
- **BUILD phase:** `&mut JitEngine`, `compile(ScanParams) -> Result<u64>`,
94+
mutable cache via `&mut self`.
95+
- **RUN phase:** `Arc<JitEngine>` freezes the cache by Rust's ownership
96+
(`&mut self` unreachable through `Arc`). `get()` drops from
97+
~25 ns (my RwLock read) to ~5 ns (plain `HashMap::get`, no
98+
synchronization needed).
99+
100+
The freeze is enforced by the type system, not by a runtime lock.
101+
That's the right design for this domain (build-once, run-many).
102+
103+
**What the D1.1 scaffold is still good for:** `CodecParams` is the
104+
codec-sweep key; `ScanParams` is ndarray's thinking-style-scan key.
105+
Different domains; a `CodecParams`-keyed adapter layer is still
106+
needed. My generic-over-handle design anticipates this — the
107+
scaffold wraps ndarray's `JitEngine` at the `H` slot when D1.1b
108+
lands.
109+
110+
**Revised D1.1b plan:**
111+
112+
Mirror ndarray's two-phase pattern in `cognitive-shader-driver`:
113+
114+
```rust
115+
// BUILD phase — mutable, single-threaded
116+
pub struct CodecKernelEngine {
117+
inner: ndarray::hpc::jitson_cranelift::JitEngine,
118+
codec_sig_to_inner_id: HashMap<u64, u64>, // CodecParams signature → JitEngine id
119+
}
120+
121+
// RUN phase — frozen via Arc
122+
impl CodecKernelEngine {
123+
pub fn build() -> CodecKernelEngineBuilder { ... }
124+
pub fn compile(&mut self, params: &CodecParams) -> Result<u64, JitError>;
125+
pub fn freeze(self) -> Arc<Self>; // moves to RUN phase
126+
pub fn get(&self, params: &CodecParams) -> Option<KernelHandle>;
127+
}
128+
```
129+
130+
Then D1.2/D1.3 call `inner.compile` with codec-specific
131+
`ScanParams`-analogs (new `CodecScanParams` struct or a JITSON
132+
template constructed from `CodecParams`).
133+
134+
**Honesty note:** user asked "I presume you are aware of
135+
cranelift/jitson" — answer is: Cranelift yes (Bytecode Alliance,
136+
wasmtime), ndarray jitson NO (didn't inspect the upstream surface
137+
before writing D1.1). This correction surfaces that gap explicitly
138+
so the next session doesn't repeat it.
139+
140+
**Cross-ref:** D1.1 `crates/cognitive-shader-driver/src/codec_kernel_cache.rs`
141+
(keep as `StubKernel`-backed test fixture); `ndarray::hpc::jitson_cranelift::JitEngine`;
142+
D1.1b revised plan above.
143+
144+
---
145+
146+
## 2026-04-20 — D1.1 scaffold-before-codegen: cache semantics testable without Cranelift
147+
148+
**Status:** FINDING
149+
150+
`CodecKernelCache<H>` is generic over the kernel-handle type. The same
151+
cache hosts `StubKernel` (deterministic fake, no compilation) for tests
152+
AND `KernelHandle` (real Cranelift function pointer) for production.
153+
154+
This separates TWO concerns that are usually tangled:
155+
156+
1. **Cache semantics** — signature-keyed insertion, double-checked
157+
locking under concurrent miss, counters for hit-ratio measurement.
158+
Testable in microseconds without a JIT engine.
159+
2. **IR emission** — the actual Cranelift / jitson code generation
160+
that takes `CodecParams` and produces a callable function pointer.
161+
Heavy; takes minutes per build; requires ndarray's jitson surface
162+
to be finalized.
163+
164+
By shipping the cache layer with `StubKernel` NOW, Phase 1's cache
165+
semantics are verified + CI-gated before the Cranelift work starts.
166+
When D1.1b lands, the only change is `H = KernelHandle`; all 9 cache
167+
tests remain valid. This is the **scaffold-before-codegen** pattern:
168+
test the hard-to-change contract first, defer the hard-to-build
169+
implementation.
170+
171+
Generalises: any JIT pipeline should separate cache-keying from IR
172+
emission at the type level. Generic over handle type is the wedge
173+
that makes this possible.
174+
175+
Cross-ref: D1.1 `crates/cognitive-shader-driver/src/codec_kernel_cache.rs`;
176+
D0.3 sweep-grid-IS-cache-warmer epiphany (same signature-as-identity
177+
insight); PR #225 `CodecParams::kernel_signature()`.
178+
179+
---
180+
68181
## 2026-04-20 — D0.3 sweep grid IS the JIT cache warmer
69182

70183
**Status:** FINDING

.claude/board/STATUS_BOARD.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -57,11 +57,12 @@ afterwards is a JIT kernel, not a rebuild. Plan path:
5757
| D0.6 | `CodecParamsBuilder` fluent API | **Shipped** | #225`contract::cam` +290 LOC of codec-params types, 14 tests (CODING_PRACTICES gap 3) |
5858
| D0.7 | Precision-ladder validation (OPQ↔BF16x32, Hadamard pow2, overfit guard) | **Shipped** | #225`CodecParamsError` at `.build()` BEFORE JIT compile |
5959

60-
### Phase 1 — JIT codec kernels — Queued
60+
### Phase 1 — JIT codec kernels
6161

6262
| D-id | Title | Status | PR / Evidence |
6363
|---|---|---|---|
64-
| D1.1 | `CodecKernelCache` via `JitCompiler` (Cranelift) | **Queued** | target ~180 LOC |
64+
| D1.1 | `CodecKernelCache` — structural cache layer (generic over handle) | **In PR** | branch — `CodecKernelCache<H>` + `StubKernel` + `get_or_compile` / `try_get_or_compile` with RwLock concurrent-safe double-check + compile/hit/ratio counters + 9 tests. Scaffold ships NOW; D1.1b Cranelift IR emission follows. |
65+
| D1.1b | Adapter: `CodecKernelEngine` wrapping `ndarray::hpc::jitson_cranelift::JitEngine` with two-phase BUILD/RUN lifecycle (Arc-freeze). CodecParams → CodecScanParams adapter + codec-specific IR emission in jitson_cranelift/scan_jit analog | **Queued** | target ~250 LOC; `JitEngine` already ships (`/home/user/ndarray/src/hpc/jitson_cranelift/engine.rs`); the work is the CodecParams adapter + codec-specific JITSON template |
6566
| D1.2 | Rotation primitives: Identity / Hadamard / OPQ as JIT kernels | **Queued** | target ~190 LOC |
6667
| D1.3 | Residual PQ via JIT composition | **Queued** | target ~150 LOC |
6768

0 commit comments

Comments
 (0)