@@ -65,6 +65,119 @@ stay as historical references.
6565
6666## Entries (reverse chronological)
6767
68+ ## 2026-04-20 — CORRECTION to D1.1 scaffold: ndarray::hpc::jitson_cranelift already ships JitEngine
69+
70+ ** Status:** FINDING / CORRECTION
71+
72+ The D1.1 ` CodecKernelCache ` scaffold (RwLock + double-check) is
73+ strictly worse than what ndarray's ` jitson_cranelift::JitEngine `
74+ already provides. Real upstream:
75+
76+ ```
77+ /home/user/ndarray/src/hpc/
78+ ├── jitson/ — JITSON template format (parser/validator/
79+ │ template/precompile/scan_config/packed/noise)
80+ └── jitson_cranelift/ — real Cranelift engine
81+ ├── engine.rs — JitEngine + JitEngineBuilder
82+ ├── ir.rs — IR emission
83+ ├── scan_jit.rs — scan kernel codegen
84+ ├── noise_jit.rs — noise kernel codegen
85+ └── detect.rs — CPU capability detection
86+ ```
87+
88+ Dependencies behind ` jit-native ` feature:
89+ ` cranelift-{codegen, jit, module, frontend} 0.116 ` + ` target-lexicon ` .
90+
91+ ** Upstream two-phase lifecycle is stronger than my scaffold:**
92+
93+ - ** BUILD phase:** ` &mut JitEngine ` , ` compile(ScanParams) -> Result<u64> ` ,
94+ mutable cache via ` &mut self ` .
95+ - ** RUN phase:** ` Arc<JitEngine> ` freezes the cache by Rust's ownership
96+ (` &mut self ` unreachable through ` Arc ` ). ` get() ` drops from
97+ ~ 25 ns (my RwLock read) to ~ 5 ns (plain ` HashMap::get ` , no
98+ synchronization needed).
99+
100+ The freeze is enforced by the type system, not by a runtime lock.
101+ That's the right design for this domain (build-once, run-many).
102+
103+ ** What the D1.1 scaffold is still good for:** ` CodecParams ` is the
104+ codec-sweep key; ` ScanParams ` is ndarray's thinking-style-scan key.
105+ Different domains; a ` CodecParams ` -keyed adapter layer is still
106+ needed. My generic-over-handle design anticipates this — the
107+ scaffold wraps ndarray's ` JitEngine ` at the ` H ` slot when D1.1b
108+ lands.
109+
110+ ** Revised D1.1b plan:**
111+
112+ Mirror ndarray's two-phase pattern in ` cognitive-shader-driver ` :
113+
114+ ``` rust
115+ // BUILD phase — mutable, single-threaded
116+ pub struct CodecKernelEngine {
117+ inner : ndarray :: hpc :: jitson_cranelift :: JitEngine ,
118+ codec_sig_to_inner_id : HashMap <u64 , u64 >, // CodecParams signature → JitEngine id
119+ }
120+
121+ // RUN phase — frozen via Arc
122+ impl CodecKernelEngine {
123+ pub fn build () -> CodecKernelEngineBuilder { ... }
124+ pub fn compile (& mut self , params : & CodecParams ) -> Result <u64 , JitError >;
125+ pub fn freeze (self ) -> Arc <Self >; // moves to RUN phase
126+ pub fn get (& self , params : & CodecParams ) -> Option <KernelHandle >;
127+ }
128+ ```
129+
130+ Then D1.2/D1.3 call ` inner.compile ` with codec-specific
131+ ` ScanParams ` -analogs (new ` CodecScanParams ` struct or a JITSON
132+ template constructed from ` CodecParams ` ).
133+
134+ ** Honesty note:** user asked "I presume you are aware of
135+ cranelift/jitson" — answer is: Cranelift yes (Bytecode Alliance,
136+ wasmtime), ndarray jitson NO (didn't inspect the upstream surface
137+ before writing D1.1). This correction surfaces that gap explicitly
138+ so the next session doesn't repeat it.
139+
140+ ** Cross-ref:** D1.1 ` crates/cognitive-shader-driver/src/codec_kernel_cache.rs `
141+ (keep as ` StubKernel ` -backed test fixture); ` ndarray::hpc::jitson_cranelift::JitEngine ` ;
142+ D1.1b revised plan above.
143+
144+ ---
145+
146+ ## 2026-04-20 — D1.1 scaffold-before-codegen: cache semantics testable without Cranelift
147+
148+ ** Status:** FINDING
149+
150+ ` CodecKernelCache<H> ` is generic over the kernel-handle type. The same
151+ cache hosts ` StubKernel ` (deterministic fake, no compilation) for tests
152+ AND ` KernelHandle ` (real Cranelift function pointer) for production.
153+
154+ This separates TWO concerns that are usually tangled:
155+
156+ 1 . ** Cache semantics** — signature-keyed insertion, double-checked
157+ locking under concurrent miss, counters for hit-ratio measurement.
158+ Testable in microseconds without a JIT engine.
159+ 2 . ** IR emission** — the actual Cranelift / jitson code generation
160+ that takes ` CodecParams ` and produces a callable function pointer.
161+ Heavy; takes minutes per build; requires ndarray's jitson surface
162+ to be finalized.
163+
164+ By shipping the cache layer with ` StubKernel ` NOW, Phase 1's cache
165+ semantics are verified + CI-gated before the Cranelift work starts.
166+ When D1.1b lands, the only change is ` H = KernelHandle ` ; all 9 cache
167+ tests remain valid. This is the ** scaffold-before-codegen** pattern:
168+ test the hard-to-change contract first, defer the hard-to-build
169+ implementation.
170+
171+ Generalises: any JIT pipeline should separate cache-keying from IR
172+ emission at the type level. Generic over handle type is the wedge
173+ that makes this possible.
174+
175+ Cross-ref: D1.1 ` crates/cognitive-shader-driver/src/codec_kernel_cache.rs ` ;
176+ D0.3 sweep-grid-IS-cache-warmer epiphany (same signature-as-identity
177+ insight); PR #225 ` CodecParams::kernel_signature() ` .
178+
179+ ---
180+
68181## 2026-04-20 — D0.3 sweep grid IS the JIT cache warmer
69182
70183** Status:** FINDING
0 commit comments