Skip to content

Commit 562a31c

Browse files
committed
CORRECTION to D1.1: ndarray::hpc::jitson_cranelift already ships JitEngine
User asked "I presume you are aware of cranelift/jitson" — honest answer: Cranelift generally yes (Bytecode Alliance, wasmtime), ndarray-side jitson engine specifically NO. Probed it just now. ndarray already ships the full JIT pipeline: src/hpc/jitson/ — JITSON template format (JSON-based): parser / validator / template / precompile / scan_config / packed / noise src/hpc/jitson_cranelift/ — Cranelift engine: engine.rs (JitEngine + JitEngineBuilder) ir.rs / scan_jit.rs / noise_jit.rs / detect.rs Deps behind `jit-native` feature: cranelift-codegen 0.116, cranelift-jit, cranelift-module, cranelift-frontend, target-lexicon Upstream two-phase lifecycle is stronger than my D1.1 scaffold: BUILD: &mut JitEngine, compile(ScanParams) -> Result<u64> RUN: Arc<JitEngine> freezes by Rust ownership &mut self unreachable through Arc get() ~5 ns (plain HashMap::get, no synchronization) vs my scaffold's ~25 ns RwLock read The freeze is enforced by the TYPE SYSTEM, not a runtime lock. The D1.1 scaffold is not redundant — CodecParams (codec-sweep key) differs from ScanParams (thinking-style-scan key). Generic-over-H design anticipates D1.1b: the scaffold wraps ndarray's JitEngine at the H slot when the real engine lands. But my RwLock lifecycle is worse than the Arc-freeze upstream uses. Revised D1.1b plan (STATUS_BOARD updated): CodecKernelEngine mirroring ndarray's BUILD/RUN pattern: pub struct CodecKernelEngine { inner: ndarray::hpc::jitson_cranelift::JitEngine, codec_sig_to_inner_id: HashMap<u64, u64>, } .build() -> Builder .compile(&mut self, &CodecParams) -> Result<u64> .freeze(self) -> Arc<Self> // moves to RUN phase .get(&self, &CodecParams) -> Option<KernelHandle> Target ~250 LOC; JitEngine itself is DONE upstream. What's left is the CodecParams adapter + codec-specific JITSON template (CodecScanParams struct OR direct JSON emission from CodecParams). D1.1 scaffold stays as StubKernel-backed test fixture. The generic-over-H design is the wedge that lets both coexist. EPIPHANIES.md PREPEND: "CORRECTION to D1.1 scaffold". STATUS_BOARD.md: D1.1b description updated to cite the real upstream surface + revised ~250 LOC target + path to jitson_cranelift/engine.rs. Honesty landed explicitly so next session doesn't repeat the "guess at upstream surface" failure mode. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
1 parent 58d7b2c commit 562a31c

2 files changed

Lines changed: 79 additions & 1 deletion

File tree

.claude/board/EPIPHANIES.md

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,84 @@ stay as historical references.
6565

6666
## Entries (reverse chronological)
6767

68+
## 2026-04-20 — CORRECTION to D1.1 scaffold: ndarray::hpc::jitson_cranelift already ships JitEngine
69+
70+
**Status:** FINDING / CORRECTION
71+
72+
The D1.1 `CodecKernelCache` scaffold (RwLock + double-check) is
73+
strictly worse than what ndarray's `jitson_cranelift::JitEngine`
74+
already provides. Real upstream:
75+
76+
```
77+
/home/user/ndarray/src/hpc/
78+
├── jitson/ — JITSON template format (parser/validator/
79+
│ template/precompile/scan_config/packed/noise)
80+
└── jitson_cranelift/ — real Cranelift engine
81+
├── engine.rs — JitEngine + JitEngineBuilder
82+
├── ir.rs — IR emission
83+
├── scan_jit.rs — scan kernel codegen
84+
├── noise_jit.rs — noise kernel codegen
85+
└── detect.rs — CPU capability detection
86+
```
87+
88+
Dependencies behind `jit-native` feature:
89+
`cranelift-{codegen, jit, module, frontend} 0.116` + `target-lexicon`.
90+
91+
**Upstream two-phase lifecycle is stronger than my scaffold:**
92+
93+
- **BUILD phase:** `&mut JitEngine`, `compile(ScanParams) -> Result<u64>`,
94+
mutable cache via `&mut self`.
95+
- **RUN phase:** `Arc<JitEngine>` freezes the cache by Rust's ownership
96+
(`&mut self` unreachable through `Arc`). `get()` drops from
97+
~25 ns (my RwLock read) to ~5 ns (plain `HashMap::get`, no
98+
synchronization needed).
99+
100+
The freeze is enforced by the type system, not by a runtime lock.
101+
That's the right design for this domain (build-once, run-many).
102+
103+
**What the D1.1 scaffold is still good for:** `CodecParams` is the
104+
codec-sweep key; `ScanParams` is ndarray's thinking-style-scan key.
105+
Different domains; a `CodecParams`-keyed adapter layer is still
106+
needed. My generic-over-handle design anticipates this — the
107+
scaffold wraps ndarray's `JitEngine` at the `H` slot when D1.1b
108+
lands.
109+
110+
**Revised D1.1b plan:**
111+
112+
Mirror ndarray's two-phase pattern in `cognitive-shader-driver`:
113+
114+
```rust
115+
// BUILD phase — mutable, single-threaded
116+
pub struct CodecKernelEngine {
117+
inner: ndarray::hpc::jitson_cranelift::JitEngine,
118+
codec_sig_to_inner_id: HashMap<u64, u64>, // CodecParams signature → JitEngine id
119+
}
120+
121+
// RUN phase — frozen via Arc
122+
impl CodecKernelEngine {
123+
pub fn build() -> CodecKernelEngineBuilder { ... }
124+
pub fn compile(&mut self, params: &CodecParams) -> Result<u64, JitError>;
125+
pub fn freeze(self) -> Arc<Self>; // moves to RUN phase
126+
pub fn get(&self, params: &CodecParams) -> Option<KernelHandle>;
127+
}
128+
```
129+
130+
Then D1.2/D1.3 call `inner.compile` with codec-specific
131+
`ScanParams`-analogs (new `CodecScanParams` struct or a JITSON
132+
template constructed from `CodecParams`).
133+
134+
**Honesty note:** user asked "I presume you are aware of
135+
cranelift/jitson" — answer is: Cranelift yes (Bytecode Alliance,
136+
wasmtime), ndarray jitson NO (didn't inspect the upstream surface
137+
before writing D1.1). This correction surfaces that gap explicitly
138+
so the next session doesn't repeat it.
139+
140+
**Cross-ref:** D1.1 `crates/cognitive-shader-driver/src/codec_kernel_cache.rs`
141+
(keep as `StubKernel`-backed test fixture); `ndarray::hpc::jitson_cranelift::JitEngine`;
142+
D1.1b revised plan above.
143+
144+
---
145+
68146
## 2026-04-20 — D1.1 scaffold-before-codegen: cache semantics testable without Cranelift
69147

70148
**Status:** FINDING

.claude/board/STATUS_BOARD.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ afterwards is a JIT kernel, not a rebuild. Plan path:
6262
| D-id | Title | Status | PR / Evidence |
6363
|---|---|---|---|
6464
| D1.1 | `CodecKernelCache` — structural cache layer (generic over handle) | **In PR** | branch — `CodecKernelCache<H>` + `StubKernel` + `get_or_compile` / `try_get_or_compile` with RwLock concurrent-safe double-check + compile/hit/ratio counters + 9 tests. Scaffold ships NOW; D1.1b Cranelift IR emission follows. |
65-
| D1.1b | Cranelift IR emission (plugs the real `KernelHandle` into the cache from D1.1) | **Queued** | target ~180 LOC once ndarray's jitson engine exposes the compile entry |
65+
| D1.1b | Adapter: `CodecKernelEngine` wrapping `ndarray::hpc::jitson_cranelift::JitEngine` with two-phase BUILD/RUN lifecycle (Arc-freeze). CodecParams → CodecScanParams adapter + codec-specific IR emission in jitson_cranelift/scan_jit analog | **Queued** | target ~250 LOC; `JitEngine` already ships (`/home/user/ndarray/src/hpc/jitson_cranelift/engine.rs`); the work is the CodecParams adapter + codec-specific JITSON template |
6666
| D1.2 | Rotation primitives: Identity / Hadamard / OPQ as JIT kernels | **Queued** | target ~190 LOC |
6767
| D1.3 | Residual PQ via JIT composition | **Queued** | target ~150 LOC |
6868

0 commit comments

Comments
 (0)