Skip to content

Commit d057605

Browse files
authored
Merge pull request #641 from AdaWorldAPI/claude/v3-substrate-migration-review-o0yoxv
onebrc/lane-j: wire GridBatch through ndarray MultiLaneColumn (D-DNV-1)
2 parents 70af76f + 8d98351 commit d057605

3 files changed

Lines changed: 149 additions & 3 deletions

File tree

.claude/board/STATUS_BOARD.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,14 @@
1+
## deepnsm-v3-convergence-v1 — DeepNSM is the encoder that fills reserved tenants
2+
3+
Plan: `.claude/plans/deepnsm-v3-convergence-v1.md` (`E-V3-DEEPNSM-IS-THE-ENCODER-NOT-A-MIGRATION-1`). Static convergence PROVEN by #624 P0–P5; the memory layer is the genuinely-unbuilt seam. Extends `v3-convergence-wiring-v1` (wire-don't-invent).
4+
5+
| D-id | Title | Crate(s) | Status | Evidence |
6+
|---|---|---|---|---|
7+
| D-DNV-1 | Gridlake carrier: `GridBatch::as_gridlake_columns``ndarray::simd::MultiLaneColumn` (i32 min/max, i64 sum, u64 count); the carrier the COCA `Cell` also rides | onebrc-probe (+ndarray) | In PR | lane-j feature pulls ndarray; 2 tests green (LE roundtrip cell-for-cell + unaligned-grid reject); lane_j.rs clippy-clean |
8+
| D-DNV-2 | deepnsm `SpoTriple``CausalEdge64` S/P/O+freq/conf → `MaterializedEdges`; run `nars_engine.all_projections()` (2³) over the COCA distance matrix | deepnsm + planner | Queued | buildable; extends #624 P3b |
9+
| D-DNV-3 | arm-discovery as the 2nd proposer leg into one SpoStore (shares palette256 oracle) | arm-discovery + deepnsm | Blocked (ARM-JIRAK-FLOOR) | D-ARM-7 Jirak noise floor is the hard prereq |
10+
| D-DNV-4 | Episodic-witness tenant + `basin=family` wake (`witness_tombstone` calcify chain) | contract + arigraph | Blocked (own wave + probe) | no episodic-witness ValueTenant; calcify chain is `todo!()`; basin=family doc-only |
11+
112
## v3-substrate-integration-v1 — the .claude/v3/ consolidation (W0–W6)
213

314
Plan: `.claude/v3/INTEGRATION-PLAN.md` (stub: `.claude/plans/v3-substrate-integration-v1.md`). Adopts (does not re-mint) D-MBX-A6, D-PERT-1, D-CC-*, D-VCW-3/5/7, D-CCF-4.

crates/onebrc-probe/Cargo.toml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -43,8 +43,10 @@ lane-h = ["lane-g"]
4343
# flush-cache interleaving - same dep set as lane-g.
4444
lane-i = ["dep:lance-graph-contract", "dep:ractor", "dep:tokio"]
4545
# Lane J (parameterized batch pipeline): lane I's shape with grid /
46-
# sink-lanes / registry knobs — needs lane-i's RowOwner.
47-
lane-j = ["lane-i"]
46+
# sink-lanes / registry knobs — needs lane-i's RowOwner. Also pulls
47+
# ndarray for the gridlake carrier (`GridBatch::as_gridlake_columns` →
48+
# `ndarray::simd::MultiLaneColumn`; DeepNSM→V3 D-DNV-1).
49+
lane-j = ["lane-i", "dep:ndarray"]
4850
# All 8 batching-method presets (src/presets.rs) — the lab-sweep surface;
4951
# see FINDINGS.md (agnostic record) + COMMENTARY.md (interpretation).
5052
presets = ["lane-g", "lane-h", "lane-i", "lane-j"]

crates/onebrc-probe/src/lane_j.rs

Lines changed: 134 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ use crate::lane_f::{fnv1a64, morton_slot};
4343
use crate::lane_i::RowOwner;
4444
use crate::{chunk_bounds, merge_maps, parse_temp_tenths, Stats};
4545
use lance_graph_contract::kanban::{ExecTarget, KanbanColumn, KanbanMove};
46+
use ndarray::simd::MultiLaneColumn;
4647
use ractor::{Actor, ActorProcessingErr, ActorRef, RpcReplyPort};
4748
use std::collections::{BTreeMap, VecDeque};
4849
use std::sync::atomic::{AtomicUsize, Ordering};
@@ -133,7 +134,7 @@ impl GridMemo {
133134

134135
// ─── Grid batch table (the gridlake SoA unit at grid=4096: ~80 KB) ──────
135136

136-
pub(crate) struct GridBatch {
137+
pub struct GridBatch {
137138
mins: Vec<i32>,
138139
maxs: Vec<i32>,
139140
sums: Vec<i64>,
@@ -180,6 +181,70 @@ impl GridBatch {
180181
}
181182
}
182183

184+
// ─── Gridlake carrier: the batch table AS ndarray MultiLaneColumns ──────
185+
186+
/// The lane-J `GridBatch` accumulators rendered as `ndarray::simd`
187+
/// [`MultiLaneColumn`] gridlake carriers — the SoA-contract carrier the
188+
/// proven 64×64 gridlake tile rides (`E-1BRC-GRIDLAKE-SWEETSPOT-1`), and the
189+
/// **same** `MultiLaneColumn` the COCA cognitive `Cell`
190+
/// (helix48/campq48/count/truth, `crates/deepnsm/examples/gridlake_coca_wire.rs`)
191+
/// composes from. This is the DeepNSM→V3 D-DNV-1 recognition
192+
/// (`.claude/plans/deepnsm-v3-convergence-v1.md`,
193+
/// `E-V3-DEEPNSM-IS-THE-ENCODER-NOT-A-MIGRATION-1`): the batch table is not a
194+
/// bespoke struct, it is typed lanes over one carrier — "wire, don't invent."
195+
///
196+
/// Lane widths follow the integer lanes ndarray added for exactly this
197+
/// (`iter_i32x16` "min/max tile columns", `iter_i64x8` "running sums"):
198+
/// min/max ride `I32x16`, sum rides `I64x8`, count (a non-negative
199+
/// accumulator) rides `U64x8`. Each column's backing buffer is a 64-byte
200+
/// multiple whenever `grid` is a multiple of 16 (i32·16 = i64·8 = u64·8 =
201+
/// 64 B), which the gridlake `grid = 4096` satisfies.
202+
pub struct GridlakeColumns {
203+
pub mins: MultiLaneColumn,
204+
pub maxs: MultiLaneColumn,
205+
pub sums: MultiLaneColumn,
206+
pub counts: MultiLaneColumn,
207+
}
208+
209+
impl GridBatch {
210+
/// Render the four accumulator columns as [`MultiLaneColumn`] gridlake
211+
/// carriers (little-endian bytes, zero semantic change — a *reading*, not
212+
/// a re-layout). `count` is widened `u32 → u64` to ride the unsigned
213+
/// 64-bit accumulator lane. Returns `Err(())` if a column buffer is not
214+
/// 64-byte aligned (i.e. `grid % 16 != 0`), mirroring
215+
/// `MultiLaneColumn::new`'s own contract.
216+
#[allow(clippy::result_unit_err)] // pass-through of MultiLaneColumn::new's Result<_, ()> alignment contract
217+
pub fn as_gridlake_columns(&self) -> Result<GridlakeColumns, ()> {
218+
fn col_i32(v: &[i32]) -> Result<MultiLaneColumn, ()> {
219+
let mut b = Vec::with_capacity(v.len() * 4);
220+
for &x in v {
221+
b.extend_from_slice(&x.to_le_bytes());
222+
}
223+
MultiLaneColumn::new(Arc::from(b))
224+
}
225+
fn col_i64(v: &[i64]) -> Result<MultiLaneColumn, ()> {
226+
let mut b = Vec::with_capacity(v.len() * 8);
227+
for &x in v {
228+
b.extend_from_slice(&x.to_le_bytes());
229+
}
230+
MultiLaneColumn::new(Arc::from(b))
231+
}
232+
fn col_u64_from_u32(v: &[u32]) -> Result<MultiLaneColumn, ()> {
233+
let mut b = Vec::with_capacity(v.len() * 8);
234+
for &x in v {
235+
b.extend_from_slice(&(x as u64).to_le_bytes());
236+
}
237+
MultiLaneColumn::new(Arc::from(b))
238+
}
239+
Ok(GridlakeColumns {
240+
mins: col_i32(&self.mins)?,
241+
maxs: col_i32(&self.maxs)?,
242+
sums: col_i64(&self.sums)?,
243+
counts: col_u64_from_u32(&self.counts)?,
244+
})
245+
}
246+
}
247+
183248
// ─── Laned sinks: each lane owns a contiguous row-range slice ───────────
184249

185250
enum LaneMsg {
@@ -594,6 +659,74 @@ pub fn lane_j_grid_pipeline(data: &[u8], workers: usize) -> BTreeMap<String, Sta
594659
mod tests {
595660
use super::*;
596661

662+
/// D-DNV-1 carrier foundation: the gridlake `GridBatch` renders
663+
/// losslessly through `ndarray::simd::MultiLaneColumn` — the LE bytes
664+
/// roundtrip cell-for-cell against the source accumulators, and the typed
665+
/// integer lanes (i32 min/max, i64 sum, u64 count) each yield the right
666+
/// window count over the carrier. This is the "wire, don't invent" proof
667+
/// that the batch table IS a MultiLaneColumn composition (the carrier the
668+
/// COCA cognitive Cell also rides).
669+
#[test]
670+
fn gridlake_batch_rides_multilane_column_losslessly() {
671+
let grid = 4096usize;
672+
let mut batch = GridBatch::new(grid);
673+
// A spread of cells with distinct signed extremes + running sums,
674+
// incl. the i32x16 / lane boundaries (15|16) and the tile edge (4095).
675+
for (k, &slot) in [0u16, 1, 15, 16, 255, 256, 4095].iter().enumerate() {
676+
let k = k as i32;
677+
batch.observe(slot, -(k * 10) - 3);
678+
batch.observe(slot, k * 7 + 11);
679+
batch.observe(slot, 5);
680+
}
681+
let cols = batch
682+
.as_gridlake_columns()
683+
.expect("grid=4096 columns are 64-byte aligned");
684+
685+
// Typed-lane views are wired: 4096 i32 = 256 × i32x16;
686+
// 4096 i64/u64 = 512 × i64x8/u64x8 (the lanes ndarray added for this).
687+
assert_eq!(cols.mins.len_i32x16(), grid / 16);
688+
assert_eq!(cols.mins.iter_i32x16().count(), grid / 16);
689+
assert_eq!(cols.maxs.len_i32x16(), grid / 16);
690+
assert_eq!(cols.sums.len_i64x8(), grid / 8);
691+
assert_eq!(cols.sums.iter_i64x8().count(), grid / 8);
692+
assert_eq!(cols.counts.len_u64x8(), grid / 8);
693+
assert_eq!(cols.counts.iter_u64x8().count(), grid / 8);
694+
695+
// LE roundtrip is cell-for-cell exact against the source accumulators.
696+
let dec_i32 = |c: &MultiLaneColumn| -> Vec<i32> {
697+
c.as_bytes()
698+
.chunks_exact(4)
699+
.map(|b| i32::from_le_bytes(b.try_into().expect("4-byte i32 chunk")))
700+
.collect()
701+
};
702+
let dec_i64 = |c: &MultiLaneColumn| -> Vec<i64> {
703+
c.as_bytes()
704+
.chunks_exact(8)
705+
.map(|b| i64::from_le_bytes(b.try_into().expect("8-byte i64 chunk")))
706+
.collect()
707+
};
708+
let dec_u64 = |c: &MultiLaneColumn| -> Vec<u64> {
709+
c.as_bytes()
710+
.chunks_exact(8)
711+
.map(|b| u64::from_le_bytes(b.try_into().expect("8-byte u64 chunk")))
712+
.collect()
713+
};
714+
assert_eq!(dec_i32(&cols.mins), batch.mins);
715+
assert_eq!(dec_i32(&cols.maxs), batch.maxs);
716+
assert_eq!(dec_i64(&cols.sums), batch.sums);
717+
let counts_u64: Vec<u64> = batch.counts.iter().map(|&c| c as u64).collect();
718+
assert_eq!(dec_u64(&cols.counts), counts_u64);
719+
}
720+
721+
/// The carrier refuses a mis-aligned grid (not a multiple of 16) rather
722+
/// than silently producing a non-64-byte column — the `MultiLaneColumn`
723+
/// contract surfaced at the batch boundary.
724+
#[test]
725+
fn gridlake_carrier_rejects_unaligned_grid() {
726+
let batch = GridBatch::new(72); // 72 % 16 != 0 → i32 col = 288 B, not 64-mult
727+
assert!(batch.as_gridlake_columns().is_err());
728+
}
729+
597730
/// Parity across the knob matrix corners: gridlake (4096) and full
598731
/// (65536) grids × 1 and 8 sink lanes × registry on/off, all with a
599732
/// small batch to force multi-batch flush-cache recycling.

0 commit comments

Comments
 (0)