diff --git a/.claude/board/EPIPHANIES.md b/.claude/board/EPIPHANIES.md index bb0053fd0..4b08187fc 100644 --- a/.claude/board/EPIPHANIES.md +++ b/.claude/board/EPIPHANIES.md @@ -1,3 +1,59 @@ +## 2026-06-06 — E-DEINTERLACE-TWO-SCALES — deinterlace is one operation at two scales; no-cross-cycle-lag = byte-scale deinterlace + +**Status:** FINDING (source-grounded; `temporal.rs` PR #468 confirms row-scale; byte-scale is a documented gap) +**Confidence:** High + +**The synthesis:** temporal causality in the SoA system must be enforced at two +independent scales that share the same monotonic clock: + +```text +Row/query scale → HLC tick + DependsClosure → temporal.rs::deinterlace() (SHIPPED, PR #468) +Byte/column scale → SoaEnvelope::cycle() stamp → MailboxSoA Arc-swap COW (GAP — plan written) +``` + +Both are the SAME operation — "sort by the causal clock and project the result +into the reader's reference frame" — but at different granularities. + +**Row scale (PR #468 confirms):** +`temporal.rs:18-20` defines the standing wave correctly: "merge-sort by HLC +tick and every field's row lands on one timeline. The result IS the standing +wave / kanban SoA." The `deinterlace()` function + `EpistemicMode` (Strict / +Aware / Retro) + `DependsClosure` implement this. 8 tests pass. + +**Byte scale (current gap):** +Nothing in the codebase prevents a reader from holding column data from SoA +cycle N and cycle N+1 in the same SIMD sweep. The `SoaEnvelope::cycle()` stamp +exists but is not enforced as a snapshot barrier. + +**The fix (plan: `cycle-coherent-soa-snapshot-v1.md`):** +Arc-swap COW at column granularity in `MailboxSoa::advance_phase`: +1. Writer increments `cycle`, then swaps the `Arc<[u8]>` of each mutated column. +2. Reader snapshots all column Arcs under one cycle stamp (lock-free retry). +3. The resulting `MailboxSoaSnapshot { cycle, cols }` is structurally coherent. + +**The boundary:** +`MultiLaneColumn` in ndarray stays layout-only. The Arc-swap policy lives in +lance-graph's `MailboxSoa`. ndarray does not learn that cycles exist. + +**The clock is one clock:** +`SoaEnvelope::cycle()` (byte scale) and `QueryReference::ref_version` (row +scale) are the same monotonic sequence. Threading `snapshot.cycle` into +`QueryReference` closes the loop: row-scale and byte-scale deinterlace use +the same clock. + +**Standing wave clarification (Q3 probe result):** +The "standing wave" is NOT a compute recurrence. It is the deinterlaced +projection over Lance versions — provided by Lance versioning itself (O(1) +90° lookup). Do not implement a standing wave in compute. + +**Cross-ref:** +- PR #468 (`temporal.rs`) — row-scale (SHIPPED) +- PR #477 (`soa_envelope.rs`) — envelope contract (IN REVIEW) +- `.claude/plans/cycle-coherent-soa-snapshot-v1.md` — byte-scale fix plan +- `docs/probes/q3-standing-wave-falsification.md` — probe confirming no wave in compute + +--- + ## 2026-06-04 — E-AUDIT-RETENTION-CAVEAT — substrate-b consumer doc Lance-versions-as-audit claim was overstated; corrected to retention-policy-gated (codex P1 on #465) **Status:** CORRECTION (codex P1 on PR #465, 2026-06-04; merged + immediate follow-up correction per the no-silent-edit discipline — the FIX appends; the original epiphany E-SUBSTRATE-B-CAPABILITY-ROADMAP stands as the corrected reference now reads). diff --git a/.claude/board/INTEGRATION_PLANS.md b/.claude/board/INTEGRATION_PLANS.md index f1aeaba8e..25c92e83d 100644 --- a/.claude/board/INTEGRATION_PLANS.md +++ b/.claude/board/INTEGRATION_PLANS.md @@ -1,3 +1,19 @@ +## 2026-06-06 — cycle-coherent-soa-snapshot-v1 (Arc-swap COW at column granularity; byte-scale deinterlace; no-cross-cycle-lag guarantee) + +**Status:** QUEUED. Design-spec only, no code. **Plan file:** `.claude/plans/cycle-coherent-soa-snapshot-v1.md`. +**Owns:** 6 deliverables D-SOA-SNAP-1..6. +- D-SOA-SNAP-1: `MailboxSoaSnapshot` type in lance-graph-contract +- D-SOA-SNAP-2: `SnapshotProvider` trait in lance-graph-contract +- D-SOA-SNAP-3: Arc-swap write path in `MailboxSoa::advance_phase` +- D-SOA-SNAP-4: `snapshot()` impl on `MailboxSoa` +- D-SOA-SNAP-5: No-cross-cycle-lag falsification test (writer thread + 8 reader threads) +- D-SOA-SNAP-6: Wire `snapshot.cycle` into `QueryReference` (close row-scale / byte-scale clock loop) +**Epiphany:** E-DEINTERLACE-TWO-SCALES (prepended 2026-06-06). +**Companion:** PR #468 (`temporal.rs`, row-scale, SHIPPED); PR #477 (`soa_envelope.rs`, IN REVIEW). +**Boundary:** ndarray stays layout-only (`MultiLaneColumn`); Arc-swap policy in lance-graph only. + +--- + ## 2026-06-05 — cesium-osm-substrate-v1 (OpenStreetMap as 6th source class for the 3DGS-ArcGIS-Cesium ingestion plan; OSM PBF → Arrow → Lance → SPO → cesium tileset → splat renderer; substrate-reuse with splat-native-ultrasound-v1) **Status:** PROPOSAL. Design-spec only, no code. **Plan file:** `.claude/plans/cesium-osm-substrate-v1.md` (~430 LOC). **Trigger:** user feasibility question on OSM × Cesium × Gaussian-splat coupling; cross-session coordination with OGAR. diff --git a/.claude/board/TECH_DEBT.md b/.claude/board/TECH_DEBT.md index 6d49b043d..1721076d5 100644 --- a/.claude/board/TECH_DEBT.md +++ b/.claude/board/TECH_DEBT.md @@ -15,6 +15,17 @@ ## Open Debt +### TD-UNBUNDLE-FROM-1 — `unbundle_from` is NOT the inverse of `bundle_into` (2026-06-07) + +**Open.** `crates/lance-graph-planner/src/cache/kv_bundle.rs` — `unbundle_from` +uses `wrapping_sub` as the "undo" of `bundle_into`. But `bundle_into` is a +weighted average: `(old * w_self + new * w_new) / total`. Subtraction is not the +inverse. `AttentionMatrix::set` calls both in sequence, silently corrupting the +gestalt ~1 bit per epoch. Measurable after ~100 epochs. Function is marked +`#[deprecated]` with a doc warning; callers use `#[allow(deprecated)]` + FIXME. +**Paid by:** switch to raw-sum + count tracking so exact integer subtraction is +possible. Cross-ref: `kv_bundle.rs:28-33`. + ### TD-HELIX-OVERLAP-1 (D-HELIX-1) — `helix` re-derives existing CERTIFIED primitives (clean-room by directive) **Open.** `crates/helix` ships as a zero-dep clean-room codec per the user directive "scoped only to crate." ~80% of its pipeline duplicates existing, in-places-CERTIFIED workspace code: Fisher-Z/arctanh→i8 (`bgz-tensor::projection::Base17Fz`, `bgz-tensor::fisher_z::FamilyGamma` ρ≥0.999), golden-spiral azimuth (`jc::weyl`), stride-4 coupling (`thinking-engine::reencode_safety`, `highheelbgz`), EULER_GAMMA hand-off (`jc::precond`, `bgz-tensor::euler_fold`), 256-palette/L1 (`bgz17::palette`). Genuinely new = the `√u` equal-area hemisphere placement + the PLACE/RESIDUE doctrine. **Paid by** (when it graduates from clean-room): the consolidation path in `crates/helix/KNOWLEDGE.md` § Overlap & Consolidation — re-export `FamilyGamma` behind a feature; route coupling through the canonical `(i·11)%17`/stride-4 zipper; feed `ResidueEdge` into the existing HIP/TWIG CAKES tier. **Also owed:** a fidelity-vs-ground-truth probe (the naive-u8 floor gate ≥0.9980 Pearson is currently CONJECTURE — NOT RUN) before promotion. Cross-ref: E-HELIX-OVERLAP, `encoding-ecosystem.md`. diff --git a/.claude/plans/cycle-coherent-soa-snapshot-v1.md b/.claude/plans/cycle-coherent-soa-snapshot-v1.md new file mode 100644 index 000000000..755ac15d8 --- /dev/null +++ b/.claude/plans/cycle-coherent-soa-snapshot-v1.md @@ -0,0 +1,176 @@ +# Plan: Cycle-Coherent SoA Snapshot — No-Cross-Cycle-Lag Guarantee + +**Version:** v1 +**Date:** 2026-06-06 +**Status:** Queued +**D-ids:** D-SOA-SNAP-1 through D-SOA-SNAP-6 + +--- + +## The problem + +`temporal.rs` (PR #468) closes the row-scale deinterlace gap: HLC tick → +`classify/deinterlace` → causally-coherent row sequence. But there is a +parallel byte-scale gap: nothing prevents a reader from holding a mix of +column data from cycle N and cycle N+1 within the same SIMD sweep. This is +the **cross-cycle lag problem** — a SIMD sweep that is not internally +single-cycle is not coherent. + +The deinterlace operation is one operation at two scales: + +```text +Row/query scale → HLC tick + DependsClosure → temporal.rs (SHIPPED, PR #468) +Byte/column scale → SoaEnvelope::cycle() stamp → MailboxSoA Arc-swap (THIS PLAN) +``` + +--- + +## The mechanism: Arc-swap COW at column granularity + +The SoA mailbox carries its columns as `Arc<[u8]>` slices (via +`MultiLaneColumn` in ndarray). The invariant is: + +> **A reader that snapshots all column Arcs at the same `cycle()` stamp sees +> a single coherent cycle. No column can be from a prior cycle.** + +### Write path (in `lance-graph`, `MailboxSoa::advance_phase`) + +On every `advance_phase(to: KanbanPhase)`: + +1. Increment `cycle` counter on the envelope. +2. For each mutated column: swap the `Arc` pointer — `Arc::make_mut` on the + backing `Arc<[u8]>` of the `MultiLaneColumn`, write the new data, then + publish the new Arc via an `ArcSwap` (or `RwLock>`). +3. The cycle increment is a `SeqCst` store (fence) BEFORE the column Arc + swaps. Readers who observe the new cycle will see the new column data. + +### Read path (in `lance-graph`, `MailboxSoaView`) + +On `snapshot()`: + +1. Load cycle stamp. +2. Clone all column Arcs under the same cycle stamp (atomic snapshot loop: + re-read cycle after loading all Arcs; retry if it changed — lock-free + single-retry is sufficient because writers are serialized through + `advance_phase`). +3. Return `MailboxSoaSnapshot { cycle, cols: [...] }`. + +The snapshot guarantees all column data is from the same cycle. + +### Boundary: ndarray stays layout-only + +`MultiLaneColumn` in ndarray is `Arc<[u8]>` with typed lane iterators — +**layout-only**. The Arc-swap policy (when to swap, how to snapshot, the +cycle fence) belongs in `lance-graph`'s `MailboxSoa`. ndarray never learns +that cycles or snapshots exist. The boundary is: + +```text +ndarray::simd::MultiLaneColumn — Arc<[u8]>, lane iters, Send + Sync, zero-copy reads +lance-graph::MailboxSoa — Arc-swap on advance_phase, cycle fence, snapshot() +``` + +### Connection to temporal.rs + +`SoaEnvelope::cycle()` is the byte-scale clock. `QueryReference::ref_version` +is the row-scale clock (a Lance version). They are the same monotonic clock +at different granularities — Lance version N corresponds to SoA cycle C(N). +When `temporal.rs::deinterlace` runs at query time, the `V_ref` it uses should +align with the `cycle()` of the snapshot being queried. + +Wiring: `VersionScheduler::on_version(&view, at, exec)` provides the Lance +version; the `MailboxSoaSnapshot` that went into that version carries its +`cycle`. Threading `snapshot.cycle` into `QueryReference` closes the loop so +row-scale and byte-scale deinterlace use the same clock. + +--- + +## Deliverables + +### D-SOA-SNAP-1 — `MailboxSoaSnapshot` type in lance-graph-contract + +A `MailboxSoaSnapshot` struct: `cycle: u32`, `cols: Vec>`. +Snapshot is `Send + Sync`. No reference to the originating `MailboxSoa`. +This is a point-in-time read — immutable after creation. + +### D-SOA-SNAP-2 — `SnapshotProvider` trait in lance-graph-contract + +```rust +pub trait SnapshotProvider { + fn snapshot(&self) -> MailboxSoaSnapshot; +} +``` + +Zero deps in contract. `MailboxSoa` in lance-graph implements it. + +### D-SOA-SNAP-3 — Arc-swap write path in `MailboxSoa::advance_phase` + +In lance-graph (not contract, not ndarray): implement the cycle fence + +column Arc-swap on every `advance_phase`. Use `std::sync::RwLock>` +per column (no external arc-swap crate needed unless benchmarks show +contention; add as a feature flag if needed). + +### D-SOA-SNAP-4 — `snapshot()` implementation on `MailboxSoa` + +Lock-free snapshot: load cycle, clone all column Arcs, re-read cycle, retry +once if changed. Return `MailboxSoaSnapshot`. + +### D-SOA-SNAP-5 — No-cross-cycle-lag falsification test + +```rust +// Spawn a writer thread: advance_phase in a loop (100 cycles). +// Spawn 8 reader threads: each calls snapshot() in a loop. +// Assert: every snapshot has all columns reporting the same cycle. +// Assert: no snapshot mixes data from two different cycles. +``` + +The test is the formal statement of the guarantee. If it passes, the +invariant is mechanically enforced, not just documented. + +### D-SOA-SNAP-6 — Wire `snapshot.cycle` into `QueryReference` + +In the planner: when a query resolves a `MailboxSoaSnapshot`, thread +`snapshot.cycle` through `QueryReference::hlc_tick` (or a new +`QueryReference::soa_cycle: Option` field) so `deinterlace` at +row scale uses the same cycle boundary as the snapshot at byte scale. + +--- + +## Prerequisite gap fixes (order matters) + +These mechanical fixes should land before or alongside D-SOA-SNAP-1 +(they settle the column shape): + +1. Remove `MailboxSoA::emit()` + `CollapseGateEmission` from source. +2. Rename `last_emission_cycle` → `last_active_cycle` in MailboxSoA. +3. Drop `entity_type: u16` from SoA row — MailboxId IS NiblePath. +4. Fix `OntologyRegistry::enumerate_first_with_entity_type_id` linear scan. +5. Remove `MappingRow.thinking_style` — Kanban owns thinking styles. +6. Fix `unbundle_from` in `kv_bundle.rs:29` — `wrapping_sub` is not the + inverse of weighted-average `bundle_into`. + +Items 1-5 settle the column shape before the Arc-swap schema is frozen. +Item 6 is independent but should not be deferred (correctness bug). + +--- + +## Non-goals + +- No recurrence / standing wave implementation. The standing wave is the + deinterlaced Lance version projection, provided by Lance versioning + (O(1) 90° lookup). Do not implement it in compute. +- No baton. No emission. No inter-mailbox handoff type. The snapshot is + consumed in-place; nothing is transmitted. +- ndarray does not learn about cycles, snapshots, or advance_phase. + +--- + +## Cross-references + +- `temporal.rs` (PR #468) — row-scale deinterlace (SHIPPED) +- `soa_envelope.rs` (PR #477) — envelope LE contract (IN REVIEW) +- `soa-three-tier-model.md` — three-tier lifecycle model +- `q3-standing-wave-falsification.md` — falsification: standing wave = Lance + versioning, not compute recurrence +- `.claude/board/EPIPHANIES.md` E-DEINTERLACE-TWO-SCALES — the synthesis +- `ndarray/src/simd_soa.rs` — `MultiLaneColumn` (layout-only; Arc-swap lives + in lance-graph, not here) diff --git a/CLAUDE.md b/CLAUDE.md index 6a2b3aacb..8afd3ab56 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1,5 +1,27 @@ # CLAUDE.md — lance-graph +## P0 — AdaWorldAPI forks ONLY, NEVER crates.io upstream + +**Always depend on the AdaWorldAPI fork of any crate that has one. NEVER use the +upstream crates.io version of a forked crate.** Non-negotiable; applies to every +`Cargo.toml` and every dependency decision in this repo. Every repo in this +workspace is local — prefer the local/fork source over the registry, always. + +- Crates with an `AdaWorldAPI/` fork — e.g. `ndarray`, `lance` / + `lancedb` / `lance-index` / `lance-linalg` / `lance-namespace`, `surrealdb`, + and any other — MUST be wired via the fork (`path` / `git` / `[patch.crates-io]`), + never the registry version. +- If a fork's coordinates (git URL, branch/tag, feature flag) are unknown, + **STOP and ask**. Do NOT fall back to crates.io as a convenience or to make a + build pass. +- `"warning: Patch ... was not used in the crate graph"` is a policy + alert. It can indicate missing fork wiring OR a transitive semver mismatch + that prevents the patch from applying. Do not ignore it: verify direct + `Cargo.toml` patch entries and `Cargo.lock` wiring, then track/resolve any + transitive blocker explicitly before closing the issue. +- crates.io is permitted ONLY for crates that have no AdaWorldAPI fork / no local + source. + > **Updated**: 2026-04-21 (categorical-algebraic inference click) > **Role**: The obligatory spine — query engine, codec stack, semantic transformer, and orchestration contract > **Status**: 22 crates, 7 in workspace, 15 excluded (standalone/DTO), Phases 1-2 DONE, Phases 6-7 DONE (grammar + governance), Phase 3 IN PROGRESS diff --git a/crates/lance-graph-contract/src/lib.rs b/crates/lance-graph-contract/src/lib.rs index e2a7e2c71..3c4638829 100644 --- a/crates/lance-graph-contract/src/lib.rs +++ b/crates/lance-graph-contract/src/lib.rs @@ -93,6 +93,7 @@ pub mod scheduler; pub mod sensorium; pub mod sigma_propagation; pub mod sla; +pub mod soa_envelope; pub mod soa_view; pub mod splat; pub mod tax; diff --git a/crates/lance-graph-contract/src/soa_envelope.rs b/crates/lance-graph-contract/src/soa_envelope.rs new file mode 100644 index 000000000..58f32a655 --- /dev/null +++ b/crates/lance-graph-contract/src/soa_envelope.rs @@ -0,0 +1,405 @@ +//! SoA envelope little-endian contract. +//! +//! # Why this module exists +//! +//! Column-level LE knowledge is not enough. ndarray's `MultiLaneColumn` +//! (the column carrier) already decodes its own bytes little-endian, and +//! `CausalEdge64` / `EpisodicEdges64` each know their own `to_le_bytes` / +//! `from_le_bytes`. But the **SoA envelope as a whole** — the thing a Lance +//! version snapshots, the thing `simd_soa` sweeps, the thing a future reader +//! decodes — has no contract describing how those columns *assemble* into one +//! row-strided packet. The parts know the LE contract; the envelope did not. +//! +//! [`SoaEnvelope`] is that missing contract. It makes the in-place SoA +//! backing store **self-describing at each cycle**: a stable column ordering, +//! a fixed row byte stride, a `cycle` version stamp, and a +//! [`ENVELOPE_LAYOUT_VERSION`]. With it, a Lance version IS a coherent LE +//! in-place layout at cycle N — not a loose collection of independently- +//! correct columns. Nothing is serialized or transmitted; the backing bytes +//! are resident in-place, zero-copy from creation to Lance tombstone. +//! +//! # Layering (read before adding an ndarray dependency here) +//! +//! This module is **zero-dep, byte-geometry only**. It describes *where* +//! columns sit in the backing store's row stride and *what* LE element each +//! holds — as data ([`ColumnDescriptor`]), never as ndarray generic bounds. +//! That keeps `lance-graph-contract` featherweight for its non-HPC consumers +//! (OGAR classes, ractor actors), and it keeps ndarray usable standalone by +//! any pure-SIMD consumer. +//! +//! The split is deliberate and complementary, not duplicated: +//! +//! | Level | Home | Answers | +//! |-------|------|---------| +//! | Column LE contract | `ndarray::simd::MultiLaneColumn` | "how do I sweep one typed column" | +//! | Envelope LE contract | this module | "where do columns sit in the row stride, what cycle is this" | +//! | Composition | `lance-graph` (always has both deps) | carve envelope columns → wrap each in `MultiLaneColumn` | +//! +//! ndarray never learns the envelope exists; this crate never learns ndarray +//! exists; `lance-graph` binds them. + +/// Layout version of the envelope byte geometry. +/// +/// Bumped whenever the meaning of [`ColumnDescriptor`] offsets/strides +/// changes. A reader MUST refuse to decode a packet whose stamped version it +/// does not understand (per `I-LEGACY-API-FEATURE-GATED`: layout reclaim is +/// paired with a version gate on the serialization path). +pub const ENVELOPE_LAYOUT_VERSION: u8 = 1; + +/// The little-endian element type of one column. +/// +/// Width only — no distance semantics, no domain meaning (cf. ndarray's +/// no-umbrella rule). The actual decode (`from_le_bytes`) happens in the +/// consumer's `MultiLaneColumn` lane iterator. +#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)] +#[repr(u8)] +pub enum ColumnKind { + U8 = 0, + I8 = 1, + U16 = 2, + I16 = 3, + U32 = 4, + F32 = 5, + U64 = 6, + F64 = 7, +} + +impl ColumnKind { + /// Bytes per element of this LE column kind. + pub const fn elem_bytes(self) -> usize { + match self { + ColumnKind::U8 | ColumnKind::I8 => 1, + ColumnKind::U16 | ColumnKind::I16 => 2, + ColumnKind::U32 | ColumnKind::F32 => 4, + ColumnKind::U64 | ColumnKind::F64 => 8, + } + } +} + +/// One column's placement within a single row of the backing store. +/// +/// `Copy` and `repr(C)` so a descriptor table is itself a stable LE artifact. +/// `name_id` is a stable column ordinal (an enum discriminant on the consumer +/// side), NOT a string — keeping this crate alloc-free and the descriptor +/// `Copy`. +#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)] +#[repr(C)] +pub struct ColumnDescriptor { + /// Stable column identity (consumer-side enum ordinal). + pub name_id: u16, + /// LE element kind. + pub kind: ColumnKind, + /// Elements of `kind` per row for this column (e.g. content = 256 × u64, + /// energy = 1 × f32). + pub elems_per_row: u16, + /// Byte offset of this column within one row packet. + pub row_offset: u32, +} + +impl ColumnDescriptor { + /// Bytes this column occupies in one row. + pub const fn col_bytes_per_row(&self) -> usize { + self.kind.elem_bytes() * self.elems_per_row as usize + } + + /// Byte range `[start, end)` of this column within a row packet. + pub const fn row_byte_range(&self) -> (usize, usize) { + let start = self.row_offset as usize; + (start, start + self.col_bytes_per_row()) + } +} + +/// What can go wrong validating an envelope's byte geometry. +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub enum EnvelopeError { + /// The stamped layout version is not the one this build understands. + LayoutVersionMismatch { expected: u8, found: u8 }, + /// Sum of column byte-widths does not equal the declared row stride. + StrideMismatch { declared: usize, summed: usize }, + /// Two columns overlap, or a gap/ordering violation was found. + ColumnOverlap { col_a: u16, col_b: u16 }, + /// A column's byte range ends past the declared row stride. Distinct from + /// [`StrideMismatch`]: the widths can sum to the stride while a column is + /// still positioned (via its `row_offset`) so its end exceeds the stride. + ColumnOutOfBounds { col: u16, col_end: usize, stride: usize }, + /// `as_le_bytes().len()` is not `row_stride * n_rows` (backing store size mismatch). + PacketSizeMismatch { expected: usize, found: usize }, + /// A requested row or column index is out of bounds. + OutOfBounds, +} + +/// The little-endian geometry contract for one SoA envelope cycle. +/// +/// Implemented by the owner of the in-place backing store (e.g. the mailbox +/// SoA). The envelope is zero-copy from creation to Lance tombstone — nothing +/// is serialized or transmitted; this trait describes *where columns sit* in +/// the already-resident backing bytes and *what cycle stamp* the store carries. +/// The read-only view here mirrors `MailboxSoaView` vs `MailboxSoaOwner`: +/// mutation lives on the owner type, never on this trait. +pub trait SoaEnvelope { + /// Layout version this implementor's geometry conforms to. + const LAYOUT_VERSION: u8 = ENVELOPE_LAYOUT_VERSION; + + /// Stable, ordered column placement table. Ordering is part of the + /// contract: a reader walks columns in this order. + fn columns(&self) -> &[ColumnDescriptor]; + + /// Total bytes per row across all columns. + fn row_stride(&self) -> usize; + + /// Number of rows in this snapshot. + fn n_rows(&self) -> usize; + + /// The version stamp this snapshot carries (the cycle whose committed + /// state these bytes are). This is what turns a Lance version into a + /// coherent "packet at cycle N". + fn cycle(&self) -> u32; + + /// The whole packet as contiguous LE bytes, zero-copy. Length MUST be + /// `row_stride() * n_rows()`. + fn as_le_bytes(&self) -> &[u8]; + + /// Zero-copy LE view of one full row. + fn row_le(&self, row: usize) -> Option<&[u8]> { + let stride = self.row_stride(); + let start = row.checked_mul(stride)?; + let end = start.checked_add(stride)?; + self.as_le_bytes().get(start..end) + } + + /// Zero-copy LE view of one column within one row. + fn column_le(&self, row: usize, col: &ColumnDescriptor) -> Option<&[u8]> { + let r = self.row_le(row)?; + let (start, end) = col.row_byte_range(); + r.get(start..end) + } + + /// Validate that the declared geometry is internally consistent and that + /// the backing packet matches. Call this at the Lance read boundary — a + /// v1 packet under a v2 reader (or a torn snapshot) is refused here rather + /// than silently mis-decoded downstream. + fn verify_layout(&self) -> Result<(), EnvelopeError> { + // 1. Version gate. + if Self::LAYOUT_VERSION != ENVELOPE_LAYOUT_VERSION { + return Err(EnvelopeError::LayoutVersionMismatch { + expected: ENVELOPE_LAYOUT_VERSION, + found: Self::LAYOUT_VERSION, + }); + } + // 2. Columns are non-overlapping, each fits within [0, stride), and + // their widths sum to the stride. + // Checking only the width sum is insufficient: two columns whose + // widths sum to the stride can still have one column whose end + // offset exceeds the stride (e.g. offsets 4+8 with stride 8). + let cols = self.columns(); + let mut summed = 0usize; + let stride = self.row_stride(); + // `row_offset as usize + col_bytes` can wrap on 32-bit targets (wasm32): + // row_offset is u32 (≤ 4.29e9) and col_bytes can reach 8 × 65535, so the + // sum can exceed usize::MAX on a 32-bit usize and wrap to a small value + // that would slip past the `a_end > stride` check. Compute every end with + // checked_add and reject overflow as ColumnOutOfBounds. + let checked_end = |c: &ColumnDescriptor| -> Result { + (c.row_offset as usize) + .checked_add(c.col_bytes_per_row()) + .ok_or(EnvelopeError::ColumnOutOfBounds { + col: c.name_id, + col_end: usize::MAX, + stride, + }) + }; + for (i, a) in cols.iter().enumerate() { + let a_start = a.row_offset as usize; + let a_end = checked_end(a)?; + summed += a.col_bytes_per_row(); + if a_end > stride { + return Err(EnvelopeError::ColumnOutOfBounds { + col: a.name_id, + col_end: a_end, + stride, + }); + } + for b in &cols[i + 1..] { + let b_start = b.row_offset as usize; + let b_end = checked_end(b)?; + let overlap = a_start < b_end && b_start < a_end; + if overlap { + return Err(EnvelopeError::ColumnOverlap { + col_a: a.name_id, + col_b: b.name_id, + }); + } + } + } + if summed != stride { + return Err(EnvelopeError::StrideMismatch { + declared: stride, + summed, + }); + } + // 3. Backing packet size matches stride × rows. + let expected = stride.saturating_mul(self.n_rows()); + let found = self.as_le_bytes().len(); + if expected != found { + return Err(EnvelopeError::PacketSizeMismatch { expected, found }); + } + Ok(()) + } +} + +#[cfg(test)] +mod tests { + use super::*; + + struct TestEnvelope { + cols: Vec, + stride: usize, + rows: usize, + bytes: Vec, + cycle: u32, + } + + impl SoaEnvelope for TestEnvelope { + fn columns(&self) -> &[ColumnDescriptor] { + &self.cols + } + fn row_stride(&self) -> usize { + self.stride + } + fn n_rows(&self) -> usize { + self.rows + } + fn cycle(&self) -> u32 { + self.cycle + } + fn as_le_bytes(&self) -> &[u8] { + &self.bytes + } + } + + fn two_col_envelope(rows: usize) -> TestEnvelope { + // col 0: 1 × f32 (4 B) at offset 0 + // col 1: 1 × u64 (8 B) at offset 4 + let cols = vec![ + ColumnDescriptor { + name_id: 0, + kind: ColumnKind::F32, + elems_per_row: 1, + row_offset: 0, + }, + ColumnDescriptor { + name_id: 1, + kind: ColumnKind::U64, + elems_per_row: 1, + row_offset: 4, + }, + ]; + let stride = 12; + TestEnvelope { + cols, + stride, + rows, + bytes: vec![0u8; stride * rows], + cycle: 7, + } + } + + #[test] + fn kind_widths() { + assert_eq!(ColumnKind::U8.elem_bytes(), 1); + assert_eq!(ColumnKind::F32.elem_bytes(), 4); + assert_eq!(ColumnKind::U64.elem_bytes(), 8); + } + + #[test] + fn descriptor_byte_range() { + let d = ColumnDescriptor { + name_id: 0, + kind: ColumnKind::U64, + elems_per_row: 256, + row_offset: 16, + }; + assert_eq!(d.col_bytes_per_row(), 256 * 8); + assert_eq!(d.row_byte_range(), (16, 16 + 256 * 8)); + } + + #[test] + fn valid_envelope_passes() { + let env = two_col_envelope(4); + assert_eq!(env.cycle(), 7); + assert!(env.verify_layout().is_ok()); + } + + #[test] + fn stride_mismatch_caught() { + let mut env = two_col_envelope(4); + env.stride = 16; // columns sum to 12, not 16 + env.bytes = vec![0u8; 16 * 4]; + assert_eq!( + env.verify_layout(), + Err(EnvelopeError::StrideMismatch { + declared: 16, + summed: 12, + }) + ); + } + + #[test] + fn overlap_caught() { + let mut env = two_col_envelope(1); + env.cols[1].row_offset = 2; // u64 at 2 overlaps f32 at [0,4) + env.stride = 10; + env.bytes = vec![0u8; 10]; + assert!(matches!( + env.verify_layout(), + Err(EnvelopeError::ColumnOverlap { .. }) + )); + } + + #[test] + fn column_past_stride_caught() { + // Two 4-byte columns at offsets 4 and 8 with stride 8. + // Width sum = 8 = stride, but column B's end (12) > stride (8). + let cols = vec![ + ColumnDescriptor { name_id: 0, kind: ColumnKind::F32, elems_per_row: 1, row_offset: 4 }, + ColumnDescriptor { name_id: 1, kind: ColumnKind::F32, elems_per_row: 1, row_offset: 8 }, + ]; + let env = TestEnvelope { cols, stride: 8, rows: 1, bytes: vec![0u8; 8], cycle: 0 }; + assert!(matches!( + env.verify_layout(), + Err(EnvelopeError::ColumnOutOfBounds { col: 1, col_end: 12, stride: 8 }) + )); + } + + #[test] + fn packet_size_mismatch_caught() { + let mut env = two_col_envelope(4); + env.bytes.truncate(12 * 3); // one row short + assert_eq!( + env.verify_layout(), + Err(EnvelopeError::PacketSizeMismatch { + expected: 48, + found: 36, + }) + ); + } + + #[test] + fn row_and_column_views_are_zero_copy_slices() { + let mut env = two_col_envelope(2); + // Write row 1, col 1 (u64) = 0x0102030405060708 LE. + let v: u64 = 0x0102_0304_0506_0708; + let row1_col1_start = 12 + 4; + env.bytes[row1_col1_start..row1_col1_start + 8].copy_from_slice(&v.to_le_bytes()); + + let row = env.row_le(1).unwrap(); + assert_eq!(row.len(), 12); + + let col = env.column_le(1, &env.cols[1]).unwrap(); + assert_eq!(col.len(), 8); + assert_eq!(u64::from_le_bytes(col.try_into().unwrap()), v); + + // Out of bounds. + assert!(env.row_le(2).is_none()); + } +} diff --git a/crates/lance-graph-planner/src/cache/kv_bundle.rs b/crates/lance-graph-planner/src/cache/kv_bundle.rs index 6b696ca7d..9ca013caa 100644 --- a/crates/lance-graph-planner/src/cache/kv_bundle.rs +++ b/crates/lance-graph-planner/src/cache/kv_bundle.rs @@ -26,6 +26,32 @@ pub fn bundle_into(source: &HeadPrint, target: &mut HeadPrint, weight_self: f32, } /// Unbundle: subtract out (XOR analog for i16). +/// +/// # Correctness warning — this is NOT the inverse of [`bundle_into`] +/// +/// [`bundle_into`] performs a **weighted average**: `(old * w_self + new * w_new) / total`. +/// Subtraction is the inverse of XOR-bundle or simple add-bundle, NOT of weighted-average +/// bundle. Calling this after `bundle_into` silently corrupts the gestalt: the gestalt +/// drifts every epoch because the removed value is the rounded average representation, +/// not the original addend. After ~100 epochs the corruption is measurable. +/// +/// The correct approach for an updateable gestalt is one of: +/// 1. **Re-accumulate**: on update, rebuild the gestalt from scratch by iterating all heads. +/// 2. **Track raw sum + count**: store `sum: [i32; BASE_DIM]` and `count: u32` separately +/// so exact subtraction is possible without rounding loss. +/// 3. **Accept approximate unbundle only when weight_self = weight_new = 1.0**: then +/// `bundle_into` reduces to `(old + new) / 2` and there is still no exact inverse. +/// +/// Tracked as tech-debt in `.claude/board/TECH_DEBT.md` (unbundle_from correctness). +/// +/// # Panics +/// Never panics — wrapping arithmetic. +#[deprecated( + since = "0.0.0", + note = "NOT the inverse of bundle_into (weighted-average). \ + See function doc for the correct unbundle strategies. \ + Tracked: .claude/board/TECH_DEBT.md — unbundle_from correctness." +)] pub fn unbundle_from(source: &HeadPrint, target: &mut HeadPrint) { for d in 0..BASE_DIM { target.dims[d] = target.dims[d].wrapping_sub(source.dims[d]); @@ -72,10 +98,16 @@ impl AttentionMatrix { } /// Set attention head and update gestalt. + /// + /// Note: the gestalt update uses the deprecated `unbundle_from` which is NOT the + /// exact inverse of `bundle_into` (weighted-average). The gestalt drifts slowly + /// over many epochs. FIXME: rebuild gestalt from scratch or switch to raw-sum + /// tracking — tracked in `.claude/board/TECH_DEBT.md`. pub fn set(&mut self, row: usize, col: usize, head: HeadPrint) { let idx = row * self.resolution + col; - // Unbundle old from gestalt + // Unbundle old from gestalt (approximate — see method doc). let old = self.heads[idx].clone(); + #[allow(deprecated)] unbundle_from(&old, &mut self.gestalt); // Bundle new into gestalt bundle_into(&head, &mut self.gestalt, self.epoch as f32, 1.0); @@ -129,8 +161,11 @@ mod tests { assert_eq!(target.dims[d], expected, "dim {d} mismatch"); } - // Unbundle b from target: should shift back toward a + // Unbundle b from target: documents the approximate (not exact) behaviour. + // This test verifies the arithmetic, NOT that round-trip fidelity holds + // (it doesn't — unbundle_from is not the inverse of bundle_into). let before_unbundle = target.clone(); + #[allow(deprecated)] unbundle_from(&b, &mut target); // After unbundle, each dim should be before - b for d in 0..BASE_DIM { diff --git a/crates/lance-graph/src/nsm/nsm_word.rs b/crates/lance-graph/src/nsm/nsm_word.rs index cebfd3d8a..f2642bac6 100644 --- a/crates/lance-graph/src/nsm/nsm_word.rs +++ b/crates/lance-graph/src/nsm/nsm_word.rs @@ -171,13 +171,16 @@ fn build_distance_matrix_from_cam( return WordDistanceMatrix::new(0); } - // Parse codebook: 6 subspaces x 256 centroids x subspace_dim floats - // We interpret codebook_bytes as f32 array - let codebook_floats: &[f32] = unsafe { - let ptr = codebook_bytes.as_ptr() as *const f32; - let len = codebook_bytes.len() / 4; - std::slice::from_raw_parts(ptr, len) - }; + // Parse codebook: 6 subspaces x 256 centroids x subspace_dim floats. + // Decode the bytes as a little-endian f32 array. A `&[u8]` carries no + // alignment guarantee, so a `*const f32` reinterpret would be UB on an + // unaligned buffer (mmap'd file, sub-slice). `from_le_bytes` over 4-byte + // chunks is alignment-free and endian-correct (matches the workspace LE + // contract). The codebook is only read below, so owning a Vec is fine. + let codebook_floats: Vec = codebook_bytes + .chunks_exact(4) + .map(|c| f32::from_le_bytes([c[0], c[1], c[2], c[3]])) + .collect(); // Determine subspace_dim: total_floats / (6 * 256) let total_floats = codebook_floats.len(); diff --git a/docs/architecture/soa-three-tier-model.md b/docs/architecture/soa-three-tier-model.md new file mode 100644 index 000000000..875c3ca96 --- /dev/null +++ b/docs/architecture/soa-three-tier-model.md @@ -0,0 +1,322 @@ +# SoA Three-Tier Model — Mailbox Lifecycle, Kanban, and Ontology + +> **Branch:** `claude/stoic-turing-M0Eiq` +> **Date:** 2026-06-06 +> **Authority:** Supersedes any prior baton/emission/CollapseGateEmission framing. + +--- + +## The invariant + +**Every SoA envelope is zero-copy from creation to Lance tombstone.** + +**Target state:** there is no baton, no emission, and no inter-mailbox handoff +type. No bytes leave the backing store until Lance's own columnar I/O writes +them to disk — and even then the in-memory store is unchanged, not serialized +and freed. + +**Current state:** legacy `MailboxSoA::emit()` and `CollapseGateEmission` +artifacts still exist in source and are scheduled for removal (see Tier 1 +below). Treat them as migration-only; do not call or extend them. + +--- + +## Tier 1 — MailboxSoA (primary, owned, zero-copy) + +The `MailboxSoA` is the single thought envelope. One mailbox owns one +SoA. The columns (`energy [f32;N]`, `plasticity [u8;N]`, `last_active_cycle [u32;N]`, +`edges [CausalEdge64;N]`, `qualia [QualiaI4_16D;N]`, `meta [MetaWord;N]`, +`entity_type [u16;N]`) are allocated once at mailbox creation and released at +Lance tombstone. + +``` +creation + │ + ▼ +MailboxSoA (backing store in-place; column LE contract = MultiLaneColumn) + │ (envelope LE contract = SoaEnvelope trait) + │ + ▼ Lance write on each version tick (LE bytes → columnar store) +DatasetVersion(v) → DatasetVersion(v+1) → ... + │ + ▼ +Lance soft-delete (tombstone) ← sole lifecycle event that ends the store +``` + +**Access contract:** +- `MailboxSoaView`: read-only, `&[T]` borrows, `edges_raw() -> &[u64]` +- `MailboxSoaOwner`: `advance_phase(&mut self, to: KanbanPhase)` — sole mutator + +**Idempotency guard:** `last_active_cycle [u32;N]` marks the cycle a row was +last written. It is a same-cycle guard, not a history column. (Rename from +`last_emission_cycle` in source — the emission framing is wrong.) + +**`MailboxSoA::emit()` and `CollapseGateEmission` are legacy artifacts from a +superseded design and are scheduled for removal.** Until that lands, treat them +as migration-only and non-canonical. There is no intended inter-mailbox handoff +type. + +--- + +## Tier 2 — KanbanColumn / Rubicon lifecycle (sole secondary data) + +The only data that is *secondary* to the SoA backing store is the Kanban +phase. This is triggered by the Lance writer, not by the SoA itself. + +``` +Lance writer → VersionScheduler::on_version(&view, at, exec) + │ read-only &V: never mutates + ▼ + Option { mailbox, from→to, libet_offset_us } + │ caller applies + ▼ + MailboxSoaOwner::advance_phase(to) ← SOLE mutator + +KanbanPhase lifecycle (6 states): + Planning → CognitiveWork → Evaluation → Commit → Plan → Prune +``` + +Above the SoA mailboxes, ractor (`lance-graph-supervisor`, ractor 0.14, +`supervisor` + `supervisor-lifecycle-audit` features) provides actor-level +meta-orchestration. Each mailbox is a ractor actor. The single-owner invariant +(no virtual ownership pointer needed) is enforced by Rust move semantics through +ractor's message-passing model. + +The Kanban column is the only data outside the SoA backing store that reflects +SoA lifecycle. There is no baton, no emission stream, no secondary truth column. + +--- + +## Tier 3 — OGAR classes + OGIT ontology (inherited, O(1)) + +The identity of a mailbox SoA resolves O(1) to its OGAR class and OGIT +ontology schema. This is Tier 3 because it is *inherited*, not stored per-row. + +**Resolution chain:** + +``` +mailbox address = u32 mailbox_id = NiblePath (MailboxId IS the NiblePath) + │ + ▼ HHTL radix-trie prefix walk — the u32 itself is the trie key + │ NiblePath::is_ancestor_of: + │ (other.path >> (4*(other.depth-self.depth))) == self.path + │ = prefix ancestry = class ancestry (Confirmed) + ▼ +OGIT radix-trie codebook (O(1) for known classes at compile time) + │ + ├─ class identity string ("ogit-op/WorkPackage") + ├─ schema (fields, assoc, enums, attributes) — stored ONCE per class in OntologyRegistry + └─ label inheritance (parent, mixins, STI) + +For new/runtime classes: JIT via lance-graph-planner (JITson / Cranelift) +``` + +**Thinking style is owned by the Kanban (Tier 2), not the class.** The class +does not dispatch a thinking style — the Kanban column does. Thinking styles +are an **O(1) lookup over an I4-32D address space** (32 nibbles × 4 bits = +128 bits = 2^128 distinct style addresses). The Rubicon phase the Kanban +assigns selects the style; the OGAR class supplies schema + tools, not the +style. (`MappingRow.thinking_style` as a per-class field is therefore the wrong +home — the style belongs to the Kanban lifecycle state, addressed O(1) in the +I4-32D space, not stored per ontology class.) + +**MailboxId IS the NiblePath.** The `u32 mailbox_id` field in `MailboxSoA` is not +a handle into a separate lookup — it IS the NiblePath key that the HHTL radix trie +walks. No separate prefix field survives. This makes `entity_type: u16` in every +SoA row entirely redundant: if the ontology resolves O(1) from the address, the +per-row handle violates SoC and defeats radix-trie cheapness. **`entity_type: u16` +removal from SoA rows is total** once O(1) lookup is the sole path. The current +linear scan in `OntologyRegistry::enumerate_first_with_entity_type_id` is a defect +— it should be an O(1) `Vec` index keyed by the 1-based ordinal, or removed entirely +once the per-row handle is gone. + +**OGAR active record / DLL AST adapter:** OGAR classes get pragmatic mapping +to inherited tools at compile time. These are cheap inherited registers, not +per-instance data in the SoA. The `Adapter::map` static identity transform in +OGAR + `KnowableFromStore` trait at the lance-graph boundary is the seam. + +**surrealdb / kv-lance:** OGAR's DLL AST → SurrealQL path (`ogar-adapter-surrealql`) +requires surrealdb with the `kv-lance` feature. This is BLOCKED(C) — the +`kv-lance` feature is only in the AdaWorldAPI surrealdb fork, coordinates +(git URL, branch) unknown. `surreal_container/Cargo.toml` dep is commented +out pending resolution. **Do not fall back to crates.io surrealdb.** + +--- + +## What does NOT exist (and must not be invented) + +| Concept | Status | +|---|---| +| `CollapseGateEmission` as cross-mailbox carrier | **WRONG** — scheduled for removal | +| `MailboxSoA::emit()` | **WRONG** — scheduled for removal | +| "Baton" as inter-mailbox handoff | **WRONG** — superseded | +| `wire_cost_bytes() = 13 + 10·baton_count` | **WRONG** — from CLAUDE.md E-BATON-1, now superseded | +| `Vsa16kF32` as a cross-mailbox carrier | **WRONG** — deprecated, lives only as legacy `cycle` column in `BindSpace` | +| Secondary data beyond KanbanColumn | **WRONG** — Kanban is the only secondary tier | +| BindSpace as the envelope | **MIGRATION IN PROGRESS** — BindSpace is the global legacy; MailboxSoA is the target | + +--- + +## Iron rules that fall out of this model + +1. `MailboxSoA` backing store is never copied, never serialized, never transmitted. + Lance writes LE bytes from it; the store itself stays in place. +2. `VersionScheduler` is read-only (`&V`). It proposes; `MailboxSoaOwner` disposes. +3. `MailboxSoA::emit()` and `CollapseGateEmission` are removed in the next + pass — they are not part of the correct design. +4. ractor provides the single-owner invariant for mailbox actors — no virtual + ownership pointer is needed. +5. Ontology resolution is O(1) HHTL prefix lookup for known classes. JITson + for new ones. The `entity_type: u16` per-row handle may be eliminated once + the O(1) lookup is the sole path. +6. surrealdb requires the AdaWorldAPI fork with `kv-lance`. Never fall back to + crates.io. BLOCKED(C) until fork coordinates are provided. + +--- + +## Register-file model — SoA as LE bytecode registers, OGAR class as instruction-set descriptor + +This is the load-bearing mental model. Read it before touching the SoA layout, +the class hierarchy, or any codegen template. + +### SoA columns = LE registers + +The `MailboxSoA` columns are CPU-style registers: fixed width, fixed byte +offset, little-endian, indexed by position. There is no schema in the row. +The row is a register bank. + +**Current / transitional layout** (target state removes `entity_type[N]` — see §3.2): + +```text + Byte offset Width Column LE kind + ────────── ───── ────── ─────── + 0 4·N energy[N] f32 × N + 4N N plasticity[N] u8 × N + 5N 4·N last_active_cycle[N] u32 × N + 9N 8·N edges[N] u64 × N (CausalEdge64 LE) + 17N N qualia[N] u8 × N (QualiaI4_16D packed) + 18N 2·N meta[N] u16 × N (MetaWord LE) + 20N 2·N entity_type[N] u16 × N ← TRANSITIONAL: scheduled + for removal once O(1) + HHTL lookup is the sole path + 22N 4 mailbox_id u32 + 22N+4 4 current_cycle u32 + ... ... (scalars follow) +``` + +The `SoaEnvelope` trait is the register-file descriptor: it names each +register's byte offset, width, and LE element kind — exactly what a CPU ABI +document does. `ColumnDescriptor` is one register descriptor. `verify_layout()` +is the ABI conformance check. + +`MultiLaneColumn` in ndarray is the load/store unit: it iterates the LE bytes +of one typed register into SIMD lanes. Nothing above this level cares about +byte order — it is resolved at the `from_le_bytes` boundary inside the lane +iterator. + +### OGAR class = instruction-set descriptor + DTO store for active record + +The OGAR class does NOT live in the register. It describes what the register +means. A class is: + +- **Label** — the OGIT identity string (`"ogit-op/WorkPackage"`), the + human-readable name, and the full label-inheritance chain up the HHTL trie. +- **Schema** — the field set: which fields exist, what types they are, which + are required vs optional. Stored once per class in `OntologyRegistry` + (`MappingRow`); never duplicated into SoA rows. +- **Tools** — the methods / adapters that operate on the register. These are + inherited from the class hierarchy (HHTL prefix ancestry = class ancestry). + A subclass inherits all parent tools without restating them. The default + mechanism is **compile-time Rust trait impls**: one `impl Tool for ClassFoo` + per class, monomorphized at build time from the HHTL inheritance chain. + Zero-cost — no vtable, no `dyn`, no runtime dispatch. "Cheap inherited + registers" in the architecture doc is literal: the trait impl IS the register, + and the compiler erases it to a direct call. + + **Dispatch escape hatch.** Runtime dispatch is *not* the default and is not + baked into the substrate. If a class genuinely needs runtime dispatch, it adds + a **dispatcher class** as an escape hatch attached to a register — an opt-in, + per-class override. The base case stays monomorphized and zero-cost; only the + classes that ask for dispatch pay for it. +- **Codegen templates** — Askama/Jinja `Class