|
| 1 | +# Plan: Cycle-Coherent SoA Snapshot — No-Cross-Cycle-Lag Guarantee |
| 2 | + |
| 3 | +**Version:** v1 |
| 4 | +**Date:** 2026-06-06 |
| 5 | +**Status:** Queued |
| 6 | +**D-ids:** D-SOA-SNAP-1 through D-SOA-SNAP-6 |
| 7 | + |
| 8 | +--- |
| 9 | + |
| 10 | +## The problem |
| 11 | + |
| 12 | +`temporal.rs` (PR #468) closes the row-scale deinterlace gap: HLC tick → |
| 13 | +`classify/deinterlace` → causally-coherent row sequence. But there is a |
| 14 | +parallel byte-scale gap: nothing prevents a reader from holding a mix of |
| 15 | +column data from cycle N and cycle N+1 within the same SIMD sweep. This is |
| 16 | +the **cross-cycle lag problem** — a SIMD sweep that is not internally |
| 17 | +single-cycle is not coherent. |
| 18 | + |
| 19 | +The deinterlace operation is one operation at two scales: |
| 20 | + |
| 21 | +```text |
| 22 | +Row/query scale → HLC tick + DependsClosure → temporal.rs (SHIPPED, PR #468) |
| 23 | +Byte/column scale → SoaEnvelope::cycle() stamp → MailboxSoA Arc-swap (THIS PLAN) |
| 24 | +``` |
| 25 | + |
| 26 | +--- |
| 27 | + |
| 28 | +## The mechanism: Arc-swap COW at column granularity |
| 29 | + |
| 30 | +The SoA mailbox carries its columns as `Arc<[u8]>` slices (via |
| 31 | +`MultiLaneColumn` in ndarray). The invariant is: |
| 32 | + |
| 33 | +> **A reader that snapshots all column Arcs at the same `cycle()` stamp sees |
| 34 | +> a single coherent cycle. No column can be from a prior cycle.** |
| 35 | +
|
| 36 | +### Write path (in `lance-graph`, `MailboxSoa::advance_phase`) |
| 37 | + |
| 38 | +On every `advance_phase(to: KanbanPhase)`: |
| 39 | + |
| 40 | +1. Increment `cycle` counter on the envelope. |
| 41 | +2. For each mutated column: swap the `Arc` pointer — `Arc::make_mut` on the |
| 42 | + backing `Arc<[u8]>` of the `MultiLaneColumn`, write the new data, then |
| 43 | + publish the new Arc via an `ArcSwap` (or `RwLock<Arc<MultiLaneColumn>>`). |
| 44 | +3. The cycle increment is a `SeqCst` store (fence) BEFORE the column Arc |
| 45 | + swaps. Readers who observe the new cycle will see the new column data. |
| 46 | + |
| 47 | +### Read path (in `lance-graph`, `MailboxSoaView`) |
| 48 | + |
| 49 | +On `snapshot()`: |
| 50 | + |
| 51 | +1. Load cycle stamp. |
| 52 | +2. Clone all column Arcs under the same cycle stamp (atomic snapshot loop: |
| 53 | + re-read cycle after loading all Arcs; retry if it changed — lock-free |
| 54 | + single-retry is sufficient because writers are serialized through |
| 55 | + `advance_phase`). |
| 56 | +3. Return `MailboxSoaSnapshot { cycle, cols: [...] }`. |
| 57 | + |
| 58 | +The snapshot guarantees all column data is from the same cycle. |
| 59 | + |
| 60 | +### Boundary: ndarray stays layout-only |
| 61 | + |
| 62 | +`MultiLaneColumn` in ndarray is `Arc<[u8]>` with typed lane iterators — |
| 63 | +**layout-only**. The Arc-swap policy (when to swap, how to snapshot, the |
| 64 | +cycle fence) belongs in `lance-graph`'s `MailboxSoa`. ndarray never learns |
| 65 | +that cycles or snapshots exist. The boundary is: |
| 66 | + |
| 67 | +```text |
| 68 | +ndarray::simd::MultiLaneColumn — Arc<[u8]>, lane iters, Send + Sync, zero-copy reads |
| 69 | +lance-graph::MailboxSoa — Arc-swap on advance_phase, cycle fence, snapshot() |
| 70 | +``` |
| 71 | + |
| 72 | +### Connection to temporal.rs |
| 73 | + |
| 74 | +`SoaEnvelope::cycle()` is the byte-scale clock. `QueryReference::ref_version` |
| 75 | +is the row-scale clock (a Lance version). They are the same monotonic clock |
| 76 | +at different granularities — Lance version N corresponds to SoA cycle C(N). |
| 77 | +When `temporal.rs::deinterlace` runs at query time, the `V_ref` it uses should |
| 78 | +align with the `cycle()` of the snapshot being queried. |
| 79 | + |
| 80 | +Wiring: `VersionScheduler::on_version(&view, at, exec)` provides the Lance |
| 81 | +version; the `MailboxSoaSnapshot` that went into that version carries its |
| 82 | +`cycle`. Threading `snapshot.cycle` into `QueryReference` closes the loop so |
| 83 | +row-scale and byte-scale deinterlace use the same clock. |
| 84 | + |
| 85 | +--- |
| 86 | + |
| 87 | +## Deliverables |
| 88 | + |
| 89 | +### D-SOA-SNAP-1 — `MailboxSoaSnapshot` type in lance-graph-contract |
| 90 | + |
| 91 | +A `MailboxSoaSnapshot` struct: `cycle: u32`, `cols: Vec<Arc<MultiLaneColumn>>`. |
| 92 | +Snapshot is `Send + Sync`. No reference to the originating `MailboxSoa`. |
| 93 | +This is a point-in-time read — immutable after creation. |
| 94 | + |
| 95 | +### D-SOA-SNAP-2 — `SnapshotProvider` trait in lance-graph-contract |
| 96 | + |
| 97 | +```rust |
| 98 | +pub trait SnapshotProvider { |
| 99 | + fn snapshot(&self) -> MailboxSoaSnapshot; |
| 100 | +} |
| 101 | +``` |
| 102 | + |
| 103 | +Zero deps in contract. `MailboxSoa` in lance-graph implements it. |
| 104 | + |
| 105 | +### D-SOA-SNAP-3 — Arc-swap write path in `MailboxSoa::advance_phase` |
| 106 | + |
| 107 | +In lance-graph (not contract, not ndarray): implement the cycle fence + |
| 108 | +column Arc-swap on every `advance_phase`. Use `std::sync::RwLock<Arc<MultiLaneColumn>>` |
| 109 | +per column (no external arc-swap crate needed unless benchmarks show |
| 110 | +contention; add as a feature flag if needed). |
| 111 | + |
| 112 | +### D-SOA-SNAP-4 — `snapshot()` implementation on `MailboxSoa` |
| 113 | + |
| 114 | +Lock-free snapshot: load cycle, clone all column Arcs, re-read cycle, retry |
| 115 | +once if changed. Return `MailboxSoaSnapshot`. |
| 116 | + |
| 117 | +### D-SOA-SNAP-5 — No-cross-cycle-lag falsification test |
| 118 | + |
| 119 | +```rust |
| 120 | +// Spawn a writer thread: advance_phase in a loop (100 cycles). |
| 121 | +// Spawn 8 reader threads: each calls snapshot() in a loop. |
| 122 | +// Assert: every snapshot has all columns reporting the same cycle. |
| 123 | +// Assert: no snapshot mixes data from two different cycles. |
| 124 | +``` |
| 125 | + |
| 126 | +The test is the formal statement of the guarantee. If it passes, the |
| 127 | +invariant is mechanically enforced, not just documented. |
| 128 | + |
| 129 | +### D-SOA-SNAP-6 — Wire `snapshot.cycle` into `QueryReference` |
| 130 | + |
| 131 | +In the planner: when a query resolves a `MailboxSoaSnapshot`, thread |
| 132 | +`snapshot.cycle` through `QueryReference::hlc_tick` (or a new |
| 133 | +`QueryReference::soa_cycle: Option<u32>` field) so `deinterlace` at |
| 134 | +row scale uses the same cycle boundary as the snapshot at byte scale. |
| 135 | + |
| 136 | +--- |
| 137 | + |
| 138 | +## Prerequisite gap fixes (order matters) |
| 139 | + |
| 140 | +These mechanical fixes should land before or alongside D-SOA-SNAP-1 |
| 141 | +(they settle the column shape): |
| 142 | + |
| 143 | +1. Remove `MailboxSoA::emit()` + `CollapseGateEmission` from source. |
| 144 | +2. Rename `last_emission_cycle` → `last_active_cycle` in MailboxSoA. |
| 145 | +3. Drop `entity_type: u16` from SoA row — MailboxId IS NiblePath. |
| 146 | +4. Fix `OntologyRegistry::enumerate_first_with_entity_type_id` linear scan. |
| 147 | +5. Remove `MappingRow.thinking_style` — Kanban owns thinking styles. |
| 148 | +6. Fix `unbundle_from` in `kv_bundle.rs:29` — `wrapping_sub` is not the |
| 149 | + inverse of weighted-average `bundle_into`. |
| 150 | + |
| 151 | +Items 1-5 settle the column shape before the Arc-swap schema is frozen. |
| 152 | +Item 6 is independent but should not be deferred (correctness bug). |
| 153 | + |
| 154 | +--- |
| 155 | + |
| 156 | +## Non-goals |
| 157 | + |
| 158 | +- No recurrence / standing wave implementation. The standing wave is the |
| 159 | + deinterlaced Lance version projection, provided by Lance versioning |
| 160 | + (O(1) 90° lookup). Do not implement it in compute. |
| 161 | +- No baton. No emission. No inter-mailbox handoff type. The snapshot is |
| 162 | + consumed in-place; nothing is transmitted. |
| 163 | +- ndarray does not learn about cycles, snapshots, or advance_phase. |
| 164 | + |
| 165 | +--- |
| 166 | + |
| 167 | +## Cross-references |
| 168 | + |
| 169 | +- `temporal.rs` (PR #468) — row-scale deinterlace (SHIPPED) |
| 170 | +- `soa_envelope.rs` (PR #477) — envelope LE contract (IN REVIEW) |
| 171 | +- `soa-three-tier-model.md` — three-tier lifecycle model |
| 172 | +- `q3-standing-wave-falsification.md` — falsification: standing wave = Lance |
| 173 | + versioning, not compute recurrence |
| 174 | +- `.claude/board/EPIPHANIES.md` E-DEINTERLACE-TWO-SCALES — the synthesis |
| 175 | +- `ndarray/src/simd_soa.rs` — `MultiLaneColumn` (layout-only; Arc-swap lives |
| 176 | + in lance-graph, not here) |
0 commit comments