You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Two layered updates from the post-PR #404-rollback session, both folded into
the existing consolidation + PR-X10 docs so PR #162 carries them.
## Layer 1 — PR #404 / PR #160 rollback salvage
### `heel_f64x8::{l1,l2,linf}_f64_simd` → PR-X10 A6 `linalg::distance`
The distance kernels were correct; the framing was wrong (filed as
"Sprint 0a of a four-repo integration arc" with cross-repo coupling that
made the rollback inherently cross-repo). Re-emerges as
`ndarray::hpc::linalg::distance::{l1,l2,linf}_f64_simd` under worker A6.
- `pr-x10-linalg-core-design.md`: added `distance.rs` to the module tree,
new section "Distance kernels — `linalg::distance`" with API surface +
precision class, A6 worker row updated (~400 → ~500 LoC, files now
include `linalg/distance.rs`)
- `stack-consolidation-bardioc-to-hhtl.md`: new "Salvage from the
2026-05-19 cross-repo rollback (PR #404 / PR #160)" section names the
re-entry point + lesson for future cross-repo arcs
### `lance-graph-contract::{ir, provider, actor}` → mostly dead, except…
`Operator`, `Cardinality`, `EngineHint`, `MvccProvider` types are
correctly dead (HHTL covers natively).
**Exception**: `SupervisableShader` + `RestartBackoff` reserved as
*column-flip-cycle commitment-gate primitives* for a future PR-X14 in
lance-graph. They wrap Ractor handlers that own a column-flip cycle
(read column → compute → flip state-flag → reply / drop). Re-framed
below under the column-substrate-identity model — they encode
flip-cycle semantics, not cross-store boundary plumbing.
## Layer 2 — Column-substrate identity (the deeper reframe)
The post-rollback session also produced an architectural collapse that
supersedes parts of the existing consolidation doc. Encoded across
three sections:
### New § "Column-substrate identity — Lance ≡ Arrow ≡ ndarray SoA"
One physical representation, end-to-end. Lance column ≡ Arrow column
buffer ≡ ndarray SoA — same bytes viewed through three names. Every
dialect surface (lance-graph cascade, SurrealDB, sea-orm, Databend,
Tantivy) parses its query language down to operations on those same
bytes. ndarray pays for the SIMD primitive once; the whole stack
collects rent.
Rubicon = *column-state flip*, not write event. A Thought is a Lance row
from allocation to query by any surface. "Crossing the Rubicon" means
flipping (e.g.) `committed: false → true` — versioned natively by Lance,
observed by any LIVE watcher with a matching predicate, no serialisation.
Section includes:
- The full Lance/Arrow/ndarray-SoA diagram with the five dialect surfaces
- "What this dissolves" table — 7 earlier framings now superseded
(mailbox writes, MvccProvider threading, surrealdb-ractor cf-event,
sea-orm entity-actor dispatch, Zone-as-storage-tier, TiKV-as-routing,
kv-lance-as-translation)
- "What survives — JITson / Cranelift, cleaner than before" — the
compile-time → JIT pipeline (DeriveEntityModel → Cranelift kernel
specialisation against OGIT-derived column types; ontology evolution
triggers next compile cycle → all surfaces auto-inherit)
- "Implication for the four-tier picture" — the substrate claim becomes
load-bearing in the right way; column IS the SoA IS the ndarray buffer
### Zone-model section rewritten
Zones are now defined as **temporal phases of column state on a single
Lance dataset**, not storage tiers. Table columns: column-state phase
(`committed=false` / `committed=true` / `egressed_at IS NOT NULL`),
which surface watches each phase, what "being in this zone" means.
Same physical bytes throughout — a row does not "move" from zone 1 to
zone 2; a column flips and the LIVE watchers notice. Section ends
pointing at § "Column-substrate identity" for the full unification.
### Click-moments inventory: three → four
Click-moment #2 (Ractor `&mut self`) gets a refinement note about
mailbox-cycle Rubicon (no physical boundary). New click-moment #4 —
"Multi-store consistency / cross-zone messaging looked like the hard
coordination problem → column-substrate identity shows there is no
cross-zone messaging." Concluding paragraph distinguishes the three
workload-shape dissolutions (#1-3) from the substrate-identity
dissolution (#4) which makes the others' "no copy, no marshal, no
coordination" claims literal.
### Salvage section's SupervisableShader framing updated
The earlier "zone-1↔zone-2 boundary" language was already wrong twice
in this PR; final framing under column-substrate identity: these are
column-flip-cycle commitment primitives. Lance's version chain provides
the natural retry semantics. The handler's "supervision boundary" is
the flip-cycle, not a perimeter — because there is no second store.
## Status after this commit
- PR #162 now carries: Phase 2 entry artifacts (canary plan + execution
prompt + PR-X10 verdict patch P1-1) AND PR #404 rollback salvage AND
column-substrate-identity reframe
- All four click-moments documented; framing across Zone model,
Click-moments inventory, and Salvage section is consistent
- PR-X10 A6 absorbs heel_f64x8 distance kernels with bench parity gate
- Re-entry path for SupervisableShader + RestartBackoff named
(future PR-X14 in lance-graph; first consumer is the NARS-revision
handler that flips `revised: false → true` per column-flip semantics)
├── sh.rs — extended SH (deg 0..=7) — supersedes splat3d/sh.rs deg-3 only
47
48
├── conv.rs — Conv1D + Conv2D (im2col + gemm path, direct path for small kernels)
@@ -169,6 +170,26 @@ Higham's scaling-and-squaring Padé(13/13) for general matrices (3 × ε_machine
169
170
170
171
**Precision class: EXACT** for SPD path (via `eig_sym` + scalar `vml::exp_f32`/`vml::ln_f32`); **VERIFY** for general path (Padé approximant order vs scaling depth trade-off).
**Ractor lives at zone boundaries**, never inside the zone-1 cascade. Actors are
100
-
the gates between deliberation and persistence (1↔2) and between persistence
101
-
and legacy egress (2↔3). Inside zone 1, the cascade is pure function composition
102
-
over typed surfaces.
95
+
|**Zone 1** (hot) |`committed = false`, currently held in mailbox-cycle scope | lance-graph cascade ops | the row is being deliberated; cascade compute is in-flight against the same bytes a future Zone-2 reader will see |
96
+
|**Zone 2** (warm) |`committed = true`, Lance-versioned | SurrealDB LIVE subscriptions, lance-graph reads | the row's truth-value crossed the Rubicon; any LIVE watcher with a matching predicate observes the flip as a column-state transition |
97
+
|**Zone 3** (cold) |`egressed_at IS NOT NULL`, mirrored once | sea-orm to legacy RDBMS | the row has been materialised into PG-shape for the legacy surface; the source Lance bytes are unchanged |
98
+
99
+
**Zones are temporal phases of column state on a single Lance dataset, not
100
+
storage tiers.** Same physical bytes throughout. A row does not "move" from
101
+
zone 1 to zone 2; a column flips from `committed = false` to `true`, and
102
+
the LIVE watchers notice. There is no serialise / marshal / wire-format
103
+
step between strata because there are no strata — there is one Lance
104
+
dataset, multiple state-flag columns, and multiple dialect surfaces reading
105
+
the same buffers.
106
+
107
+
This is the right framing for the Rubicon model: the crossing is a *column
108
+
flip*, not a write event. There is no "mailbox in RAM commits to
109
+
SurrealDB" — SurrealDB always saw the row, the row just changed state. The
110
+
mailbox-cycle still governs the commit (the handler decides when to flip
111
+
the flag, and `&mut self` there is the gated write), but the flip itself
112
+
is a state transition on bytes that didn't move.
113
+
114
+
What stays true from earlier framings:
115
+
- The cascade inside a single handler body is pure function composition over
116
+
typed surfaces (Rule #3 territory)
117
+
- The `&mut self` in the handler IS the gated write — legitimate because it
118
+
IS the Rubicon crossing (the column flip), not "during computation"
119
+
- Typed surfaces at the dialect interfaces (SurrealQL parses to column
120
+
predicates; sea-orm projects to legacy DTOs; Databend pushes filters to
121
+
column kernels) — but these are *type-level* contracts on how each
122
+
dialect reads the same bytes, not perimeters around different stores
123
+
124
+
See § "Column-substrate identity" below for the full unification.
125
+
126
+
## Column-substrate identity — Lance ≡ Arrow ≡ ndarray SoA
│ (ndarray SIMD ops directly over the column bytes;
137
+
│ no copy, no serde, no marshal — the "in-RAM Thought"
138
+
│ IS the Lance column slot)
139
+
│
140
+
├──→ SurrealDB: SurrealQL parses → reads the same column
141
+
│ LIVE subscription = a watch on column-state predicates
142
+
│
143
+
├──→ sea-orm: SQL via Lance backend → reads the same column
144
+
│ (Zone-3 egress is materialise-once into PG-shape for the
145
+
│ legacy surface; the source bytes are unchanged)
146
+
│
147
+
├──→ Databend: analytic SQL → reads the same column
148
+
│ (ndarray::simd kernel swap → operates on the same bytes
149
+
│ the cognitive cascade just operated on)
150
+
│
151
+
└──→ Tantivy: FTS index → built over the same column
152
+
```
153
+
154
+
**One physical representation, end to end.** The Lance column layout, the
155
+
Arrow column buffer layout, and the ndarray SoA layout are the same bytes
156
+
viewed through three names. The four dialect surfaces (lance-graph cascade,
157
+
SurrealDB, sea-orm, Databend, Tantivy) all parse their respective query
158
+
languages down to operations on those same bytes.
159
+
160
+
**ndarray amortises the SIMD primitive across the whole stack.** The same
161
+
kernel that runs the cognitive cascade, that Databend's filter pushdown
162
+
invokes, that Tantivy's indexer reads, that sea-orm projects to legacy
163
+
egress — they are the same kernel on the same bytes. ndarray pays for the
164
+
SIMD primitive once and the entire stack collects rent. No transcode tier,
165
+
no copy boundary, no format conversion at any zone.
166
+
167
+
**Rubicon = column-state flip, not write event.** A Thought is a Lance row
168
+
from the moment it is allocated to the moment it is queried by any surface.
169
+
"Crossing the Rubicon" means flipping (e.g.) `committed: false → true` —
170
+
versioned natively by Lance, observed by any LIVE watcher with a matching
171
+
predicate, no serialisation involved.
172
+
173
+
### What this dissolves
174
+
175
+
| Earlier framing (wrong) | Why it's wrong |
176
+
|---|---|
177
+
| "Mailbox writes to SurrealDB on Rubicon crossing" | There is no write — SurrealDB always saw the row; the row just changed state |
178
+
| "MvccProvider::snapshot_ts threads across engines" | There is one Lance dataset with one version chain; all readers see the same version |
179
+
| "surrealdb-ractor as cf-event router" | No cf-event-as-message needed; mailboxes already share the same column slice that SurrealDB watches |
180
+
| "sea-orm-ractor entity-actor dispatch by PK" | The mailbox IS the row; no separate dispatch layer |
181
+
| "Zone 1 in-process vs Zone 2 durable" (as storage tiers) | Same physical bytes; zones are temporal phases of column state, not storage tiers |
182
+
| "TiKV as routing / coordination layer" | TiKV ranges are Lance dataset shards under the XOR cascade — substrate, not routing |
183
+
| "kv-lance translates records into Lance rows for SurrealDB" | No translation; SurrealQL parses directly against Lance columns that lance-graph already owns |
184
+
185
+
### What survives — JITson / Cranelift, cleaner than before
186
+
187
+
The compile-time → JIT pipeline does not collapse with the framing — it
188
+
sharpens:
189
+
190
+
-**ndarray SoA layout = Lance column layout = known at OGIT-schema-compile time.**
191
+
The schema fixes the column shape; everything downstream specialises against it.
192
+
-**`DeriveEntityModel` (or equivalent) emits column-typed accessors at Rust
193
+
compile time** — typed handles into the same bytes for each dialect surface.
194
+
-**Cranelift JITs hot-path kernels specialised for the OGIT-derived column
195
+
types at first call** — predicate compilation, projection compilation,
196
+
cascade-step compilation, all against the typed column shape.
197
+
-**"Sinkin becomes compile next time"** — when a new column shape enters the
198
+
substrate (ontology evolution), the next compile cycle regenerates the typed
199
+
accessors and the JIT re-specialises against the new shape.
200
+
-**All four dialect surfaces automatically inherit the new kernels** because
201
+
they all operate on the same column layout. Add a column → all surfaces
202
+
see it. Specialise a kernel → all surfaces use it.
203
+
204
+
### Implication for the four-tier picture
205
+
206
+
The four-tier picture earlier in this doc names `ndarray::simd` as "the
207
+
common SIMD substrate across all four tiers". That claim is correct, but
208
+
its load-bearing reason is the column-substrate identity, not "we happen
209
+
to use the same SIMD library in four places". The deeper fact:
210
+
211
+
> **The column IS the SoA IS the ndarray buffer.** The cognitive cascade,
212
+
> the analytic scan, the FTS index build, and the graph traversal all
213
+
> operate on the same bytes through the same SIMD kernels. ndarray::simd
214
+
> is the common substrate because the substrate is genuinely one thing,
215
+
> not four parallel things wearing the same uniform.
216
+
217
+
This is the actually-clean Foundry-aspiring shape: one physical store, one
218
+
column layout, one kernel set, multiple dialect surfaces. The "same data,
219
+
different syntax" claim is finally literal — not "same schema across
| HHTL distribution math is wrong | High | This is the load-bearing claim; numerical certification (PR-X11 pillars) covers cascade ops; add formal proofs for the XOR-projection bijectivity property before zone-2 commit |
205
323
| 90° vector / Walsh-Hadamard basis breaks for non-projectable queries | High | API enforces "queries must be expressible in basis"; queries that aren't are bounced back to the caller with a typed error, not silently scanned |
206
324
207
-
## Click-moments inventory (the three architectural dissolutions)
325
+
## Click-moments inventory (the four architectural dissolutions)
208
326
209
327
These are the moments where a perceived problem turned out to not be a problem:
210
328
@@ -215,17 +333,32 @@ These are the moments where a perceived problem turned out to not be a problem:
215
333
2.**Ractor `&mut self` violated Rule #3** → **Rubicon model shows actors are
216
334
commitment gates, not shared-state mutators.** The handler body IS the
217
335
Rubicon crossing; `&mut self` there is the gated write, not "during
218
-
computation". Dual to Rule #3, not opposed.
336
+
computation". Dual to Rule #3, not opposed. **Refinement** (2026-05-19,
337
+
post-PR #404 rollback): the mailbox carries the commitment responsibility
338
+
implicitly, so there is no physical boundary between zones 1/2/3 for
339
+
actors to "live at" — Rubicon is per-mailbox-commit-cycle, distributed
340
+
everywhere there is a handler.
219
341
220
342
3.**ClickHouse OLAP gap blocked the new stack** → **HHTL shows the cognitive
221
343
workload doesn't need OLAP, just project-and-lookup.** ClickHouse stays in
222
344
Bardioc and is decommissioned when the last scan-aggregate query is ported
223
345
(which is never, because cognitive queries don't have that shape).
224
346
225
-
All three dissolutions are structural — they don't require new code, they
347
+
4.**Multi-store consistency / cross-zone messaging looked like the hard
348
+
coordination problem** → **Column-substrate identity shows there is no
0 commit comments