|
| 1 | +# Integration plan — loose ends → the Spain-grid acceptance gate |
| 2 | + |
| 3 | +Status legend: ☐ open · ◐ in progress · ☑ done (this session) |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +## Done this session (the foundation) |
| 8 | + |
| 9 | +- ☑ **ractor messaging compiles.** `MessagingErr::Saturated` handled at all |
| 10 | + three match sites (`actor.rs`, `thread_local/inner.rs`, `derived_actor.rs`). |
| 11 | + This is the kanban backpressure valve. (AdaWorldAPI/ractor#2, merged.) |
| 12 | +- ☑ **kv-lance feature gates proven + documented.** Lite-unified surreal |
| 13 | + compiles without RocksDB/C++ storage. (AdaWorldAPI/surrealdb#47, #48, merged.) |
| 14 | +- ☑ **Golden image compiles + links.** `cargo build` exit 0, 19m18s, |
| 15 | + `target/debug/symbiont` 4.2 MB, 912 packages, zero errors. The five forks |
| 16 | + resolve AND compile+link into one binary; lockstep pins held. (This is a |
| 17 | + compile milestone — it proves nothing about runtime data flow; see the |
| 18 | + loose-end ledger below.) |
| 19 | +- ☑ **Perturbation-sim NaN foundations.** `cascade.rs` preserve-last-finite |
| 20 | + abort + `perturbation_shape_is_always_finite` test; `stats.rs` empty-slice |
| 21 | + guards on `mean`/`pop_var`. (lance-graph, merged.) |
| 22 | + |
| 23 | +--- |
| 24 | + |
| 25 | +## Council findings (5+3 hardening, 2026-06-19) — read before §A |
| 26 | + |
| 27 | +An 8-agent council (5 research + 3 brutal reviewers) audited the gap between |
| 28 | +"compiles" and the win condition. The one finding everything reduced to: |
| 29 | + |
| 30 | +> **The five crates are linked into one binary with ZERO runtime edges |
| 31 | +> between them.** "Compiles" proves the dependency graph; it proves nothing |
| 32 | +> about data flow. There are **three incompatible "node" representations and |
| 33 | +> no adapter between any of them:** |
| 34 | +> 1. canonical `NodeRow` (4096-bit, `lance-graph-contract::canonical_node`) — what the win condition means by "16K-node SoA" |
| 35 | +> 2. `VersionedGraph::NodeSchema` (SPO triple planes, `FixedSizeBinary(2048)`, `blasgraph/columnar.rs`) — what `LanceVersionScheduler` *actually* reads today |
| 36 | +> 3. perturbation-sim's `Grid`/`PerturbationShape` (plain `f64`) — what the cascade produces |
| 37 | +
|
| 38 | +**☐ D0 — PREREQUISITE DECISION (gates all of §A): pick which representation |
| 39 | +"the 16K-node SoA" is.** A2 says "canonical 4096-bit node"; the only wired |
| 40 | +Lance substrate (`VersionedGraph`) uses a *different* SPO-plane schema. They |
| 41 | +cannot both be "the 16K-node SoA." Decide canon (`NodeRow`) and the §A work |
| 42 | +targets it; until written down, the Grid→substrate bridge can't be aimed. |
| 43 | + |
| 44 | +**Corrected prerequisite chain** (the plan's flat checkboxes hid these): |
| 45 | +`D0 (pick representation)` → `A1 fixture` (also: create the `tests/` dir — it |
| 46 | +doesn't exist) → `#1 perturbation-sim gains lance-graph-contract dep` → |
| 47 | +`A2 Grid→NodeRow bridge` → `#3 NodeRowPacket→Lance writer` → `A3/A4`. |
| 48 | +`C2` (clippy, §C) is independent and **failing now** — cheapest to clear. |
| 49 | +The entire kanban loop (ractor scheduler, jitson dispatch, surrealdb version |
| 50 | +stream) is **genuinely post-gate** — the 3-part gate needs none of it. |
| 51 | + |
| 52 | +**Key-encoding probe (gates whether A2 is mechanical):** the *value* side of |
| 53 | +the bridge is a 0-friction OPPORTUNITY (`basin.rs::as_row()[5]` + |
| 54 | +`buffer.rs::inertia_buffer_column()` → `ValueTenant` slots, algebra aligned). |
| 55 | +The *key* side is WORTH-EXPLORING: `hhtl.rs::HhtlKey` is the binary-Cheeger |
| 56 | +1-bit/tier instance, **not** OGAR's 16-ary/256-centroid production key — it |
| 57 | +type-aligns (`u16×3`) but isn't prefix-routable. Probe first: does the binary |
| 58 | +key give acceptable HHTL routing locality on the Spain grid, or must the |
| 59 | +centroid encoder (compose `basin.rs::spectral_embedding` + `splat.rs::morton2`) |
| 60 | +be built before A4's cascade routing is meaningful? |
| 61 | + |
| 62 | +**Honesty corrections applied to the docs (overclaim-auditor):** the README |
| 63 | +no longer states the substrate "carries" Spain's grid in present tense; the |
| 64 | +build milestone is scoped to compile/link (done) vs data-flow (not); the |
| 65 | +"912 packages" claim is scoped to resolution+build, with the two-`object_store` |
| 66 | +caveat noted. |
| 67 | + |
| 68 | +### Reviewer findings — golden-image setup correctness (P0/P1 reviewers) |
| 69 | + |
| 70 | +Verdicts: brutally-honest-tester = **HOLD**, baton-handoff-auditor = |
| 71 | +**CATCH-LATENT**. The image links cleanly today; these harden it into a |
| 72 | +*reproducible* foundation. None blocks the current green build. |
| 73 | + |
| 74 | +- **☑ R1 — ndarray duplication: ACCEPTED as cosmetic (decision 2026-06-19).** |
| 75 | + The graph links two ndarray-fork instances (surrealdb-core's git rev + |
| 76 | + lance-graph's path) plus the real crates.io `ndarray 0.16.1` lance-index |
| 77 | + legitimately needs. The 5+3 council confirmed **no ndarray type crosses the |
| 78 | + surrealdb↔lance-graph seam**, so the duplication never manifests at a call |
| 79 | + boundary — pure binary-size cosmetics, not a correctness issue. The proven |
| 80 | + green build (912 packages, exit 0) had exactly this shape. |
| 81 | + **Two fixes were tried and rejected:** (a) relabeling the shared fork's |
| 82 | + version `0.17.2→0.16.1` — dirty, lies about the fork's identity to every |
| 83 | + consumer; (b) vendoring lance-index + bumping its one ndarray req to `0.17` |
| 84 | + — honest but adds 126 vendored files + an unproven compile for a non-problem. |
| 85 | + **Resolution: leave the duplicate.** Revisit only if a real workload needs to |
| 86 | + pass an ndarray type across the surrealdb↔lance-graph boundary (then the |
| 87 | + clean route is the AdaWorldAPI lance-index fork bumped to ndarray 0.17). |
| 88 | +- **☐ R2 — commit `symbiont/Cargo.lock`.** It exists on disk (the build |
| 89 | + generated it) but isn't tracked. Without it, `branch`-pinned git deps |
| 90 | + (OGAR's surrealdb `main`, ndarray) can resolve to different commits on |
| 91 | + different days → not byte-reproducible. |
| 92 | +- **☐ R3 — pin OGAR's surrealdb git dep to an exact `rev`.** `OGAR/Cargo.toml` |
| 93 | + uses `branch = "main"`, but symbiont's `[patch]` silently substitutes the |
| 94 | + local tree on a *different* branch. Compiles today (AST shape matches); |
| 95 | + drops the baton if the local branch advances the AST or the patch is removed. |
| 96 | +- **☐ R4 — regenerate `/home/user/surrealdb/Cargo.lock`.** It resolves lance |
| 97 | + **6.0.0** / lancedb 0.29 — contradicting surrealdb's own `=7.0.0` manifest |
| 98 | + pin. surrealdb's kv-lance-on-lance-7 path was **never resolved inside |
| 99 | + surrealdb's own workspace**; symbiont is the first witness. Regenerate so |
| 100 | + the fork's CI exercises lance 7. |
| 101 | +- **note — absolute paths are deliberate** (`publish = false`); the image is |
| 102 | + intentionally machine-pinned to `/home/user/{...}`. Switch to relative |
| 103 | + (`../`) only if portability is wanted. |
| 104 | + |
| 105 | +**NaN coverage (reviewer-confirmed, strong):** `cascade.rs:146` finite-guard, |
| 106 | +`perturbation.rs` `FRAGMENTATION_SENTINEL = +∞` (deliberately not NaN, |
| 107 | +finiteness-checkable), `eigen.rs:123` div-guard, `stats.rs` divisor floors. |
| 108 | +One real P2 gap: a `+∞` sentinel reaching `stats::pearson` makes `saa*sbb=+∞` |
| 109 | +→ `sqrt`→ ratio → **NaN**, and the `<1e-12` guard does NOT catch `+∞`. Add an |
| 110 | +`is_finite` filter at the stats boundary + a `pearson_rejects_nonfinite` test. |
| 111 | +This folds into §B (the NaN-free win condition). |
| 112 | + |
| 113 | +## The acceptance gate (the biggest goal) |
| 114 | + |
| 115 | +> **16K-node SoA substrate carries every Spanish electricity node; the |
| 116 | +> perturbation cascade runs NaN-free; `cargo clippy` + `cargo machete` clean.** |
| 117 | +
|
| 118 | +### A. Substrate carries the Spanish grid |
| 119 | + |
| 120 | +- ☐ **A1 — source the Spanish grid topology.** REE / ENTSO-E node + line |
| 121 | + list (buses, lines, transformers, susceptances). Deterministic fixture |
| 122 | + checked into `perturbation-sim/tests/fixtures/` (no network at test time). |
| 123 | +- ☐ **A2 — map each grid node → one canonical 4096-bit node.** |
| 124 | + `key(16) = classid(u32) | HEEL | HIP | TWIG | family(u24) | identity(u24)`. |
| 125 | + Grid nodes start in the default basin (classid=0, family=0); `identity` |
| 126 | + alone discriminates (16.7M capacity — Spain's ~10³–10⁴ buses fit trivially). |
| 127 | + Edges (12 in-family + 4 out-of-family) carry the line adjacency. |
| 128 | +- ☐ **A3 — load the grid into a `MailboxSoA` view over a Lance dataset.** |
| 129 | + The 16K-node column is the Lance-backed SoA; this is where `kv-lance` |
| 130 | + earns its place (zero-copy columnar, versioned). |
| 131 | +- ☐ **A4 — run the cascade over the full node set.** `cascade.rs` |
| 132 | + (Weyl/Davis-Kahan spectral perturbation ∘ DC-power-flow/LODF) + |
| 133 | + `basin.rs` (Kron-reduced cross-border super-nodes) + `scorecard.rs` |
| 134 | + (ES `policy_mult` 1.3, `H` 2.0). Output: the perturbation SHAPE per node. |
| 135 | + |
| 136 | +### B. NaN-free, enforced |
| 137 | + |
| 138 | +- ☐ **B1 — NaN linter guard.** A clippy lint / debug-assert pass that fails |
| 139 | + if any `f32`/`f64` in the cascade, spectral step, or scorecard is non-finite. |
| 140 | + Build on the existing `is_finite()` guards; promote them to a checked |
| 141 | + invariant at module boundaries (not just the cascade loop). |
| 142 | +- ☐ **B2 — property test over the grid fixture.** Extend |
| 143 | + `perturbation_shape_is_always_finite` to the full Spain fixture (every |
| 144 | + node, every cascade round) — the regression that proves B1 holds on real |
| 145 | + topology, not just synthetic input. |
| 146 | + |
| 147 | +### C. Tight graph |
| 148 | + |
| 149 | +- ☐ **C1 — `cargo machete` clean.** Remove unused deps from the golden-image |
| 150 | + graph and from `perturbation-sim`. (Machete reads manifests; cheap.) |
| 151 | +- ☐ **C2 — `cargo clippy --all-targets -- -D warnings` clean** across the |
| 152 | + symbiont graph (at least the first-party crates; upstream warnings triaged). |
| 153 | + |
| 154 | +--- |
| 155 | + |
| 156 | +## Other loose ends (post-gate) |
| 157 | + |
| 158 | +- ☐ **surreal_container `BLOCKED(C)`.** The `surreal_container` consumer still |
| 159 | + has the kv-lance fork dep unwired in its `Cargo.toml`. The golden image |
| 160 | + proves the dep graph works; porting that wiring into `surreal_container` |
| 161 | + clears the block. |
| 162 | +- ☐ **ndarray-simd in perturbation-sim.** Enable the `ndarray-simd` feature |
| 163 | + (Walsh-Hadamard via ndarray AVX-512 under `target-cpu=x86-64-v4`) and |
| 164 | + `[patch]` perturbation-sim's git ndarray to the local fork. Deferred from |
| 165 | + the first image to keep the AVX/git-patch risk out of the initial compile. |
| 166 | +- ☐ **Kanban loop wiring.** Stand up `LanceVersionScheduler` (ractor) → |
| 167 | + `KanbanMove(ExecTarget::Jit)` → jitson formula → `MailboxSoaView` write → |
| 168 | + Lance commit. The perturbation cascade becomes the first *formula* the |
| 169 | + scheduler dispatches. |
| 170 | +- ☐ **main.rs as a real harness.** Replace the probe `println!` with a CLI |
| 171 | + that loads the grid fixture, runs the cascade, prints the scorecard, and |
| 172 | + asserts finite — so `cargo run` IS the acceptance-gate demo. |
| 173 | +- ☐ **Optional: no-C++ image.** Drop S3 cloud object-store features + flip |
| 174 | + `jsonwebtoken` to `rust_crypto` (see INSTALLATION.md). Nice-to-have only. |
| 175 | + |
| 176 | +--- |
| 177 | + |
| 178 | +## Risks / watch-items |
| 179 | + |
| 180 | +- **Two `object_store` versions** appear in the resolved graph (lance vs |
| 181 | + surrealdb transitive). Allowed by cargo (distinct majors); watch for any |
| 182 | + public-type mismatch if they ever meet at an API boundary. |
| 183 | +- **Disk:** the full `target/` is multi-GB; build in one shared target dir, |
| 184 | + clean sibling `target/`s (build residue, not research data) if headroom |
| 185 | + drops below ~3 GB. |
| 186 | +- **edition 2024 (OGAR)** requires the 1.95 toolchain in the active override — |
| 187 | + `rust-toolchain.toml` pins it; don't run the image build under 1.94. |
0 commit comments