Skip to content

Commit 37cc21b

Browse files
authored
Merge pull request #555 from AdaWorldAPI/claude/symbiont-golden-image-plan
symbiont golden image: integration plan (5+3 council loose-end ledger)
2 parents b495c54 + ca4c20d commit 37cc21b

1 file changed

Lines changed: 208 additions & 0 deletions

File tree

Lines changed: 208 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,208 @@
1+
# Integration plan — loose ends → the Spain-grid acceptance gate
2+
3+
Status legend: ☐ open · ◐ in progress · ☑ done (this session) · ⊘ blocked (waiting on an upstream/dep change)
4+
5+
---
6+
7+
## Done this session (the foundation)
8+
9+
-**ractor messaging compiles.** `MessagingErr::Saturated` handled at all
10+
three match sites (`actor.rs`, `thread_local/inner.rs`, `derived_actor.rs`).
11+
This is the kanban backpressure valve. (AdaWorldAPI/ractor#2, merged.)
12+
-**kv-lance feature gates proven + documented.** Lite-unified surreal
13+
compiles without RocksDB/C++ storage. (AdaWorldAPI/surrealdb#47, #48, merged.)
14+
-**Golden image compiles + links — TWICE, both green.** (1) local-path build:
15+
`cargo build` exit 0, 19m18s, 912 packages. (2) **Portable git-deps build**
16+
(the living-harness config — surrealdb/OGAR `main`, ndarray `master`, ractor
17+
`jirak`): `CARGO_EXIT=0`, 12m52s, `target/debug/symbiont` 4.3 MB, runs + prints
18+
the linked-stack line. Unified `lance 7.0.0 / lance-index 7.0.0 / lancedb
19+
0.30.0 / datafusion 53.1.0 / arrow 58`, **no lance-6/7 split.** (A compile
20+
milestone — it proves the stack composes on the lockstep pins; it proves
21+
nothing about runtime data flow; see the loose-end ledger below.)
22+
-**Perturbation-sim NaN foundations.** `cascade.rs` preserve-last-finite
23+
abort + `perturbation_shape_is_always_finite` test; `stats.rs` empty-slice
24+
guards on `mean`/`pop_var`. (lance-graph, merged.)
25+
26+
---
27+
28+
## Council findings (5+3 hardening, 2026-06-19) — read before §A
29+
30+
An 8-agent council (5 research + 3 brutal reviewers) audited the gap between
31+
"compiles" and the win condition. The one finding everything reduced to:
32+
33+
> **The five crates are linked into one binary with ZERO runtime edges
34+
> between them.** "Compiles" proves the dependency graph; it proves nothing
35+
> about data flow. There are **three incompatible "node" representations and
36+
> no adapter between any of them:**
37+
> 1. canonical `NodeRow` (4096-bit, `lance-graph-contract::canonical_node`) — what the win condition means by "16K-node SoA"
38+
> 2. `VersionedGraph::NodeSchema` (SPO triple planes, `FixedSizeBinary(2048)`, `blasgraph/columnar.rs`) — what `LanceVersionScheduler` *actually* reads today
39+
> 3. perturbation-sim's `Grid`/`PerturbationShape` (plain `f64`) — what the cascade produces
40+
41+
**☐ D0 — PREREQUISITE DECISION (gates all of §A): pick which representation
42+
"the 16K-node SoA" is.** A2 says "canonical 4096-bit node"; the only wired
43+
Lance substrate (`VersionedGraph`) uses a *different* SPO-plane schema. They
44+
cannot both be "the 16K-node SoA." Decide canon (`NodeRow`) and the §A work
45+
targets it; until written down, the Grid→substrate bridge can't be aimed.
46+
47+
**Corrected prerequisite chain** (the plan's flat checkboxes hid these):
48+
`D0 (pick representation)``A1 fixture` (also: create the `tests/` dir — it
49+
doesn't exist) → `#1 perturbation-sim gains lance-graph-contract dep`
50+
`A2 Grid→NodeRow bridge``#3 NodeRowPacket→Lance writer``A3/A4`.
51+
`C2` (clippy, §C) is independent and **failing now** — cheapest to clear.
52+
The entire kanban loop (ractor scheduler, jitson dispatch, surrealdb version
53+
stream) is **genuinely post-gate** — the 3-part gate needs none of it.
54+
55+
**Key-encoding probe (gates whether A2 is mechanical):** the *value* side of
56+
the bridge is a 0-friction OPPORTUNITY (`basin.rs::as_row()[5]` +
57+
`buffer.rs::inertia_buffer_column()``ValueTenant` slots, algebra aligned).
58+
The *key* side is WORTH-EXPLORING: `hhtl.rs::HhtlKey` is the binary-Cheeger
59+
1-bit/tier instance, **not** OGAR's 16-ary/256-centroid production key — it
60+
type-aligns (`u16×3`) but isn't prefix-routable. Probe first: does the binary
61+
key give acceptable HHTL routing locality on the Spain grid, or must the
62+
centroid encoder (compose `basin.rs::spectral_embedding` + `splat.rs::morton2`)
63+
be built before A4's cascade routing is meaningful?
64+
65+
**Honesty corrections applied to the docs (overclaim-auditor):** the README
66+
no longer states the substrate "carries" Spain's grid in present tense; the
67+
build milestone is scoped to compile/link (done) vs data-flow (not); the
68+
"912 packages" claim is scoped to resolution+build, with the two-`object_store`
69+
caveat noted.
70+
71+
### Reviewer findings — golden-image setup correctness (P0/P1 reviewers)
72+
73+
Verdicts: brutally-honest-tester = **HOLD**, baton-handoff-auditor =
74+
**CATCH-LATENT**. The image links cleanly today; these harden it into a
75+
*reproducible* foundation. None blocks the current green build.
76+
77+
- **☑ R1 — ndarray duplication: ACCEPTED as cosmetic (decision 2026-06-19).**
78+
The graph links two ndarray-fork instances (surrealdb-core's git rev +
79+
lance-graph's path) plus the real crates.io `ndarray 0.16.1` lance-index
80+
legitimately needs. The 5+3 council confirmed **no ndarray type crosses the
81+
surrealdb↔lance-graph seam**, so the duplication never manifests at a call
82+
boundary — pure binary-size cosmetics, not a correctness issue. The proven
83+
green build (912 packages, exit 0) had exactly this shape.
84+
**Two fixes were tried and rejected:** (a) relabeling the shared fork's
85+
version `0.17.2→0.16.1` — dirty, lies about the fork's identity to every
86+
consumer; (b) vendoring lance-index + bumping its one ndarray req to `0.17`
87+
— honest but adds 126 vendored files + an unproven compile for a non-problem.
88+
**Resolution: leave the duplicate.** Revisit only if a real workload needs to
89+
pass an ndarray type across the surrealdb↔lance-graph boundary (then the
90+
clean route is the AdaWorldAPI lance-index fork bumped to ndarray 0.17).
91+
- **☑ R2 / R3 — SUPERSEDED by the living-harness reframe (2026-06-20).** These
92+
asked to commit `symbiont/Cargo.lock` and pin git-deps to exact `rev`s for
93+
byte-reproducibility — the **snapshot** model the operator explicitly rejected
94+
("a Dockerfile + Cargo that actually RUNS the *current* substrate, pending
95+
integration"). The golden image is a *living* harness: it re-resolves to each
96+
fork's canonical branch tip every build. `Cargo.lock` is now `.gitignore`d; the
97+
`[patch]` is gone (surrealdb consumers align on `main` → one source; cargo
98+
forbids patching a url to itself anyway). See EPIPHANIES
99+
E-GOLDEN-IMAGE-IS-A-LIVING-HARNESS.
100+
- **☑ R4 — surrealdb lance-7 witnessed GREEN.** The git-deps build resolved
101+
surrealdb-core's `kv-lance` against `lance 7.0.0 / lance-index 7.0.0 / lancedb
102+
0.30.0` cleanly — the fork's `main` manifest pins `=7.0.0` (verified). The
103+
earlier "resolves lance 6" worry was the **stale `jirak` branch**, not `main`.
104+
`TD-SURREALDB-KVLANCE-LANCE7` is **PAID**. Residual (surrealdb-fork CI
105+
housekeeping, not ours): the fork's own committed `Cargo.lock` may still
106+
resolve lance 6 — regenerate it in the fork so its CI exercises lance 7.
107+
- **note — absolute paths are deliberate** (`publish = false`); the image is
108+
intentionally machine-pinned to `/home/user/{...}`. Switch to relative
109+
(`../`) only if portability is wanted.
110+
111+
**NaN coverage (reviewer-confirmed, strong):** `cascade.rs:146` finite-guard,
112+
`perturbation.rs` `FRAGMENTATION_SENTINEL = +∞` (deliberately not NaN,
113+
finiteness-checkable), `eigen.rs:123` div-guard, `stats.rs` divisor floors.
114+
One real P2 gap: a `+∞` sentinel reaching `stats::pearson` makes `saa*sbb=+∞`
115+
`sqrt`→ ratio → **NaN**, and the `<1e-12` guard does NOT catch `+∞`. Add an
116+
`is_finite` filter at the stats boundary + a `pearson_rejects_nonfinite` test.
117+
This folds into §B (the NaN-free win condition).
118+
119+
## The acceptance gate (the biggest goal)
120+
121+
> **16K-node SoA substrate carries every Spanish electricity node; the
122+
> perturbation cascade runs NaN-free; `cargo clippy` + `cargo machete` clean.**
123+
124+
### A. Substrate carries the Spanish grid
125+
126+
-**A1 — source the Spanish grid topology.** REE / ENTSO-E node + line
127+
list (buses, lines, transformers, susceptances). Deterministic fixture
128+
checked into `perturbation-sim/tests/fixtures/` (no network at test time).
129+
-**A2 — map each grid node → one canonical 4096-bit node.**
130+
`key(16) = classid(u32) | HEEL | HIP | TWIG | family(u24) | identity(u24)`.
131+
Grid nodes start in the default basin (classid=0, family=0); `identity`
132+
alone discriminates (16.7M capacity — Spain's ~10³–10⁴ buses fit trivially).
133+
Edges (12 in-family + 4 out-of-family) carry the line adjacency.
134+
-**A3 — load the grid into a `MailboxSoA` view over a Lance dataset.**
135+
The 16K-node column is the Lance-backed SoA; this is where `kv-lance`
136+
earns its place (zero-copy columnar, versioned).
137+
-**A4 — run the cascade over the full node set.** `cascade.rs`
138+
(Weyl/Davis-Kahan spectral perturbation ∘ DC-power-flow/LODF) +
139+
`basin.rs` (Kron-reduced cross-border super-nodes) + `scorecard.rs`
140+
(ES `policy_mult` 1.3, `H` 2.0). Output: the perturbation SHAPE per node.
141+
142+
### B. NaN-free, enforced
143+
144+
-**B1 — NaN linter guard.** A clippy lint / debug-assert pass that fails
145+
if any `f32`/`f64` in the cascade, spectral step, or scorecard is non-finite.
146+
Build on the existing `is_finite()` guards; promote them to a checked
147+
invariant at module boundaries (not just the cascade loop).
148+
-**B2 — property test over the grid fixture.** Extend
149+
`perturbation_shape_is_always_finite` to the full Spain fixture (every
150+
node, every cascade round) — the regression that proves B1 holds on real
151+
topology, not just synthetic input.
152+
153+
### C. Tight graph
154+
155+
-**C1 — `cargo machete` clean.** Run with
156+
`--manifest-path crates/symbiont/Cargo.toml` (and on `perturbation-sim`).
157+
Note: machete is **report-only by default** — it lists unused deps and exits
158+
non-zero, but only `--fix` actually edits `Cargo.toml`. The catch for
159+
symbiont: `main.rs` only prints a probe line, so its direct deps (lance-graph,
160+
perturbation-sim, ractor, surrealdb-core, ogar-*) ARE the integration payload
161+
— exactly what forces the golden-image link — so machete will (correctly)
162+
report them as unused and **fail a "machete clean" gate**. Whitelist them via
163+
`[package.metadata.cargo-machete] ignored = [...]` so the report passes; never
164+
`--fix` them away (the build would pass while exercising nothing).
165+
Genuinely-unused deps elsewhere (e.g. in `perturbation-sim`) are the real
166+
targets.
167+
-**C2 — `cargo clippy --all-targets -- -D warnings` clean.** NOTE:
168+
`symbiont` has its OWN `[workspace]`, so a root-level `cargo clippy` SKIPS it
169+
entirely — run from `crates/symbiont/` or add
170+
`--manifest-path crates/symbiont/Cargo.toml`. First-party crates must be
171+
clean; upstream (git-dep) warnings triaged, not gated.
172+
173+
---
174+
175+
## Other loose ends (post-gate)
176+
177+
-**surreal_container — version-unblocked, execution-blocked on wiring.**
178+
`BLOCKED(C)` was a VERSION blocker (surrealdb `kv-lance` pinned lance 6) — now
179+
RESOLVED: surrealdb `main` pins lance 7 and the golden image built green
180+
against it. The residual is pure wiring: `surreal_container`'s surrealdb dep
181+
is still commented out (D-PG-6 Rubicon kanban VIEW). Uncomment + wire; no
182+
version work remains. (TECH_DEBT `TD-SURREALDB-KVLANCE-LANCE7` = PAID.)
183+
-**ndarray-simd in perturbation-sim.** Enable the `ndarray-simd` feature
184+
(Walsh-Hadamard via ndarray AVX-512 under `target-cpu=x86-64-v4`) and
185+
`[patch]` perturbation-sim's git ndarray to the local fork. Deferred from
186+
the first image to keep the AVX/git-patch risk out of the initial compile.
187+
-**Kanban loop wiring.** Stand up `LanceVersionScheduler` (ractor) →
188+
`KanbanMove(ExecTarget::Jit)` → jitson formula → `MailboxSoaView` write →
189+
Lance commit. The perturbation cascade becomes the first *formula* the
190+
scheduler dispatches.
191+
-**main.rs as a real harness.** Replace the probe `println!` with a CLI
192+
that loads the grid fixture, runs the cascade, prints the scorecard, and
193+
asserts finite — so `cargo run` IS the acceptance-gate demo.
194+
-**Optional: no-C++ image.** Drop S3 cloud object-store features + flip
195+
`jsonwebtoken` to `rust_crypto` (see INSTALLATION.md). Nice-to-have only.
196+
197+
---
198+
199+
## Risks / watch-items
200+
201+
- **Two `object_store` versions** appear in the resolved graph (lance vs
202+
surrealdb transitive). Allowed by cargo (distinct majors); watch for any
203+
public-type mismatch if they ever meet at an API boundary.
204+
- **Disk:** the full `target/` is multi-GB; build in one shared target dir,
205+
clean sibling `target/`s (build residue, not research data) if headroom
206+
drops below ~3 GB.
207+
- **edition 2024 (OGAR)** requires the 1.95 toolchain in the active override —
208+
`rust-toolchain.toml` pins it; don't run the image build under 1.94.

0 commit comments

Comments
 (0)