diff --git a/.claude/board/AGENT_LOG.md b/.claude/board/AGENT_LOG.md index 5921012e..bbe617d1 100644 --- a/.claude/board/AGENT_LOG.md +++ b/.claude/board/AGENT_LOG.md @@ -1,3 +1,11 @@ +## 2026-06-20 (cont.⁶) — SoA-as-graph domain foundation for q2 (OSINT/Gotham 0x0007 + FMA 0x0008) + +**Main thread (Opus), autoattended.** Operator: "prepare everything so q2 can render nodes/edges + family nodes + HHTL CLAM hop adjacency, neo4j-emulation; OSINT OGAR class is 0x0007; also FMA anatomy 70k as body with bones as stability anchor — rendering is wired in the q2 session, here just the basic domain + SoA-as-graph." Grounded with two parallel Explore agents (q2 wiring + lance-graph ontology/callcenter/polyglot) BEFORE building — consult-don't-guess paid off twice: (a) `graph_render.rs` ALREADY is the Neo4j/Gotham surface (`GraphSnapshot`/`RenderNode`/`RenderEdge`, consumer = q2 cockpit) → reused, not duplicated; (b) `NiblePath::from_guid_prefix` ALREADY is the canonical GUID→path lowering → de-duped symbiont's third copy onto it. + +NEW `contract::soa_graph` (zero-dep, q2-consumable): `project_snapshot(&[NodeRow], &DomainSpec) -> GraphSnapshot` projects the 32-byte head (NodeGuid+EdgeBlock) into the Gotham surface — family nodes (by u24 family), member→family + in-family (identity-low-byte) + out-of-family (family-low-byte) edges. `nearest_anchor` ranks every node to its closest stability-anchor family by the NEW `NiblePath::family_hop_count` (CLAM tree distance = `2·(16−lcp)` on the fixed-depth lowering). `DomainSpec` (domain-agnostic data) + two registered consts: `OSINT_GOTHAM` (classid `0x0007`) + `FMA_ANATOMY` (`0x0008`). Registered both in `BUILTIN_READ_MODES`: `ReadMode::OSINT` (Cognitive/CoarseOnly, hot entity graph) + `ReadMode::FMA` (Compressed/CoarseOnly, cold structural reference). All structure is HEAD-ONLY (anchors = `family` ids, not value-slab entity-types) → the whole projection is zero value decode, falsifiably (`projection_is_head_only_zero_value_decode` poisons the slab, asserts invariant). + +**Rendering deferred to q2** (per operator). **Callcenter DataFusion/gremlin POC + the heavier OntologyRegistry ClassView labels = next slices** (named, not built). `cargo test -p lance-graph-contract` **698/698** (7 new: soa_graph ×5, family_hop_count, osint/fma classids); `cargo test --manifest-path crates/symbiont` **12/12** (symbiont `hhtl_path_of` converged onto `from_guid_prefix`, its 2 semantics tests updated 12→16-nibble). clippy `-D warnings` clean. EPIPHANIES `E-ANCHOR-IS-A-HEAD-FIELD-NOT-A-VALUE-TYPE`. Pushed to main. + ## 2026-06-20 (cont.⁵) — §2.4 key-only neo4j render green (zero value decode, falsifiable) **Main thread (Opus), autoattended.** Operator picked superpower §2.4 from the post-reconciliation menu. Read the canon surface in full first (`canonical_node.rs` NodeGuid/EdgeBlock/NodeRow + ValueTenant carve; `soa_view.rs` the `hhtl_path_at`/`edge_block_at`/`identity_plane_at` deferred key facets; `hhtl.rs` NiblePath) — not a scent-skim. diff --git a/.claude/board/EPIPHANIES.md b/.claude/board/EPIPHANIES.md index 2f0826ec..b12c1326 100644 --- a/.claude/board/EPIPHANIES.md +++ b/.claude/board/EPIPHANIES.md @@ -1,3 +1,20 @@ +## 2026-06-20 — E-ANCHOR-IS-A-HEAD-FIELD-NOT-A-VALUE-TYPE — graph STRUCTURE (domain, family grouping, hierarchy, stability anchors, adjacency) must key off the 32-byte HEAD (classid / family / HHTL path), never the value slab; only then does the whole neo4j/Gotham view — and "FMA bones as stability anchor" — stay zero-value-decode at memory-scan speed + +**Status:** FINDING (perennial; shipped `contract::soa_graph` + `NiblePath::family_hop_count`, 2026-06-20). + +When a domain wants a "type" or "category" to drive graph structure or layout, the System-1 reflex is to read an entity-type field — which lives in the **value slab** (`ValueTenant::EntityType`, bytes 32..512). That quietly defeats the entire `E-GUID-IS-THE-GRAPH` thesis: the moment structure depends on a value field, the "render from the head" scan has to decode the value, and the 8 MiB of slabs go hot. The fix is an ontology discipline, not a trick: **every structural axis is already in the 32-byte head**, so put the category there. + +The head carries three orthogonal structural axes, and the OSINT/Gotham + FMA graph uses all three with zero value decode: +1. **Domain = `classid`** (bytes 0..4, head). OSINT/Gotham = `0x0007`, FMA-anatomy = `0x0008`. The `classid → ReadMode` registry resolves how to read the 128+128 — itself a head-only lookup. +2. **Family grouping = `family`** (u24, bytes 10..13, head). `soa_graph::project_snapshot` emits one family node per distinct `family`; "use family nodes" is a head field, not a join. +3. **Hierarchy + adjacency = the HHTL path** (`classid_lo·HEEL·HIP·TWIG`, head). `NiblePath::family_hop_count` (CLAM tree distance) ranks nearest-anchor with no value read. + +The keystone: **a stability anchor is a FAMILY (head), not an entity-type (value).** "FMA bones as the skeleton the soft tissue hangs off" is expressible as `DomainSpec::anchor_families: &[u32]` — a list of head `family` ids — so `nearest_anchor` computes the bone-distance layout signal entirely from keys. Had we modelled "bone" as a value-slab entity-type, a 70k-node anatomy render would decode 70k×480 B just to find the skeleton. Because anchor-ness is a head field, the same projector serves the Gotham entity graph and the FMA body with one zero-value-decode sweep. + +The general rule: **value slab = content (fingerprints, qualia, energy); head = structure (identity, domain, family, hierarchy, anchors).** If a graph/layout/routing decision is reaching into the value slab, the category is in the wrong register — lift it to classid/family/HHT. Cross-ref: `E-GUID-IS-THE-GRAPH`, `E-ZERO-DECODE-IS-FALSIFIABLE-BY-POISON` (the test that catches a regression here — `soa_graph::tests::projection_is_head_only_zero_value_decode` poisons the slab and asserts the snapshot is invariant), `E-BASIN-IS-A-NODE` (the family/basin tree this projects), OGAR `CLAUDE.md` P0 "the key prerenders nodes with zero value decode." + +--- + ## 2026-06-20 — E-ZERO-DECODE-IS-FALSIFIABLE-BY-POISON — a "we never read region X" claim (zero-value-decode key render, cold-column skip, head-only scan) is not a comment, it is a TEST: poison region X with a sentinel, run the op, assert the output is byte-identical to the un-poisoned run; if it touched X, the bytes diverge **Status:** FINDING (perennial; shipped `symbiont/key_render.rs::tests::render_ignores_value_slab`, 2026-06-20). diff --git a/.claude/board/LATEST_STATE.md b/.claude/board/LATEST_STATE.md index 988707a0..32b8abc6 100644 --- a/.claude/board/LATEST_STATE.md +++ b/.claude/board/LATEST_STATE.md @@ -16,6 +16,8 @@ --- +> **2026-06-20 — branch work (`claude/jirak-math-theorems-harvest-rfii13`)** — **SoA-as-graph domain foundation for the OSINT/Gotham + FMA consumers (q2 renders the pixels).** New zero-dep `contract::soa_graph`: `project_snapshot(&[NodeRow], &DomainSpec) -> graph_render::GraphSnapshot` projects the canonical 32-byte head (NodeGuid + EdgeBlock) into the EXISTING Gotham/neo4j surface (`graph_render` — reused, not duplicated) — family nodes (by u24 `family`), member/in-family/out-of-family edges, all **zero value decode**. `nearest_anchor` ranks nodes to their nearest stability-anchor family by the new `NiblePath::family_hop_count` (CLAM tree distance). Two domains registered: `OSINT_GOTHAM` (classid **`0x0007`**) + `FMA_ANATOMY` (**`0x0008`**, bones = anchor families) in `BUILTIN_READ_MODES` (`ReadMode::OSINT` Cognitive/CoarseOnly hot; `ReadMode::FMA` Compressed/CoarseOnly cold). Anchor-ness is a HEAD field (`family`), never a value type — so "FMA bones as stability anchor" stays head-only (`E-ANCHOR-IS-A-HEAD-FIELD-NOT-A-VALUE-TYPE`). De-duped the GUID→NiblePath lowering: symbiont's `hhtl_path_of` now delegates to canonical `from_guid_prefix` (third copy collapsed). 698 contract + 12 symbiont tests green, clippy clean. **Deferred (named):** q2 rendering (q2 session), Callcenter DataFusion/gremlin POC, OntologyRegistry ClassView labels. Refs: AGENT_LOG 2026-06-20 (cont.⁶), EPIPHANIES `E-ANCHOR-IS-A-HEAD-FIELD-NOT-A-VALUE-TYPE`. +> > **2026-06-20 — branch work (`claude/happy-hamilton-0azlw4`)** — **UNICHARSET `other_case` transcoded + byte-parity proven (E-CPP-PARITY-5), the fifth leaf.** `UniCharSet` now parses the case-pair id (the token right after the script) into `other_cases: Vec`, applying the load-time clamp (`unicharset.cpp:901`: a value `>= size`, incl. the absent default, folds to the id itself). Exposes `get_other_case` + `dump_other_case`, mirroring `unicharset.h:703` (out-of-range id → `INVALID_UNICHAR_ID` -1). **Byte-identical 112/112** on real `eng.lstm-unicharset` vs tesseract's own `get_other_case` (self-validating oracle, `other_case` mode; 60/112 self, 52 real pairs, e.g. `C`→`c`). Last field cleanly reachable by token-offset; direction/mirror/bbox need the multi-tier parser (next, larger leaf). Additive, zero-dep; +4 contract tests (23 unicharset total), clippy `-D warnings` + fmt clean; reproducible via `examples/unicharset_dump.rs other_case`. Consumed by `tesseract-core::CharSet::get_other_case` (+1 boundary test, 6/6). No Core gap. EPIPHANIES `E-CPP-PARITY-5`. > > **2026-06-20 — branch work (`claude/happy-hamilton-0azlw4`)** — **UNICHARSET script table transcoded + byte-parity proven (E-CPP-PARITY-4), the fourth leaf — first to transcode an INTERNING side-table.** `UniCharSet` now parses the per-line script name (the token after the optional bbox/stats CSV), interns it via an `add_script`-equivalent (`unicharset.cpp:1063`, insertion-order dedup) into `scripts: Vec` with `null_script` ("NULL") seeded at sid 0 (the `unichar_insert` set_script, `unicharset.cpp:680`; so `null_sid_ == 0` always), and stores `script_ids: Vec`. Exposes `get_script` / `get_script_table_size` / `script_from_script_id` / `script_of` / `dump_script`, mirroring `unicharset.h:681` (out-of-range → `null_sid_` 0). **Byte-identical 112/112** on real `eng.lstm-unicharset` vs tesseract's own `get_script` (same self-validating oracle, `script` mode; oracle table = `["NULL","Common","Latin"]` confirmed empirically before writing the Rust). Mixed-tier safe (eng id 0 is tier-5 no-CSV, others tier-1 CSV). Additive, zero-dep; +4 contract tests (19 unicharset total), clippy `-D warnings` + fmt clean; reproducible via `examples/unicharset_dump.rs script`. Consumed by `tesseract-core::CharSet::{get_script,script_of}` (+1 boundary test, 5/5). No Core gap. EPIPHANIES `E-CPP-PARITY-4`. Next leaf: the full column tier-parser (unlocks other_case/mirror/direction/bbox). diff --git a/.claude/plans/unified-soa-rubikon-integration-v1.md b/.claude/plans/unified-soa-rubikon-integration-v1.md index 2345186c..fa2b8309 100644 --- a/.claude/plans/unified-soa-rubikon-integration-v1.md +++ b/.claude/plans/unified-soa-rubikon-integration-v1.md @@ -54,6 +54,16 @@ No copies, no per-subsystem mirror (R1 "one SoA never transformed"). 16384 nodes / 32768 edges from 512 KiB of heads, 7680 KiB of value slabs COLD; zero-value-decode proven by the `0xFF`-poison falsifiable probe. `SymbiontBoard` now materialises the contract's `edge_block_at`/`hhtl_path_at` key facets. +- ✅ **SoA-as-graph domain foundation for q2 (OSINT/Gotham + FMA)** — + `contract::soa_graph` projects the head into the EXISTING `graph_render` + Gotham/neo4j surface (`GraphSnapshot`): family nodes (u24 `family`), + member/in-family/out-of-family edges, `nearest_anchor` via the new + `NiblePath::family_hop_count` (CLAM hop adjacency). Domains `OSINT_GOTHAM` + (`classid 0x0007`) + `FMA_ANATOMY` (`0x0008`, bones = anchor families) + registered in `BUILTIN_READ_MODES`. All structure head-only (anchor = `family`, + not value type → `E-ANCHOR-IS-A-HEAD-FIELD`). **Rendering deferred to the q2 + session; Callcenter DataFusion/gremlin POC + OntologyRegistry ClassView labels + are the named next slices.** --- diff --git a/crates/lance-graph-contract/src/canonical_node.rs b/crates/lance-graph-contract/src/canonical_node.rs index 2538e428..67751c85 100644 --- a/crates/lance-graph-contract/src/canonical_node.rs +++ b/crates/lance-graph-contract/src/canonical_node.rs @@ -40,6 +40,16 @@ impl NodeGuid { /// Reserved canonical default basin (implicit fallback; no neighborhood grouping). pub const FAMILY_DEFAULT: u32 = 0x00_0000; + /// OGAR class for the **OSINT / Palantir-Gotham** domain — the neo4j-emulation + /// entity graph (people / orgs / systems / events, family-grouped). Resolves + /// to [`ReadMode::OSINT`] (hot `Cognitive` value + `CoarseOnly` adjacency edges). + pub const CLASSID_OSINT: u32 = 0x0000_0007; + /// OGAR class for the **FMA anatomy** domain — the Foundational Model of + /// Anatomy (~70k structural entities, family = body region, bones = stability + /// anchors). Resolves to [`ReadMode::FMA`] (cold `Compressed` reference value + + /// `CoarseOnly` part-of adjacency). + pub const CLASSID_FMA: u32 = 0x0000_0008; + /// Construct from the six canonical groups. `family`/`identity` use their low 3 bytes. /// /// Panics (incl. const-eval) when `family` or `identity` exceed 24 bits — the @@ -659,6 +669,25 @@ impl ReadMode { edge_codec: EdgeCodecFlavor::CoarseOnly, }; + /// The **OSINT / Palantir-Gotham** read-mode ([`NodeGuid::CLASSID_OSINT`]): + /// a *hot* entity graph — [`ValueSchema::Cognitive`] (Meta + Qualia + + /// Fingerprint + Energy + Plasticity + EntityType, for live NARS reasoning) + /// over [`EdgeCodecFlavor::CoarseOnly`] adjacency (the 12 in-family + 4 + /// out-of-family slots read literally as the neo4j-emulation edges). + pub const OSINT: ReadMode = ReadMode { + value_schema: ValueSchema::Cognitive, + edge_codec: EdgeCodecFlavor::CoarseOnly, + }; + + /// The **FMA anatomy** read-mode ([`NodeGuid::CLASSID_FMA`]): a *cold* + /// structural reference graph — [`ValueSchema::Compressed`] (Fingerprint + + /// Helix + Turbovec + EntityType; no hot lifecycle columns, it is static + /// reference data) over [`EdgeCodecFlavor::CoarseOnly`] part-of adjacency. + pub const FMA: ReadMode = ReadMode { + value_schema: ValueSchema::Compressed, + edge_codec: EdgeCodecFlavor::CoarseOnly, + }; + /// Both axes are layout-preserving (a preset/flavor re-interprets reserved /// bytes, never a stride change), so adopting any read-mode needs no /// `ENVELOPE_LAYOUT_VERSION` bump. @@ -680,6 +709,11 @@ static BUILTIN_READ_MODES: LazyLock> = LazyLock::new(|| { let mut m = HashMap::new(); // The canon default class materialises the POC-Full slab (see ReadMode::DEFAULT). m.insert(NodeGuid::CLASSID_DEFAULT, ReadMode::DEFAULT); + // OSINT/Gotham (hot entity graph) + FMA anatomy (cold structural reference) — + // the two registered graph domains (see `soa_graph`). Both read edges as + // CoarseOnly adjacency; they differ in the value schema (hot vs cold). + m.insert(NodeGuid::CLASSID_OSINT, ReadMode::OSINT); + m.insert(NodeGuid::CLASSID_FMA, ReadMode::FMA); m }); @@ -1224,4 +1258,30 @@ mod tests { assert!(rm.value_schema.tenant_bytes() <= VALUE_SLAB_LEN); assert!(rm.is_layout_preserving()); } + + #[test] + fn osint_and_fma_classids_resolve_to_their_read_modes() { + // The two registered graph domains (see `soa_graph`): OSINT/Gotham is a + // hot entity graph (Cognitive value), FMA anatomy is a cold structural + // reference (Compressed value); both read edges as CoarseOnly adjacency. + let osint = classid_read_mode(NodeGuid::CLASSID_OSINT); + assert_eq!(osint, ReadMode::OSINT); + assert_eq!(osint.value_schema, ValueSchema::Cognitive); + assert_eq!(osint.edge_codec, EdgeCodecFlavor::CoarseOnly); + + let fma = classid_read_mode(NodeGuid::CLASSID_FMA); + assert_eq!(fma, ReadMode::FMA); + assert_eq!(fma.value_schema, ValueSchema::Compressed); + assert_eq!(fma.edge_codec, EdgeCodecFlavor::CoarseOnly); + + // The classids are the OGAR-confirmed 0x0007 (OSINT) and 0x0008 (FMA); + // both are layout-preserving and carrier-method-consistent. + assert_eq!(NodeGuid::CLASSID_OSINT, 0x0000_0007); + assert_eq!(NodeGuid::CLASSID_FMA, 0x0000_0008); + assert_eq!( + NodeGuid::new(NodeGuid::CLASSID_OSINT, 1, 2, 3, 0xAB, 0xCD).read_mode(), + ReadMode::OSINT + ); + assert!(osint.is_layout_preserving() && fma.is_layout_preserving()); + } } diff --git a/crates/lance-graph-contract/src/hhtl.rs b/crates/lance-graph-contract/src/hhtl.rs index 76cb98d0..5176da88 100644 --- a/crates/lance-graph-contract/src/hhtl.rs +++ b/crates/lance-graph-contract/src/hhtl.rs @@ -312,6 +312,21 @@ impl NiblePath { Self::from_packed(path, MAX_DEPTH) } + /// **Family hop count** — the CLAM tree distance to `other`: the number of + /// edges between the two nodes through their lowest common ancestor in the + /// 16ⁿ tree. `(self.depth − common) + (other.depth − common)` where `common = + /// `[`common_prefix_depth`](NiblePath::common_prefix_depth). Identical path = + /// 0, parent/child = 1, siblings = 2; disjoint subtrees = the full ascent + + /// descent. This is the operator's "HHTL CLAM via family-nodes hop count as + /// adjacency" metric — pure key arithmetic, O(depth), **zero value decode**. + /// + /// Symmetric: `a.family_hop_count(b) == b.family_hop_count(a)`. + #[must_use] + pub const fn family_hop_count(self, other: Self) -> u8 { + let common = self.common_prefix_depth(other); + (self.depth - common) + (other.depth - common) + } + /// Is this path a descendant-or-equal of `other`? — the symmetric form of /// [`is_ancestor_of`]. `self.is_descendant_of(other)` is equivalent to /// `other.is_ancestor_of(self)` BUT the form is sometimes more natural at @@ -658,6 +673,26 @@ mod tests { assert_eq!(NiblePath::EMPTY.common_ancestor(a), None); } + #[test] + fn family_hop_count_is_clam_tree_distance() { + let a = NiblePath::root(0x1).child(0x2).child(0x3).child(0x4); + // identical path = 0 hops + assert_eq!(a.family_hop_count(a), 0); + // siblings (share parent (1)(2)(3), differ in leaf) = 2 hops + let sib = NiblePath::root(0x1).child(0x2).child(0x3).child(0x9); + assert_eq!(a.family_hop_count(sib), 2); + assert_eq!(sib.family_hop_count(a), 2); // symmetric + // parent = 1 hop + let parent = NiblePath::root(0x1).child(0x2).child(0x3); + assert_eq!(a.family_hop_count(parent), 1); + // cousins: share (1)(2), differ from depth 3 down → (4-2)+(4-2) = 4 + let cousin = NiblePath::root(0x1).child(0x2).child(0x7).child(0x8); + assert_eq!(a.family_hop_count(cousin), 4); + // disjoint basins: no common prefix → full ascent + descent + let other = NiblePath::root(0xF).child(0xE); + assert_eq!(a.family_hop_count(other), 4 + 2); + } + // ── NiblePath::prefix — single-shot ancestor view ───────────────────────── #[test] diff --git a/crates/lance-graph-contract/src/lib.rs b/crates/lance-graph-contract/src/lib.rs index 87d61a4a..2e4bd24f 100644 --- a/crates/lance-graph-contract/src/lib.rs +++ b/crates/lance-graph-contract/src/lib.rs @@ -99,6 +99,7 @@ pub mod sensorium; pub mod sigma_propagation; pub mod sla; pub mod soa_envelope; +pub mod soa_graph; pub mod soa_view; pub mod splat; pub mod tax; @@ -123,5 +124,8 @@ pub use episodic_edges::{EdgeRef, EpisodicEdges64}; pub use head2head::{CompetitionOutcome, Head2Head, WinnerCriterion}; pub use kanban::{ExecTarget, KanbanColumn, KanbanMove, RubiconTransitionError}; pub use scheduler::{DatasetVersion, NextPhaseScheduler, VersionScheduler}; +pub use soa_graph::{ + nearest_anchor, project_snapshot, AnchorHop, DomainSpec, FMA_ANATOMY, OSINT_GOTHAM, +}; pub use soa_view::{MailboxSoaOwner, MailboxSoaView}; pub use view_angle::ViewAngle; diff --git a/crates/lance-graph-contract/src/soa_graph.rs b/crates/lance-graph-contract/src/soa_graph.rs new file mode 100644 index 00000000..fd7c1102 --- /dev/null +++ b/crates/lance-graph-contract/src/soa_graph.rs @@ -0,0 +1,425 @@ +//! `soa_graph` — project the canonical SoA head into the Gotham graph surface. +//! +//! The bridge from the **canonical node head** (128-bit [`NodeGuid`] + 128-bit +//! [`EdgeBlock`], `key(16)+edges(16)`, bytes 0..32 of a [`NodeRow`]) to the +//! existing [`graph_render`](crate::graph_render) Neo4j/Palantir-Gotham surface +//! ([`GraphSnapshot`] / [`RenderNode`] / [`RenderEdge`]). **Zero value decode:** +//! every node, edge, family, and anchor here is read from the 32-byte head — +//! the 480-byte value slab is never touched (`E-GUID-IS-THE-GRAPH`; the same +//! falsifiable invariant `symbiont::key_render` proves by 0xFF-poisoning). +//! +//! **Rendering lives in q2.** This module produces the *structural* snapshot; +//! the q2 `cockpit-server` cockpit (vis-network / Neo4j-Browser-style UI) lays +//! it out and draws it. What lance-graph owns is "the basic domain + SoA as a +//! graph"; q2 owns the pixels. +//! +//! ## Two head axes, two graph roles +//! +//! The canonical key carries two orthogonal grouping axes, both in the head: +//! +//! - **family** (`u24`, bytes 10..13) — the *basin leaf*. [`project_snapshot`] +//! groups member nodes by `family` and emits one **family node** per distinct +//! family (the "use family nodes" requirement). A family node is an **anchor** +//! when its id is in [`DomainSpec::anchor_families`] (FMA *bones* / OSINT *key +//! entities* — the stability anchors layout hangs off). +//! - **HHTL path** (`classid_lo·HEEL·HIP·TWIG`, via +//! [`NiblePath::from_guid_prefix`]) — the *Abstammung tree*. [`nearest_anchor`] +//! ranks every node against the anchor families by +//! [`NiblePath::family_hop_count`] (CLAM tree distance) — the "HHTL CLAM via +//! family-nodes hop count as adjacency" metric. +//! +//! ## Edge resolution (the `EdgeBlock` reading) +//! +//! `EdgeCodecFlavor::CoarseOnly` (the read-mode both registered domains use): +//! each non-zero edge byte is a one-byte basin-local neighbour index. +//! - `in_family[k]` → the same-family member whose `identity & 0xFF` equals the +//! byte (an intra-basin adjacency edge, [`DomainSpec::in_family_edge`]). +//! - `out_family[k]` → the family node whose `family & 0xFF` equals the byte (a +//! cross-basin link to another family, [`DomainSpec::out_family_edge`]). +//! +//! Unresolved bytes are skipped (a dangling 1-byte index, never a wrong edge). +//! +//! Two domains ship registered: [`OSINT_GOTHAM`] (classid +//! [`NodeGuid::CLASSID_OSINT`]) and [`FMA_ANATOMY`] (classid +//! [`NodeGuid::CLASSID_FMA`]). New domains are just another `DomainSpec` — +//! the projector is domain-agnostic. + +use crate::canonical_node::{NodeGuid, NodeRow}; +use crate::graph_render::{GraphSnapshot, RenderEdge, RenderNode}; +use crate::hhtl::NiblePath; +use std::collections::HashMap; + +/// A graph domain: how a class of SoA nodes is labelled and which families are +/// stability anchors. Domain-agnostic data (no behaviour) — the projector reads +/// it. `&'static` so domains can be `const` (see [`OSINT_GOTHAM`], [`FMA_ANATOMY`]). +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub struct DomainSpec { + /// OGAR classid this domain occupies (the GUID routing prefix). + pub classid: u32, + /// Human name, used as the member node `kind` (e.g. "OSINT/Gotham"). + pub name: &'static str, + /// Families that are **stability anchors** (FMA bones / OSINT key entities). + /// Family nodes in this set render as `kind = "Anchor"` and are the targets + /// [`nearest_anchor`] measures hop distance to. + pub anchor_families: &'static [u32], + /// Edge label for intra-family adjacency (`in_family` slots). + pub in_family_edge: &'static str, + /// Edge label for cross-family links (`out_family` slots). + pub out_family_edge: &'static str, + /// Edge label for the member → family-node containment edge. + pub member_edge: &'static str, +} + +/// The **OSINT / Palantir-Gotham** domain (classid [`NodeGuid::CLASSID_OSINT`]): +/// a neo4j-emulation entity graph. Anchor families are caller-supplied (the key +/// entities of an investigation); the default declares none. +pub const OSINT_GOTHAM: DomainSpec = DomainSpec { + classid: NodeGuid::CLASSID_OSINT, + name: "OSINT/Gotham", + anchor_families: &[], + in_family_edge: "linked", + out_family_edge: "references", + member_edge: "member-of", +}; + +/// The **FMA anatomy** domain (classid [`NodeGuid::CLASSID_FMA`]): ~70k +/// structural entities, family = body region, `out_family` = part-of. Anchor +/// families are the *bones* (the skeleton the soft tissue hangs off); the +/// default declares none — a caller supplies the bone families. +pub const FMA_ANATOMY: DomainSpec = DomainSpec { + classid: NodeGuid::CLASSID_FMA, + name: "FMA-Anatomy", + anchor_families: &[], + in_family_edge: "adjacent-to", + out_family_edge: "part-of", + member_edge: "part-of", +}; + +/// The synthetic id of a family node in the snapshot (`"family:RRGGBB"` hex). +#[inline] +fn family_node_id(family: u32) -> String { + format!("family:{family:06x}") +} + +/// HHTL routing path of a GUID, via the canonical [`NiblePath::from_guid_prefix`] +/// lowering (`classid_lo·HEEL·HIP·TWIG`). Falls back to [`NiblePath::EMPTY`] for +/// the (canon-reserved) case of a non-zero high `classid` u16. +#[inline] +fn hhtl_path(guid: &NodeGuid) -> NiblePath { + NiblePath::from_guid_prefix(guid).unwrap_or(NiblePath::EMPTY) +} + +/// Project a board-set into a [`GraphSnapshot`] for the Gotham/neo4j surface — +/// member nodes + family nodes + (member→family, in-family, out-of-family) +/// edges. Touches ONLY the 32-byte head of each row (`key` + `edges`); never the +/// value slab. +pub fn project_snapshot(rows: &[NodeRow], domain: &DomainSpec) -> GraphSnapshot { + // family → its members as (identity_low_byte, guid) + let mut by_family: HashMap> = HashMap::new(); + // family_low_byte → a family id (first seen) for out-of-family resolution + let mut family_by_low: HashMap = HashMap::new(); + for row in rows { + let g = row.key; + let fam = g.family(); + by_family + .entry(fam) + .or_default() + .push(((g.identity() & 0xFF) as u8, g)); + family_by_low.entry((fam & 0xFF) as u8).or_insert(fam); + } + + let mut nodes: Vec = Vec::with_capacity(rows.len() + by_family.len()); + let mut edges: Vec = Vec::new(); + + // One family node per distinct family (the "use family nodes" surface). + // Sorted for deterministic output regardless of HashMap iteration order. + let mut families: Vec<(&u32, &Vec<(u8, NodeGuid)>)> = by_family.iter().collect(); + families.sort_by_key(|(fam, _)| **fam); + for (&fam, members) in families { + let is_anchor = domain.anchor_families.contains(&fam); + nodes.push(RenderNode { + id: family_node_id(fam), + label: format!("{} family {fam:06x}", domain.name), + kind: if is_anchor { "Anchor" } else { "Family" }.to_string(), + confidence: 1.0, + props: vec![ + ("family".to_string(), format!("{fam:06x}")), + ("members".to_string(), members.len().to_string()), + ("anchor".to_string(), is_anchor.to_string()), + ], + }); + } + + // Member nodes + their edges (all head-only). + for row in rows { + let g = row.key; + let fam = g.family(); + nodes.push(RenderNode { + id: g.to_string(), + label: format!("{:06x}", g.identity()), + kind: domain.name.to_string(), + confidence: 1.0, + props: vec![ + ("classid".to_string(), format!("{:08x}", g.classid())), + ("family".to_string(), format!("{fam:06x}")), + ("hhtl_depth".to_string(), hhtl_path(&g).depth().to_string()), + ], + }); + // member → family containment + edges.push(RenderEdge { + source: g.to_string(), + target: family_node_id(fam), + label: domain.member_edge.to_string(), + frequency: 1.0, + confidence: 1.0, + inferred: false, + }); + let eb = row.edges; + // in-family adjacency: byte = same-family member's identity low byte + if let Some(members) = by_family.get(&fam) { + for &b in eb.in_family.iter().filter(|&&b| b != 0) { + if let Some(&(_, target)) = + members.iter().find(|(lb, t)| *lb == b && *t != g) + { + edges.push(RenderEdge { + source: g.to_string(), + target: target.to_string(), + label: domain.in_family_edge.to_string(), + frequency: 1.0, + confidence: 1.0, + inferred: false, + }); + } + } + } + // out-of-family links: byte = target family's low byte → its family node + for &b in eb.out_family.iter().filter(|&&b| b != 0) { + if let Some(&target_fam) = family_by_low.get(&b) { + if target_fam != fam { + edges.push(RenderEdge { + source: g.to_string(), + target: family_node_id(target_fam), + label: domain.out_family_edge.to_string(), + frequency: 1.0, + confidence: 1.0, + inferred: false, + }); + } + } + } + } + + GraphSnapshot { + nodes, + edges, + inferences: Vec::new(), + contradictions: Vec::new(), + timestamp: 0, + } +} + +/// A node's CLAM hop distance to its nearest stability anchor. +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub struct AnchorHop { + /// The node measured. + pub node: NodeGuid, + /// The family id of the nearest anchor (`u32::MAX` if the domain declares + /// none, or none is reachable). + pub anchor_family: u32, + /// HHTL CLAM hop count to that anchor's representative path (`u8::MAX` when + /// no anchor exists). + pub hops: u8, +} + +/// For each node, the nearest stability-anchor family by **HHTL CLAM hop count** +/// ([`NiblePath::family_hop_count`] over the GUIDs' HHTL paths) — the "bones as +/// stability anchor" layout signal: each node hangs off its closest anchor, and +/// the hop count is the adjacency weight the q2 layout uses (anchors fixed, soft +/// tissue positioned by distance). Anchors are the families in +/// [`DomainSpec::anchor_families`]; their representative path is the first member +/// seen. Pure head arithmetic, zero value decode. O(rows × anchors). +/// +/// The canonical lowering is fixed-depth-16, so `hops = 2·(16 − lcp)` (`lcp` = +/// shared-prefix nibble count) — a monotone prefix distance, not a variable-depth +/// tree walk: smaller hops ⇔ deeper shared `classid_lo·HEEL·HIP·TWIG` prefix. +/// Ranking (nearest anchor) is what callers use; the absolute value is even. +pub fn nearest_anchor(rows: &[NodeRow], domain: &DomainSpec) -> Vec { + // Representative HHTL path per anchor family (first member encountered). + let mut anchor_paths: Vec<(u32, NiblePath)> = Vec::new(); + for row in rows { + let fam = row.key.family(); + if domain.anchor_families.contains(&fam) + && !anchor_paths.iter().any(|(f, _)| *f == fam) + { + anchor_paths.push((fam, hhtl_path(&row.key))); + } + } + rows.iter() + .map(|row| { + let g = row.key; + let p = hhtl_path(&g); + let mut anchor_family = u32::MAX; + let mut hops = u8::MAX; + for &(fam, ap) in &anchor_paths { + let h = p.family_hop_count(ap); + if h < hops { + hops = h; + anchor_family = fam; + } + } + AnchorHop { + node: g, + anchor_family, + hops, + } + }) + .collect() +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::canonical_node::EdgeBlock; + + /// Build a node in a domain: `classid` from the domain, hierarchy in the + /// `hht` tiers `(heel, hip, twig)`, family = basin leaf, identity = leaf. + /// Edges optional. + fn node( + domain: &DomainSpec, + hht: (u16, u16, u16), + family: u32, + identity: u32, + in_fam: &[u8], + out_fam: &[u8], + ) -> NodeRow { + let mut edges = EdgeBlock::default(); + for (i, &b) in in_fam.iter().enumerate().take(12) { + edges.in_family[i] = b; + } + for (i, &b) in out_fam.iter().enumerate().take(4) { + edges.out_family[i] = b; + } + NodeRow { + key: NodeGuid::new(domain.classid, hht.0, hht.1, hht.2, family, identity), + edges, + value: [0u8; 480], + } + } + + #[test] + fn project_emits_family_nodes_and_member_edges() { + // Two families (0xA, 0xB), two members each. Member 1 in family A points + // at member 2 (identity low byte 2) via in_family; member 1 also links + // out to family B (low byte 0xB) via out_family. + let rows = [ + node(&OSINT_GOTHAM, (1, 0, 0), 0xA, 1, &[2], &[0xB]), + node(&OSINT_GOTHAM, (1, 0, 0), 0xA, 2, &[], &[]), + node(&OSINT_GOTHAM, (2, 0, 0), 0xB, 1, &[], &[]), + node(&OSINT_GOTHAM, (2, 0, 0), 0xB, 2, &[], &[]), + ]; + let snap = project_snapshot(&rows, &OSINT_GOTHAM); + // 4 member nodes + 2 family nodes + assert_eq!(snap.nodes.len(), 6); + let family_nodes = snap.nodes.iter().filter(|n| n.kind == "Family").count(); + assert_eq!(family_nodes, 2); + // every member has a member-of edge → 4 of them + let member_of = snap.edges.iter().filter(|e| e.label == "member-of").count(); + assert_eq!(member_of, 4); + // the in-family adjacency edge member1 → member2 + assert!(snap.edges.iter().any(|e| e.label == "linked" + && e.target.ends_with("000a000002"))); + // the out-of-family link member1 → family:00000b + assert!(snap + .edges + .iter() + .any(|e| e.label == "references" && e.target == "family:00000b")); + } + + #[test] + fn anchor_families_render_as_anchor_kind() { + // FMA: family 0x01 is a "bone" anchor; 0x02 is soft tissue. + let fma_bones = DomainSpec { + anchor_families: &[0x01], + ..FMA_ANATOMY + }; + let rows = [ + node(&fma_bones, (0x1, 0, 0), 0x01, 1, &[], &[]), // bone + node(&fma_bones, (0x2, 0, 0), 0x02, 1, &[], &[]), // tissue + ]; + let snap = project_snapshot(&rows, &fma_bones); + let anchor = snap.nodes.iter().find(|n| n.id == "family:000001").unwrap(); + assert_eq!(anchor.kind, "Anchor"); + let tissue = snap.nodes.iter().find(|n| n.id == "family:000002").unwrap(); + assert_eq!(tissue.kind, "Family"); + } + + #[test] + fn nearest_anchor_ranks_by_hhtl_hop_count() { + // The canonical lowering is fixed-depth-16, so family_hop_count = 2·(16 − + // lcp): the deeper the shared prefix, the fewer hops. Anchor family 0x01 + // sits at heel=0x1000. Same path ⇒ 0; a node differing in the last HEEL + // nibble (lcp=7) ⇒ 18; a node differing in the first HEEL nibble (lcp=4) + // ⇒ 24. What matters is the ordering (closer prefix ⇒ smaller hops). + let fma_bones = DomainSpec { + anchor_families: &[0x01], + ..FMA_ANATOMY + }; + let rows = [ + node(&fma_bones, (0x1000, 0, 0), 0x01, 1, &[], &[]), // the anchor itself + node(&fma_bones, (0x1000, 0, 0), 0x02, 1, &[], &[]), // same HHT path + node(&fma_bones, (0x1009, 0, 0), 0x03, 1, &[], &[]), // diverges late (lcp 7) + node(&fma_bones, (0xF000, 0, 0), 0x04, 1, &[], &[]), // diverges early (lcp 4) + ]; + let hops = nearest_anchor(&rows, &fma_bones); + assert_eq!(hops.len(), 4); + assert_eq!(hops[0].hops, 0); + assert_eq!(hops[0].anchor_family, 0x01); + assert_eq!(hops[1].hops, 0, "same HHT path as the anchor ⇒ 0 hops"); + assert!( + hops[1].hops < hops[2].hops && hops[2].hops < hops[3].hops, + "monotone: closer shared prefix ⇒ fewer hops ({} < {} < {})", + hops[1].hops, + hops[2].hops, + hops[3].hops + ); + // The exact fixed-depth-16 values: 2·(16−7)=18 and 2·(16−4)=24. + assert_eq!(hops[2].hops, 18); + assert_eq!(hops[3].hops, 24); + } + + #[test] + fn nearest_anchor_with_no_anchors_is_unreachable() { + // Default OSINT declares no anchor families ⇒ every node is unreachable. + let rows = [node(&OSINT_GOTHAM, (1, 0, 0), 0xA, 1, &[], &[])]; + let hops = nearest_anchor(&rows, &OSINT_GOTHAM); + assert_eq!(hops[0].hops, u8::MAX); + assert_eq!(hops[0].anchor_family, u32::MAX); + } + + #[test] + fn projection_is_head_only_zero_value_decode() { + // Poison the value slab; the snapshot must be byte-identical (the + // E-GUID-IS-THE-GRAPH / zero-value-decode invariant, falsifiable). + let clean = [ + node(&OSINT_GOTHAM, (1, 0, 0), 0xA, 1, &[2], &[0xB]), + node(&OSINT_GOTHAM, (2, 0, 0), 0xB, 2, &[], &[]), + ]; + let mut poisoned = clean; + for row in &mut poisoned { + row.value = [0xFFu8; 480]; + } + let a = project_snapshot(&clean, &OSINT_GOTHAM); + let b = project_snapshot(&poisoned, &OSINT_GOTHAM); + // GraphSnapshot isn't PartialEq; compare the structural projection. + let key = |s: &GraphSnapshot| { + ( + s.nodes.iter().map(|n| (n.id.clone(), n.kind.clone())).collect::>(), + s.edges + .iter() + .map(|e| (e.source.clone(), e.target.clone(), e.label.clone())) + .collect::>(), + ) + }; + assert_eq!(key(&a), key(&b)); + } +} diff --git a/crates/symbiont/src/key_render.rs b/crates/symbiont/src/key_render.rs index ddc1a78b..7f05b765 100644 --- a/crates/symbiont/src/key_render.rs +++ b/crates/symbiont/src/key_render.rs @@ -23,29 +23,15 @@ use lance_graph_contract::canonical_node::NodeRow; use lance_graph_contract::hhtl::NiblePath; use lance_graph_contract::NodeGuid; -/// Lower a GUID's 3×4 HHT cascade — `HEEL·HIP·TWIG`, 3 tiers × 4 nibbles = 12 -/// nibbles, root-first — to a [`NiblePath`] (the radix-trie / CLAM cluster -/// address). `classid` is the routing PREFIX (codebook selector, resolved -/// separately by longest-prefix); the HHTL *path* proper is the 12 HHT nibbles -/// (OGAR canon "3×4 PATH — uniform"). Pure key arithmetic, zero value decode. +/// Lower a GUID to its HHTL routing path via the canonical +/// [`NiblePath::from_guid_prefix`] (`classid_lo·HEEL·HIP·TWIG`, 16 nibbles, +/// root-first) — the radix-trie / CLAM cluster address. Falls back to +/// [`NiblePath::EMPTY`] only for the canon-reserved non-zero high-`classid` case. +/// Thin wrapper kept for callers in this crate; the lowering itself is the +/// contract's single source of truth (no third copy). Zero value decode. #[inline] pub fn hhtl_path_of(guid: &NodeGuid) -> NiblePath { - let tiers = [guid.heel(), guid.hip(), guid.twig()]; - let mut p = NiblePath::EMPTY; - let mut first = true; - for tier in tiers { - // 4 nibbles per u16 tier, most-significant first (root-first). - for shift in [12u32, 8, 4, 0] { - let nib = ((tier >> shift) & 0xF) as u8; - p = if first { - first = false; - NiblePath::root(nib) - } else { - p.child(nib) - }; - } - } - p + NiblePath::from_guid_prefix(guid).unwrap_or(NiblePath::EMPTY) } /// One rendered node, derived from the 32-byte head ONLY. @@ -165,25 +151,27 @@ mod tests { } #[test] - fn hhtl_path_of_bootstrap_is_depth_12_all_zero() { - // A bootstrap GUID (HEEL=HIP=TWIG=0) lowers to a 12-nibble all-zero path - // (root basin 0, descending 0 each level) — every HHT tier consulted, - // none discriminating yet (the zero-fallback ladder, in the path axis). + fn hhtl_path_of_bootstrap_is_depth_16_all_zero() { + // The canonical lowering packs 16 nibbles (classid_lo·HEEL·HIP·TWIG). A + // bootstrap GUID (classid=HEEL=HIP=TWIG=0) lowers to a 16-nibble all-zero + // path — every tier consulted, none discriminating yet (zero-fallback). let g = NodeGuid::local(42); let p = hhtl_path_of(&g); - assert_eq!(p.depth(), 12); + assert_eq!(p.depth(), 16); } #[test] - fn hhtl_path_of_uses_hht_tiers_not_classid_or_identity() { - // classid is the routing prefix (codebook selector), identity is the - // leaf — neither is part of the HHT path. Two GUIDs differing ONLY in - // classid + identity share the same 12-nibble HHTL path; differing in a - // HHT tier changes it. - let a = NodeGuid::new(0xAAAA_AAAA, 0x1234, 0x5678, 0x9ABC, 0, 0x00_0001); - let b = NodeGuid::new(0xBBBB_BBBB, 0x1234, 0x5678, 0x9ABC, 0, 0x00_0002); - let c = NodeGuid::new(0xAAAA_AAAA, 0x0234, 0x5678, 0x9ABC, 0, 0x00_0001); + fn hhtl_path_of_includes_classid_lo_and_hht_not_identity() { + // Canonical lowering: classid_lo is the routing PREFIX (top 4 nibbles), + // HEEL/HIP/TWIG are the cascade, identity (leaf) is NOT in the path. + // a vs b differ ONLY in identity ⇒ same path; a vs c differ in classid_lo + // ⇒ different path; a vs d differ in a HHT tier ⇒ different path. + let a = NodeGuid::new(0x0000_0007, 0x1234, 0x5678, 0x9ABC, 0xAB, 0x00_0001); + let b = NodeGuid::new(0x0000_0007, 0x1234, 0x5678, 0x9ABC, 0xAB, 0x00_0002); + let c = NodeGuid::new(0x0000_0008, 0x1234, 0x5678, 0x9ABC, 0xAB, 0x00_0001); + let d = NodeGuid::new(0x0000_0007, 0x0234, 0x5678, 0x9ABC, 0xAB, 0x00_0001); assert_eq!(hhtl_path_of(&a), hhtl_path_of(&b)); assert_ne!(hhtl_path_of(&a), hhtl_path_of(&c)); + assert_ne!(hhtl_path_of(&a), hhtl_path_of(&d)); } }