diff --git a/.claude/board/AGENT_LOG.md b/.claude/board/AGENT_LOG.md index 411c85a31..cb0f7630a 100644 --- a/.claude/board/AGENT_LOG.md +++ b/.claude/board/AGENT_LOG.md @@ -1,6 +1,18 @@ +## 2026-06-09 — plan addendum: left-prefix parsing confirmed + D-PG-7 deterministic foveated tree-builder + +**Main thread (Fable).** User direction validated against identity.rs octets: GUID left half (class+tree) is order-preserving plain bytes ⇒ Cypher label/subtree patterns = byte-prefix predicates on FixedSizeBinary(16) via Lance zone-maps; similarity leg (RaBitQ/CAM-PQ/Binary16K) rides the same row. Two caveats recorded (namespace-first ordering; ≤4-nibble GUID prefix). New M6 + D-PG-7: NiblePath assignment computable by deterministic hierarchical partition ("deterministic Louvain" → concretely ndarray CLAM pole-split, 16-way, capacity-bounded ⇒ foveation), with the iron requirement APPEND-STABLE (bootstrap once; minted paths never move; layout_version gates changes). Query-time twin noted (cascade / bgz-tensor HHTL cache). Plan §8 + STATUS_BOARD row. Commit: this. + +## 2026-06-09 — polyglot query-membrane research: 2 sweeps + spot-verification → plan v1 (D-PG-1..6) + +**Main thread (Fable 5 1M) + 2 Explore sweeps (Sonnet).** Researched "parse mailboxes via SurrealDB's AST adapter as a normal cold path; ontology = Christmas tree, decorations materialize at HHTL addresses" + user-added scope (Node Container answers DataFusion UDF + SurrealQL DDL AST + Neo4j/Cypher). Verified at file:line: fork keys storekey-encoded ORDER-PRESERVING (arrays incl.), record-ranges lower to `stream_keys_vals(beg..end)` (pipeline.rs:223) → HHTL subtree = one native range scan under `addr64 = path << 4·(16−depth)`; kv-lance FULLY in-tree (get :646 / keys :824 / scan :848, MVCC+timeline, ~6k test lines) — `surreal_container` BLOCKED(C/D) stale; typed `surrealdb-ast` crate + C16b DDL builders (`new_for_ddl`→`ToSql`, DB-free; consumer op-surreal-ast/nexgen) = the AST-adapter surface; frontend slot = ArenaIR strategy registry (mod.rs:57-60). **Agent-claim correction:** sweep claimed `MailboxSoA` impls `SoaEnvelope` — spot-grep disproved (only TestEnvelope; identity N3 LIVE → D-PG-2). Ruling respected (LanceDB leads; SurrealDB = view). Deliverable: `.claude/plans/polyglot-container-query-membrane-v1.md` + INTEGRATION_PLANS prepend + STATUS_BOARD D-PG-1..6 (all Queued). No code. No epiphany entries (council gate available on request). Commit: this. + +## 2026-06-09 — D-IDENTITY-2 Phase B first brick: frugal north-star mint (dedup + bijection) landed + +**Main thread (Fable).** Implemented moves 1+2+3 of the identity plan's Phase B seam in `lance-graph-ontology` (registry.rs +242, namespace.rs, bridge.rs): (1) dedup-by-URI mint — a canonical class URI already in the dictionary REUSES its global `entity_type` (new row, new bridge/namespace, same template id); fresh mints stay monotone append-order with gaps, u16-overflow-guarded. (2) `entity_type↔NiblePath` bijection pair table + `register_class_path` (both-way conflict-rejecting, EMPTY-sentinel guard, idempotent same-pair) + `niblepath_of`/`entity_type_of`/`rows_with_entity_type`. (3) round-trip tests. +5 tests (dedup-shares-id, monotone-with-gaps, checksum-reappend-keeps-id, bijection-round-trips, bijection-conflicts-rejected); 14 registry tests green; crate suites green. 3 stale-doc fixes (namespace.rs "dense within the namespace" → GLOBAL; bridge.rs "dense index" → compare-only). My 3 files clippy/fmt-clean; pre-existing crate-wide `-D warnings` (oxrdf/doc-overindent in untouched files) + fmt drift (54 files) left as-is per surgical-diff discipline. Board: STATUS_BOARD identity section (D-IDENTITY-1..4), TD-PAIRTABLE-1, plan LANDED note. Deferred: move 4 (gate positional helper, D-IDENTITY-3). + ## 2026-06-09 — D-IDENTITY Phase B: global entity_type ratified + mint trace correction -**Main thread (Opus→Fable mid-session).** Decision-gate ratified `entity_type` = GLOBAL shared template id (DECISION-3). Pre-change trace overturned two beliefs: (a) `namespace.rs:12` "dense within the namespace" is STALE — live mint `registry.rs:476` is already global append-order; (b) registry is NOT template-deduped (own claim, corrected in-place in the plan). Blast radius of global/sparse ids traced benign (~16 readers, none dense-index). Synthesis: bijection IS the dedup — one `NiblePath ↔ entity_type` pair table = template registry + dedup index + bijection witness. Plan: DECISION-3 + CORRECTION + refinement. Epiphany: E-MINT-TRACE-1; E-OGAR-NORTHSTAR-1 Status updated. Rides in #481. Next: implement first brick (pair-table mint + round-trip test) in lance-graph-ontology. +**Main thread (Opus→Fable mid-session).** Decision-gate ratified `entity_type` = GLOBAL shared template id (DECISION-3). Pre-change trace overturned two beliefs: (a) `namespace.rs:12` "dense within the namespace" is STALE — live mint `registry.rs:476` is already global append-order; (b) registry is NOT template-deduped (own claim, corrected in-place in the plan). Blast radius of global/sparse ids traced benign (~16 readers, none dense-index). Synthesis: bijection IS the dedup — one `NiblePath ↔ entity_type` pair table = template registry + dedup index + bijection witness. Plan: DECISION-3 + CORRECTION + refinement. Epiphany: E-OGAR-NORTHSTAR-1 Status updated. Rides in #481. Next: implement first brick (pair-table mint + round-trip test) in lance-graph-ontology. ## 2026-06-09 — D-IDENTITY decisions: OGAR mirror (ratified) + north-star template model diff --git a/.claude/board/EPIPHANIES.md b/.claude/board/EPIPHANIES.md index 25755bf99..67895390d 100644 --- a/.claude/board/EPIPHANIES.md +++ b/.claude/board/EPIPHANIES.md @@ -1,40 +1,7 @@ -## 2026-06-09 — E-MINT-TRACE-1 — the live mint is already global (registry.rs:476); the "namespace-local" doc is stale; dedup is net-new; the bijection IS the dedup - -**Status:** FINDING (traced, ratified: `entity_type` = global shared template id) -**Confidence:** High (read the mint, not the doc comment) - -**Trace before change paid twice.** (1) `namespace.rs:12` documents `entity_type_id` as "dense **within the namespace**" — but the actual mint is `registry.rs:476 entity_type_id = (rows.len()+1)`: **global append-order across all namespaces**. The doc comment is stale; the GLOBAL semantics DECISION-2/3 want are already the live behavior. (2) It corrected this session's own claim, minutes old: the registry is **not** template-deduped — every append mints a fresh id (`enumerate_first_with_entity_type_id` is defensive, not reuse evidence). Frugal dedup + the `entity_type↔NiblePath` pairing are net-new. - -**Blast radius traced benign:** ~16 `entity_type_id()` readers store-as-column-value or compare; none dense-index an array BY entity_type. Global/sparse ids break nothing. Dedup consequence: per-id row lookup becomes namespace-ambiguous ⇒ resolve by `(namespace, entity_type)`. - -**The synthesis that shrinks Phase B:** the bijection IS the dedup. One pair table `NiblePath ↔ entity_type` in the registry: path present ⇒ reuse the template id (new row, new namespace); absent ⇒ mint fresh (monotone, never reused) + record the pair. The pair table is simultaneously the template registry, the dedup index, and the bijection witness the round-trip test proves. Moves 1+2 of the Phase B seam are one mechanism. - -**Process lesson (generalizes):** doc comments describe intent at write-time; the mint line is the contract. For any "is this id local or global / dense or sparse" question, read the assignment site and grep for dense-indexing consumers before believing prose. - -**Cross-ref:** identity-architecture plan DECISION-3 + Phase B grounded seam (CORRECTION block); E-OGAR-NORTHSTAR-1 (Status updated); I-LEGACY-API-FEATURE-GATED (the positional `contract/ontology.rs:85` helper is the v1 path to gate). - -## 2026-06-09 — E-ANCESTRY-TRINITY-1 — NiblePath::is_ancestor_of is ONE bit-shift read three ways: subClassOf = supervision-edge = north-star template specialization - -**Status:** FINDING (cross-session convergence — OGAR/SurrealDB session + identity-contract session, independently) -**Confidence:** High - -**The convergence.** A parallel CCA2A session (OGAR / nexgen op-surreal-ast / SurrealDB RecordId) pulled #480 and independently re-derived the OGAR↔lance-graph membrane as **"the registry mint of `(entity_type, NiblePath)` per class"** — exactly DECISION-2 (OGAR mirror) committed from this side in #481. Two sessions, opposite directions, same membrane. - -**The new synthesis it surfaces:** `NiblePath::is_ancestor_of` (a single HHTL bit-shift on the GUID routing prefix) is simultaneously THREE relations: -- **OWL `subClassOf`** (ontology inheritance) — OGAR-AST-CONTRACT §1. -- **OTP supervision edge** (ractor parent-routing / delegation through `OrchestrationBridge`) — the other session's "supervisor-edge is now [G] mechanical" finding. -- **North-star template specialization** (a domain class descends from its shared template) — E-OGAR-NORTHSTAR-1. - -They are the SAME relation: the north-star template hierarchy IS the routing/supervision hierarchy IS the subClass hierarchy — one bit-shift, three names. Consequence: reusing a template (inherit + switch namespace), being-supervised-by, and being-a-subclass-of are the same arithmetic; there is no separate routing structure to maintain. - -**Coordination:** the OGAR session is on #480 (Phase A); #481 carries the OGAR-side answer it needs — OGAR = OGIT mirror, immutable ClassIds, north-star spine, `namespace`=domain. Its proposed `D-IDENT` paired-note + `D-IDENTITY-PIN` should absorb the `namespace`=domain + north-star framing on next pull. - -**Cross-ref:** E-OGAR-NORTHSTAR-1; E-IDENTITY-WHITEBOX-1; identity-architecture DECISION-2 + north-star guard; `hhtl.rs::is_ancestor_of`. - ## 2026-06-09 — E-OGAR-NORTHSTAR-1 — ontology cache = OGAR mirror with a reusable north-star template spine (namespace specializes, entity_type is shared) -**Status:** DECISION (OGAR mirror RATIFIED via decision-gate; north-star template model RATIFIED 2026-06-09 "frugal it is"; `entity_type` = GLOBAL shared template id RATIFIED via decision-gate — see E-MINT-TRACE-1) -**Confidence:** High (both halves ratified; mint trace confirms global append-order is already the live mint) +**Status:** DECISION (OGAR mirror RATIFIED via decision-gate; north-star template model RATIFIED 2026-06-09 "frugal it is"; `entity_type` = GLOBAL shared template id RATIFIED via decision-gate) +**Confidence:** High (both halves ratified; the live mint is global append-order across namespaces) **Two decisions, one architecture.** diff --git a/.claude/board/INTEGRATION_PLANS.md b/.claude/board/INTEGRATION_PLANS.md index e330b78e9..fd46b2167 100644 --- a/.claude/board/INTEGRATION_PLANS.md +++ b/.claude/board/INTEGRATION_PLANS.md @@ -1,3 +1,19 @@ +## 2026-06-09 — polyglot-container-query-membrane-v1 (Node Container answers Cypher + SurrealQL AST + DataFusion UDF over one HHTL address space; mailbox = a normal cold path) + +**Status:** RESEARCH MAP + PLAN. Grounded by two parallel sweeps (lance-graph + surrealdb fork) with main-thread spot-verification; one agent claim caught false (SoaEnvelope has ZERO real impls — identity N3 stands live). **Plan file:** `.claude/plans/polyglot-container-query-membrane-v1.md`. +**Owns:** 6 deliverables D-PG-1..6. +- D-PG-1: `addr64 = path << 4·(16−depth)` codec + order-preservation property test (subtree ⇔ contiguous range ⇔ `is_ancestor_of`) — the falsifiable first brick +- D-PG-2: `SoaEnvelope` impl for `MailboxSoA` (= identity-plan N3) + LE parity test +- D-PG-3: read-only mailbox `Transactable` adapter (get/keys/keysr/scan/scanr over phase-pinned snapshot) + hot==cold differential test +- D-PG-4: `SurrealqlParse` strategy → ArenaIR (frontend #5, same slot as sparql_parse) +- D-PG-5: DDL ⇄ registry bridge (DEFINE TABLE/FIELD ⇄ dedup-by-URI mint + MappingRow/FieldMask; C16b builders as exchange format) +- D-PG-6 (optional): `surreal_container` unblock → Rubicon kanban VIEW over leading LanceDB (ruling-compliant; off critical path) +**Key findings:** surrealdb fork keys are storekey-encoded (order-preserving, arrays included) and record-ranges lower to native KV byte-range scans (`stream_keys_vals`, pipeline.rs:223); kv-lance is FULLY implemented in-tree (18/19 Transactable methods + MVCC/timeline) — `surreal_container` BLOCKED(C/D) is stale; the fork's typed `surrealdb-ast` crate + C16b DDL builders (`new_for_ddl` → `ToSql`, database-free) are the AST-adapter surface. +**Ruling respected:** LanceDB leads; SurrealDB is a view/dialect (handover 2026-05-28 §2, E-RUBICON-RACTOR). +**Companion plans:** `identity-architecture-exists-vs-needs-v1` (addresses), `bindspace-singleton-to-mailbox-soa-v1` (mailbox, #418). + +--- + ## 2026-06-07 — singleton-to-snapshot-nudge-v1 (workspace-wide audit: every shared-mutable singleton → per-owner MailboxSoA + Arc-swap COW snapshot; read-only codebooks explicitly left as-is) **Status:** PROPOSAL. Design-spec + audit only, no code beyond the AttentionMatrix correctness fix. **Plan file:** `.claude/plans/singleton-to-snapshot-nudge-v1.md`. diff --git a/.claude/board/STATUS_BOARD.md b/.claude/board/STATUS_BOARD.md index ffb1347ba..a05c89eae 100644 --- a/.claude/board/STATUS_BOARD.md +++ b/.claude/board/STATUS_BOARD.md @@ -799,3 +799,32 @@ When a deliverable is abandoned: |---|---|---|---| | D-EW64-3 | `EpisodicEdges64::{coldest, contains}` — MRU cold-tier read surface | In PR | contract lib 545 green; clippy clean | | D-EW64-4 | `DemotionSink` trait + `promote_into` — hot→cold exit seam (impls gated OQ-11.6) | In PR | contract lib 545 green; clippy clean | + +--- + +## identity-architecture-exists-vs-needs-v1 — structured NodeGuid + frugal north-star OGAR mint + +Plan path: `.claude/plans/identity-architecture-exists-vs-needs-v1.md`. Epiphanies: E-IDENTITY-WHITEBOX-1, E-OGAR-NORTHSTAR-1. Rides in the open identity PR on `claude/nice-edison-g4rhhl`. + +| D-id | Title | Crate(s) / repo | ~LOC | Risk | Status | PR / Evidence | +|---|---|---|---|---|---|---| +| D-IDENTITY-1 | `identity::NodeGuid` (UUIDv8) + `NiblePath::from_packed` — byte layout, version/variant gates, field-isolation matrix | `lance-graph-contract` | ~250 | LOW | **Shipped** | Phase A; +15 contract tests, clippy-D clean | +| D-IDENTITY-2 | Frugal north-star mint: dedup-by-URI global template id + `entity_type↔NiblePath` bijection pair table + round-trip tests (moves 1+2+3) | `lance-graph-ontology` | ~250 | LOW | **In PR** | dedup + `register_class_path`/`niblepath_of`/`entity_type_of`/`rows_with_entity_type`; +5 tests, 14 registry green | +| D-IDENTITY-3 | Gate legacy positional `contract/ontology.rs:85 entity_type_id` per I-LEGACY-API-FEATURE-GATED (move 4) | `lance-graph-contract` / -ontology | ~80 | MED | **Queued** | needs consumer audit first | +| D-IDENTITY-4 | Pair-table Lance persistence (re-register-on-hydration → persisted) | `lance-graph-ontology` | ~60 | LOW | **Queued** | TECH_DEBT TD-PAIRTABLE-1 | + +--- + +## polyglot-container-query-membrane-v1 — three dialects, one HHTL address space, mailbox as cold path + +Plan path: `.claude/plans/polyglot-container-query-membrane-v1.md`. Research grounded 2026-06-09; rides on `claude/nice-edison-g4rhhl`. + +| D-id | Title | Crate(s) / repo | ~LOC | Risk | Status | PR / Evidence | +|---|---|---|---|---|---|---| +| D-PG-1 | `addr64` left-aligned HHTL codec + order-preservation property test (subtree ⇔ contiguous range) | `lance-graph-contract` | ~120 | LOW | **Queued** | first brick; everything stands on it | +| D-PG-2 | `SoaEnvelope` impl for `MailboxSoA` (= identity N3, confirmed live) + LE parity test | `cognitive-shader-driver` | ~150 | LOW | **Queued** | gap re-verified 2026-06-09 (§2.4 of plan) | +| D-PG-3 | Read-only mailbox `Transactable` adapter (5 methods, phase-pinned) + hot==cold differential test | shader-driver + fork contract | ~250 | MED | **Queued** | gated on D-PG-1,2 | +| D-PG-4 | `SurrealqlParse` strategy → ArenaIR (SELECT point/range) + selector rule | `lance-graph-planner` | ~300 | MED | **Queued** | slot proven by sparql_parse | +| D-PG-5 | DDL ⇄ registry bridge (DEFINE walker → mint; reverse via C16b `ToSql`) | `lance-graph-ontology` | ~250 | MED | **Queued** | gated on fork C16c | +| D-PG-6 | (optional) `surreal_container` unblock → kanban view over LanceDB | `surreal_container` | ~200 | LOW | **Queued** | ruling-compliant; OQ-PG1 open | +| D-PG-7 | Deterministic foveated tree-builder (CLAM-style 16-way bootstrap + append-stable insertion → `register_class_path`) | `lance-graph-ontology` + ndarray CLAM | ~300 | MED | **Queued** | plan §8 addendum; gated on D-PG-1; determinism + append-stability property tests mandatory | diff --git a/.claude/board/TECH_DEBT.md b/.claude/board/TECH_DEBT.md index 3863b66d6..c3ef7040a 100644 --- a/.claude/board/TECH_DEBT.md +++ b/.claude/board/TECH_DEBT.md @@ -2649,3 +2649,16 @@ files; 217 tests green. Fixes are hand-applied (NOT `clippy --fix`, which mangle `reader_state.rs` into stranded-comment match guards). The CI clippy step for deepnsm was promoted Tier-B advisory → Tier-A gating in `.github/workflows/style.yml`. + +## TD-PAIRTABLE-1 — entity_type↔NiblePath pair table is in-memory only +**Status:** Open (2026-06-09, D-IDENTITY-2) + +The `path_by_type` / `type_by_path` bijection tables on `RegistryState` +(registry.rs) are NOT persisted to the Lance cache. `absorb_row` (the +`OntologyRegistry::open` replay path) reconstructs `rows` + the by-name/by-uri +indices but leaves the pair tables empty; callers must re-`register_class_path` +after hydration (the OGAR/hydrator seed step does this). Persisting the pairs +(two extra Lance columns, or derive from a NiblePath column on MappingRow) is +the Paid state. Low risk: dedup itself survives replay (the deduped +`entity_type` is baked into each persisted `schema_ptr`); only the path +bijection needs re-seeding. Pair: D-IDENTITY-4. diff --git a/.claude/plans/identity-architecture-exists-vs-needs-v1.md b/.claude/plans/identity-architecture-exists-vs-needs-v1.md index fad5c1400..dc0469a86 100644 --- a/.claude/plans/identity-architecture-exists-vs-needs-v1.md +++ b/.claude/plans/identity-architecture-exists-vs-needs-v1.md @@ -244,6 +244,24 @@ test proves. **First brick:** moves 1+2+3 together (mint + pairing + round-trip); the legacy-gate (4) follows once nothing canonical reads the positional helper. +> **LANDED (2026-06-09, D-IDENTITY-2).** Moves 1+2+3 shipped in +> `lance-graph-ontology` (registry.rs + namespace.rs + bridge.rs): +> dedup-by-URI mint (one global template id, reused across bridges / +> namespaces; monotone-with-gaps, never renumbered), the +> `entity_type ↔ NiblePath` bijection pair table (`register_class_path` / +> `niblepath_of` / `entity_type_of`, both-way conflict-rejecting + +> EMPTY-sentinel guard), `rows_with_entity_type` for the multi-row +> reading, and the three stale-doc corrections. 5 new tests +> (`same_uri_…_shares_one_template_id`, `fresh_mint_is_monotone_with_gaps`, +> `changed_checksum_reappend_keeps_the_template_id`, +> `class_path_bijection_round_trips_both_ways`, +> `class_path_bijection_conflicts_are_rejected`); 14 registry tests green, +> crate suites green, my 3 files clippy/fmt-clean (pre-existing crate-wide +> `-D warnings` + fmt drift in untouched files left as-is). **Deferred:** +> move 4 (gate `contract/ontology.rs:85` positional helper — needs the +> consumer audit first); pair-table Lance persistence (TECH_DEBT — in-memory +> only, re-registered on hydration). + ## Honest ledger - **[G] (exists, reuse):** all 6 layers above — `NiblePath`, `SchemaPtr`, `ClassId`, diff --git a/.claude/plans/polyglot-container-query-membrane-v1.md b/.claude/plans/polyglot-container-query-membrane-v1.md new file mode 100644 index 000000000..aa756b408 --- /dev/null +++ b/.claude/plans/polyglot-container-query-membrane-v1.md @@ -0,0 +1,256 @@ +# Polyglot Container Query Membrane — SurrealQL AST + DataFusion UDF + Cypher over one HHTL address space (v1) + +> **Status:** RESEARCH MAP + INTEGRATION PLAN. Grounded 2026-06-09 by two parallel +> repo sweeps (lance-graph + the surrealdb fork) with main-thread spot-verification +> of every load-bearing claim (one agent claim caught false and corrected, §2.4). +> **Branch:** `claude/nice-edison-g4rhhl`. +> **Companions:** `identity-architecture-exists-vs-needs-v1.md` (the address layer), +> `bindspace-singleton-to-mailbox-soa-v1.md` (the mailbox layer, PR #418), +> `.claude/handovers/2026-05-28-1200-...md` §2 (the SurrealDB-as-VIEW ruling), +> `.claude/surreal/` POC corpus (12 tasks; framing partially superseded, see ruling). + +## 1. Thesis — the Christmas tree + +The ontology is a Christmas tree: the **registry bijection** (`entity_type ↔ +NiblePath`, landed in D-IDENTITY-2) is the always-resident skeleton, and **rows +are decorations that are NOT stored in the tree** — they *materialize at read +time at their HHTL address* from whichever tier owns them (hot mailbox snapshot, +Lance cold store, consumer store via `EntityKey`). Three query dialects — +**Neo4j/Cypher**, **SurrealQL (DML + DDL, via the fork's typed AST)**, and +**DataFusion SQL/UDF** — resolve classes against the SAME registry catalog and +addresses against the SAME order-preserving key codec. Consequence: *parsing a +mailbox is indistinguishable from scanning a cold table*. The mailbox is just +another tier behind the same 5-method read contract — "a normal cold path." + +Standing ruling respected throughout (handover 2026-05-28 §2, `E-RUBICON-RACTOR`): +**LanceDB is the leading store; SurrealDB is a view/frontend, never the store.** +An AST adapter is a query surface — it strengthens that ruling rather than +bending it. The embedded SurrealDB-on-kv-lance engine remains the OPTIONAL leg +(kanban view), explicitly off the critical path. + +## 2. Grounded inventory — what EXISTS (verified file:line) + +### 2.1 lance-graph: container, mailbox, addresses, frontends + +| Surface | Where | Status | +|---|---|---| +| `Container = [u64; 256]` (2 KB); `CogRecord { meta, content }` (4 KB); `ContentGeometry` (Bitpacked16K / DenseF32 / TripleSPO / EdgePacked) | contract `container.rs:14-67` | **[G]** | +| `MailboxSoaView` — zero-copy column borrows: `energy() -> &[f32]`, `edges_raw() -> &[u64]`, `meta_raw() -> &[u32]`, `entity_type() -> &[u16]`, + `phase() -> KanbanColumn`, `current_cycle()` | contract `soa_view.rs:40-70` | **[G]** trait | +| `MailboxSoaOwner::try_advance_phase` (Rubicon gate = the snapshot/transaction boundary) | contract `soa_view.rs` | **[G]** trait | +| `SoaEnvelope` — LE geometry: `as_le_bytes()`, `row_le(row)`, `column_le(row,col)`, `verify_layout()`, `cycle()` | contract `soa_envelope.rs:143-252` | **[G]** trait / **[H] ZERO real impls** (§2.4) | +| `MailboxSoA` concrete columns (energy/edges/qualia/meta/entity_type…) | cognitive-shader-driver `mailbox_soa.rs:43` | **[G]** struct; implements NEITHER trait yet | +| `NiblePath` algebra: `child = (parent.path << 4) \| nibble`, low-aligned, `MAX_DEPTH=16`; `is_ancestor_of` = one shift-compare | contract `hhtl.rs:47-101, 176-183` | **[G]** | +| Registry bijection: `path_by_type`/`type_by_path`, `register_class_path` (conflict-rejecting), `niblepath_of`, `entity_type_of`, `rows_with_entity_type`; dedup-by-URI mint | ontology `registry.rs:64-72, 343-418, 565-579` | **[G]** (D-IDENTITY-2) | +| `NodeGuid` UUIDv8 octets: `namespace u8 \| entity_type u16 \| kind u8 \| niblepath_prefix u16 (≤4 nibbles, routing cache ONLY) \| depth \| shape_hash 22b \| local u24 \| layout_version` | contract `identity.rs:68-81, 159-206` | **[G]** (D-IDENTITY-1) | +| Polyglot strategy registry: `cypher_parse` / `gql_parse` / `gremlin_parse` / `sparql_parse` / `arena_ir` all registered as boxed strategies; selector scores by dialect | planner `strategy/mod.rs:50-60`, `strategy/*.rs` | **[G]** — the frontend slot shape | +| DataFusion UDF registration + custom physical ops (`CamPqScanOp`, `CollapseOp`) | core `datafusion_planner/udf.rs`; planner `physical/` | **[G]** | +| `cypher_bridge.rs` in shader-driver (Cypher already reaches the cognitive side) | cognitive-shader-driver | **[G]** | +| Hot==cold intent stated: mailbox columns "read identically whether in RAM … or via Lance snapshot" | `docs/SUBSTRATE-ENDGAME-RUNTIME-VIEW.md` §1.1 | **[G]** doc intent — this plan wires it | + +### 2.2 surrealdb fork (3.1.0-alpha): AST, RecordId, key encoding, kv-lance + +| Surface | Where | Status | +|---|---|---| +| **Typed AST crate** `surrealdb-ast` — `TopLevelExpr`, `Expr`, `Select`, `Create`/`Update`/`Delete`/`Insert`, `DefineNamespace/Database/Table/Field/Index/…`, `RecordId`, `RecordIdKey`, `RecordIdKeyRange`; visitor infra (`visit/`, `mac.rs`) | fork `surrealdb/ast/src/lib.rs:37-160` | **[G]** — public, programmatically constructible | +| Dedicated parser crate (recursive descent) | fork `surrealdb/parser/src/parse/mod.rs:66-69,736` | **[G]** | +| `RecordIdKey::{Number, String, Uuid, **Array**, Object, **Range(Box)**}` | fork `types/src/value/record_id/key.rs:20-55 (Array at :28)` | **[G]** | +| `RecordIdKeyRange { start: Bound, end: Bound }` | fork `types/.../range.rs:17-22` | **[G]** | +| **Order-preserving KV key encoding** via `storekey` — layout `/*{ns_id}*{db_id}*{tb}*{record_id_key}`; lexicographic byte order == logical order, arrays included | fork `core/src/key/record.rs:5-26` | **[G]** — the load-bearing property (codec-level); composed-key proof test is OURS to write (P0) | +| **Record-range → KV byte-range scan**: `range_start_key`/`range_end_key` → `txn.stream_keys_vals(beg..end, …)` | fork `core/src/exec/operators/scan/pipeline.rs:211-238 (:223), 367-412`; `scan/record_id.rs:305-331` | **[G]** — subtree-as-range has a native execution path | +| **kv-lance backend: FULLY IMPLEMENTED in-tree** — `get` :646, `keys` :824, `keysr` :836, `scan` :848, `scanr`, writes, savepoints; MVCC via Lance versions, `Timeline` time-travel, background optimizer; ~6k lines of tests; SDK `Surreal::new::(path)` | fork `core/src/kvs/lance/mod.rs`; `src/engine/local/mod.rs:329`; features `core/Cargo.toml:27`, sdk `Cargo.toml:27` | **[G]** — supersedes `surreal_container` BLOCKED(C/D) | +| `Transactable` contract; read-only subset = `get / keys / keysr / scan / scanr` (+ `kind/closed/writeable`) | fork `core/src/kvs/api.rs:76+` | **[G]** — the tier contract M2 targets | +| **C16b DDL builders** (`op-codegen-bridge`): `new_for_ddl()` + `with_*` setters on `catalog::{Table,Field,Index}Definition`, render via `ToSql` **without a database**; downstream consumer `op-surreal-ast` (openproject-nexgen-rs); C16c adds `From for catalog::*` | fork `.claude/op-codegen-bridge/README.md`; `core/src/catalog/{table.rs, schema/field.rs, schema/index.rs}` | **[G]** active initiative — the DDL exchange format M3 reuses | +| SDK `.query()` accepts **strings only** (`Vec>`) | fork `src/method/query.rs:28-32` | **[G]** constraint → OQ-PG1 | + +### 2.3 lance-graph surreal prior art (and its standing correction) + +- `crates/surreal_container/` — BLOCKED skeleton (12 task stubs). Its BLOCKED(C/D) + reasons are **stale**: the fork dep coordinates now exist locally and `kv-lance` + is in-tree (§2.2). Remains **optional** per the ruling (D-MBX-6 note: "NOT on + the critical path"). +- `.claude/surreal/` 12-task POC + `RECONCILIATION` + `cognitive-substrate.md` — + partially superseded framing ("SurrealDB = Zone-2 cold store" → "LanceDB + leading, SurrealDB a view"); supersedure annotation still pending + (`E-SURREAL-POC-UNANNOTATED-SUPERSEDURE`). + +### 2.4 Sweep-error correction (recorded for provenance) + +The lance-graph sweep agent claimed `MailboxSoA` implements `SoaEnvelope`. +**Spot-grep proves it does not** — the only `impl SoaEnvelope` in the workspace is +the in-test `TestEnvelope` (`soa_envelope.rs:266`), and `mailbox_soa.rs` contains +no trait impls for the type at all. The identity plan's gap **N3 ("SoaEnvelope: +zero impls") stands LIVE** and appears below as D-PG-2. (Method: never let an +agent claim into a plan without a main-thread grep.) + +## 3. The mapping — five moves + +### M1 — One address codec under all three dialects [CONJECTURE until D-PG-1 proof] + +Define the **sortable HHTL address**: `addr64 = path << (4 · (16 − depth))` +(left-align the low-aligned `NiblePath` into the u64). Then for any branch `p` +at depth `d`, **every descendant at every deeper depth falls in ONE contiguous +range** `[ p·16^(16−d) , (p+1)·16^(16−d) )` — and `is_ancestor_of` (the +hhtl.rs:176 shift-compare) is exactly range-containment under this codec. + +Per-dialect, the SAME range is: + +| Dialect | The subtree read | +|---|---| +| SurrealQL | `SELECT * FROM node:[⟨addr_lo⟩]..[⟨addr_hi⟩]` → `RecordIdKeyRange` → `stream_keys_vals(beg..end)` (native, §2.2) | +| DataFusion SQL/UDF | `WHERE addr64 >= lo AND addr64 < hi` on the stored column → partition/row-group pruning; helper UDFs `hhtl_subtree(addr, depth)`, `guid_class(guid)` in `udf.rs` | +| Cypher | label/class scan via registry (`entity_type → NiblePath → range`); blasgraph HHTL basin walk uses the same prefix arithmetic it already has | + +RecordId form for rows: `node:[addr64, local]` (`RecordIdKey::Array` — storekey +keeps array order). **Honest scope:** storekey's element-wise order preservation +is [G] at codec level, but the COMPOSED key (`u64` + `u32` array under +surrealdb's value ordering, negative/width edge cases) gets a property test +before anything builds on it — that test IS deliverable D-PG-1, the falsifiable +brick. **NodeGuid caveat:** the GUID carries only a 4-nibble routing prefix +(identity.rs octets 4-6); scans deeper than 4 resolve the FULL path through the +registry bijection (`niblepath_of`) — the tree is resident, so this is one +HashMap hop, not I/O. + +### M2 — The mailbox is a tier, not an engine [DESIGN] + +1. Implement `SoaEnvelope` for `MailboxSoA` (D-PG-2 = identity-plan N3): the + columns already exist; the impl is descriptor table + LE byte views. +2. A **read-only adapter** implementing the 5-method `Transactable` read subset + (`get/keys/keysr/scan/scanr`) over a **phase-pinned snapshot**: Rubicon + `try_advance_phase` is the transaction boundary; `MailboxSoaView`'s read + airgap satisfies the no-`&mut`-during-computation data-flow rule by + construction. +3. The query layer above (any dialect) cannot tell hot from cold. Acceptance is + a **differential test**: same range query against (a) the mailbox tier and + (b) the same rows persisted in Lance — byte-identical results (D-PG-3). + +This is `SUBSTRATE-ENDGAME` §1.1's "read identically in RAM or via Lance +snapshot" sentence turned into a contract with a test. + +### M3 — DDL declares the tree; the registry IS the catalog [DESIGN] + +- `DEFINE TABLE SCHEMAFULL` ⇄ registry append (dedup-by-URI mint — + same URI never re-mints, D-IDENTITY-2) + `register_class_path` (hang the + ornament hook on the tree). +- `DEFINE FIELD ON ` ⇄ `MappingRow` property → `FieldMask` bit + (parent-OR-delta inheritance already exists). +- `DEFINE INDEX` ⇄ bijection entry / Lance index declaration. +- **Exchange format = the C16b builders**: registry → `TableDefinition` + (`new_for_ddl().with_…`) → `to_sql()` → SurrealQL text → fork parser → + typed AST → registry. Round-trip = the schema's `roundtrip_eq` analogue. +- Cypher `CREATE (:Label)` and DataFusion `CREATE EXTERNAL TABLE` resolve to + the SAME mint — three DDL dialects, one catalog, zero duplicate ids (the + dedup mint guarantees it). + +### M4 — SurrealQL as frontend #5 [DESIGN] + +`strategy/surrealql_parse.rs`: typed `surrealdb-ast` statements → `ArenaIR`, +registered exactly like `sparql_parse` (one `Box::new` line in +`strategy/mod.rs:57-60`, one selector scoring rule). Subset order: SELECT +point-get → SELECT record-range (M1) → graph-step (`->edge->` onto +episodic/causal edges) → DDL (M3). Cypher is already frontend #1; GQL/Gremlin/ +SPARQL prove the slot shape; DataFusion UDFs make the same primitives available +to plain SQL. **No new REST endpoint, no new service** — this is a parser +strategy feeding the existing IR, per `lab-vs-canonical-surface.md`. + +### M5 — Embedded SurrealDB view [OPTIONAL — per the ruling] + +Unblock `surreal_container` (deps are now real, §2.2/§2.3) ONLY for the Rubicon +kanban view over leading LanceDB (the #418 framing). Execution seam is open +(OQ-PG1: SDK `.query()` is string-only → render via `ToSql`, or call core +`Datastore` directly). Explicitly not on the critical path; D-PG-1..5 do not +depend on it. + +## 4. Deliverables + +| D-id | What | Reuses [G] | Adds [H] | Gate | +|---|---|---|---|---| +| **D-PG-1** | `addr64` codec + **order-preservation property test** (random NiblePaths: byte-order of encoded keys ⇔ `is_ancestor_of` containment; composed `[addr64, local]` array form under storekey) | NiblePath, storekey | the codec fn + proptest | none — **first brick** | +| **D-PG-2** | `SoaEnvelope` impl for `MailboxSoA` (= identity N3, confirmed live §2.4) + LE parity vs Lance bytes | trait + struct + columns | the impl + `verify_layout` test | none | +| **D-PG-3** | Read-only mailbox `Transactable` adapter (5 methods over phase-pinned envelope snapshot) + **hot==cold differential test** | Transactable contract, Rubicon, D-PG-2 | the adapter | D-PG-1,2 | +| **D-PG-4** | `SurrealqlParse` strategy → ArenaIR (SELECT point/range subset) + selector rule + 3 golden queries | surrealdb-ast, ArenaIR, strategy registry | the strategy | D-PG-1 | +| **D-PG-5** | DDL ⇄ registry bridge (Define walker → mint/MappingRow; reverse render via C16b builders) | C16b, dedup mint, bijection | the walker + round-trip test | C16c upstream (`From` impls) | +| **D-PG-6** *(opt)* | `surreal_container` unblock → kanban view over LanceDB | kv-lance in-tree | dep wiring + view | ruling-compliant; OQ-PG1 | + +Phases: **P0** = D-PG-1 alone (falsifiable; everything stands on it). **P1** = +D-PG-2+3 (the "mailbox is a normal cold path" claim becomes a passing test). +**P2** = D-PG-4 (three dialects live). **P3** = D-PG-5 (the tree is declared in +DDL). P4 = D-PG-6 if/when wanted. + +## 5. Iron-rule + ruling compliance + +- **I-VSA-IDENTITIES:** pure register pattern — addresses point to content; + nothing is bundled or superposed. Test 0 (natural ids exist) passes by + construction. +- **LanceDB-leading ruling:** SurrealQL here is a *dialect*; the engine leg is + optional and view-only. No persistence moves to SurrealDB. +- **I-LEGACY-API-FEATURE-GATED:** the codec carries `layout_version` (NodeGuid + octet 13 already reserves it); any future addr64 layout change versions + through it. +- **Data-flow rule (no `&mut` during computation):** all query reads are `&self` + on phase-pinned snapshots; mutation stays behind Rubicon + commit_event. +- **lab-vs-canonical-surface:** no new endpoints; frontends are parser + strategies into the existing IR/bridge. + +## 6. Open questions + +- **OQ-PG1** — embedded-view execution seam: SDK `.query()` is string-only; + choose `ToSql` render vs direct core `Datastore` call when D-PG-6 activates. +- **OQ-PG2** — store `addr64` denormalized as a row column (cheap pruning, + 16 B/row total with local) vs derive from registry at plan time (no + duplication; one hop). Decide at D-PG-3 with a measurement, per Rule 7. +- **OQ-PG3** — `Bound` semantics: normalize SurrealQL inclusive/exclusive + bounds to the half-open `[lo, hi)` convention of the subtree formula at + D-PG-1 (property test covers both bound kinds). + +## 7. Cross-refs + +Identity plan (addresses; N3→D-PG-2), #418 mailbox plan + D-MBX-6, handover +2026-05-28 §2 (ruling), `.claude/surreal/` POC (fold-in pending), fork +`op-codegen-bridge` C16b/C16c, `SUBSTRATE-ENDGAME-RUNTIME-VIEW.md` §1.1, +`docs/CLUSTER_ASYMMETRY.md` (surreal-cluster as Raft provider — unrelated leg). + +## 8. Addendum 2026-06-09 — left-prefix parsing + deterministic foveated tree construction (user direction) + +**User framing, confirmed against `identity.rs:68-81`:** the GUID reads as +`classid-HHHH-HHHH-TTTT-LLLLLLIDENTI` — octets 0-3 class (`namespace|entity_type|kind`), +4-7 tree address (NiblePath prefix + depth), 8-9 shape, 10-15 local identity. +**The left half is plain order-preserving bytes** ⇒ Neo4j/Cypher label + subtree +patterns compile to byte-prefix/range predicates on a `FixedSizeBinary(16)` GUID +column — Lance zone-maps/scalar indexes serve them directly; quantized-vector +indexes (RaBitQ-style 1-bit codes / CAM-PQ / Binary16K Hamming) serve the +similarity leg on the SAME row. Structural predicate + similarity predicate = +two indexes, one container. Caveats (both already in §3 M1): namespace sorts +first (cross-namespace template scans = ≤256 ranges or one registry hop); +GUID carries ≤4 path nibbles (deeper scans resolve via registry / addr64). + +**M6 — deterministic foveated tree construction [CONJECTURE until D-PG-7 test].** +NiblePath assignments need not be purely editorial (E-OGAR named curation the +only real cost): they can be COMPUTED by a deterministic hierarchical +partitioner over class fingerprints/co-occurrence — "deterministic Louvain" in +the user's phrase, with three mandatory properties: + +1. **Deterministic** — canonical input ordering + stable tie-breaks (Leiden-with- + canonical-order, or the already-in-tree deterministic divisive splitter: + **ndarray CLAM**, 46 tests — preferred starting point over Louvain proper, + which is node-order dependent in its classic form). +2. **16-way capacity-bounded** — one nibble per level; a basin subdivides only + when it exceeds capacity/density θ ⇒ **foveation falls out**: depth grows + where data is dense, stays shallow where sparse. +3. **Append-stable (the identity-stability requirement)** — clustering runs ONCE + as bootstrap; thereafter new classes insert greedily under the nearest + existing basin and **minted paths never move** (protobuf-field-number + discipline, same as the entity_type mint). Full re-clustering on rebuild + would re-address every GUID — forbidden. Tree layout changes version through + `layout_version` (octet 13) per I-LEGACY-API-FEATURE-GATED. + +Query-time twin: the ndarray cascade (Belichtungsmesser bands, early exit) and +bgz-tensor's HHTL cache (95% of pairs skipped) are the SAME foveation principle +applied at read time — build-time depth where dense, query-time attention where +relevant. One principle, both sides of the store. + +**D-PG-7 (Queued):** deterministic tree-builder bootstrap — partition class +fingerprints (CLAM-style pole-split, 16-way, capacity θ) → NiblePath +assignments → `register_class_path` batch; property tests: (a) two runs over +permuted input yield byte-identical trees, (b) append of M new classes leaves +all prior paths unchanged, (c) depth distribution tracks density (foveation +witness). Gated on D-PG-1 (the codec the paths feed). diff --git a/crates/lance-graph-ontology/src/bridge.rs b/crates/lance-graph-ontology/src/bridge.rs index 32f9366e0..2a08d462f 100644 --- a/crates/lance-graph-ontology/src/bridge.rs +++ b/crates/lance-graph-ontology/src/bridge.rs @@ -136,8 +136,11 @@ pub trait NamespaceBridge: Send + Sync { } } -/// Pointer to an entity in the dictionary. The hot-path consumer uses -/// `schema_ptr.entity_type_id()` as a dense index into per-namespace data. +/// Pointer to an entity in the dictionary. The hot-path consumer compares +/// `schema_ptr.entity_type_id()` — the GLOBAL template id (DECISION-3), +/// shared across namespaces for the same canonical class. Ids are sparse +/// (monotone with gaps), so compare/lookup by id; never dense-index an +/// array with it. #[derive(Clone, Copy, Debug, PartialEq, Eq, Hash)] pub struct EntityRef { pub schema_ptr: SchemaPtr, diff --git a/crates/lance-graph-ontology/src/namespace.rs b/crates/lance-graph-ontology/src/namespace.rs index 051918c79..f9b361b50 100644 --- a/crates/lance-graph-ontology/src/namespace.rs +++ b/crates/lance-graph-ontology/src/namespace.rs @@ -9,7 +9,10 @@ //! ```text //! SchemaPtr (u32): //! bits 31..24 : namespace_id (u8) -//! bits 23..8 : entity_type_id (u16, dense within the namespace) +//! bits 23..8 : entity_type_id (u16, GLOBAL template id — DECISION-3: +//! append-order minted across ALL namespaces, shared by +//! every mapping of the same canonical class; monotone, +//! never renumbered, gaps allowed) //! bits 7..0 : kind discriminant (u8) — Entity / Edge / Attribute //! ``` //! @@ -103,7 +106,10 @@ impl std::fmt::Display for OgitUri { /// Packed schema pointer. Returned from /// [`crate::OntologyRegistry::resolve`]. The hot path consumer pattern is /// to compare the `namespace_id()` against the bridge's lock and then use -/// the `entity_type_id()` as the dense local index. +/// the `entity_type_id()` as the GLOBAL template id (DECISION-3: one id +/// per canonical class, shared across namespaces — `(namespace_id, +/// entity_type_id)` = (domain, shared shape); cross-domain alignment is a +/// u16 compare). /// /// Carries an `ontology_context_id: u32` (the named-graph context per /// `lance-graph-rdf-fma-snomed-v1.md` §Core types and the diff --git a/crates/lance-graph-ontology/src/registry.rs b/crates/lance-graph-ontology/src/registry.rs index 19147ff1b..7de85b885 100644 --- a/crates/lance-graph-ontology/src/registry.rs +++ b/crates/lance-graph-ontology/src/registry.rs @@ -25,6 +25,7 @@ use crate::proposal::{ }; use crate::semantic_types::SemanticTypeMap; use crate::ttl_parse::{parse_ttl_directory_with_provenance, ttl_root_checksum}; +use lance_graph_contract::hhtl::NiblePath; use lance_graph_contract::property::{Marking, SemanticType}; use std::collections::HashMap; use std::path::{Path, PathBuf}; @@ -60,6 +61,14 @@ struct RegistryState { /// whitelist registered via `register_edge_types`. Populated by the /// per-ontology hydrators in `crate::hydrators` (e.g. `hydrate_dolce`). bundles: HashMap, + /// DECISION-3 (frugal north-star): the bijective class-pair table + /// `entity_type ↔ NiblePath`. One structure, three readings — the + /// template registry (same canonical class ⇒ same template id), the + /// dedup index, and the Eineindeutigkeit witness the round-trip test + /// proves. In-memory only: hydration / replay re-registers pairs + /// (lance-cache persistence of the pair table is tracked in TECH_DEBT). + path_by_type: HashMap, + type_by_path: HashMap, } impl OntologyRegistry { @@ -324,6 +333,13 @@ impl OntologyRegistry { } /// Resolve a `BindSpace.entity_type` index to its row (D-CASCADE-V1-7). + /// + /// Post-DECISION-3 dedup one template id can own N rows (one per + /// bridge / namespace mapping of the same canonical class); this + /// returns the FIRST — the mint row, stable in registration order. + /// All rows with one id share the canonical URI, so URI-derived + /// readings (DOLCE classification, shape) are unambiguous. For the + /// full set use [`OntologyRegistry::rows_with_entity_type`]. pub fn enumerate_first_with_entity_type_id(&self, entity_type_id: u16) -> Option { let s = self.inner.read().unwrap(); s.rows @@ -332,6 +348,75 @@ impl OntologyRegistry { .cloned() } + /// All rows sharing a template `entity_type` — the multi-row reading + /// after DECISION-3 dedup (one template id, N bridge/namespace rows). + /// The first element is the mint row. + pub fn rows_with_entity_type(&self, entity_type_id: u16) -> Vec { + let s = self.inner.read().unwrap(); + s.rows + .iter() + .filter(|r| r.schema_ptr.entity_type_id() == entity_type_id) + .cloned() + .collect() + } + + /// Pair a minted template `entity_type` with its [`NiblePath`] (the + /// Abstammung-tree address). Enforces Eineindeutigkeit both ways + /// (DECISION-2 enforcement (a)): one type ↔ one path; re-registering + /// the SAME pair is idempotent-`Ok`; a conflicting pair, an unminted + /// type, or the EMPTY no-route sentinel is an error. Failed + /// registrations leave no residue. + pub fn register_class_path(&self, entity_type_id: u16, path: NiblePath) -> Result<()> { + if path == NiblePath::EMPTY { + return Err(Error::other( + "NiblePath::EMPTY is the no-route sentinel, not a class address", + )); + } + let mut state = self.inner.write().unwrap(); + if !state + .rows + .iter() + .any(|r| r.schema_ptr.entity_type_id() == entity_type_id) + { + return Err(Error::other(format!( + "entity_type {entity_type_id} was never minted" + ))); + } + match ( + state.path_by_type.get(&entity_type_id).copied(), + state.type_by_path.get(&path).copied(), + ) { + (Some(p), _) if p == path => Ok(()), + (Some(p), _) => Err(Error::other(format!( + "bijection conflict: entity_type {entity_type_id} is paired with {p:?}, refusing {path:?}" + ))), + (None, Some(t)) => Err(Error::other(format!( + "bijection conflict: {path:?} is paired with entity_type {t}, refusing {entity_type_id}" + ))), + (None, None) => { + state.path_by_type.insert(entity_type_id, path); + state.type_by_path.insert(path, entity_type_id); + Ok(()) + } + } + } + + /// The bijective derived view: `entity_type → NiblePath` + /// (`niblepath_of(entity_type)` in the identity plan / `NodeGuid` docs). + pub fn niblepath_of(&self, entity_type_id: u16) -> Option { + self.inner + .read() + .unwrap() + .path_by_type + .get(&entity_type_id) + .copied() + } + + /// The bijective reverse: `NiblePath → entity_type`. + pub fn entity_type_of(&self, path: NiblePath) -> Option { + self.inner.read().unwrap().type_by_path.get(&path).copied() + } + /// Export the registry to an OGIT-shaped TTL fragment for the named /// namespace. Used by the Lance ↔ OGIT round-trip and for fork PRs /// that promote schema-scanner suggestions back into the canonical @@ -402,11 +487,7 @@ impl OntologyRegistry { /// no bundle is registered yet at `g` — register the bundle first /// (typically via a `hydrate_*` glue function) and then declare the /// edge whitelist. - pub fn register_edge_types( - &self, - g: u32, - edges: &[&str], - ) -> std::result::Result<(), String> { + pub fn register_edge_types(&self, g: u32, edges: &[&str]) -> std::result::Result<(), String> { let mut state = self.inner.write().unwrap(); let bundle = state .bundles @@ -473,7 +554,29 @@ impl RegistryState { }; let kind = proposal.schema_kind(); - let entity_type_id = (self.rows.len() + 1) as u16; + // DECISION-3 mint discipline (frugal north-star): the canonical class + // URI is the template identity — a URI already in the dictionary + // REUSES its entity_type (new row for another bridge / namespace, + // SAME template id; `(namespace, entity_type)` = (domain, shared + // shape)). Fresh mints stay global append-order and monotone: a + // deduped row still grows `rows`, so `rows.len()+1` strictly exceeds + // every previously minted id — never renumbered, never reused, gaps + // allowed (protobuf-field-number discipline). + let entity_type_id = match self + .by_uri + .get(proposal.ogit_uri.as_str()) + .map(|idx| self.rows[*idx as usize].schema_ptr.entity_type_id()) + { + Some(existing) => existing, + None => { + // Id 0 is the "unknown" sentinel; u16 is the ratified ClassId + // width (OD-CLASSID-WIDTH) — refuse to wrap, never alias. + if self.rows.len() >= u16::MAX as usize { + return AppendOutcome::Failed("entity_type overflow (u16)".to_string()); + } + (self.rows.len() + 1) as u16 + } + }; // Codex P1 fix (2026-05-07): the previous code constructed // SchemaPtr::new(...) with the default ontology_context_id = 0, // which left every registry-created row with ctx_id 0 — making @@ -670,4 +773,131 @@ mod tests { assert_eq!(reg.namespace_id("A").unwrap().raw(), 1); assert_eq!(reg.namespace_id("B").unwrap().raw(), 2); } + + // ── DECISION-3: frugal north-star mint (dedup + bijection) ────────────── + + fn proposal_from(uri: &str, bridge: &str, namespace: &str) -> MappingProposal { + let mut p = proposal(uri); + p.bridge_id = bridge.to_string(); + p.namespace = namespace.to_string(); + p.checksum = format!("checksum-{uri}-{bridge}"); + p + } + + #[test] + fn same_uri_across_bridges_and_namespaces_shares_one_template_id() { + // The north-star model in one test: three domains map the SAME + // canonical class → three rows, three namespaces, ONE entity_type. + // `(namespace, entity_type)` = (domain, shared template). + let reg = OntologyRegistry::new_in_memory(); + let a = reg.append_mapping(proposal("ogit.Person:Person")).unwrap(); + let b = reg + .append_mapping(proposal_from("ogit.Person:Person", "medcare", "Health")) + .unwrap(); + let c = reg + .append_mapping(proposal_from("ogit.Person:Person", "odoo", "Odoo")) + .unwrap(); + assert_eq!(reg.len(), 3, "one row per bridge"); + assert_eq!( + a.schema_ptr.entity_type_id(), + b.schema_ptr.entity_type_id(), + "same canonical class ⇒ same template id" + ); + assert_eq!(b.schema_ptr.entity_type_id(), c.schema_ptr.entity_type_id()); + assert_ne!( + a.schema_ptr.namespace_id(), + b.schema_ptr.namespace_id(), + "domains stay distinct on the namespace axis" + ); + assert_ne!(b.schema_ptr.namespace_id(), c.schema_ptr.namespace_id()); + assert_eq!( + reg.rows_with_entity_type(a.schema_ptr.entity_type_id()) + .len(), + 3 + ); + } + + #[test] + fn fresh_mint_is_monotone_with_gaps() { + let reg = OntologyRegistry::new_in_memory(); + let a = reg.append_mapping(proposal("ogit.A:X")).unwrap(); // mint: 1 + let _dup = reg + .append_mapping(proposal_from("ogit.A:X", "woa", "Woa")) + .unwrap(); // dedup: 1 (row 2) + let c = reg.append_mapping(proposal("ogit.B:Y")).unwrap(); // mint: 3 + assert_eq!(a.schema_ptr.entity_type_id(), 1); + assert_eq!( + c.schema_ptr.entity_type_id(), + 3, + "gap at 2: ids are monotone and never reused, gaps allowed" + ); + } + + #[test] + fn changed_checksum_reappend_keeps_the_template_id() { + // Re-proposing the same class with new content adds a row but NEVER + // re-mints the class id (pre-dedup this minted a fresh id and + // orphaned the old one — the anti-frugal latent bug). + let reg = OntologyRegistry::new_in_memory(); + let a = reg.append_mapping(proposal("ogit.Net:Ip")).unwrap(); + let mut p2 = proposal("ogit.Net:Ip"); + p2.checksum = "checksum-v2".to_string(); + let b = reg.append_mapping(p2).unwrap(); + assert_eq!(reg.len(), 2); + assert_eq!(a.schema_ptr.entity_type_id(), b.schema_ptr.entity_type_id()); + } + + #[test] + fn class_path_bijection_round_trips_both_ways() { + let reg = OntologyRegistry::new_in_memory(); + let a = reg.append_mapping(proposal("ogit.A:X")).unwrap(); + let b = reg.append_mapping(proposal("ogit.B:Y")).unwrap(); + let (ta, tb) = (a.schema_ptr.entity_type_id(), b.schema_ptr.entity_type_id()); + let pa = NiblePath::root(0x0).child(0x3); + let pb = NiblePath::root(0x1); + reg.register_class_path(ta, pa).unwrap(); + reg.register_class_path(tb, pb).unwrap(); + for (t, p) in [(ta, pa), (tb, pb)] { + assert_eq!(reg.niblepath_of(t), Some(p)); + assert_eq!(reg.entity_type_of(p), Some(t)); + assert_eq!( + reg.entity_type_of(reg.niblepath_of(t).unwrap()), + Some(t), + "entity_type_of ∘ niblepath_of = identity (Eineindeutigkeit)" + ); + } + reg.register_class_path(ta, pa) + .expect("re-registering the same pair is idempotent"); + } + + #[test] + fn class_path_bijection_conflicts_are_rejected() { + let reg = OntologyRegistry::new_in_memory(); + let a = reg.append_mapping(proposal("ogit.A:X")).unwrap(); + let b = reg.append_mapping(proposal("ogit.B:Y")).unwrap(); + let (ta, tb) = (a.schema_ptr.entity_type_id(), b.schema_ptr.entity_type_id()); + let pa = NiblePath::root(0x0).child(0x3); + reg.register_class_path(ta, pa).unwrap(); + assert!( + reg.register_class_path(ta, NiblePath::root(0x2)).is_err(), + "one type, one path" + ); + assert!( + reg.register_class_path(tb, pa).is_err(), + "one path, one type" + ); + assert!( + reg.register_class_path(999, NiblePath::root(0x4)).is_err(), + "unminted type cannot pair" + ); + assert!( + reg.register_class_path(tb, NiblePath::EMPTY).is_err(), + "EMPTY is the no-route sentinel" + ); + assert_eq!( + reg.niblepath_of(tb), + None, + "failed registrations leave no residue" + ); + } }