|
| 1 | +# Identity Architecture — What Exists vs What Needs Building (v1) |
| 2 | + |
| 3 | +> **Status:** INTEGRATION MAP + PLAN. Grounded by first-hand reads + two parallel |
| 4 | +> cross-repo sweeps (2026-06-09). Companion to |
| 5 | +> `cognitive-write-roundtrip-substrate-v1.md` (the round-trip mechanism). |
| 6 | +> **Branch:** `claude/nice-edison-g4rhhl`. |
| 7 | +
|
| 8 | +## Thesis |
| 9 | + |
| 10 | +The hot path should carry a lean **128-bit structured immutable identity** (a |
| 11 | +UUIDv8 = the HHTL nibble-address *formalized + namespaced*); heavy content stays |
| 12 | +in consumer stores keyed by it. The identity does five jobs as register reads of |
| 13 | +one object: **resolve** (class-from-address), **route** (delegate switch), |
| 14 | +**witness** (immutable audit + merkle), **ground-truth** (shape_hash drift), and |
| 15 | +**dispatch-to-store** (EntityKey → consumer). This doc maps what already exists |
| 16 | +against what must be built, and phases the integration. |
| 17 | + |
| 18 | +## Four headline findings (grounded) |
| 19 | + |
| 20 | +1. **The 128-bit identity space is empty** — no committed `u128`/`Uuid`(binary)/ |
| 21 | + `[u8;16]`-as-id exists (the single `[u8;16]`, `atoms.rs:74 I4x32`, is a |
| 22 | + thinking-style vector, doc-confirmed *not* an identity). A new GUID won't |
| 23 | + byte-collide. *(Agent A sweep, lance-graph + ndarray.)* |
| 24 | + |
| 25 | +2. **But every GUID FIELD already exists as a committed scalar** → the iron |
| 26 | + mandate is **compose existing fields, do NOT re-invent**: `namespace` = |
| 27 | + `NamespaceId(u8)` inside `SchemaPtr.packed:u32 = [ns:8|entity_type:16|kind:8]`; |
| 28 | + `class/address` = `NiblePath` + `ClassId(u16)` + `EdgeRef{family:u8,local:u16}`; |
| 29 | + `shape_hash` = `StructuralSignature`; `local` = `EdgeRef.local`. A parallel |
| 30 | + re-pack duplicates ratified discriminators (`OD-CLASSID-WIDTH`, |
| 31 | + `I-VSA-IDENTITIES`). *(Agent A finding #2.)* |
| 32 | + |
| 33 | +3. **The cross-store transport is ALREADY solved** — `EntityKey<'a>(pub &'a [u8])` |
| 34 | + (repository.rs:12) is an opaque length-agnostic key both consumer repos use; |
| 35 | + `smb-bridge::key_to_filter` already branches on length (12→ObjectId, else→ |
| 36 | + `Bson::Binary`) on Mongo *and* Lance. A 16-byte GUID is "just another length" |
| 37 | + the tested plumbing handles. *(Agent B sweep.)* |
| 38 | + |
| 39 | +4. **The cold path has NO stable structured identity today** — it keys nodes by |
| 40 | + bare `node_id:u32` (no edge id; `String` label + `HashMap<String,String>` |
| 41 | + props), the SPO hot path keys by a `u64` *content* `dn_hash` (not stable), |
| 42 | + `CogRecord` carries no id ("id is the external dn_hash"), and durable identity |
| 43 | + is ad-hoc `Uuid→String` (learning crate) + `OgitUri(String)`. **The structured |
| 44 | + identity fills a real gap** — provided it *subsumes* `SchemaPtr` + `EdgeRef`, |
| 45 | + never parallels them. *(Agent A finding #3.)* |
| 46 | + |
| 47 | +## WHAT EXISTS — grounded inventory (6 layers, file:line) |
| 48 | + |
| 49 | +### Layer 0 — address / discriminator scalars (the GUID's fields) |
| 50 | +| Type | Width | Role | Status | Evidence | |
| 51 | +|---|---|---|---|---| |
| 52 | +| `NiblePath{path:u64,depth:u8}` | 72 | HHTL tree address (basin/child/is_ancestor_of, 16ⁿ) | **[G]** | hhtl.rs | |
| 53 | +| `SchemaPtr{packed:u32=[ns:8\|entity_type:16\|kind:8], ctx:u32}` | 64 | schema/type pointer | **[G]** | namespace.rs:119 | |
| 54 | +| `NamespaceId(u8)` | 8 | OGIT namespace ordinal | **[G]** | namespace.rs:24 | |
| 55 | +| `ClassId = u16` | 16 | per-row shape discriminator ("never a content hash") | **[G]** | class_view.rs:53 | |
| 56 | +| `EntityTypeId = u16` | 16 | per-row object-type (Palantir) | **[G]** | ontology.rs:81 | |
| 57 | +| `FieldMask(u64)` + `inherit` | 64 | presence bitmask, parent-OR-delta | **[G]** | class_view.rs:69,136 | |
| 58 | +| `StructuralSignature` (shape_hash) | hash | "deterministic hash over property-id set" | **[G] type / [H] live-wire** | odoo_blueprint::class_signature | |
| 59 | +| `EdgeRef{family:u8,local:u16}` | 24 | episodic HHTL family+local address | **[G]** | episodic_edges.rs:34 | |
| 60 | + |
| 61 | +### Layer 1 — edge / handoff carriers (the LE "sound members") |
| 62 | +| Type | Width | Role | Status | |
| 63 | +|---|---|---|---| |
| 64 | +| `EpisodicEdges64(u64)` = 4×EdgeRef, MRU promote/evict, `to_le_bytes` | 64 | AriGraph episodic edges | **[G]** episodic_edges.rs | |
| 65 | +| `CausalEdge64(u64)` (NARS 10+10 ×1023) | 64 | baton/causal edge payload | **[G]** ndarray causal_diff.rs:153 | |
| 66 | +| Baton `(target:u16, edge:u64)` | 80 | inter-mailbox handoff | **[G]** collapse_gate.rs:235 | |
| 67 | +| `MailboxId=u32`, `MailboxRow{mailbox_ref:u32,row_idx:u32}` | 32/64 | mailbox + row address | **[G]** | |
| 68 | + |
| 69 | +### Layer 2 — cold-path stores (TODAY: thin + inconsistent) |
| 70 | +| Store | Key | Status | |
| 71 | +|---|---|---| |
| 72 | +| `MetadataStore`: `NodeRecord{node_id:u32, label:String, properties:HashMap<String,String>}`, `EdgeRecord{source:u32,target:u32,edge_type:String}` | u32 + **STRING label/props (legacy Cypher)** | **[G]** metadata.rs:60,86 | |
| 73 | +| `SpoStore`: `HashMap<u64 dn_hash, SpoRecord>` | u64 **content-hash** (not stable id) | **[G]** spo/store.rs:38 | |
| 74 | +| ndarray `CogRecord{meta,cam,btree,embed}` | **no id** ("id is external dn_hash") | **[G]** cogrecord.rs:56 | |
| 75 | +| `WitnessId(u64)` (arigraph witness) | 64 opaque handle | **[G]** witness_corpus.rs:63 | |
| 76 | + |
| 77 | +### Layer 3 — resolution (class-from-address) |
| 78 | +| Surface | Status | |
| 79 | +|---|---| |
| 80 | +| `RegistryClassView: ClassView` (fields/template/dolce_category_id) | **[G] resolve / [H] field-enum deferred** class_resolver.rs | |
| 81 | +| `OntologyRegistry`: `resolve_uri`, `enumerate_first_with_entity_type_id(u16)`, `resolve_iri_in` | **[G]** registry.rs | |
| 82 | + |
| 83 | +### Layer 4 — commit + witness (the membrane) |
| 84 | +| Surface | Status | |
| 85 | +|---|---| |
| 86 | +| `SoaEnvelope` trait + `ColumnDescriptor` (container-LE geometry) | **[G] trait / [H] ZERO impls** soa_envelope.rs | |
| 87 | +| `MailboxSoaView`/`MailboxSoaOwner` (read airgap + Rubicon `try_advance_phase`) | **[G]** soa_view.rs | |
| 88 | +| `commit_event` sole-writer + `ExternalMembrane::project` + `CommitFilter`/`MembraneGate` | **[G]** lance_membrane.rs:315 | |
| 89 | +| `CognitiveEventRow` (scalar audit event — VSA stripped) | **[G]** external_intent.rs:113 | |
| 90 | +| `MerkleRoot(u64)` ×3 (audit/SPO/unified) + `AuditSink` (jsonl/lance) | **[G]** audit_sink/, merkle.rs | |
| 91 | +| `SlaPolicy`, `TenantScope` | **[G] types** sla.rs | |
| 92 | + |
| 93 | +### Layer 5 — cross-store transport (the consumer boundary) |
| 94 | +| Surface | Status | |
| 95 | +|---|---| |
| 96 | +| `EntityKey<'a>(pub &'a [u8])` — opaque length-agnostic key | **[G]** repository.rs:12 | |
| 97 | +| `EntityStore`/`EntityWriter`/`Batch` traits | **[G]** repository.rs | |
| 98 | +| `smb-bridge`: implements both for Mongo+Lance, `key_to_filter` length-branch | **[G]** smb-bridge/mongo.rs:79, lance.rs:92 | |
| 99 | +| MedCare-rs: MySQL i64 PKs; DMS `sha256`(NOT NULL)+`storage_key`; imports EntityKey | **[G]** dms.rs:14, graph_contract.rs:31 | |
| 100 | +| smb-office-rs: Mongo `ObjectId`(12B) + `String` refs; actively impls repository | **[G]** base.rs:92 | |
| 101 | + |
| 102 | +### Layer 6 — round-trip / substrate-hardening |
| 103 | +| Surface | Status | |
| 104 | +|---|---| |
| 105 | +| `TripletProjection` trait + `roundtrip_eq` → `RoundTripFailure` | **[G]** codegen_spine.rs:107 | |
| 106 | +| cognitive-write projection (mailbox SoA → SPO+edges) | **[H] does not exist** | |
| 107 | + |
| 108 | +## WHAT NEEDS BUILDING — 7 gaps (each: what it REUSES [G] + what it ADDS [H]) |
| 109 | + |
| 110 | +| # | Gap | Reuses (exists [G]) | Adds [H] | Blocked? | |
| 111 | +|---|---|---|---|---| |
| 112 | +| **N1** | **`NodeGuid`/`EdgeGuid`** 128-bit identity type | `SchemaPtr` ⊕ `NiblePath` ⊕ `StructuralSignature` ⊕ `EdgeRef.local` | the UUIDv8 composition + layout version + the 5 readings | no | |
| 113 | +| **N2** | wire `StructuralSignature` into live `RegistryClassView` | `StructuralSignature` type, `ClassView` | the field-enum from `MappingRow` (the deferred D-CLS audit) | no | |
| 114 | +| **N3** | `SoaEnvelope` **implementor** for `MailboxSoA<N>` | `SoaEnvelope` trait, `MailboxSoaView` | the zero-copy impl (mailbox bytes == cold bytes) | no | |
| 115 | +| **N4** | cognitive-write `TripletProjection` + `roundtrip_eq` | `TripletProjection`, `EpisodicEdges64`/`CausalEdge64` `to_le_bytes` | the project/decompile over the identity graph | no | |
| 116 | +| **N5** | `project_graph` emitter through the gate | `commit_event`, `CommitFilter`/`MembraneGate`, `ExternalMembrane` | the node/edge projection (today emits scalar `CognitiveEventRow`) | no | |
| 117 | +| **N6** | **`MetadataStore` string→identity migration** | `MetadataStore`, `EntityKey` | `NodeRecord`/`EdgeRecord` keyed by `NodeGuid` not `String` label/props | no (I-LEGACY-API gated) | |
| 118 | +| **N7** | GUID-as-`EntityKey` wiring + MedCare `external_ref` | `EntityKey`, `EntityStore`/`EntityWriter`, smb `key_to_filter` | pass 16-byte key + **one** MedCare column (or reuse `sha256`) | no | |
| 119 | +| **N8** | surreal_container SurrealQL read glove | `surreal_container` skeleton | the kv-lance read path | **BLOCKED(C)** fork coords | |
| 120 | + |
| 121 | +**Only N8 is blocked.** N1-N7 need no surrealdb coords. |
| 122 | + |
| 123 | +## N1 — the identity type as a COMPOSITION (the iron mandate from Agent A #2) |
| 124 | + |
| 125 | +```rust |
| 126 | +// crates/lance-graph-contract/src/identity.rs (NEW, zero-dep) |
| 127 | +// EVERY field is an existing committed type. No re-invention. |
| 128 | + |
| 129 | +/// 128-bit immutable structured node identity (UUIDv8, RFC 9562). |
| 130 | +/// Frozen at write; the class is RE-RESOLVED from the address (never stored mutable). |
| 131 | +#[repr(C, align(16))] |
| 132 | +pub struct NodeGuid([u8; 16]); |
| 133 | +// bits 0..32 : SchemaPtr.packed [ns:8 | entity_type:16 | kind:8] ← REUSE namespace.rs:119 |
| 134 | +// bits 32..74 : NiblePath prefix (path bits + small depth; ver nibble carved at 48..52) |
| 135 | +// bits 74..98 : StructuralSignature (shape_hash, truncated) ← REUSE odoo_blueprint |
| 136 | +// bits 98..122 : local instance (EdgeRef.local widened) ← REUSE episodic_edges |
| 137 | +// bits 48..52 : version = 8 · bits 64..66 : variant = 10 ← RFC 9562 reserved (6 b) |
| 138 | + |
| 139 | +/// 128-bit edge identity: source address ⊕ the episodic EdgeRef. |
| 140 | +#[repr(C, align(16))] |
| 141 | +pub struct EdgeGuid([u8; 16]); |
| 142 | +// = [ source SchemaPtr/NiblePath | EdgeRef{family:u8, local:u16} | shape_hash ] ← REUSE EpisodicEdges64 |
| 143 | +``` |
| 144 | + |
| 145 | +**The five readings (register reads of one key):** |
| 146 | +- **resolve** `guid.schema_ptr() → entity_type → ClassView` (class-from-address, O(1) bit-shift + cache) |
| 147 | +- **route** `guid.niblepath().is_ancestor_of(...)` → delegate switch (HHTL bit-shift, through `OrchestrationBridge`) |
| 148 | +- **witness** frozen `[u8;16]` + `MerkleRoot` chain (immutable, examined-in-place) |
| 149 | +- **ground-truth** `guid.shape_hash() != resolve(addr).shape_hash_now` → drift (read-time diff) |
| 150 | +- **dispatch-to-store** `EntityKey(guid.as_bytes())` → consumer (Layer-5 transport, already [G]) |
| 151 | + |
| 152 | +**Immutability law (ratified this session):** `class_id` never updates — it's the |
| 153 | +lineage id, re-resolved from the address for free; the GUID is write-once; drift |
| 154 | +*repair* is a **new immutable version** (Lance is versioned), never an in-place |
| 155 | +mutation. `I-VSA-IDENTITIES` Test 0: the GUID is a register key (points to |
| 156 | +content), never VSA-bundled. |
| 157 | + |
| 158 | +### ⚠ One open DECISION (yours to pin — both grounded, bijective) |
| 159 | +The class can be carried two ways; pick the **stored** form, resolve the other: |
| 160 | +- **(D1) `SchemaPtr.entity_type:u16`** — reuse the existing dense pointer (Agent A "compose existing"). Compact, exact. |
| 161 | +- **(D2) `NiblePath` prefix** — identity-IS-address (ADR-1374, your "nibble = the GUID class"). O(1) ancestry-routing without a cache hit. |
| 162 | +- **Recommendation:** store **SchemaPtr (exact) + a truncated NiblePath prefix (for routing)** — SchemaPtr resolves deep paths exactly; the prefix gives branchless `is_ancestor_of`. Costs ~42 bits for the prefix; worth it for probe-free routing. |
| 163 | + |
| 164 | +## Phased integration plan (A→H; each phase = one landable PR) |
| 165 | + |
| 166 | +| Phase | Gap | Crate | Deliverable | DoD | Dep | |
| 167 | +|---|---|---|---|---|---| |
| 168 | +| **A** | N1 | contract | `NodeGuid`/`EdgeGuid` as composition of existing fields + layout version | byte-decompose round-trips to `SchemaPtr`/`NiblePath`/`StructuralSignature`/`local`; UUIDv8 validates; zero-dep; clippy/fmt | — | |
| 169 | +| **B** | N2 | ontology | wire `StructuralSignature` → `RegistryClassView` (enumerate field-set from `MappingRow`) | `shape_hash(class_id)` returns a stable signature; the deferred D-CLS field-enum closed | A | |
| 170 | +| **C** | N3 | shader-driver | `impl SoaEnvelope for MailboxSoA<N>` (zero-copy) | `as_le_bytes().as_ptr()==backing`; `verify_layout()` green | — | |
| 171 | +| **D** | N4 | lance-graph | cognitive-write `TripletProjection` + `roundtrip_eq` over the identity graph | passes the `account.move` fixture; corrupt-pack fails; (f,c) within 1/1023 | A, C | |
| 172 | +| **E** | N5 | callcenter | `project_graph` (node/edge emitter) through `commit_event`+gate | committed cycle queryable as `NodeGuid` nodes + `EdgeGuid` edges; version ticks; RBAC applies | A, D | |
| 173 | +| **F** | N6 | lance-graph core | `MetadataStore` string→identity: `NodeRecord`/`EdgeRecord` keyed by `NodeGuid` (label/props → resolved-from-identity) | old string path feature-gated/migrated; field-isolation tests (I-LEGACY-API); query parity | A, B, E | |
| 174 | +| **G** | N7 | consumers | GUID-as-`EntityKey`(16B) + MedCare `external_ref` (or `sha256` reuse) | smb: 16-byte key resolves via existing `key_to_filter`; MedCare: GUID→row reverse lookup | A | |
| 175 | +| **H** | N8 | surreal_container | SurrealQL read glove | DEFERRED — **BLOCKED(C)** fork coords | E | |
| 176 | + |
| 177 | +**Critical path:** A → (B, C) → D → E → F. G hangs off A (parallel). H is gated. |
| 178 | +**Smallest unblocked first brick:** Phase A (the `NodeGuid` composition, zero-dep contract) OR Phase C (the `SoaEnvelope` impl) — both leaf, both needed by D. |
| 179 | + |
| 180 | +## Honest ledger |
| 181 | + |
| 182 | +- **[G] (exists, reuse):** all 6 layers above — `NiblePath`, `SchemaPtr`, `ClassId`, |
| 183 | + `StructuralSignature` (type), `EdgeRef`, `EpisodicEdges64`/`CausalEdge64` LE, |
| 184 | + `commit_event`+gate, `MerkleRoot`+`AuditSink`, `SlaPolicy`/`TenantScope`, |
| 185 | + `EntityKey`+`EntityStore`/`EntityWriter`, `TripletProjection`. **The substrate is |
| 186 | + ~80% present.** |
| 187 | +- **[H] (build):** N1-N7 — but each is a *composition/wiring* of [G] parts, not a |
| 188 | + green-field invention. The largest is N6 (cold-path string→identity migration). |
| 189 | +- **[BLOCKED(C)]:** N8 only (surrealdb fork coords — human gate; lance-graph P0 |
| 190 | + "STOP and ask"). |
| 191 | +- **One open [DECISION]:** D1 vs D2 (SchemaPtr-entity_type vs NiblePath-prefix as |
| 192 | + the class carrier) — recommendation: both (exact + routing prefix). |
| 193 | + |
| 194 | +## Guards (iron rules this plan must not violate) |
| 195 | + |
| 196 | +- **I-VSA-IDENTITIES:** the GUID is a register key that POINTS TO content; never |
| 197 | + VSA-bundle it, never intern open content (only the closed vocabulary). Identities |
| 198 | + intern; scanned papers / free text stay in consumer stores (Layer 5). |
| 199 | +- **Compose, don't parallel (Agent A #2):** N1 MUST subsume `SchemaPtr` + |
| 200 | + `EdgeRef`, not re-pack ns/class/family beside them. |
| 201 | +- **I-LEGACY-API-FEATURE-GATED:** N6's string→identity layout reclaim needs a |
| 202 | + version gate + field-isolation matrix tests. |
| 203 | +- **Sole-writer / no-&mut-during-compute:** N5 reads SoA (`&self`), builds owned |
| 204 | + identity rows, `commit_event` is the gated write-back; drift *repair* is a new |
| 205 | + version, never in-place mutation (the immutability law). |
| 206 | +- **AGI-as-SoA:** the GUID is per-NODE at the membrane, NOT a 16-byte-per-row SoA |
| 207 | + column (the hot SoA keeps its lean `u16 class_id`). |
| 208 | + |
| 209 | +## Provenance |
| 210 | + |
| 211 | +First-hand reads (2026-06-09): hhtl.rs · soa_envelope.rs · soa_view.rs · |
| 212 | +class_resolver.rs · class_view.rs · episodic_edges.rs · metadata.rs:60-94 · |
| 213 | +registry.rs · namespace.rs · wikidata_hhtl.rs · lance_membrane.rs:315-429 · |
| 214 | +external_intent.rs:113 · sla.rs · codegen_spine.rs · atoms.rs:74 · audit_sink/. |
| 215 | +Cross-repo sweeps: Agent A (lance-graph + ndarray identity-type inventory) · |
| 216 | +Agent B (MedCare-rs + smb-office-rs store keys — `EntityKey`, MySQL i64 / Mongo |
| 217 | +ObjectId, DMS `sha256`/`storage_key`). Companion: |
| 218 | +`cognitive-write-roundtrip-substrate-v1.md`. |
0 commit comments