|
| 1 | +# Metadata Schema Reference |
| 2 | + |
| 3 | +> **Generated**: 2026-03-17 |
| 4 | +> **Source of truth**: `src/container/meta.rs`, `src/width_16k/schema.rs` |
| 5 | +> **Schema version**: 1 (`meta.rs:93`) |
| 6 | +
|
| 7 | +--- |
| 8 | + |
| 9 | +## Overview |
| 10 | + |
| 11 | +Ladybug-rs uses a two-tier metadata system: |
| 12 | + |
| 13 | +1. **MetaView** (128 words = 1,024 bytes) — full-resolution canonical metadata in Container 0 |
| 14 | +2. **SchemaSidecar** (32 words = 256 bytes) — compact summary in words 224-255 |
| 15 | + |
| 16 | +Container 0 is **never searched** by Hamming distance. It holds structural information only. |
| 17 | +Content containers (1+) hold fingerprints for SIMD operations. |
| 18 | + |
| 19 | +--- |
| 20 | + |
| 21 | +## MetaView — Container 0 Layout (W0-W127) |
| 22 | + |
| 23 | +Source: `src/container/meta.rs:1-93` |
| 24 | + |
| 25 | +```text |
| 26 | +Word(s) Offset Content |
| 27 | +───────── ─────── ───────────────────────────────────────────────── |
| 28 | +W0 0 PackedDn address (THE identity, u64) |
| 29 | +W1 8 node_kind:u8 | container_count:u8 | geometry:u8 |
| 30 | + | flags:u8 | schema_version:u16 | provenance_hash:u16 |
| 31 | +W2 16 Timestamps: created_ms:u32 | modified_ms:u32 |
| 32 | +W3 24 label_hash:u32 | tree_depth:u8 | branch:u8 | reserved:u16 |
| 33 | +W4-7 32 NARS: freq:f32 | conf:f32 | pos_evidence:f32 | neg_evidence:f32 |
| 34 | +W8-11 64 DN rung + 7-Layer compact + collapse gate |
| 35 | +W12-15 96 7-Layer markers (5 bytes x 7 = 35 bytes) |
| 36 | +W16-31 128 Inline edges (64 packed, 4 per word) |
| 37 | +W32-39 256 RL / Q-values / rewards |
| 38 | +W40-47 320 Bloom filter (512 bits) |
| 39 | +W48-55 384 Graph metrics (full precision f64) |
| 40 | +W56-63 448 Qualia (18 channels x f16 + 8 slots) |
| 41 | +W64-79 512 Rung history + collapse gate history |
| 42 | +W80-95 640 Representation language descriptor |
| 43 | +W96-111 768 DN-Sparse adjacency (compact inline CSR) |
| 44 | +W112-125 896 Reserved |
| 45 | +W126-127 1008 Checksum (CRC32:u32 | parity:u32) + schema version |
| 46 | +``` |
| 47 | + |
| 48 | +### Word Constants (`meta.rs:35-93`) |
| 49 | + |
| 50 | +| Constant | Value | Purpose | |
| 51 | +|-------------------|-------|-------------------------------------------| |
| 52 | +| `W_DN_ADDR` | 0 | PackedDn address | |
| 53 | +| `W_TYPE` | 1 | Record type + geometry | |
| 54 | +| `W_TIME` | 2 | Timestamps | |
| 55 | +| `W_LABEL` | 3 | Label hash + tree metadata | |
| 56 | +| `W_NARS_BASE` | 4 | NARS truth values (4 words) | |
| 57 | +| `W_DN_RUNG` | 8 | DN rung + 7-layer compact | |
| 58 | +| `W_LAYER_BASE` | 12 | 7-layer markers | |
| 59 | +| `W_EDGE_BASE` | 16 | Inline edges start | |
| 60 | +| `W_EDGE_END` | 31 | Inline edges end | |
| 61 | +| `W_RL_BASE` | 32 | Reinforcement learning data | |
| 62 | +| `W_BLOOM_BASE` | 40 | Bloom filter (512 bits) | |
| 63 | +| `W_GRAPH_BASE` | 48 | Graph metrics | |
| 64 | +| `W_QUALIA_BASE` | 56 | Qualia channels | |
| 65 | +| `W_RUNG_HIST` | 64 | Rung + collapse gate history | |
| 66 | +| `W_REPR_BASE` | 80 | Representation language descriptor | |
| 67 | +| `W_ADJ_BASE` | 96 | DN-Sparse adjacency (inline CSR) | |
| 68 | +| `W_RESERVED` | 112 | Reserved | |
| 69 | +| `W_CHECKSUM` | 126 | Checksum + version | |
| 70 | +| `MAX_INLINE_EDGES` | 64 | Maximum edges stored inline | |
| 71 | +| `SCHEMA_VERSION` | 1 | Current schema version | |
| 72 | + |
| 73 | +### MetaView API (`meta.rs:99+`) |
| 74 | + |
| 75 | +```rust |
| 76 | +pub struct MetaView<'a> { /* zero-copy borrow of &'a [u64; 128] */ } |
| 77 | +pub struct MetaViewMut<'a> { /* mutable borrow */ } |
| 78 | +``` |
| 79 | + |
| 80 | +Both provide zero-copy access to the word layout above. No allocation on read. |
| 81 | + |
| 82 | +--- |
| 83 | + |
| 84 | +## SchemaSidecar — Compact Summary (W224-W255) |
| 85 | + |
| 86 | +Source: `src/width_16k/schema.rs:1-299` |
| 87 | + |
| 88 | +The SchemaSidecar packs identity, reasoning, learning, and topology into the |
| 89 | +upper 32 words of the metadata container. It is a compressed snapshot useful |
| 90 | +for quick deserialization without parsing the full MetaView. |
| 91 | + |
| 92 | +### Block 14: Identity + Reasoning + Learning (W224-W239) |
| 93 | + |
| 94 | +```text |
| 95 | +[224] depth:u8 | rung:u8 | qidx:u16 | access_count:u32 |
| 96 | +[225] ttl:u16 | sigma_q:u16 | node_type:u32 |
| 97 | +[226] label_hash:u64 |
| 98 | +[227] edge_type:u32 | version:u8 | reserved:u24 |
| 99 | +[228-229] ANI levels: 8 x u16 = 128 bits |
| 100 | +[230] NARS truth:u32 | budget_lo:u32 (priority + durability) |
| 101 | +[231] budget_hi:u32 (quality + reserved) | reserved:u32 |
| 102 | +[232-233] Q-values: 16 x i8 = 128 bits |
| 103 | +[234-235] Rewards: 8 x i16 = 128 bits |
| 104 | +[236-237] STDP: 8 x u16 = 128 bits |
| 105 | +[238-239] Hebbian: 8 x u16 = 128 bits |
| 106 | +``` |
| 107 | + |
| 108 | +### Block 15: Graph Topology + Edges (W240-W255) |
| 109 | + |
| 110 | +```text |
| 111 | +[240-243] DN address: 32 x u8 = 256 bits |
| 112 | +[244-247] Neighbor bloom: 4 x u64 = 256 bits (3-hash bloom filter) |
| 113 | +[248] Graph metrics: packed u64 |
| 114 | +[249-255] Inline edges: 7 words = up to 28 edges at 16 bits each |
| 115 | +``` |
| 116 | + |
| 117 | +### Key Structs |
| 118 | + |
| 119 | +#### NodeIdentity (`schema.rs:46-65`) |
| 120 | + |
| 121 | +```rust |
| 122 | +pub struct NodeIdentity { |
| 123 | + pub depth: u8, // Tree depth (0 = root) |
| 124 | + pub rung: u8, // Pearl's causal rung: 0=SEE, 1=DO, 2=IMAGINE |
| 125 | + pub qidx: u16, // Quantization index (codebook entry) |
| 126 | + pub access_count: u32, // LRU/frequency tracking |
| 127 | + pub ttl: u16, // Time-to-live in ticks (0 = permanent) |
| 128 | + pub sigma_q: u16, // Uncertainty: sigma * 1000 as u16 |
| 129 | + pub node_type: NodeTypeMarker, |
| 130 | + pub label_hash: u64, |
| 131 | + pub edge_type: EdgeTypeMarker, |
| 132 | +} |
| 133 | +``` |
| 134 | + |
| 135 | +#### AniLevels (`schema.rs:78-89`) — 8 cognitive reasoning levels |
| 136 | + |
| 137 | +```rust |
| 138 | +pub struct AniLevels { |
| 139 | + pub reactive: u16, // Layer 0: stimulus-response |
| 140 | + pub memory: u16, // Layer 1: episodic recall |
| 141 | + pub analogy: u16, // Layer 2: structural mapping |
| 142 | + pub planning: u16, // Layer 3: multi-step lookahead |
| 143 | + pub meta: u16, // Layer 4: self-reflection |
| 144 | + pub social: u16, // Layer 5: theory of mind |
| 145 | + pub creative: u16, // Layer 6: generative novelty |
| 146 | + pub abstract: u16, // Layer 7: formal reasoning |
| 147 | +} |
| 148 | +``` |
| 149 | + |
| 150 | +Packed as `u128` (16 bits each). `dominant()` returns the index of the highest level. |
| 151 | + |
| 152 | +#### NarsTruth (`schema.rs:137-185`) — NARS truth value |
| 153 | + |
| 154 | +```rust |
| 155 | +pub struct NarsTruth { |
| 156 | + pub frequency: u16, // Quantized 0.0-1.0 as 0-65535 |
| 157 | + pub confidence: u16, // Quantized 0.0-0.9999 as 0-65535 |
| 158 | +} |
| 159 | +``` |
| 160 | + |
| 161 | +Methods: `from_floats()`, `f()`, `c()`, `revision()`, `deduction()`, `pack()/unpack()`. |
| 162 | +Packed as `u32` (frequency in low 16 bits, confidence in high 16 bits). |
| 163 | + |
| 164 | +#### NarsBudget (`schema.rs:188-222`) — NARS resource allocation |
| 165 | + |
| 166 | +```rust |
| 167 | +pub struct NarsBudget { |
| 168 | + pub priority: u16, |
| 169 | + pub durability: u16, |
| 170 | + pub quality: u16, |
| 171 | + pub _reserved: u16, |
| 172 | +} |
| 173 | +``` |
| 174 | + |
| 175 | +Packed as `u64`. |
| 176 | + |
| 177 | +#### EdgeTypeMarker (`schema.rs:225-258`) |
| 178 | + |
| 179 | +```rust |
| 180 | +pub struct EdgeTypeMarker { |
| 181 | + pub verb_id: u8, // Cognitive verb identifier |
| 182 | + pub direction: u8, // Edge direction |
| 183 | + pub weight: u8, // Edge weight |
| 184 | + pub flags: u8, // Bit 0: temporal, Bit 1: causal, Bit 2: hierarchical |
| 185 | +} |
| 186 | +``` |
| 187 | + |
| 188 | +#### NodeTypeMarker (`schema.rs:280-298`) |
| 189 | + |
| 190 | +```rust |
| 191 | +pub struct NodeTypeMarker { |
| 192 | + pub kind: u8, // See NodeKind enum |
| 193 | + pub subtype: u8, |
| 194 | + pub provenance: u16, |
| 195 | +} |
| 196 | +``` |
| 197 | + |
| 198 | +#### NodeKind (`schema.rs:262-276`) |
| 199 | + |
| 200 | +```rust |
| 201 | +pub enum NodeKind { |
| 202 | + Entity = 0, |
| 203 | + Concept = 1, |
| 204 | + Event = 2, |
| 205 | + Rule = 3, |
| 206 | + Goal = 4, |
| 207 | + Query = 5, |
| 208 | + Hypothesis = 6, |
| 209 | + Observation = 7, |
| 210 | +} |
| 211 | +``` |
| 212 | + |
| 213 | +--- |
| 214 | + |
| 215 | +## Auxiliary Metadata Types |
| 216 | + |
| 217 | +### EnvelopeMetadata (`src/contract/types.rs`) |
| 218 | + |
| 219 | +Wire-format metadata for cross-runtime data envelopes: |
| 220 | + |
| 221 | +```rust |
| 222 | +pub struct EnvelopeMetadata { |
| 223 | + pub agent_id: Option<String>, |
| 224 | + pub confidence: Option<f64>, |
| 225 | + pub epoch: Option<i64>, |
| 226 | + pub version: Option<String>, |
| 227 | +} |
| 228 | +``` |
| 229 | + |
| 230 | +Part of the `DataEnvelope` struct shared across ada-n8n, crewai-rust, and ladybug-rs. |
| 231 | + |
| 232 | +### DocumentMeta (`src/storage/corpus.rs`) |
| 233 | + |
| 234 | +Metadata for scent-indexed training corpora. Arrow columns: |
| 235 | +- `chunk_id: u64` |
| 236 | +- `doc_id: string` |
| 237 | +- `text: string` |
| 238 | +- `fingerprint: binary[48]` (384-bit scent) |
| 239 | +- `position: u32` |
| 240 | +- `metadata: json` |
| 241 | + |
| 242 | +### Unified Execution Contract (`src/contract/types.rs`) |
| 243 | + |
| 244 | +```rust |
| 245 | +pub struct UnifiedStep { |
| 246 | + pub step_id: String, |
| 247 | + pub execution_id: String, |
| 248 | + pub step_type: String, // "n8n.*" | "crew.*" | "lb.*" | "core.*" |
| 249 | + pub runtime: String, |
| 250 | + pub name: String, |
| 251 | + pub status: StepStatus, // Pending | Running | Completed | Failed | Skipped |
| 252 | + pub input: Value, |
| 253 | + pub output: Value, |
| 254 | + pub error: Option<String>, |
| 255 | + pub started_at: DateTime<Utc>, |
| 256 | + pub finished_at: Option<DateTime<Utc>>, |
| 257 | + pub sequence: i32, |
| 258 | + pub reasoning: Option<String>, |
| 259 | + pub confidence: Option<f64>, |
| 260 | + pub alternatives: Option<Value>, |
| 261 | +} |
| 262 | +``` |
| 263 | + |
| 264 | +--- |
| 265 | + |
| 266 | +## Invariants |
| 267 | + |
| 268 | +1. **Container 0 = metadata ONLY** — never included in Hamming search |
| 269 | +2. **Schema version** in `W127` — currently version 1 |
| 270 | +3. **Checksum**: CRC32 of content (W126 bits 0-31) + XOR parity of W0-W125 (W126 bits 32-63) |
| 271 | +4. **Hot/Cold separation**: cold path metadata NEVER modifies hot path state |
| 272 | +5. **Zero-copy access**: MetaView borrows `&[u64; 128]` directly — no allocation |
| 273 | +6. **64-byte alignment**: cache-line aligned for SIMD safety |
| 274 | + |
| 275 | +--- |
| 276 | + |
| 277 | +## Record Geometry |
| 278 | + |
| 279 | +```text |
| 280 | +Full record: 2,048 bytes = 256 x u64 |
| 281 | +├── Container 0 (metadata): 1,024 bytes = 128 x u64 (W0-W127) |
| 282 | +└── Container 1+ (content): 1,024 bytes = 128 x u64 (fingerprints) |
| 283 | +
|
| 284 | +16K-bit upgrade record: 16,384 bytes = 2,048 x u64 |
| 285 | +├── Container 0 (metadata): 1,024 bytes = 128 x u64 (W0-W127) |
| 286 | +├── SchemaSidecar: 256 bytes = 32 x u64 (W224-W255) |
| 287 | +└── Content containers: Variable |
| 288 | +``` |
0 commit comments