Skip to content

Commit 3b601b6

Browse files
authored
Merge pull request #174 from AdaWorldAPI/claude/document-metadata-schema-pVbYP
Add comprehensive metadata schema reference documentation
2 parents e4c43b6 + 490a440 commit 3b601b6

1 file changed

Lines changed: 288 additions & 0 deletions

File tree

docs/METADATA_SCHEMA.md

Lines changed: 288 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,288 @@
1+
# Metadata Schema Reference
2+
3+
> **Generated**: 2026-03-17
4+
> **Source of truth**: `src/container/meta.rs`, `src/width_16k/schema.rs`
5+
> **Schema version**: 1 (`meta.rs:93`)
6+
7+
---
8+
9+
## Overview
10+
11+
Ladybug-rs uses a two-tier metadata system:
12+
13+
1. **MetaView** (128 words = 1,024 bytes) — full-resolution canonical metadata in Container 0
14+
2. **SchemaSidecar** (32 words = 256 bytes) — compact summary in words 224-255
15+
16+
Container 0 is **never searched** by Hamming distance. It holds structural information only.
17+
Content containers (1+) hold fingerprints for SIMD operations.
18+
19+
---
20+
21+
## MetaView — Container 0 Layout (W0-W127)
22+
23+
Source: `src/container/meta.rs:1-93`
24+
25+
```text
26+
Word(s) Offset Content
27+
───────── ─────── ─────────────────────────────────────────────────
28+
W0 0 PackedDn address (THE identity, u64)
29+
W1 8 node_kind:u8 | container_count:u8 | geometry:u8
30+
| flags:u8 | schema_version:u16 | provenance_hash:u16
31+
W2 16 Timestamps: created_ms:u32 | modified_ms:u32
32+
W3 24 label_hash:u32 | tree_depth:u8 | branch:u8 | reserved:u16
33+
W4-7 32 NARS: freq:f32 | conf:f32 | pos_evidence:f32 | neg_evidence:f32
34+
W8-11 64 DN rung + 7-Layer compact + collapse gate
35+
W12-15 96 7-Layer markers (5 bytes x 7 = 35 bytes)
36+
W16-31 128 Inline edges (64 packed, 4 per word)
37+
W32-39 256 RL / Q-values / rewards
38+
W40-47 320 Bloom filter (512 bits)
39+
W48-55 384 Graph metrics (full precision f64)
40+
W56-63 448 Qualia (18 channels x f16 + 8 slots)
41+
W64-79 512 Rung history + collapse gate history
42+
W80-95 640 Representation language descriptor
43+
W96-111 768 DN-Sparse adjacency (compact inline CSR)
44+
W112-125 896 Reserved
45+
W126-127 1008 Checksum (CRC32:u32 | parity:u32) + schema version
46+
```
47+
48+
### Word Constants (`meta.rs:35-93`)
49+
50+
| Constant | Value | Purpose |
51+
|-------------------|-------|-------------------------------------------|
52+
| `W_DN_ADDR` | 0 | PackedDn address |
53+
| `W_TYPE` | 1 | Record type + geometry |
54+
| `W_TIME` | 2 | Timestamps |
55+
| `W_LABEL` | 3 | Label hash + tree metadata |
56+
| `W_NARS_BASE` | 4 | NARS truth values (4 words) |
57+
| `W_DN_RUNG` | 8 | DN rung + 7-layer compact |
58+
| `W_LAYER_BASE` | 12 | 7-layer markers |
59+
| `W_EDGE_BASE` | 16 | Inline edges start |
60+
| `W_EDGE_END` | 31 | Inline edges end |
61+
| `W_RL_BASE` | 32 | Reinforcement learning data |
62+
| `W_BLOOM_BASE` | 40 | Bloom filter (512 bits) |
63+
| `W_GRAPH_BASE` | 48 | Graph metrics |
64+
| `W_QUALIA_BASE` | 56 | Qualia channels |
65+
| `W_RUNG_HIST` | 64 | Rung + collapse gate history |
66+
| `W_REPR_BASE` | 80 | Representation language descriptor |
67+
| `W_ADJ_BASE` | 96 | DN-Sparse adjacency (inline CSR) |
68+
| `W_RESERVED` | 112 | Reserved |
69+
| `W_CHECKSUM` | 126 | Checksum + version |
70+
| `MAX_INLINE_EDGES` | 64 | Maximum edges stored inline |
71+
| `SCHEMA_VERSION` | 1 | Current schema version |
72+
73+
### MetaView API (`meta.rs:99+`)
74+
75+
```rust
76+
pub struct MetaView<'a> { /* zero-copy borrow of &'a [u64; 128] */ }
77+
pub struct MetaViewMut<'a> { /* mutable borrow */ }
78+
```
79+
80+
Both provide zero-copy access to the word layout above. No allocation on read.
81+
82+
---
83+
84+
## SchemaSidecar — Compact Summary (W224-W255)
85+
86+
Source: `src/width_16k/schema.rs:1-299`
87+
88+
The SchemaSidecar packs identity, reasoning, learning, and topology into the
89+
upper 32 words of the metadata container. It is a compressed snapshot useful
90+
for quick deserialization without parsing the full MetaView.
91+
92+
### Block 14: Identity + Reasoning + Learning (W224-W239)
93+
94+
```text
95+
[224] depth:u8 | rung:u8 | qidx:u16 | access_count:u32
96+
[225] ttl:u16 | sigma_q:u16 | node_type:u32
97+
[226] label_hash:u64
98+
[227] edge_type:u32 | version:u8 | reserved:u24
99+
[228-229] ANI levels: 8 x u16 = 128 bits
100+
[230] NARS truth:u32 | budget_lo:u32 (priority + durability)
101+
[231] budget_hi:u32 (quality + reserved) | reserved:u32
102+
[232-233] Q-values: 16 x i8 = 128 bits
103+
[234-235] Rewards: 8 x i16 = 128 bits
104+
[236-237] STDP: 8 x u16 = 128 bits
105+
[238-239] Hebbian: 8 x u16 = 128 bits
106+
```
107+
108+
### Block 15: Graph Topology + Edges (W240-W255)
109+
110+
```text
111+
[240-243] DN address: 32 x u8 = 256 bits
112+
[244-247] Neighbor bloom: 4 x u64 = 256 bits (3-hash bloom filter)
113+
[248] Graph metrics: packed u64
114+
[249-255] Inline edges: 7 words = up to 28 edges at 16 bits each
115+
```
116+
117+
### Key Structs
118+
119+
#### NodeIdentity (`schema.rs:46-65`)
120+
121+
```rust
122+
pub struct NodeIdentity {
123+
pub depth: u8, // Tree depth (0 = root)
124+
pub rung: u8, // Pearl's causal rung: 0=SEE, 1=DO, 2=IMAGINE
125+
pub qidx: u16, // Quantization index (codebook entry)
126+
pub access_count: u32, // LRU/frequency tracking
127+
pub ttl: u16, // Time-to-live in ticks (0 = permanent)
128+
pub sigma_q: u16, // Uncertainty: sigma * 1000 as u16
129+
pub node_type: NodeTypeMarker,
130+
pub label_hash: u64,
131+
pub edge_type: EdgeTypeMarker,
132+
}
133+
```
134+
135+
#### AniLevels (`schema.rs:78-89`) — 8 cognitive reasoning levels
136+
137+
```rust
138+
pub struct AniLevels {
139+
pub reactive: u16, // Layer 0: stimulus-response
140+
pub memory: u16, // Layer 1: episodic recall
141+
pub analogy: u16, // Layer 2: structural mapping
142+
pub planning: u16, // Layer 3: multi-step lookahead
143+
pub meta: u16, // Layer 4: self-reflection
144+
pub social: u16, // Layer 5: theory of mind
145+
pub creative: u16, // Layer 6: generative novelty
146+
pub abstract: u16, // Layer 7: formal reasoning
147+
}
148+
```
149+
150+
Packed as `u128` (16 bits each). `dominant()` returns the index of the highest level.
151+
152+
#### NarsTruth (`schema.rs:137-185`) — NARS truth value
153+
154+
```rust
155+
pub struct NarsTruth {
156+
pub frequency: u16, // Quantized 0.0-1.0 as 0-65535
157+
pub confidence: u16, // Quantized 0.0-0.9999 as 0-65535
158+
}
159+
```
160+
161+
Methods: `from_floats()`, `f()`, `c()`, `revision()`, `deduction()`, `pack()/unpack()`.
162+
Packed as `u32` (frequency in low 16 bits, confidence in high 16 bits).
163+
164+
#### NarsBudget (`schema.rs:188-222`) — NARS resource allocation
165+
166+
```rust
167+
pub struct NarsBudget {
168+
pub priority: u16,
169+
pub durability: u16,
170+
pub quality: u16,
171+
pub _reserved: u16,
172+
}
173+
```
174+
175+
Packed as `u64`.
176+
177+
#### EdgeTypeMarker (`schema.rs:225-258`)
178+
179+
```rust
180+
pub struct EdgeTypeMarker {
181+
pub verb_id: u8, // Cognitive verb identifier
182+
pub direction: u8, // Edge direction
183+
pub weight: u8, // Edge weight
184+
pub flags: u8, // Bit 0: temporal, Bit 1: causal, Bit 2: hierarchical
185+
}
186+
```
187+
188+
#### NodeTypeMarker (`schema.rs:280-298`)
189+
190+
```rust
191+
pub struct NodeTypeMarker {
192+
pub kind: u8, // See NodeKind enum
193+
pub subtype: u8,
194+
pub provenance: u16,
195+
}
196+
```
197+
198+
#### NodeKind (`schema.rs:262-276`)
199+
200+
```rust
201+
pub enum NodeKind {
202+
Entity = 0,
203+
Concept = 1,
204+
Event = 2,
205+
Rule = 3,
206+
Goal = 4,
207+
Query = 5,
208+
Hypothesis = 6,
209+
Observation = 7,
210+
}
211+
```
212+
213+
---
214+
215+
## Auxiliary Metadata Types
216+
217+
### EnvelopeMetadata (`src/contract/types.rs`)
218+
219+
Wire-format metadata for cross-runtime data envelopes:
220+
221+
```rust
222+
pub struct EnvelopeMetadata {
223+
pub agent_id: Option<String>,
224+
pub confidence: Option<f64>,
225+
pub epoch: Option<i64>,
226+
pub version: Option<String>,
227+
}
228+
```
229+
230+
Part of the `DataEnvelope` struct shared across ada-n8n, crewai-rust, and ladybug-rs.
231+
232+
### DocumentMeta (`src/storage/corpus.rs`)
233+
234+
Metadata for scent-indexed training corpora. Arrow columns:
235+
- `chunk_id: u64`
236+
- `doc_id: string`
237+
- `text: string`
238+
- `fingerprint: binary[48]` (384-bit scent)
239+
- `position: u32`
240+
- `metadata: json`
241+
242+
### Unified Execution Contract (`src/contract/types.rs`)
243+
244+
```rust
245+
pub struct UnifiedStep {
246+
pub step_id: String,
247+
pub execution_id: String,
248+
pub step_type: String, // "n8n.*" | "crew.*" | "lb.*" | "core.*"
249+
pub runtime: String,
250+
pub name: String,
251+
pub status: StepStatus, // Pending | Running | Completed | Failed | Skipped
252+
pub input: Value,
253+
pub output: Value,
254+
pub error: Option<String>,
255+
pub started_at: DateTime<Utc>,
256+
pub finished_at: Option<DateTime<Utc>>,
257+
pub sequence: i32,
258+
pub reasoning: Option<String>,
259+
pub confidence: Option<f64>,
260+
pub alternatives: Option<Value>,
261+
}
262+
```
263+
264+
---
265+
266+
## Invariants
267+
268+
1. **Container 0 = metadata ONLY** — never included in Hamming search
269+
2. **Schema version** in `W127` — currently version 1
270+
3. **Checksum**: CRC32 of content (W126 bits 0-31) + XOR parity of W0-W125 (W126 bits 32-63)
271+
4. **Hot/Cold separation**: cold path metadata NEVER modifies hot path state
272+
5. **Zero-copy access**: MetaView borrows `&[u64; 128]` directly — no allocation
273+
6. **64-byte alignment**: cache-line aligned for SIMD safety
274+
275+
---
276+
277+
## Record Geometry
278+
279+
```text
280+
Full record: 2,048 bytes = 256 x u64
281+
├── Container 0 (metadata): 1,024 bytes = 128 x u64 (W0-W127)
282+
└── Container 1+ (content): 1,024 bytes = 128 x u64 (fingerprints)
283+
284+
16K-bit upgrade record: 16,384 bytes = 2,048 x u64
285+
├── Container 0 (metadata): 1,024 bytes = 128 x u64 (W0-W127)
286+
├── SchemaSidecar: 256 bytes = 32 x u64 (W224-W255)
287+
└── Content containers: Variable
288+
```

0 commit comments

Comments
 (0)