Skip to content

Commit e401004

Browse files
authored
Merge pull request #151 from AdaWorldAPI/claude/compare-rustynum-ndarray-5ePRn
docs: update 11 markdown files for 16K container layout
2 parents 7024115 + f135bea commit e401004

11 files changed

Lines changed: 121 additions & 100 deletions

ARCHITECTURE.md

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -410,22 +410,23 @@ https://github.com/AdaWorldAPI/ladybug-rs
410410
411411
The sections below document the deep container substrate that underlies
412412
the CAM/scent architecture above. Where Part I describes the indexing
413-
and query surface, Part II describes the 8192-bit cognitive geometry,
413+
and query surface, Part II describes the 16,384-bit cognitive geometry,
414414
the DN tree, NARS reasoning, qualia modules, and the Friston free energy
415415
loop that ties them together.
416416

417417
---
418418

419419
## 11. Container Geometry
420420

421-
The **Container** is the atomic unit. 8192 bits = 128 × u64 words = 1 KB.
421+
The **Container** is the atomic unit. 16,384 bits = 256 × u64 words = 2 KB.
422422
Stack-allocated, SIMD-aligned (`#[repr(C, align(64))]`), zero-copy.
423+
Each CogRecord IS one container. A node has N containers (meta, content, embeddings...).
423424

424425
```text
425426
┌──────────────────────────────────────────────────────────┐
426-
128 words × 64 bits = 8192 bits = 1 KB
427-
16 AVX-512 iterations cover the full container │
428-
│ Expected random Hamming distance: 4096≈ 45) │
427+
256 words × 64 bits = 16,384 bits = 2 KB │
428+
32 AVX-512 iterations cover the full container │
429+
│ Expected random Hamming distance: 8192= 64) │
429430
└──────────────────────────────────────────────────────────┘
430431
```
431432

@@ -1114,9 +1115,9 @@ SNN/ANN/GNN approaches:
11141115
relationships compose via XOR without loss. GNNs lose information at
11151116
each message-passing layer.
11161117

1117-
3. **Constant-size representation**: An edge between two 8192-bit containers
1118-
is another 8192-bit container. No matter how complex the relationship,
1119-
it fits in 1 KB. Neural edge representations grow with model size.
1118+
3. **Constant-size representation**: An edge between two 16,384-bit containers
1119+
is another 16,384-bit container. No matter how complex the relationship,
1120+
it fits in 2 KB. Neural edge representations grow with model size.
11201121

11211122
4. **Information content is measurable**: `popcount(a xor b)` = how many bits
11221123
differ = the energy of the transformation. This is an exact count, not
@@ -1248,7 +1249,7 @@ The key transcoding: RedisGraph stored adjacency as integer node IDs in CSR
12481249
format. The holograph step replaced integer IDs with Container fingerprints
12491250
and integer edge weights with XOR deltas. BlasGraph formalized this as
12501251
sparse-adjacent-vector operations. ContainerGraph (ladybug-rs) is the final
1251-
form: pure Container-native, everything is 8192 bits, all operations are
1252+
form: pure Container-native, everything is 16,384 bits, all operations are
12521253
XOR/Hamming/popcount.
12531254

12541255
The adjacency encoding in W16-31 (inline) and W96-111 (CSR overflow) is the

HANDOVER.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,7 @@ order:
140140

141141
Extended from 402 → 1,649 lines. Preserved existing CAM/scent-index sections
142142
(1-10). Added 17 new sections covering:
143-
- Container Geometry (8192-bit atom, XOR/Hamming/popcount)
143+
- Container Geometry (16,384-bit atom, XOR/Hamming/popcount)
144144
- CogRecord (2 KB holy grail layout)
145145
- Container 0 Metadata Map (W0-W127 complete)
146146
- DN Tree (PackedDn 7×8-bit)
@@ -399,7 +399,7 @@ Complete mapping between the Python ecosystem and ladybug-rs:
399399
| File | What |
400400
|------|------|
401401
| **Container substrate** | |
402-
| `crates/ladybug-contract/src/container.rs` | CONTAINER_BITS=8192, EXPECTED_DISTANCE=4096, SIGMA=45.25 |
402+
| `crates/ladybug-contract/src/container.rs` | CONTAINER_BITS=16384, EXPECTED_DISTANCE=8192, SIGMA=64.0 |
403403
| `crates/ladybug-contract/src/record.rs` | CogRecord (2 KB = meta + content), cross_hydrate, extract_perspective |
404404
| `crates/ladybug-contract/src/nars.rs` | TruthValue, revision/deduction/induction/abduction/analogy/comparison |
405405
| `src/container/meta.rs` | W0-W127 metadata layout, MetaView/MetaViewMut |

INTEGRATION_PLAN.md

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -37,10 +37,9 @@ It survives context resets and serves as the task skeleton for agents.
3737
3838
┌──────────────────┐
3939
│ ladybug-contract │
40-
│ Container 8192b │
41-
│ WideContainer │
42-
│ CogRecord8K │
43-
│ CogPacket wire │
40+
│ Container 16384b │
41+
│ CogRecord 16K │
42+
│ CogPacket wire │
4443
└────────┬─────────┘
4544
4645
┌──────────────────┐
@@ -296,15 +295,14 @@ UnifiedExecution
296295
#### 5.2 Future Primitives [TODO]
297296
- [ ] Majority vote bundle (used by CognitiveKernel.bundle_recent)
298297
- [ ] Hamming similarity (portable version of AVX-512 VPOPCNTDQ)
299-
- [ ] Fingerprint XOR-fold (Container 8192 ↔ Fingerprint 16384)
298+
- [ ] Fingerprint XOR-fold (Container = Fingerprint = 16384 bits)
300299
- [ ] Focus mask: dimension selection based on codebook crystallization history
301300

302301
### 6. ladybug-contract (Pure Types)
303302

304303
#### 6.1 Current State [DONE]
305-
- [x] Container (8192-bit, 128 × u64)
306-
- [x] WideContainer (16384-bit, 256 × u64)
307-
- [x] CogRecord / CogRecord8K
304+
- [x] Container (16384-bit, 256 × u64) *(updated Feb 2026 from 8192-bit)*
305+
- [x] CogRecord (each container = 16K = 2 KB)
308306
- [x] CogPacket wire protocol
309307
- [x] EmbeddingFormat enum
310308

REALITY_CHECK.md

Lines changed: 55 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,11 @@ src/container/mod.rs:73 — [u64; 128] = 8,192 bits = 1 KB
9292
crates/ladybug-contract/src/container.rs:29 — [u64; 128] = 8,192 bits = 1 KB
9393
```
9494

95+
> **UPDATE (Feb 2026):** Container has been widened to 16,384 bits (256 × u64 = 2 KB).
96+
> Container = Fingerprint = same width. No more truncation or zero-extension.
97+
> The dual-Container type issue (src/container vs contract crate) remains a
98+
> cleanup target, but the width mismatch is resolved.
99+
95100
They are **identical in layout** but **different Rust types**. You cannot pass one
96101
where the other is expected without conversion. This means:
97102

@@ -102,44 +107,47 @@ where the other is expected without conversion. This means:
102107
- `src/container/cache.rs`, `src/container/graph.rs` use the local one
103108
- Type mismatch between modules that should be the same
104109

105-
**AND** there is `Fingerprint` (256 u64 = 16,384 bits = 2 KB) in `src/core/fingerprint.rs`.
110+
~~**AND** there is `Fingerprint` (256 u64 = 16,384 bits = 2 KB) in `src/core/fingerprint.rs`.
106111
Two `From` impls exist to convert:
107112
- Fingerprint → Container: truncation (copy first 128 words, discard upper 128)
108113
- Container → Fingerprint: zero-extension (copy 128 words, pad 128 zeros)
109114

110115
This means **half the fingerprint is thrown away** when going to storage.
111-
Or storage records carry **128 zero words** when promoted to Fingerprint.
116+
Or storage records carry **128 zero words** when promoted to Fingerprint.~~
117+
118+
> **RESOLVED (Feb 2026):** Container and Fingerprint are now both 256 × u64 = 16,384 bits.
119+
> Conversion is direct copy — no truncation, no zero-extension.
112120
113121
### THE FIX
114122

115-
The correct layout (which you identified in a previous session) is:
123+
> **UPDATE (Feb 2026):** The canonical layout is now implemented:
124+
> - Every Container = 16,384 bits = 256 × u64 = 2 KB. Always.
125+
> - Container 0 = Metadata CogRecord (16K): W0-W127 MetaView fields, W224-W255 SchemaSidecar
126+
> - Container 1 = Content CogRecord (16K): All 256 words = searchable VSA fingerprint
127+
> - Container N = Additional CogRecords (Jina embeddings, etc.)
128+
> - Constants: CONTAINER_BITS=16384, CONTAINER_WORDS=256, CONTENT_OFFSET=0, CONTENT_WORDS=256
129+
>
130+
> The old "8192+8192 = one Fingerprint" model is superseded. Each CogRecord IS one
131+
> full-width container. A node is composed of separate CogRecords (Container 0 meta,
132+
> Container 1 content, etc.), each 2 KB.
116133
117-
```
134+
~~The correct layout (which you identified in a previous session) is:~~
135+
136+
~~```
118137
CogRecord = 8,192-bit metadata (W0-W127) + 8,192-bit content (W0-W127)
119138
= 2 Containers = 2 KB total
120139
= Exactly 1 Fingerprint
121-
```
140+
```~~
122141
123-
This means:
142+
Remaining work:
124143
1. **Delete `src/container/mod.rs:73` Container** — re-export from contract crate
125-
2. **Make CogRecord = exactly 1 Fingerprint** — upper 128 words = metadata, lower 128 = content
126-
OR: Container 0 (meta) + Container 1 (content), serialized as one Fingerprint
127-
3. **DN tree in Redis** — each key maps to exactly 2 KB = 1 Fingerprint = 1 CogRecord
128-
4. **Spine** — XOR of content containers IS the tree spine, same as Redis DN tree
129-
130-
What changes:
131-
- `crates/ladybug-contract/src/record.rs` — CogRecord becomes `[Container; 2]` not `meta + Vec<Container>`
132-
- `src/core/fingerprint.rs` — Fingerprint IS a CogRecord (upper=meta, lower=content)
133-
- Kill `src/container/mod.rs` — everything uses `ladybug_contract::container::Container`
134-
- `ContainerGeometry::Cam` (the default, most common) stays as 1 meta + 1 content = 2 KB
135-
- Multi-container geometries (Xyz, Chunked, Tree) become linked lists of 2 KB records via DN tree
144+
2. Kill duplicate Container type — everything uses `ladybug_contract::container::Container`
136145
137146
### WHY THIS MATTERS
138147
139-
Right now searching requires loading Container (1 KB) then separately loading metadata.
140-
With 8192+8192, every record is self-contained. One 2 KB read gives you everything:
141-
identity, NARS truth, edges, AND the searchable content fingerprint.
142-
Zero joins. Zero second lookups. The record IS the DN tree node IS the Redis value.
148+
Each container is self-contained at 2 KB. One read gives you the full 16,384-bit
149+
searchable fingerprint (Container 1) or the full metadata (Container 0).
150+
Zero truncation. Zero zero-extension. The record IS the DN tree node IS the storage value.
143151
144152
---
145153
@@ -420,45 +428,41 @@ Wire them into the cognitive kernel (`src/cognitive/cognitive_kernel.rs`):
420428

421429
---
422430

423-
## THE HOLY GRAIL: 8192 META + 8192 CONTENT
431+
## THE HOLY GRAIL: 16K-PER-CONTAINER MODEL
432+
433+
> **UPDATE (Feb 2026):** This section described the old "8192+8192" model.
434+
> The canonical layout is now: every Container = 16,384 bits = 256 × u64 = 2 KB.
435+
> A node has separate containers (Container 0 = metadata, Container 1 = content, etc.),
436+
> each one a full 16K-bit CogRecord.
424437
425-
### Current State (Wrong)
438+
### Previous State (Fixed)
426439

427440
```
428441
Fingerprint = 256 u64 = 16,384 bits = 2 KB (src/core/)
429-
Container = 128 u64 = 8,192 bits = 1 KB (contract + src/container/ DUPLICATE)
442+
Container = 256 u64 = 16,384 bits = 2 KB (contract — updated from 128 u64)
430443
CogRecord = 1 meta Container + Vec<Container> (variable size, heap allocated)
431444
CogPacket = 8-word header + 1-2 Containers (wire protocol)
432445
```
433446

434-
Problems:
435-
- Fingerprint → Container loses half the data (truncation at conversion)
436-
- CogRecord is heap-allocated Vec (variable size = no zero-copy, no mmap)
437-
- Two Container types cause type confusion
438-
- Wire protocol adds its own 64-byte header, different from meta.rs W0-W127
439-
440-
### Target State (8192 + 8192)
447+
### Current State (Implemented)
441448

442449
```
443-
Container = 128 u64 = 8,192 bits = 1 KB (ONE type, in contract)
444-
CogRecord = [Container; 2] = 2 KB fixed (meta + content, stack allocated)
445-
Fingerprint = type alias for CogRecord (or From<CogRecord> zero-cost)
450+
Container = 256 u64 = 16,384 bits = 2 KB (ONE type, in contract)
451+
Fingerprint = 256 u64 = 16,384 bits = 2 KB (same width as Container)
452+
CogRecord = separate Containers (meta + content + ...)
446453
DN tree key = PackedDn (8 bytes)
447-
DN tree val = CogRecord (2 KB fixed)
448-
Redis key = DN address
449-
Redis value = 2 KB blob (identical to CogRecord)
454+
DN tree val = Container(s) (2 KB each)
450455
```
451456

452457
### What this gives you
453458

454-
1. **Zero-copy everything**: mmap a file, cast to `&[CogRecord]`, done
455-
2. **No heap allocation**: `[Container; 2]` lives on the stack
456-
3. **DN tree = Redis = Storage**: exact same 2 KB blob everywhere
457-
4. **Spine = XOR of content containers**: `spine = records.iter().fold(Container::zero(), |s, r| s.xor(&r.content))`
458-
5. **SIMD on full record**: 2 x 16 AVX-512 iterations = 32 iterations per record
459-
6. **One lookup per node**: GET dn_addr → 2 KB → you have meta + content + edges + NARS
460-
7. **CLAM tree over CogRecords**: one tree indexes both metadata and content
461-
8. **panCAKES compression on content container**: XOR-diff from cluster center, 5-70x ratio
459+
1. **No truncation**: Container = Fingerprint = 16,384 bits, direct copy
460+
2. **Full-width SIMD**: 32 AVX-512 iterations per Container (256 words / 8)
461+
3. **σ = 64.0 exactly**: sqrt(16384/4) = 64.0, simplifies all threshold math
462+
4. **Spine = XOR of content containers**: `spine = records.iter().fold(Container::zero(), |s, r| s.xor(&r.cam))`
463+
5. **One lookup per container**: GET dn_addr → 2 KB → you have the full 16K fingerprint
464+
6. **CLAM tree over Containers**: one tree indexes content directly
465+
7. **panCAKES compression on content container**: XOR-diff from cluster center, 5-70x ratio
462466

463467
### Migration Path
464468

@@ -491,10 +495,10 @@ Week 2: crewai-rust Resurrection
491495
[6] Wire task.execute_sync() to real execution (Issue #2)
492496
[7] Delete dead wire_bridge or call it (Issue #2)
493497
494-
Week 3: The Holy Grail — 8192+8192
495-
[8] CogRecord = [Container; 2] (Issue #1, steps 2-5)
496-
[9] Update storage to fixed 2 KB records (steps 6-8)
497-
[10] Fingerprint = CogRecord alias (step 9)
498+
Week 3: The Holy Grail — 16K per Container *(DONE Feb 2026)*
499+
[8] ~~CogRecord = [Container; 2]~~ → Container widened to 16384 bits ✓
500+
[9] ~~Update storage to fixed 2 KB records~~ → Constants updated ✓
501+
[10] ~~Fingerprint = CogRecord alias~~ → Same width, direct copy ✓
498502
499503
Week 4: Connect the Pipes
500504
[11] Bridge Grammar → CausalSearch (Issue #3)
@@ -582,9 +586,9 @@ into "scientific breakthrough."
582586

583587
### The gap between "looks impressive" and "actually works" is ~8 weeks of focused work.
584588

585-
The 8192+8192 change is the architectural unlock (weeks 1-3).
589+
The 16K-per-container change is the architectural unlock *(completed Feb 2026)*.
586590
CLAM integration is the scientific unlock (weeks 5-8).
587-
Everything else follows from having one canonical 2 KB record type
591+
Everything else follows from having one canonical 2 KB container type
588592
that IS the fingerprint, IS the DN tree node, IS the Redis value,
589593
IS the search vector, IS the CLAM tree leaf, IS the storage unit.
590594

docs/BINDSPACE_UNIFICATION.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,14 @@
22

33
> **Date**: 2026-02-13
44
> **Status**: PLAN — awaiting implementation
5+
>
6+
> **NOTE (Feb 2026):** This document was written when Container = 128 × u64 = 8,192 bits = 1 KB.
7+
> Container is now 256 × u64 = 16,384 bits = 2 KB. The "8K vs 16K" comparison table
8+
> and the 128/128 metadata/content split described in §6-§7 are **superseded**.
9+
> Canonical layout: every Container = 16,384 bits = 256 words. Container 0 = metadata,
10+
> Container 1 = content (full 256 words = searchable fingerprint), Container N = additional.
11+
> The BindSpace concepts (typed lens, Arrow backing, DnIndex) remain valid.
12+
> Specific word-offset references to [0..128] metadata / [128..256] content need updating.
513
> **Branch for work**: Create fresh branch from `claude/code-review-SMMuY`
614
> **Rollback**: `git revert` the implementation commit(s); all consolidation files remain untouched on the parent branch
715
@@ -1875,7 +1883,7 @@ type Node {
18751883
children: [Node!]! # dn_index.children(addr)
18761884
parent: Node # dn.parent() → resolve
18771885
content: Fingerprint # content_container()
1878-
similarity(to: Fingerprint!): Float # content.hamming() → 1.0 - dist/8192
1886+
similarity(to: Fingerprint!): Float # content.hamming() → 1.0 - dist/16384
18791887
}
18801888

18811889
type Query {

docs/COGNITIVE_RECORD_192.md

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,19 @@
11
# The 192×u64 Cognitive Record
22

33
**Date**: 2026-02-05
4-
**Status**: Proposal (alternate to COMPOSITE_FINGERPRINT_SCHEMA.md)
4+
**Status**: ~~Proposal~~ **SUPERSEDED**
5+
6+
> **NOTE (Feb 2026):** This document proposed a 192-word record with an 8,192-bit
7+
> (128 × u64) fingerprint lane. The canonical layout is now:
8+
> - Every Container = 16,384 bits = 256 × u64 = 2 KB
9+
> - Container 0 = Metadata (16K), Container 1 = Content (16K), Container N = Additional
10+
> - The 192-word layout was never implemented. See `width_16k/mod.rs` for the current spec.
11+
> - Many of the GraphBLAS/semiring concepts remain valid and are implemented in
12+
> `src/container/semiring.rs` and `src/width_16k/search.rs`.
13+
>
14+
> Kept for historical reference — the adjacency bitvector concept and DataFusion operator
15+
> model described here informed later design decisions.
16+
517
**Core idea**: One 1,536-byte fixed-size record is the node, the edge row, the sparse adjacency vector, the NARS belief, the scent, AND the VSA fingerprint. Every cognitive operation reduces to SIMD on a lane of the same buffer. Zero-copy from storage through compute through transport.
618

719
---

docs/COGRECORD_65536.md

Lines changed: 12 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -10,17 +10,18 @@ February 21, 2026 — AdaWorldAPI / Claude Architecture Audit
1010

1111
## 0. The Insight
1212

13-
Current: `Container = [u64; 128]` = 8,192 bits = 1 KB.
14-
Current: `CogRecord = [Container; 2]` = 16,384 bits = 2 KB.
15-
Current: `Fingerprint = [u64; 256]` = 16,384 bits = 2 KB.
13+
> **UPDATE (Feb 2026):** Phase 0 of this plan is now implemented.
14+
> `CONTAINER_BITS = 16_384`, `CONTAINER_WORDS = 256`, `SIGMA = 64.0`.
15+
> Container = Fingerprint = same width. No truncation. No zero-extension.
16+
> The "Current" state below reflects the pre-change baseline.
1617
17-
Container is **half-width Fingerprint**. Every `From<&Fingerprint> for Container` truncates. Every `From<&Container> for Fingerprint` zero-extends. This is wasted information.
18-
19-
**New: Container = Fingerprint = `[u64; 256]` = 16,384 bits = 2 KB.**
18+
~~Current: `Container = [u64; 128]` = 8,192 bits = 1 KB.~~
19+
~~Current: `CogRecord = [Container; 2]` = 16,384 bits = 2 KB.~~
20+
**Implemented: Container = Fingerprint = `[u64; 256]` = 16,384 bits = 2 KB.**
2021

2122
One type. Full width. No truncation. No zero-extension. The conversion functions become identity operations.
2223

23-
Then: `CogRecord = [Container; 4]` = 65,536 bits = 8 KB.
24+
Future: `CogRecord = [Container; 4]` = 65,536 bits = 8 KB.
2425

2526
With AVX-512 `VPOPCNTDQ` (confirmed present on this hardware), a full 65,536-bit sweep costs 128 instructions — the same throughput class as the current 8,192-bit sweep's 16 instructions. The pipeline fills identically; only the iteration count changes.
2627

@@ -159,16 +160,10 @@ match meta.embedding_metric() {
159160

160161
The entire change starts at one file: `crates/ladybug-contract/src/container.rs`.
161162

162-
### Phase 0: Change Constants (1 hour)
163+
### Phase 0: Change Constants ~~(1 hour)~~ **DONE (Feb 2026)**
163164

164165
```rust
165-
// BEFORE:
166-
pub const CONTAINER_BITS: usize = 8_192;
167-
pub const CONTAINER_WORDS: usize = CONTAINER_BITS / 64; // 128
168-
pub const CONTAINER_BYTES: usize = CONTAINER_WORDS * 8; // 1024
169-
pub const CONTAINER_AVX512_ITERS: usize = CONTAINER_WORDS / 8; // 16
170-
171-
// AFTER:
166+
// IMPLEMENTED in crates/ladybug-contract/src/container.rs:
172167
pub const CONTAINER_BITS: usize = 16_384;
173168
pub const CONTAINER_WORDS: usize = CONTAINER_BITS / 64; // 256
174169
pub const CONTAINER_BYTES: usize = CONTAINER_WORDS * 8; // 2048
@@ -177,15 +172,10 @@ pub const CONTAINER_AVX512_ITERS: usize = CONTAINER_WORDS / 8; // 32
177172

178173
Because the code uses `CONTAINER_WORDS` everywhere (not hardcoded `128`), **most code compiles immediately**. The 17 files importing these constants just get wider containers.
179174

180-
### Phase 0.1: Statistical Constants
175+
### Phase 0.1: Statistical Constants **DONE (Feb 2026)**
181176

182177
```rust
183-
// BEFORE:
184-
pub const EXPECTED_DISTANCE: u32 = 4096; // 8192/2
185-
pub const SIGMA: f64 = 45.254833995939045; // √(8192/4)
186-
pub const SIGMA_APPROX: u32 = 45;
187-
188-
// AFTER:
178+
// IMPLEMENTED:
189179
pub const EXPECTED_DISTANCE: u32 = 8192; // 16384/2
190180
pub const SIGMA: f64 = 64.0; // √(16384/4) = √4096 = 64.0 exactly
191181
pub const SIGMA_APPROX: u32 = 64;

0 commit comments

Comments
 (0)