|
| 1 | +# ADR 0002 — I1 Codec Regime Split (proposed) |
| 2 | + |
| 3 | +> **Status:** **Proposed** (drafted 2026-04-24) |
| 4 | +> **Supersedes:** None |
| 5 | +> **Superseded by:** None |
| 6 | +> |
| 7 | +> **Scope:** Locks the invariant that governs every codec choice across |
| 8 | +> the BindSpace SoA, the Lance persistence schema, AriGraph (episodic + |
| 9 | +> triplet_graph), archetype / persona catalogues, and role-key storage. |
| 10 | +> |
| 11 | +> **Quantitative gate:** jc pillar 5 (+ pending pillar 5b on Pearl 2³ |
| 12 | +> mask accuracy — see `TECH_DEBT.md` 2026-04-24 jc Pillar 5b entry). |
| 13 | +
|
| 14 | +--- |
| 15 | + |
| 16 | +## Context |
| 17 | + |
| 18 | +Across multiple sessions the same question kept surfacing in different |
| 19 | +disguises: "should we compress field X?" The field varied — 3 SPO planes |
| 20 | +in `cognitive_nodes.lance`, episodic fingerprints in `arigraph`, persona |
| 21 | +resonance fingerprints, role-key slices — but the answer always hinged |
| 22 | +on the **same distinction**: is this field identity-bearing, or is it |
| 23 | +similarity-searchable? |
| 24 | + |
| 25 | +The contract crate **already encodes the answer** in |
| 26 | +`crates/lance-graph-contract/src/cam.rs`: |
| 27 | + |
| 28 | +```rust |
| 29 | +pub enum CodecRoute { |
| 30 | + CamPq, // argmax regime — compression OK |
| 31 | + Passthrough, // index regime — lossless required |
| 32 | + Skip, // too small / not a codec target |
| 33 | +} |
| 34 | +``` |
| 35 | + |
| 36 | +with shipped prose: |
| 37 | + |
| 38 | +> *"Identity lookup must be exact — no codec can survive Invariant I1."* |
| 39 | +
|
| 40 | +What this ADR does is **lift that codec-routing enum into a workspace- |
| 41 | +wide invariant** and specify how every new structure must be classified. |
| 42 | + |
| 43 | +## Decision |
| 44 | + |
| 45 | +### The invariant (I1 Codec Regime Split) |
| 46 | + |
| 47 | +Every field added to the BindSpace SoA, the Lance persistence schema, |
| 48 | +the AriGraph crate, the archetype / persona catalogues, or any role-key |
| 49 | +storage MUST be classified into exactly one of three regimes: |
| 50 | + |
| 51 | +| Regime | Codec | When it applies | |
| 52 | +|---|---|---| |
| 53 | +| **Index** | `Passthrough` (lossless) | Field is used for exact identity lookup, hash-keyed retrieval, independent-component addressability (e.g. Pearl 2³), or VSA bind/unbind role. Bit-level / byte-level round-trip MUST be exact. | |
| 54 | +| **Argmax** | `CamPq` (lossy OK) | Field is used for nearest-neighbor similarity, cascade filtering, resonance dispatch. Small error budget acceptable; only relative order matters. | |
| 55 | +| **Skip** | `Passthrough` trivially | Field too small to benefit from compression (norms, biases, packed truth values). | |
| 56 | + |
| 57 | +### Classification rules |
| 58 | + |
| 59 | +The following cases are normative and must not be rediscovered per-PR: |
| 60 | + |
| 61 | +| Structure | Regime | Reason | |
| 62 | +|---|---|---| |
| 63 | +| Pearl 2³ S/P/O planes (`cognitive_nodes.lance`) | **Index** | Mask evaluation requires independent per-role addressability; collapsing to a shared codebook violates Berry-Esseen IID assumptions (see Jirak pillar 5) | |
| 64 | +| `integrated_16k` cascade L1 | **Argmax** | Fast HHTL filter — CAM-PQ legitimate as first-tier scent | |
| 65 | +| AriGraph `Triplet.{subject, object, relation}` | **Index** | Strings are ground-truth identity; HashMap-keyed lookup | |
| 66 | +| AriGraph `Episode.fingerprint` | **Argmax** | Hamming-similarity retrieval; CAM-PQ-eligible as cascade filter | |
| 67 | +| `PersonaCard.entry.id` (ExpertId) | **Index** | Enum/ID is the identity; dispatch is exact | |
| 68 | +| Per-persona resonance codebook | **Argmax** | Implicit-routing similarity match | |
| 69 | +| Role keys (`grammar/role_keys.rs`) | **Index** | Bipolar bind/unbind identity — per I-VSA-IDENTITIES | |
| 70 | +| NARS truth (f, c) | **Skip** | 32 bits total; no codec payoff | |
| 71 | + |
| 72 | +### Quantitative gate |
| 73 | + |
| 74 | +Every proposed codec change must either: |
| 75 | + |
| 76 | +1. Keep `cargo run --manifest-path crates/jc/Cargo.toml --release --example prove_it` green (the five-pillar proof), OR |
| 77 | +2. Cite which pillar it extends, and add the corresponding arm to the proof binary. |
| 78 | + |
| 79 | +Pillar 5 (Jirak Berry-Esseen) is the current quantitative anchor: weak- |
| 80 | +dependent data (25 % shared-codebook prefix + 10 % overlapping role |
| 81 | +slices) showed sup-error 0.013287 at d=16384, N=5000 — vs IID baseline |
| 82 | +0.011671. That 14 % inflation IS the cost of violating Index regime. |
| 83 | + |
| 84 | +Pending extension (Pillar 5b, TECH_DEBT 2026-04-24): direct Pearl 2³ |
| 85 | +mask-misclassification rate, three-plane vs CAM-PQ-bundled. Required |
| 86 | +before ADR-0002 acceptance. |
| 87 | + |
| 88 | +## Consequences |
| 89 | + |
| 90 | +### What this permits |
| 91 | + |
| 92 | +- CAM-PQ compression on argmax-regime overlays: `integrated_16k`, |
| 93 | + episodic fingerprints, resonance codebooks. |
| 94 | +- Stack-side VSA binding of metadata (role, card_id) into role-key |
| 95 | + slices — stays in Index regime because the mapping is deterministic |
| 96 | + and lossless. |
| 97 | +- Cascade search paths that use CAM-PQ as first-tier filter + exact |
| 98 | + match on lossless fields as commit tier. |
| 99 | + |
| 100 | +### What this forbids |
| 101 | + |
| 102 | +- Replacing the three S/P/O planes in `cognitive_nodes.lance` with |
| 103 | + CAM-PQ codes. They are Index regime (Pearl 2³ addressability). |
| 104 | +- Replacing `AriGraph::Triplet` strings with compressed codes. |
| 105 | +- Replacing `PersonaCard.entry.id` with a CAM-PQ code. |
| 106 | +- Running `MergeMode::Xor` on state-transition paths (violates |
| 107 | + I-SUBSTRATE-MARKOV; see CLAUDE.md). XOR merge is legitimate only |
| 108 | + for single-writer deltas. |
| 109 | + |
| 110 | +### Migration |
| 111 | + |
| 112 | +Current code is already compliant — no migration required. This ADR |
| 113 | +codifies the existing CodecRoute invariant and extends it to AriGraph |
| 114 | ++ archetype surfaces that were previously unclassified. |
| 115 | + |
| 116 | +Future codec decisions consult this ADR first, not session discussion. |
| 117 | + |
| 118 | +## Acceptance criteria |
| 119 | + |
| 120 | +This ADR moves from **Proposed** to **Accepted** when: |
| 121 | + |
| 122 | +1. Pillar 5b extension ships (direct Pearl 2³ mask-accuracy measurement). |
| 123 | +2. Pillar 5b numbers are cited in this ADR. |
| 124 | +3. `@truth-architect` + `@integration-lead` sign off. |
| 125 | + |
| 126 | +Until then, the classification table above is the operating rule but |
| 127 | +the ADR lacks its quantitative anchor. |
| 128 | + |
| 129 | +## References |
| 130 | + |
| 131 | +- `crates/lance-graph-contract/src/cam.rs` `CodecRoute` + `route_tensor` |
| 132 | +- `crates/jc/src/jirak.rs` pillar 5 (current) |
| 133 | +- `crates/jc/examples/prove_it.rs` five-pillar harness |
| 134 | +- CLAUDE.md I-VSA-IDENTITIES, I-NOISE-FLOOR-JIRAK, I-SUBSTRATE-MARKOV |
| 135 | +- `.claude/board/EPIPHANIES.md` 2026-04-24 "I1 Codec Regime Split" |
| 136 | +- `.claude/board/TECH_DEBT.md` 2026-04-24 "jc Pillar 5b" |
| 137 | +- ADR 0001 Archetype Transcode + Stack Lock (parent ADR for stack decisions) |
0 commit comments