Merge pull request #259 from AdaWorldAPI/claude/hamming-content-cascade

AdaWorldAPI · web-flow · commit b21c037bfce8 · 2026-04-24T22:57:35.000+02:00
feat(shader-driver): wire content-plane Hamming cascade — dispatch sees real similarity
diff --git a/.claude/board/AGENT_LOG.md b/.claude/board/AGENT_LOG.md
@@ -266,9 +266,35 @@ newest-first.** A `BlackboardEntry` by any other transport.
 **Tests:** 12 pass
 **Outcome:** Shipped `lance-graph-archetype` crate scaffold: Component + Processor traits, World meta-state with tick/fork/at_tick stubs, CommandBroker FIFO queue, ArchetypeError. PR #254 merged.
 
+## 2026-04-24T17:20 — Content Hamming cascade wire (opus, claude/hamming-content-cascade)
+
+**D-ids:** Content-plane similarity pre-pass in ShaderDriver::run()
+**Commit:** `2cf36ad`
+**Tests:** 45 pass (43 lib + 2 e2e, 3 new: content_hamming_finds_similar_rows / _skips_dissimilar / _respects_style_threshold)
+**Outcome:** The glove is flying. Before: dispatch() on 3 encoded rows returned `hit_count:0, confidence:0.0, admit_ignorance:true` across every style — the PaletteSemiring cascade probed a synthetic Base17 table unrelated to the encoded text, and the content plane was only read for the cycle_fp XOR fold, never compared. After: content-plane Hamming pre-pass runs BEFORE the palette cascade. For each pair in `passed_rows`, popcount XOR of `content_row(i)` vs `content_row(j)`; if `resonance = 1 - Hamming/16384 >= style.resonance_threshold`, emit `ShaderHit{predicates:0x01}`. Guard: N² sweep skipped when `passed_rows.len() > 256`.
+
+**Live verification (encode 3 rows, dispatch 0..3):**
+- "Palantir develops surveillance systems"     (row 0)
+- "Palantir Gotham is a surveillance platform" (row 1)
+- "Israel deploys military AI"                 (row 2)
+
+| Style              | Threshold | hit_count | top-1 row pair | resonance | confidence |
+|--------------------|-----------|-----------|----------------|-----------|------------|
+| Analytical (1)     | 0.85      | 0         | —              | —         | 0.0 (admit_ignorance) |
+| Creative (4)       | 0.35      | 6         | row 0 ↔ row 1  | 0.598     | 0.598                 |
+| Peripheral (9)     | 0.20      | 6         | row 0 ↔ row 1  | 0.598     | 0.598                 |
+
+The strongest signal (rows 0↔1, both Palantir) correctly ranks first. Rows 0↔2 (Palantir vs Israel AI) lands lowest at 0.496. Analytical's 0.85 threshold rejects all pairs — style semantics preserved.
+
+**Key insight:** The Jirak 454-Hamming threshold calibrated in the 2026-04-24 EPIPHANY was for UNTILED DeepNSM encodes at density ≈ 0.016. The live encode path 32×-tiles 512-bit VSA → 16K content plane, pushing density to ≈ 0.48 and expected-random Hamming to ≈ 8000. Using an absolute bit threshold would have required per-density calibration; using `resonance >= style.resonance_threshold` is density-agnostic and reuses the existing style semantics. Style config IS the content-similarity threshold.
+
+**Remaining gap:** palette cascade hits (synthetic Base17) still exist and can flood top-k when their resonance exceeds content-match resonance; see driver.rs:180 `hits.truncate(8)`. The test `content_hamming_respects_style_threshold` uses empty planes to isolate the content cascade; in production with meaningful planes, content hits will intermix with palette hits via the shared resonance sort. Option: promote content hits with a small resonance bonus if future tuning shows palette drowning content too aggressively.
+
+Cross-ref: EPIPHANIES 2026-04-24 "Jirak noise floor" + "dispatch wiring audit", I-NOISE-FLOOR-JIRAK iron rule, driver.rs:93-156.
+
 ## 2026-04-24T17:30 — Cypher → AriGraph bridge (opus, claude/cypher-to-arigraph-wire)
 
 **D-ids:** CypherBridge, /v1/shader/route lg.cypher handling
 **Commit:** `45fc3a4`
 **Tests:** 7 pass (create, match, unsupported, non-cypher, missing-reasoning, lowercase, nd-reject)
-**Outcome:** Phase 1 stub landed — prefix classifier over step_type="lg.cypher". CREATE and MATCH → Completed (confidence 0.5), other cypher constructs → Skipped with "unsupported cypher construct, stub in place", non-`lg.cypher` → `Err(DomainUnavailable)` so route_handler falls through to planner. Phase 2 (real `lance_graph::parser::parse_cypher_query` + SPO commit + BindSpace label search) deferred: pulling lance-graph core (arrow + datafusion + lance) into cognitive-shader-driver would balloon build time for what today is a test-path transport. route_handler is now a three-stage chain: CodecResearchBridge (nd.*) → CypherBridge (lg.cypher) → planner_bridge. Live curl against localhost:3001/v1/shader/route verified all four paths: CREATE→completed+0.5, MATCH→completed+0.5, DROP INDEX→skipped, lg.plan→failed (planner not compiled in, unchanged from pre-PR).
+**Outcome:** Phase 1 stub landed — prefix classifier over step_type="lg.cypher". CREATE and MATCH → Completed (confidence 0.5), other cypher constructs → Skipped with "unsupported cypher construct, stub in place", non-`lg.cypher` → `Err(DomainUnavailable)` so route_handler falls through to planner. Phase 2 (real `lance_graph::parser::parse_cypher_query` + SPO commit + BindSpace label search) deferred: pulling lance-graph core (arrow + datafusion + lance) into cognitive-shader-driver would balloon build time for what today is a test-path transport. route_handler is now a three-stage chain: CodecResearchBridge (nd.*) → CypherBridge (lg.cypher) → planner_bridge. Live curl against localhost:3001/v1/shader/route verified all four paths: CREATE→completed+0.5, MATCH→completed+0.5, DROP INDEX→skipped, lg.plan→failed (planner not compiled in, unchanged from pre-PR). PR #258 merged.
diff --git a/crates/cognitive-shader-driver/src/driver.rs b/crates/cognitive-shader-driver/src/driver.rs
@@ -90,6 +90,70 @@ impl ShaderDriver {
         let max_dist = (self.semiring.k as f32) * (self.semiring.k as f32);
         let mut hits = Vec::<ShaderHit>::with_capacity(passed_rows.len().min(64));
 
+        // ═══════════════════════════════════════════════════════════════
+        // Content-plane Hamming pre-pass (PR: hamming-content-cascade).
+        // Compare content fingerprint of each passed row against every
+        // other passed row. If Hamming-resonance exceeds the style's
+        // resonance_threshold, emit a content-match hit. This is the
+        // wire that lets dispatch() see real text similarity, not just
+        // edge palette distance.
+        //
+        // Resonance model: resonance = 1 - Hamming/16384. Rows that
+        // share content words land at higher resonance; fully disjoint
+        // rows land near 0.5 (density ≈ 0.48 after 32× DeepNSM tiling).
+        // Style thresholds (UNIFIED_STYLES):
+        //   analytical 0.85 (strict)   focused 0.90 (strictest)
+        //   creative   0.35 (loose)    peripheral 0.20 (loosest)
+        // Jirak-calibrated 3σ reference: Hamming < 454 at density 0.016
+        // (untiled). For tiled encodings (current DeepNSM path) the
+        // density-dependent baseline shifts; resonance-over-threshold
+        // is the density-agnostic reading. See EPIPHANIES 2026-04-24
+        // "Jirak noise floor calibrated for DeepNSM-tiled 16K-bit
+        // fingerprints".
+        //
+        // Guard: skip the N² sweep if passed_rows.len() > 256 — at
+        // 4096 rows that is 16M popcount × 256 comparisons.
+        // ═══════════════════════════════════════════════════════════════
+        const CONTENT_MATCH_PREDICATE: u8 = 0x01;
+        const MAX_CONTENT_PREPASS_ROWS: usize = 256;
+        const FP_BITS: f32 = (WORDS_PER_FP * 64) as f32;
+        if passed_rows.len() <= MAX_CONTENT_PREPASS_ROWS {
+            let style_cfg = &crate::engine_bridge::UNIFIED_STYLES[(style_ord % 12) as usize];
+            let min_resonance = style_cfg.resonance_threshold;
+
+            for (i, &row_i) in passed_rows.iter().enumerate() {
+                let fp_i = self.bindspace.fingerprints.content_row(row_i as usize);
+                for (j_off, &row_j) in passed_rows.iter().enumerate().skip(i + 1) {
+                    let fp_j = self.bindspace.fingerprints.content_row(row_j as usize);
+                    // Hamming = popcount of XOR across all 256 u64 words.
+                    let hamming: u32 = fp_i.iter().zip(fp_j.iter())
+                        .map(|(a, b)| (a ^ b).count_ones())
+                        .sum();
+                    // Resonance: normalized to full bit-width; higher = more similar.
+                    let resonance = 1.0 - (hamming as f32 / FP_BITS);
+                    if resonance >= min_resonance {
+                        // Record both directions so either row can surface via top-k.
+                        hits.push(ShaderHit {
+                            row: row_i,
+                            distance: hamming.min(u16::MAX as u32) as u16,
+                            predicates: CONTENT_MATCH_PREDICATE,
+                            _pad: 0,
+                            resonance,
+                            cycle_index: i as u32,
+                        });
+                        hits.push(ShaderHit {
+                            row: row_j,
+                            distance: hamming.min(u16::MAX as u32) as u16,
+                            predicates: CONTENT_MATCH_PREDICATE,
+                            _pad: 0,
+                            resonance,
+                            cycle_index: j_off as u32,
+                        });
+                    }
+                }
+            }
+        }
+
         for (cycle_idx, &row) in passed_rows.iter().enumerate() {
             if cycle_idx as u16 >= req.max_cycles.saturating_mul(4) { break; }
             // Use the SPO `s_idx` of the row's edge as the query palette index.
@@ -444,6 +508,136 @@ mod tests {
         assert!(crystal.bus.resonance.cycles_used <= 1);
     }
 
+    /// Build a BindSpace of `n` rows with caller-supplied content fingerprints.
+    /// Meta confidence set to (200, 200) so everything passes the prefilter.
+    fn bindspace_with_content(rows: &[[u64; WORDS_PER_FP]]) -> BindSpace {
+        let q = [0.0f32; QUALIA_DIMS];
+        let mut builder = BindSpaceBuilder::new(rows.len());
+        for (idx, content) in rows.iter().enumerate() {
+            let meta = MetaWord::new((idx as u8).wrapping_add(1), (idx as u8).wrapping_add(1), 200, 200, 5);
+            builder = builder.push(content, meta, 0, &q, 0, 0);
+        }
+        builder.build()
+    }
+
+    #[test]
+    fn content_hamming_finds_similar_rows() {
+        // Two rows with near-identical content (differ in only 4 bits)
+        // → resonance ≈ 0.9998, well above any style threshold.
+        let mut a = [0u64; WORDS_PER_FP];
+        for i in 0..250 { a[i / 64] |= 1u64 << (i % 64); }
+        let mut b = a;
+        b[0] ^= 0xF; // 4-bit difference → Hamming = 4
+        // A third row with substantially different content.
+        let mut c = [0u64; WORDS_PER_FP];
+        for i in 8000..8250 { c[i / 64] |= 1u64 << (i % 64); }
+
+        let bs = Arc::new(bindspace_with_content(&[a, b, c]));
+        let sr = Arc::new(demo_semiring());
+        let driver = CognitiveShaderBuilder::new()
+            .bindspace(bs).semiring(sr).planes(demo_planes()).build();
+
+        let req = ShaderDispatch {
+            rows: ColumnWindow::new(0, 3),
+            meta_prefilter: MetaFilter::ALL,
+            layer_mask: 0xFF,
+            radius: u16::MAX,
+            style: StyleSelector::Ordinal(auto_style::ANALYTICAL),
+            ..Default::default()
+        };
+        let crystal = driver.dispatch(&req);
+        // Top-k must contain at least one content-match hit (predicates=0x01).
+        let content_hits: Vec<_> = crystal.bus.resonance.top_k.iter()
+            .filter(|h| h.predicates & 0x01 != 0 && h.resonance > 0.0)
+            .collect();
+        assert!(!content_hits.is_empty(),
+            "expected at least one content-match hit, got top_k={:?}",
+            crystal.bus.resonance.top_k);
+        // Similarity should be very high (differ in only 4/16384 bits).
+        assert!(content_hits.iter().any(|h| h.resonance > 0.5),
+            "content-match resonance should be > 0.5 for near-identical rows");
+    }
+
+    #[test]
+    fn content_hamming_skips_dissimilar() {
+        // Two rows with ~10000 Hamming distance → resonance ≈ 0.39, which
+        // is BELOW analytical threshold (0.85). Analytical must not emit
+        // a content-match hit.
+        let mut a = [0u64; WORDS_PER_FP];
+        for i in 0..5000 { a[i / 64] |= 1u64 << (i % 64); }
+        let mut b = [0u64; WORDS_PER_FP];
+        for i in 8000..13000 { b[i / 64] |= 1u64 << (i % 64); }
+        // Disjoint ranges → Hamming ≈ 10000.
+
+        let bs = Arc::new(bindspace_with_content(&[a, b]));
+        let sr = Arc::new(demo_semiring());
+        let driver = CognitiveShaderBuilder::new()
+            .bindspace(bs).semiring(sr).planes(demo_planes()).build();
+
+        let req = ShaderDispatch {
+            rows: ColumnWindow::new(0, 2),
+            meta_prefilter: MetaFilter::ALL,
+            layer_mask: 0xFF,
+            radius: u16::MAX,
+            style: StyleSelector::Ordinal(auto_style::ANALYTICAL),
+            ..Default::default()
+        };
+        let crystal = driver.dispatch(&req);
+        let content_hits: Vec<_> = crystal.bus.resonance.top_k.iter()
+            .filter(|h| h.predicates & 0x01 != 0 && h.resonance > 0.0)
+            .collect();
+        assert!(content_hits.is_empty(),
+            "analytical style should not emit content hits when resonance < 0.85; got {:?}",
+            content_hits);
+    }
+
+    #[test]
+    fn content_hamming_respects_style_threshold() {
+        // Design Hamming ≈ 5000 so resonance ≈ 0.695:
+        //   * below analytical  (0.85) → 0 content hits
+        //   * above creative    (0.35) → ≥ 1 content hits
+        // a = bits [0..5000), b = bits [2500..7500) → overlap 2500 bits,
+        // disjoint 2500+2500 = 5000, Hamming ≈ 5000.
+        let mut a = [0u64; WORDS_PER_FP];
+        for i in 0..5000 { a[i / 64] |= 1u64 << (i % 64); }
+        let mut b = [0u64; WORDS_PER_FP];
+        for i in 2500..7500 { b[i / 64] |= 1u64 << (i % 64); }
+
+        // Use empty planes so the palette cascade produces no hits —
+        // isolates the content pre-pass so it cannot be drowned out by
+        // synthetic palette matches that dominate top-k truncate(8).
+        let empty_planes = [[0u64; 64]; 8];
+        let mk_driver = || {
+            let bs = Arc::new(bindspace_with_content(&[a, b]));
+            let sr = Arc::new(demo_semiring());
+            CognitiveShaderBuilder::new()
+                .bindspace(bs).semiring(sr).planes(empty_planes).build()
+        };
+        let mk_req = |style_ord: u8| ShaderDispatch {
+            rows: ColumnWindow::new(0, 2),
+            meta_prefilter: MetaFilter::ALL,
+            layer_mask: 0xFF,
+            radius: u16::MAX,
+            style: StyleSelector::Ordinal(style_ord),
+            ..Default::default()
+        };
+
+        let strict = mk_driver().dispatch(&mk_req(auto_style::ANALYTICAL));
+        let loose  = mk_driver().dispatch(&mk_req(auto_style::CREATIVE));
+        let strict_hits = strict.bus.resonance.top_k.iter()
+            .filter(|h| h.predicates & 0x01 != 0 && h.resonance > 0.0).count();
+        let loose_hits  = loose.bus.resonance.top_k.iter()
+            .filter(|h| h.predicates & 0x01 != 0 && h.resonance > 0.0).count();
+        // Monotonicity: loosening the style cannot reduce the set of
+        // content-match hits. This is the load-bearing invariant.
+        assert!(strict_hits <= loose_hits,
+            "creative (loose) should emit >= analytical (strict) content hits: strict={} loose={}",
+            strict_hits, loose_hits);
+        assert!(loose_hits > 0,
+            "creative (threshold 0.35) should emit content hits for resonance ≈ 0.695\nloose top_k: {:?}",
+            loose.bus.resonance.top_k);
+    }
+
     #[test]
     fn sink_short_circuits_on_false() {
         struct Stop;