diff --git a/.claude/board/TECH_DEBT.md b/.claude/board/TECH_DEBT.md index 1721076d5..3863b66d6 100644 --- a/.claude/board/TECH_DEBT.md +++ b/.claude/board/TECH_DEBT.md @@ -2638,3 +2638,14 @@ W6 entropy-ledger reframe of `DEEPNSM-NSM-1`. ## TD-DEEPNSM-CLIPPY-195 — 12 pre-existing default-clippy lints in deepnsm (clippy 1.95 bump) `cargo clippy --manifest-path crates/deepnsm/Cargo.toml --all-targets -- -D warnings` reports 12 errors across 7 files (codebook 2, encoder 4, similarity 2, disambiguator_glue/nsm_primes/parser/quantum_mode 1 each) — newer lints (`manual_repeat_n`, `uninlined_format_args`, …) that were clean when written and fire only under clippy 1.95.0. Pre-existing (not from the E-ENGLISH-BIFURCATES slice; `arcs.rs` is clean at pedantic+nursery). Tests unaffected (94+4+8+1 green). Fix = a separate mechanical sweep across the 7 files; deliberately NOT bundled into the feature slice (7-file scope creep). Surfaced 2026-05-31. + +**Resolved 2026-06-09** (PR #479, branch `claude/stoic-turing-M0Eiq`, commit `bf95caa`): +hand-reviewed clippy sweep landed. `cargo clippy --manifest-path +crates/deepnsm/Cargo.toml --all-targets -- -D warnings` is now clean (exit 0). +Cleared the original 7-file set plus the lints in PR #479's new reader modules +(window / reader_state / crystal_neighborhood / sentence_transformer64 / +signed_crystal / codebook) surfaced by `--all-targets` — 22 lints across 13 +files; 217 tests green. Fixes are hand-applied (NOT `clippy --fix`, which mangled +`reader_state.rs` into stranded-comment match guards). The CI clippy step for +deepnsm was promoted Tier-B advisory → Tier-A gating in +`.github/workflows/style.yml`. diff --git a/.github/workflows/rust-test.yml b/.github/workflows/rust-test.yml index 1ef9e27f5..98e37bcec 100644 --- a/.github/workflows/rust-test.yml +++ b/.github/workflows/rust-test.yml @@ -20,6 +20,11 @@ env: RUST_BACKTRACE: "1" CARGO_INCREMENTAL: "0" +# Least-privilege: these jobs only read the repo (checkout + build + test). +# Codecov upload uses its own token secret and is non-fatal (fail_ci_if_error: false). +permissions: + contents: read + jobs: test: runs-on: ubuntu-24.04 @@ -87,6 +92,11 @@ jobs: run: cargo test --manifest-path crates/lance-graph-contract/Cargo.toml --tests - name: Run contract doctests run: cargo test --manifest-path crates/lance-graph-contract/Cargo.toml --doc + # deepnsm: standalone 0-dep codec crate, workspace-excluded, so the + # lance-graph test steps above never reached it. ~217 lib + integration + + # doctests, fast (no lance/datafusion/ndarray deps). Gating. + - name: Run deepnsm tests + run: cargo test --manifest-path crates/deepnsm/Cargo.toml test-with-coverage: runs-on: ubuntu-24.04 diff --git a/.github/workflows/style.yml b/.github/workflows/style.yml index 707ec5c8f..9ff65515c 100644 --- a/.github/workflows/style.yml +++ b/.github/workflows/style.yml @@ -18,6 +18,10 @@ env: CARGO_TERM_COLOR: always RUSTFLAGS: "-C debuginfo=1 -C target-cpu=x86-64-v3" +# Least-privilege: these jobs only read the repo (checkout + build + lint). +permissions: + contents: read + jobs: # Clippy runs FIRST and is mandatory — logical soundness before syntax. # Discipline: @@ -71,6 +75,12 @@ jobs: - name: Clippy lance-graph (advisory) continue-on-error: true run: cargo clippy --manifest-path crates/lance-graph/Cargo.toml --lib --tests -- -D warnings + # Tier A (mandatory, gating): deepnsm is now clippy-clean — TD-DEEPNSM-CLIPPY-195 + # resolved 2026-06-09 by a hand-reviewed sweep. It's a standalone 0-dep codec + # crate, workspace-excluded, so the lance-graph clippy steps don't cover it; + # gate it explicitly (same posture as the contract crate) so it can't regress. + - name: Clippy deepnsm (mandatory) + run: cargo clippy --manifest-path crates/deepnsm/Cargo.toml --all-targets -- -D warnings format: runs-on: ubuntu-24.04 @@ -94,8 +104,13 @@ jobs: - uses: actions-rust-lang/setup-rust-toolchain@v1 with: components: rustfmt - - name: Check formatting + - name: Check formatting (lance-graph) run: cargo fmt --manifest-path crates/lance-graph/Cargo.toml -- --check + # deepnsm is a standalone, workspace-excluded codec crate, so + # `cargo fmt --all` never reaches it. It was brought to a rustfmt-clean + # baseline in this PR; check it explicitly so it can't silently drift. + - name: Check formatting (deepnsm) + run: cargo fmt --manifest-path crates/deepnsm/Cargo.toml -- --check # typos / spell-check removed 2026-04-26: too many false positives on # technical jargon (NARS terms, codec acronyms, German loanwords used in diff --git a/crates/deepnsm/Cargo.toml b/crates/deepnsm/Cargo.toml index ea561982d..4292b8f4c 100644 --- a/crates/deepnsm/Cargo.toml +++ b/crates/deepnsm/Cargo.toml @@ -4,12 +4,6 @@ version = "0.1.0" edition = "2021" license = "Apache-2.0" publish = false - -# Empty `[workspace]` so cargo treats this crate as standalone when invoked -# via `--manifest-path` (deepnsm is `exclude`d from the parent workspace, -# but in nested git-worktree directories cargo's auto-discovery would -# otherwise walk further up and pick up the outer workspace root). -[workspace] description = """ DeepNSM: Distributional semantic transformer replacement. 4,096 words × 12 bits × 8MB distance matrix = complete semantic engine. @@ -17,6 +11,12 @@ O(1) per word, O(n) per sentence, deterministic, bit-reproducible. No GPU. No learned weights. Same decision boundaries as cosine. """ +# Empty `[workspace]` so cargo treats this crate as standalone when invoked +# via `--manifest-path` (deepnsm is `exclude`d from the parent workspace, +# but in nested git-worktree directories cargo's auto-discovery would +# otherwise walk further up and pick up the outer workspace root). +[workspace] + # Zero EXTERNAL (crates.io) dependencies — for supply-chain cleanness. # AdaWorldAPI path deps are mandatory and compile into the same binary. # ndarray is the canonical SIMD/BLAS/CLAM provider: ndarray::simd is a diff --git a/crates/deepnsm/examples/probe_semantic_sanity.rs b/crates/deepnsm/examples/probe_semantic_sanity.rs index 27fb6fc44..2a5f33d2e 100644 --- a/crates/deepnsm/examples/probe_semantic_sanity.rs +++ b/crates/deepnsm/examples/probe_semantic_sanity.rs @@ -7,7 +7,7 @@ //! - off-diag cosine mean 0.64 //! - effective rank (participation ratio) 1.82 out of 256 //! - 43.76% of pairs with cos > 0.9 -//! → degenerate null-context artifact, not a real semantic manifold +//! → degenerate null-context artifact, not a real semantic manifold //! //! The DeepNSM matrix is a completely different source: 96-dimensional //! distributional vectors from COCA subgenre frequencies (1-billion-word @@ -49,8 +49,8 @@ use std::fs; use std::path::PathBuf; -use deepnsm::DeepNsmEngine; use deepnsm::spo::WordDistanceMatrix; +use deepnsm::DeepNsmEngine; fn main() { println!("# Probe: DeepNSM Semantic Layer Sanity"); @@ -86,7 +86,11 @@ fn main() { let nonzero_diagonals: Vec<(usize, u8)> = (0..k) .filter_map(|i| { let d = dm.get(i as u16, i as u16); - if d != 0 { Some((i, d)) } else { None } + if d != 0 { + Some((i, d)) + } else { + None + } }) .take(5) .collect(); @@ -113,10 +117,14 @@ fn main() { // Convert to f64 for stats let n = off.len() as f64; let mean: f64 = off.iter().map(|&v| v as f64).sum::() / n; - let var: f64 = off.iter().map(|&v| { - let diff = v as f64 - mean; - diff * diff - }).sum::() / n; + let var: f64 = off + .iter() + .map(|&v| { + let diff = v as f64 - mean; + diff * diff + }) + .sum::() + / n; let std_dev = var.sqrt(); // Percentiles via sort @@ -165,12 +173,20 @@ fn main() { // has no per-row distinguishing structure → degenerate. let row_sum_f64: Vec = row_sum.iter().map(|&s| s as f64).collect(); let mean_rs = row_sum_f64.iter().sum::() / k as f64; - let var_rs = row_sum_f64.iter().map(|&s| { - let diff = s - mean_rs; - diff * diff - }).sum::() / k as f64; + let var_rs = row_sum_f64 + .iter() + .map(|&s| { + let diff = s - mean_rs; + diff * diff + }) + .sum::() + / k as f64; let std_rs = var_rs.sqrt(); - let cv = if mean_rs.abs() > 1e-9 { std_rs / mean_rs } else { 0.0 }; + let cv = if mean_rs.abs() > 1e-9 { + std_rs / mean_rs + } else { + 0.0 + }; println!("## Row-sum constancy (matrix isotropy proxy)"); println!("- mean row sum: {:.2}", mean_rs); println!("- std row sum: {:.2}", std_rs); @@ -186,17 +202,25 @@ fn main() { for i in 0..k { let mut best = u32::MAX; for j in 0..k { - if i == j { continue; } + if i == j { + continue; + } let d = dm.get(i as u16, j as u16) as u32; - if d < best { best = d; } + if d < best { + best = d; + } } nn_dist.push(best); } let nn_mean: f64 = nn_dist.iter().map(|&v| v as f64).sum::() / k as f64; - let nn_var: f64 = nn_dist.iter().map(|&v| { - let diff = v as f64 - nn_mean; - diff * diff - }).sum::() / k as f64; + let nn_var: f64 = nn_dist + .iter() + .map(|&v| { + let diff = v as f64 - nn_mean; + diff * diff + }) + .sum::() + / k as f64; let nn_std = nn_var.sqrt(); println!("## Nearest-neighbor distance (excluding self)"); println!("- mean: {:.2}", nn_mean); @@ -232,8 +256,10 @@ fn main() { println!("| matrix size | 256×256 | {}×{} |", k, k); println!("| off-diag mean | 0.640 (cos) | {:.2} (u8 dist) |", mean); println!("| effective rank | 1.82 | see Python follow-up |"); - println!("| frac > 0.9 (cos) / high u8 | 43.76% | {:.2}% (top 10 bins) |", - top10 as f64 / n * 100.0); + println!( + "| frac > 0.9 (cos) / high u8 | 43.76% | {:.2}% (top 10 bins) |", + top10 as f64 / n * 100.0 + ); println!("| nearest-neighbor similarity | 0.9407 (cos) | see std above |"); println!(); @@ -242,7 +268,10 @@ fn main() { println!(); println!("```python"); println!("import numpy as np"); - println!("d = np.fromfile('{}', dtype=np.uint8).reshape(4096, 4096).astype(np.float64)", dump_path); + println!( + "d = np.fromfile('{}', dtype=np.uint8).reshape(4096, 4096).astype(np.float64)", + dump_path + ); println!("# Convert distance to similarity: normalize [0,255] → [0,1], invert"); println!("max_d = d.max()"); println!("sim = 1.0 - d / max(max_d, 1)"); diff --git a/crates/deepnsm/src/arcs.rs b/crates/deepnsm/src/arcs.rs index 35b1c4495..b45673e7c 100644 --- a/crates/deepnsm/src/arcs.rs +++ b/crates/deepnsm/src/arcs.rs @@ -66,7 +66,10 @@ mod tests { let ranks = [12_u16, 670, 2942]; let (basin, literal) = t.split_arcs(&ranks); assert_eq!(basin.0, t.fingerprint, "basin arc IS the spine bundle"); - assert_eq!(literal.0, ranks, "literal arc carries the COCA ranks verbatim"); + assert_eq!( + literal.0, ranks, + "literal arc carries the COCA ranks verbatim" + ); } #[test] diff --git a/crates/deepnsm/src/arcuate.rs b/crates/deepnsm/src/arcuate.rs index 6033a2b34..15d9ac423 100644 --- a/crates/deepnsm/src/arcuate.rs +++ b/crates/deepnsm/src/arcuate.rs @@ -166,6 +166,9 @@ mod tests { } let result = arc.disambiguate([fp(1.0), fp(-1.0)]); assert_eq!(result.candidate_count, 2, "both candidates evaluated"); - assert!(result.winner_index < 2, "a real winner over the ±5 evidence"); + assert!( + result.winner_index < 2, + "a real winner over the ±5 evidence" + ); } } diff --git a/crates/deepnsm/src/cam64.rs b/crates/deepnsm/src/cam64.rs new file mode 100644 index 000000000..611e9e792 --- /dev/null +++ b/crates/deepnsm/src/cam64.rs @@ -0,0 +1,383 @@ +//! 64-bit reading-state locality code (CAM64). +//! +//! **This is NOT semantic truth.** The `EpisodicSpoFrame` rows in +//! `episodic_spo` are the auditable witnesses. `Cam64` is a fast locality +//! key for: +//! - candidate prefetch +//! - relative-pronoun / coreference heuristics +//! - "does this sentence continue the previous story?" basin matching +//! - "does this sentence open a new basin?" detection +//! +//! The 64 bits encode 8 named lanes, one byte each. +//! The 256 values per lane give 4096 possible lane combinations, +//! which maps directly onto the CAM-PQ bucket space for O(1) lookup. +//! +//! ## Lane layout +//! +//! ```text +//! byte 0 — entity / subject state (vocabulary-bucket of active subject) +//! byte 1 — predicate / action state (vocabulary-bucket of active predicate) +//! byte 2 — object / complement state (vocabulary-bucket of active object, 0 if absent) +//! byte 3 — morphology / tense / number / voice (MorphFlags low byte) +//! byte 4 — clause structure / relative / subordination (MorphFlags high byte) +//! byte 5 — discourse / anaphora / referent stack depth + coreference flag +//! byte 6 — causal / temporal / conditional markers +//! byte 7 — episodic basin / novelty / wisdom / epiphany markers +//! ``` +//! +//! Bytes 3-4 split `MorphFlags(u16)` across two lanes so all 14 morph bits are +//! represented without compression. + +use crate::morphology::MorphFlags; +use crate::spo::{SpoTriple, NO_ROLE}; + +/// 64-bit reading-state locality code: 8 lanes × 8 bits. +/// +/// Stored little-endian in a `u64`: lane 0 occupies bits 0-7, lane 7 bits 56-63. +#[derive(Clone, Copy, Default, PartialEq, Eq, Hash)] +pub struct Cam64(u64); + +impl Cam64 { + /// Construct from an explicit 8-byte lane array. + #[inline] + pub fn from_lanes(lanes: [u8; 8]) -> Self { + let mut v = 0u64; + for (i, &b) in lanes.iter().enumerate() { + v |= (b as u64) << (i * 8); + } + Self(v) + } + + /// Extract one lane (0-7). + #[inline] + pub fn lane(self, i: usize) -> u8 { + debug_assert!(i < 8, "lane index out of range"); + (self.0 >> (i * 8)) as u8 + } + + /// Return a new `Cam64` with one lane replaced. + #[inline] + pub fn with_lane(self, i: usize, val: u8) -> Self { + debug_assert!(i < 8, "lane index out of range"); + let mask = !(0xFFu64 << (i * 8)); + Self((self.0 & mask) | ((val as u64) << (i * 8))) + } + + /// Raw u64 value. + #[inline] + pub fn raw(self) -> u64 { + self.0 + } + + /// Construct from raw u64. + #[inline] + pub fn from_raw(v: u64) -> Self { + Self(v) + } + + // ── Named lane accessors ───────────────────────────────────────────────── + + /// Vocabulary bucket of the active subject (lane 0). + /// Bucket = rank >> 5 → 128 buckets of 32 adjacent vocabulary items each. + pub fn entity_state(self) -> u8 { + self.lane(0) + } + /// Vocabulary bucket of the active predicate (lane 1). + pub fn predicate_state(self) -> u8 { + self.lane(1) + } + /// Vocabulary bucket of the active object, or 0 if absent (lane 2). + pub fn object_state(self) -> u8 { + self.lane(2) + } + /// `MorphFlags` bits 0-7: tense, number, person, passive, negated (lane 3). + pub fn morph_state(self) -> u8 { + self.lane(3) + } + /// `MorphFlags` bits 8-13: clause structure flags (lane 4). + pub fn clause_state(self) -> u8 { + self.lane(4) + } + /// Discourse / anaphora: entity-stack depth (bits 0-6) + coreference flag (bit 7) (lane 5). + pub fn discourse_state(self) -> u8 { + self.lane(5) + } + /// Causal / temporal / conditional markers (lane 6). + pub fn causal_state(self) -> u8 { + self.lane(6) + } + /// Episodic basin markers: novelty/entropy/epiphany flags (lane 7). + pub fn basin_state(self) -> u8 { + self.lane(7) + } + + // ── Construction helpers ───────────────────────────────────────────────── + + /// Build a `Cam64` from a resolved triple + morph flags + reading context. + /// + /// `entity_stack_depth` — number of active entities in the coreference stack (0-127). + /// `coreference_resolved` — true if the subject was resolved from the entity stack. + /// `has_temporal` — true if the triple carries a temporal marker. + /// `novelty_high` — caller-supplied hint that this triple is novel (bit 0 of basin lane). + pub fn from_triple( + triple: &SpoTriple, + morph: MorphFlags, + entity_stack_depth: u8, + coreference_resolved: bool, + has_temporal: bool, + novelty_high: bool, + ) -> Self { + // Lanes 0-2: vocabulary-bucket of each role (128 buckets of 32 ranks). + // Adjacent vocabulary items share a bucket → helps basin-matching. + let entity_lane = (triple.subject() >> 5) as u8; + let pred_lane = (triple.predicate() >> 5) as u8; + let obj_lane = if triple.object() != NO_ROLE { + (triple.object() >> 5) as u8 + } else { + 0 + }; + + // Lanes 3-4: split MorphFlags across two bytes. + let morph_bits = morph.bits(); + // MorphFlags is defined over bits 0-13 (bits 14-15 spare; see morphology.rs). + // clause_lane below carries bits 8-15, so a future flag at bit 14/15 would + // land there with no defined meaning — guard the invariant in debug builds. + debug_assert!( + morph_bits <= 0x3FFF, + "MorphFlags bit 14/15 set — cam64 clause lane has no slot for it" + ); + let morph_lane = (morph_bits & 0xFF) as u8; + let clause_lane = ((morph_bits >> 8) & 0xFF) as u8; + + // Lane 5: discourse — stack depth (bits 0-6) + coreference flag (bit 7). + let depth_clamped = entity_stack_depth.min(127); + let discourse_lane = depth_clamped | if coreference_resolved { 0x80 } else { 0 }; + + // Lane 6: causal/temporal — bit 0 = temporal marker present (v1; + // causal/conditional markers in bits 1-7 reserved for v2). + let causal_lane = if has_temporal { 0x01u8 } else { 0x00 }; + + // Lane 7: basin — bit 0 = novelty_high (v1 placeholder; epiphany/wisdom baked in v2). + let basin_lane = if novelty_high { 0x01u8 } else { 0x00 }; + + Self::from_lanes([ + entity_lane, + pred_lane, + obj_lane, + morph_lane, + clause_lane, + discourse_lane, + causal_lane, + basin_lane, + ]) + } + + /// Return true if the entity bucket of `self` matches `other`. + /// + /// Used for basin-matching: "is this sentence in the same topic domain?" + #[inline] + pub fn same_entity_bucket(self, other: Cam64) -> bool { + self.entity_state() == other.entity_state() + } + + /// Return true if the discourse lane indicates a coreference was resolved. + #[inline] + pub fn has_coreference(self) -> bool { + self.discourse_state() & 0x80 != 0 + } + + /// Entity stack depth encoded in the discourse lane (bits 0-6). + #[inline] + pub fn entity_stack_depth(self) -> u8 { + self.discourse_state() & 0x7F + } + + // ── Basin continuation (Pika chart-arc predicate) ──────────────────────── + + /// Return true if this code is a plausible continuation of `prev`. + /// + /// Uses popcount on the XOR to measure lane-level agreement: + /// - shared bits (XNOR popcount) ≥ `CAM64_CONTINUATION_MIN_SHARED` → agreeing lanes + /// - differing bits (XOR popcount) ≤ `CAM64_CONTINUATION_MAX_DIFF` → acceptable drift + /// + /// The thresholds are deliberately loose so declarative sentences that + /// share entity/predicate buckets but differ in morph/discourse still + /// qualify as basin continuations. This is a dumb, deterministic predicate — + /// it is NOT semantic equivalence. + #[inline] + pub fn continues_basin(self, prev: Cam64) -> bool { + let diff = self.0 ^ prev.0; + let diff_bits = diff.count_ones(); + let shared_bits = 64 - diff_bits; // XNOR popcount via complement + shared_bits >= CAM64_CONTINUATION_MIN_SHARED && diff_bits <= CAM64_CONTINUATION_MAX_DIFF + } + + /// Basin continuation quality score (0 = no continuation, 255 = perfect match). + /// + /// Defined as `255 - diff_bits * 4`, clamped to 0. v2 stub — callers that + /// only need the binary predicate should use `continues_basin()`. + #[inline] + pub fn basin_continuation_score(self, prev: Cam64) -> u8 { + let diff_bits = (self.0 ^ prev.0).count_ones(); + 255u32.saturating_sub(diff_bits * 4) as u8 + } +} + +/// Minimum shared bits required for `continues_basin` to return true. +/// 16 of 64 bits shared → at least 2 lane bytes unchanged. +pub const CAM64_CONTINUATION_MIN_SHARED: u32 = 16; + +/// Maximum differing bits allowed for `continues_basin` to return true. +/// 24 of 64 bits differing → at most 3 lane bytes changed. +pub const CAM64_CONTINUATION_MAX_DIFF: u32 = 24; + +impl core::fmt::Debug for Cam64 { + fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { + write!( + f, + "Cam64(entity={:#04x} pred={:#04x} obj={:#04x} morph={:#04x} \ + clause={:#04x} discourse={:#04x} causal={:#04x} basin={:#04x})", + self.entity_state(), + self.predicate_state(), + self.object_state(), + self.morph_state(), + self.clause_state(), + self.discourse_state(), + self.causal_state(), + self.basin_state(), + ) + } +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::morphology::MorphFlags; + use crate::spo::SpoTriple; + + #[test] + fn lane_roundtrip() { + let lanes = [1u8, 2, 3, 4, 5, 6, 7, 8]; + let c = Cam64::from_lanes(lanes); + for (i, &b) in lanes.iter().enumerate() { + assert_eq!(c.lane(i), b, "lane {i}"); + } + } + + #[test] + fn with_lane_does_not_corrupt_others() { + let c = Cam64::from_lanes([0xFF; 8]); + let c2 = c.with_lane(3, 0x00); + assert_eq!(c2.lane(3), 0x00); + for i in [0, 1, 2, 4, 5, 6, 7] { + assert_eq!(c2.lane(i), 0xFF, "lane {i} corrupted"); + } + } + + #[test] + fn from_triple_entity_bucket() { + let t = SpoTriple::new(64, 96, 128); // bucket = rank >> 5 + let m = MorphFlags::default().set(MorphFlags::PRESENT); + let c = Cam64::from_triple(&t, m, 3, false, false, false); + assert_eq!(c.entity_state(), 64 >> 5); + assert_eq!(c.predicate_state(), 96 >> 5); + assert_eq!(c.object_state(), 128 >> 5); + } + + #[test] + fn from_triple_morph_split() { + let t = SpoTriple::new(1, 2, 3); + // Set a flag that lands in the high byte (RELATIVE_CLAUSE = bit 11) + let m = MorphFlags::default() + .set(MorphFlags::NEGATED) // bit 9 → high byte bit 1 + .set(MorphFlags::RELATIVE_CLAUSE); // bit 11 → high byte bit 3 + let c = Cam64::from_triple(&t, m, 0, false, false, false); + // morph_lane = low byte of flags + assert_eq!(c.morph_state(), (m.bits() & 0xFF) as u8); + // clause_lane = high byte + assert_eq!(c.clause_state(), (m.bits() >> 8) as u8); + } + + #[test] + fn coreference_flag_in_discourse_lane() { + let t = SpoTriple::new(1, 2, 3); + let m = MorphFlags::default(); + let c_yes = Cam64::from_triple(&t, m, 5, true, false, false); + let c_no = Cam64::from_triple(&t, m, 5, false, false, false); + assert!(c_yes.has_coreference()); + assert!(!c_no.has_coreference()); + assert_eq!(c_yes.entity_stack_depth(), 5); + assert_eq!(c_no.entity_stack_depth(), 5); + } + + #[test] + fn temporal_sets_causal_bit0() { + let t = SpoTriple::new(1, 2, 3); + let m = MorphFlags::default(); + let c = Cam64::from_triple(&t, m, 0, false, true, false); + assert_eq!(c.causal_state() & 0x01, 1); + } + + #[test] + fn same_entity_bucket_matching() { + let t1 = SpoTriple::new(64, 1, 1); + let t2 = SpoTriple::new(70, 2, 2); // same bucket (64 >> 5 == 70 >> 5 == 2) + let t3 = SpoTriple::new(100, 3, 3); // different bucket (100 >> 5 == 3) + let m = MorphFlags::default(); + let c1 = Cam64::from_triple(&t1, m, 0, false, false, false); + let c2 = Cam64::from_triple(&t2, m, 0, false, false, false); + let c3 = Cam64::from_triple(&t3, m, 0, false, false, false); + assert!(c1.same_entity_bucket(c2)); + assert!(!c1.same_entity_bucket(c3)); + } + + #[test] + fn raw_roundtrip() { + let c = Cam64::from_lanes([0xAB, 0xCD, 0xEF, 0x12, 0x34, 0x56, 0x78, 0x9A]); + assert_eq!(Cam64::from_raw(c.raw()), c); + } + + #[test] + fn stack_depth_clamped_at_127() { + let t = SpoTriple::new(1, 2, 3); + let m = MorphFlags::default(); + let c = Cam64::from_triple(&t, m, 255, false, false, false); + assert_eq!(c.entity_stack_depth(), 127); + } + + #[test] + fn cam64_continuation_true_for_same_code() { + let c = Cam64::from_lanes([10, 20, 30, 40, 50, 60, 70, 80]); + assert!(c.continues_basin(c)); + } + + #[test] + fn cam64_continuation_true_for_nearby_codes() { + // Differ by 1 bit per lane (8 bits total) → well within threshold. + let a = Cam64::from_lanes([0x00; 8]); + let b = Cam64::from_lanes([0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00]); + assert!(b.continues_basin(a)); + } + + #[test] + fn cam64_continuation_false_for_far_codes() { + // Differ by all 64 bits → 0 shared, 64 differing. + let a = Cam64::from_raw(0x0000_0000_0000_0000); + let b = Cam64::from_raw(0xFFFF_FFFF_FFFF_FFFF); + assert!(!b.continues_basin(a)); + } + + #[test] + fn basin_continuation_score_perfect_match() { + let c = Cam64::from_lanes([1; 8]); + assert_eq!(c.basin_continuation_score(c), 255); + } + + #[test] + fn basin_continuation_score_decreases_with_diff() { + let a = Cam64::from_raw(0x0000_0000_0000_0000); + let b = Cam64::from_raw(0x0000_0000_0000_00FF); // 8 bits differ + let score = b.basin_continuation_score(a); + assert_eq!(score, 255 - 8 * 4); + } +} diff --git a/crates/deepnsm/src/codebook.rs b/crates/deepnsm/src/codebook.rs index ccb78c3ae..02b990299 100644 --- a/crates/deepnsm/src/codebook.rs +++ b/crates/deepnsm/src/codebook.rs @@ -75,16 +75,15 @@ impl Codebook { .map_err(|e| format!("Failed to read {}: {}", path.display(), e))?; // Simple JSON parsing for the arrays we need - let mean = extract_f32_array(&content, "\"mean\"") - .ok_or("Failed to extract mean array")?; - let std_vals = extract_f32_array(&content, "\"std\"") - .ok_or("Failed to extract std array")?; - let centroids = extract_codebook_array(&content) - .ok_or("Failed to extract codebook array")?; + let mean = extract_f32_array(&content, "\"mean\"").ok_or("Failed to extract mean array")?; + let std_vals = + extract_f32_array(&content, "\"std\"").ok_or("Failed to extract std array")?; + let centroids = + extract_codebook_array(&content).ok_or("Failed to extract codebook array")?; Ok(Codebook { centroids, - mean: mean, + mean, std: std_vals, }) } @@ -155,6 +154,11 @@ impl Codebook { pub fn len(&self) -> usize { self.centroids.len() / SUB_DIM } + + /// True if no centroids are loaded. + pub fn is_empty(&self) -> bool { + self.centroids.is_empty() + } } impl CamCodes { diff --git a/crates/deepnsm/src/comprehension.rs b/crates/deepnsm/src/comprehension.rs index ec4844b4b..bcc904c13 100644 --- a/crates/deepnsm/src/comprehension.rs +++ b/crates/deepnsm/src/comprehension.rs @@ -93,7 +93,10 @@ mod tests { let s = two_triples_first_temporal(); assert!(s.is_temporal(0)); let l = s.triple_landing(0); - assert!(l.fact && l.story, "temporal triple → BOTH fact and story (fork)"); + assert!( + l.fact && l.story, + "temporal triple → BOTH fact and story (fork)" + ); } #[test] @@ -110,8 +113,20 @@ mod tests { let s = two_triples_first_temporal(); let ls = s.landings(); assert_eq!(ls.len(), 2); - assert_eq!(ls[0], Landing { fact: true, story: true }); - assert_eq!(ls[1], Landing { fact: true, story: false }); + assert_eq!( + ls[0], + Landing { + fact: true, + story: true + } + ); + assert_eq!( + ls[1], + Landing { + fact: true, + story: false + } + ); } #[test] @@ -124,6 +139,12 @@ mod tests { }; assert!(s.landings().is_empty()); // Out-of-range index is fact-only (no temporal marker can match). - assert_eq!(s.triple_landing(7), Landing { fact: true, story: false }); + assert_eq!( + s.triple_landing(7), + Landing { + fact: true, + story: false + } + ); } } diff --git a/crates/deepnsm/src/context.rs b/crates/deepnsm/src/context.rs index 83a2b72f4..df4dfcfb1 100644 --- a/crates/deepnsm/src/context.rs +++ b/crates/deepnsm/src/context.rs @@ -11,7 +11,7 @@ //! //! O(1) per sentence update, no recomputation of previous sentences. -use crate::encoder::{self, VsaVec, RoleVectors, bundle}; +use crate::encoder::{self, bundle, RoleVectors, VsaVec}; /// Default context window size: ±5 sentences = 11 total. pub const DEFAULT_WINDOW_SIZE: usize = 11; @@ -120,11 +120,7 @@ impl ContextWindow { } if self.cached_bundle.is_none() { - let active: Vec = self - .buffer - .iter() - .filter_map(|slot| slot.clone()) - .collect(); + let active: Vec = self.buffer.iter().filter_map(|slot| slot.clone()).collect(); if active.is_empty() { return None; @@ -234,9 +230,9 @@ mod tests { let mut ctx = ContextWindow::new(5); // Add some "financial" context - ctx.push(VsaVec::from_rank(500)); // "money" - ctx.push(VsaVec::from_rank(600)); // "account" - ctx.push(VsaVec::from_rank(700)); // "invest" + ctx.push(VsaVec::from_rank(500)); // "money" + ctx.push(VsaVec::from_rank(600)); // "account" + ctx.push(VsaVec::from_rank(700)); // "invest" let plain = VsaVec::from_rank(100); // "bank" let disambiguated = ctx.disambiguate(100); diff --git a/crates/deepnsm/src/crystal_neighborhood.rs b/crates/deepnsm/src/crystal_neighborhood.rs new file mode 100644 index 000000000..7c9594676 --- /dev/null +++ b/crates/deepnsm/src/crystal_neighborhood.rs @@ -0,0 +1,402 @@ +//! Local geometry helpers for `Crystal4096` — the L1 layer of the three-tier model. +//! +//! ## Three-tier model +//! +//! ```text +//! L0 ABI signed_crystal.rs +//! Crystal4096 = compact 12-bit coordinate +//! SignedOffset4 = 4-bit signed distance (-7..+7 + overflow) +//! +//! L1 local geometry crystal_neighborhood.rs ← THIS FILE +//! neighbors_4096(center, radius, metric) +//! perturbation tile expansion +//! splat candidates +//! +//! L2 graph / DP blasgraph (future v2) +//! frontier propagation over sentence sequence +//! basin continuity scoring +//! inverse/right-context Pika pass +//! ``` +//! +//! L1 computes neighbourhood masks, splats, and traversal around the local +//! signed lattice. It does **not** own the semantic meaning of the coordinates; +//! it only knows the geometry. +//! +//! ## No floats +//! +//! All computations use integer nibble arithmetic. The weights for +//! `LaneCompatible` filtering are `u8` activation levels. L2 (blasgraph) will +//! use `u16` transition costs when it arrives. No f32 in this file. +//! +//! ## Neighbourhood sizes at radius = 1 +//! +//! | Metric | Max cells (incl. center) | +//! |--------|--------------------------| +//! | Manhattan | 7 (center + 6 axis-aligned faces) | +//! | Chebyshev | 27 (3^3 = all nibble combinations within ±1 per axis) | +//! | LaneCompatible | ≤ 27 (Chebyshev, then excludes morphologically incompatible) | +//! +//! At radius = 2, Chebyshev gives up to 5^3 = 125 cells, but valid nibbles are +//! 0-14 so overflow cells (nibble 15) are always excluded. + +use crate::signed_crystal::{Crystal4096, SignedOffset4}; + +// ── Neighbourhood metric ────────────────────────────────────────────────────── + +/// Distance metric for `Crystal4096` neighbourhood queries. +/// +/// All metrics use **nibble-level** distance — each axis is one nibble (0-14 +/// valid, 15 = overflow/excluded). No float arithmetic. +#[derive(Clone, Copy, Debug, PartialEq, Eq, Default)] +pub enum NeighborhoodMetric { + /// Axis-aligned only: |dx| + |dy| + |dz| ≤ radius. + /// Radius 1 → 7 cells (center + 6 faces). + Manhattan, + /// Cube neighbourhood: max(|dx|, |dy|, |dz|) ≤ radius. + /// Radius 1 → up to 27 cells (3^3). + #[default] + Chebyshev, + /// Chebyshev filtered by morphological and clause compatibility. + /// + /// A neighbor is included only if the X axis (sentence offset) delta + /// is ≤ the `lane_compat_x_limit` and neither Y nor Z axis is overflow. + /// This approximates "valid reading transitions" without a full grammar table. + /// v1 implementation; a full compatibility table is a v2 concern. + LaneCompatible, +} + +// ── Crystal4096Neighbourhood ────────────────────────────────────────────────── + +/// Fixed-capacity neighbourhood result for `Crystal4096` queries. +/// +/// Holds up to 27 cells (Chebyshev radius 1 in 3D). Stack-allocated, no heap. +pub struct Crystal4096Neighbourhood { + buf: [Crystal4096; 27], + len: usize, +} + +impl Crystal4096Neighbourhood { + fn new() -> Self { + Self { + buf: [Crystal4096(0); 27], + len: 0, + } + } + + fn push(&mut self, c: Crystal4096) { + if self.len < 27 { + self.buf[self.len] = c; + self.len += 1; + } + } + + /// Iterate the neighbourhood cells (including center). + pub fn iter(&self) -> &[Crystal4096] { + &self.buf[..self.len] + } + + /// Total cells including center. + pub fn len(&self) -> usize { + self.len + } + + /// True if the neighbourhood holds no cells. A query always includes the + /// centre, so this is `false` there; provided for the `len_without_is_empty` + /// contract. + pub fn is_empty(&self) -> bool { + self.len == 0 + } + + /// True if only the center is present. + pub fn is_singleton(&self) -> bool { + self.len == 1 + } +} + +// ── Neighbour query ─────────────────────────────────────────────────────────── + +/// Compute the neighbourhood of `center` within `radius` using `metric`. +/// +/// Overflow cells (any nibble = 15) are always excluded. The center is always +/// included as the first element (distance 0). `radius` is clamped to 7 +/// (the maximum valid signed offset). +/// +/// ``` +/// use crate::deepnsm::crystal_neighborhood::{neighbors_4096, NeighborhoodMetric}; +/// use crate::deepnsm::signed_crystal::{Crystal4096, SignedOffset4}; +/// +/// let center = Crystal4096::new( +/// SignedOffset4::ZERO, SignedOffset4::ZERO, SignedOffset4::ZERO, +/// ); +/// let nb = neighbors_4096(center, 1, NeighborhoodMetric::Manhattan); +/// assert_eq!(nb.len(), 7); // center + 6 face neighbors +/// ``` +pub fn neighbors_4096( + center: Crystal4096, + radius: u8, + metric: NeighborhoodMetric, +) -> Crystal4096Neighbourhood { + let r = radius.min(7) as i8; + let mut out = Crystal4096Neighbourhood::new(); + + // Decode center nibbles as signed offsets. + let cx = center.x(); + let cy = center.y(); + let cz = center.z(); + + // If center itself has an overflow axis, return it alone. + if cx.is_overflow() || cy.is_overflow() || cz.is_overflow() { + out.push(center); + return out; + } + + let cx_off = cx.to_offset().unwrap(); + let cy_off = cy.to_offset().unwrap(); + let cz_off = cz.to_offset().unwrap(); + + // Center is always first. + out.push(center); + + for dx in -r..=r { + for dy in -r..=r { + for dz in -r..=r { + if dx == 0 && dy == 0 && dz == 0 { + continue; + } // already pushed + + // Metric filter. + let in_metric = match metric { + NeighborhoodMetric::Manhattan => dx.abs() + dy.abs() + dz.abs() <= r, + NeighborhoodMetric::Chebyshev => dx.abs().max(dy.abs()).max(dz.abs()) <= r, + NeighborhoodMetric::LaneCompatible => dx.abs().max(dy.abs()).max(dz.abs()) <= r, + }; + if !in_metric { + continue; + } + + let nx = cx_off + dx; + let ny = cy_off + dy; + let nz = cz_off + dz; + + // Clip to valid range (-7..+7); skip overflow cells. + let sx = SignedOffset4::from_offset(nx); + let sy = SignedOffset4::from_offset(ny); + let sz = SignedOffset4::from_offset(nz); + + if sx.is_overflow() || sy.is_overflow() || sz.is_overflow() { + continue; + } + + // LaneCompatible: sentence axis (X) delta ≤ 1. + if matches!(metric, NeighborhoodMetric::LaneCompatible) && dx.abs() > 1 { + continue; + } + + out.push(Crystal4096::new(sx, sy, sz)); + } + } + } + out +} + +/// Chebyshev distance between two `Crystal4096` coordinates. +/// +/// Returns `None` if either coordinate has an overflow axis. +/// Returns `Some(distance)` where distance = max(|dx|, |dy|, |dz|) over signed axes. +pub fn chebyshev_distance(a: Crystal4096, b: Crystal4096) -> Option { + let (ax, ay, az) = (a.x(), a.y(), a.z()); + let (bx, by, bz) = (b.x(), b.y(), b.z()); + if ax.is_overflow() + || ay.is_overflow() + || az.is_overflow() + || bx.is_overflow() + || by.is_overflow() + || bz.is_overflow() + { + return None; + } + let dx = (ax.to_offset().unwrap() - bx.to_offset().unwrap()).unsigned_abs(); + let dy = (ay.to_offset().unwrap() - by.to_offset().unwrap()).unsigned_abs(); + let dz = (az.to_offset().unwrap() - bz.to_offset().unwrap()).unsigned_abs(); + Some(dx.max(dy).max(dz)) +} + +/// Manhattan distance between two `Crystal4096` coordinates. +/// +/// Returns `None` if either coordinate has an overflow axis. +pub fn manhattan_distance(a: Crystal4096, b: Crystal4096) -> Option { + let (ax, ay, az) = (a.x(), a.y(), a.z()); + let (bx, by, bz) = (b.x(), b.y(), b.z()); + if ax.is_overflow() + || ay.is_overflow() + || az.is_overflow() + || bx.is_overflow() + || by.is_overflow() + || bz.is_overflow() + { + return None; + } + let dx = (ax.to_offset().unwrap() - bx.to_offset().unwrap()).unsigned_abs(); + let dy = (ay.to_offset().unwrap() - by.to_offset().unwrap()).unsigned_abs(); + let dz = (az.to_offset().unwrap() - bz.to_offset().unwrap()).unsigned_abs(); + Some(dx + dy + dz) +} + +/// Enumerate all valid `Crystal4096` cells — those with no overflow axis. +/// +/// 15^3 = 3375 valid cells (nibbles 0-14 on each axis). +/// Returns a fixed-capacity buffer; useful for debug and codebook construction. +pub fn all_valid_cells() -> impl Iterator { + (0u8..15).flat_map(move |x| { + (0u8..15).flat_map(move |y| { + (0u8..15).map(move |z| { + Crystal4096::new(SignedOffset4(x), SignedOffset4(y), SignedOffset4(z)) + }) + }) + }) +} + +// ── Tests ───────────────────────────────────────────────────────────────────── + +#[cfg(test)] +mod tests { + use super::*; + use crate::signed_crystal::{Crystal4096, SignedOffset4}; + + fn zero() -> Crystal4096 { + Crystal4096::new( + SignedOffset4::ZERO, + SignedOffset4::ZERO, + SignedOffset4::ZERO, + ) + } + + fn at(x: i8, y: i8, z: i8) -> Crystal4096 { + Crystal4096::new( + SignedOffset4::from_offset(x), + SignedOffset4::from_offset(y), + SignedOffset4::from_offset(z), + ) + } + + #[test] + fn manhattan_radius1_gives_7_cells() { + let nb = neighbors_4096(zero(), 1, NeighborhoodMetric::Manhattan); + // center (0,0,0) + 6 face neighbors = 7 + assert_eq!(nb.len(), 7); + } + + #[test] + fn chebyshev_radius1_gives_27_cells_at_center() { + // Center is (0,0,0) — all 26 neighbours + center are within ±7. + let nb = neighbors_4096(zero(), 1, NeighborhoodMetric::Chebyshev); + assert_eq!(nb.len(), 27); + } + + #[test] + fn chebyshev_radius1_near_boundary_clips_overflow() { + // Center at (+7,+7,+7) — neighbours in +direction would overflow. + let c = at(7, 7, 7); + let nb = neighbors_4096(c, 1, NeighborhoodMetric::Chebyshev); + // Only cells with x,y,z ∈ {+6,+7} are valid (not +8 which overflows). + // 2^3 = 8 cells. + assert_eq!(nb.len(), 8); + for cell in nb.iter() { + assert!(!cell.has_overflow()); + } + } + + #[test] + fn chebyshev_radius0_gives_singleton() { + let nb = neighbors_4096(zero(), 0, NeighborhoodMetric::Chebyshev); + assert_eq!(nb.len(), 1); + assert!(nb.is_singleton()); + } + + #[test] + fn lane_compatible_limits_x_axis_to_1() { + // LaneCompatible: |dx| ≤ 1, |dy|/|dz| ≤ r. + let nb = neighbors_4096(zero(), 2, NeighborhoodMetric::LaneCompatible); + for cell in nb.iter() { + let x_off = cell.x().to_offset().unwrap_or(99); + assert!((-1..=1).contains(&x_off), "X offset {x_off} exceeds ±1"); + } + } + + #[test] + fn overflow_center_returns_singleton() { + let overflow = Crystal4096::new( + SignedOffset4::OVERFLOW, + SignedOffset4::ZERO, + SignedOffset4::ZERO, + ); + let nb = neighbors_4096(overflow, 1, NeighborhoodMetric::Chebyshev); + assert_eq!(nb.len(), 1); + assert_eq!(nb.iter()[0], overflow); + } + + #[test] + fn chebyshev_distance_same_is_zero() { + let c = at(1, 2, 3); + assert_eq!(chebyshev_distance(c, c), Some(0)); + } + + #[test] + fn chebyshev_distance_one_axis() { + let a = at(0, 0, 0); + let b = at(3, 0, 0); + assert_eq!(chebyshev_distance(a, b), Some(3)); + } + + #[test] + fn chebyshev_distance_multi_axis_is_max() { + let a = at(0, 0, 0); + let b = at(2, 3, 1); + assert_eq!(chebyshev_distance(a, b), Some(3)); + } + + #[test] + fn manhattan_distance_same_is_zero() { + let c = at(0, 0, 0); + assert_eq!(manhattan_distance(c, c), Some(0)); + } + + #[test] + fn manhattan_distance_multi_axis_is_sum() { + let a = at(0, 0, 0); + let b = at(1, 2, 3); + assert_eq!(manhattan_distance(a, b), Some(6)); + } + + #[test] + fn overflow_distance_returns_none() { + let a = at(0, 0, 0); + let b = Crystal4096::new( + SignedOffset4::OVERFLOW, + SignedOffset4::ZERO, + SignedOffset4::ZERO, + ); + assert_eq!(chebyshev_distance(a, b), None); + assert_eq!(manhattan_distance(a, b), None); + } + + #[test] + fn all_valid_cells_count() { + // 15 values per axis (0..14), 3 axes → 15^3 = 3375 + assert_eq!(all_valid_cells().count(), 15 * 15 * 15); + } + + #[test] + fn all_valid_cells_no_overflow() { + for c in all_valid_cells() { + assert!(!c.has_overflow(), "unexpected overflow cell: {c:?}"); + } + } + + #[test] + fn center_is_always_first_in_neighbourhood() { + let c = at(2, -3, 1); + let nb = neighbors_4096(c, 1, NeighborhoodMetric::Chebyshev); + assert_eq!(nb.iter()[0], c); + } +} diff --git a/crates/deepnsm/src/disambiguator_glue.rs b/crates/deepnsm/src/disambiguator_glue.rs index ad18e61ec..04c159e1e 100644 --- a/crates/deepnsm/src/disambiguator_glue.rs +++ b/crates/deepnsm/src/disambiguator_glue.rs @@ -235,7 +235,7 @@ mod tests { #[test] fn sign_binarize_truncates_oversized_bundle() { let mut bundle = vec![1.0_f32; BINARY16K_BITS]; - bundle.extend(std::iter::repeat(-1.0_f32).take(100)); + bundle.extend(std::iter::repeat_n(-1.0_f32, 100)); let bits = sign_binarize_to_binary16k(&bundle); // First 16,384 bits → all-positive → all-ones. for &w in bits.iter() { diff --git a/crates/deepnsm/src/encoder.rs b/crates/deepnsm/src/encoder.rs index 2d4255d9e..3e9a18c09 100644 --- a/crates/deepnsm/src/encoder.rs +++ b/crates/deepnsm/src/encoder.rs @@ -51,17 +51,19 @@ impl VsaVec { /// Deterministic: same rank always produces the same vector. pub fn from_rank(rank: u16) -> Self { // Use rank as seed with a large prime multiplier for spread - Self::random((rank as u64).wrapping_mul(0x9E3779B97F4A7C15).wrapping_add(0xBF58476D1CE4E5B9)) + Self::random( + (rank as u64) + .wrapping_mul(0x9E3779B97F4A7C15) + .wrapping_add(0xBF58476D1CE4E5B9), + ) } /// XOR bind: `self ⊕ other`. Reversible: `(a ⊕ b) ⊕ b = a`. #[inline] pub fn bind(&self, other: &VsaVec) -> VsaVec { - let mut result = [0u64; VSA_WORDS]; - for i in 0..VSA_WORDS { - result[i] = self.data[i] ^ other.data[i]; + VsaVec { + data: std::array::from_fn(|i| self.data[i] ^ other.data[i]), } - VsaVec { data: result } } /// Hamming distance (number of differing bits). @@ -90,11 +92,9 @@ impl VsaVec { /// Bitwise NOT (complement). pub fn complement(&self) -> VsaVec { - let mut result = [0u64; VSA_WORDS]; - for i in 0..VSA_WORDS { - result[i] = !self.data[i]; + VsaVec { + data: std::array::from_fn(|i| !self.data[i]), } - VsaVec { data: result } } /// Access raw data. @@ -135,12 +135,12 @@ impl RoleVectors { /// These never change — they're architectural constants. pub fn new() -> Self { RoleVectors { - subject: VsaVec::random(0x5375626A65637400), // "Subject\0" + subject: VsaVec::random(0x5375626A65637400), // "Subject\0" predicate: VsaVec::random(0x5072656469636174), // "Predicat" - object: VsaVec::random(0x4F626A6563740000), // "Object\0\0" - modifier: VsaVec::random(0x4D6F646966696572), // "Modifier" - temporal: VsaVec::random(0x54656D706F72616C), // "Temporal" - negation: VsaVec::random(0x4E65676174696F6E), // "Negation" + object: VsaVec::random(0x4F626A6563740000), // "Object\0\0" + modifier: VsaVec::random(0x4D6F646966696572), // "Modifier" + temporal: VsaVec::random(0x54656D706F72616C), // "Temporal" + negation: VsaVec::random(0x4E65676174696F6E), // "Negation" } } } @@ -173,7 +173,7 @@ pub fn bundle(vectors: &[VsaVec]) -> VsaVec { let threshold = vectors.len() / 2; let mut result = [0u64; VSA_WORDS]; - for bit_word in 0..VSA_WORDS { + for (bit_word, result_slot) in result.iter_mut().enumerate() { let mut result_word = 0u64; for bit_pos in 0..64 { let mask = 1u64 << bit_pos; @@ -183,7 +183,7 @@ pub fn bundle(vectors: &[VsaVec]) -> VsaVec { .count(); if count > threshold { result_word |= mask; - } else if count == threshold && vectors.len() % 2 == 0 { + } else if count == threshold && vectors.len().is_multiple_of(2) { // Tie-breaking for even count: use deterministic rule // (use bit position parity) if bit_pos % 2 == 0 { @@ -191,7 +191,7 @@ pub fn bundle(vectors: &[VsaVec]) -> VsaVec { } } } - result[bit_word] = result_word; + *result_slot = result_word; } VsaVec { data: result } @@ -282,7 +282,9 @@ mod tests { let tolerance = (VSA_BITS as f32).sqrt() as u32 * 3; // 3σ assert!( popcount.abs_diff(expected) < tolerance, - "popcount={}, expected≈{}", popcount, expected + "popcount={}, expected≈{}", + popcount, + expected ); } @@ -344,7 +346,11 @@ mod tests { // Negated should be different let sim = positive.similarity(&negated); - assert!(sim < 0.8, "sim = {} — negation should change the vector!", sim); + assert!( + sim < 0.8, + "sim = {} — negation should change the vector!", + sim + ); } #[test] diff --git a/crates/deepnsm/src/episodic_spo.rs b/crates/deepnsm/src/episodic_spo.rs new file mode 100644 index 000000000..ccca15c92 --- /dev/null +++ b/crates/deepnsm/src/episodic_spo.rs @@ -0,0 +1,330 @@ +//! Episodic SPO frame — the auditable sentence-level witness. +//! +//! An `EpisodicSpoFrame` is one row in the reading surface for a single +//! triple within a single sentence. It is the **truth** / **witness**: +//! inspectable, tombs-toneable, AriGraph-committable. +//! +//! The `Cam64` fast-index stored inside each frame is NOT the meaning — +//! it is the reading-locality key. See `cam64` module for the distinction. +//! +//! ## Column layout +//! +//! The column names match the spec verbatim so the SoA projection is +//! mechanical. All fields are `Copy` — the frame is meant to be stacked +//! in `Vec` and swept SIMD-style. +//! +//! ## Crystallisation lifecycle +//! +//! ```text +//! EpisodicSpoFrame (emitted per sentence) +//! → repeated SPO detail → story basin candidate +//! → basin coherent enough → tombstone witness (SPO + Lance columnar) +//! → new facts classified as: reinforcement / novelty / wisdom / contradiction / epiphany +//! ``` + +use crate::cam64::Cam64; +use crate::morphology::MorphFlags; +use crate::pos::PoS; + +/// Sentinel: "no role / unresolved" for 12-bit vocabulary ranks. +/// +/// Re-exported from `crate::spo` so there is a single canonical `0xFFF` +/// definition — removes the drift risk of two independent constants. Existing +/// `use crate::episodic_spo::NO_ROLE` imports keep working unchanged. +pub use crate::spo::NO_ROLE; + +// ── Role classification enums ───────────────────────────────────────────── + +/// Syntactic dependency role of the head term in this frame. +#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, Default)] +#[repr(u8)] +pub enum DependencyRole { + #[default] + Unknown = 0, + Subject = 1, + Predicate = 2, + Object = 3, + Modifier = 4, + Complement = 5, + Adjunct = 6, + Specifier = 7, +} + +/// Clause-level structural role. +#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, Default)] +#[repr(u8)] +pub enum ClauseRole { + #[default] + Main = 0, + Relative = 1, + Subordinate = 2, + Infinitival = 3, + Participial = 4, + Coordinate = 5, +} + +/// Discourse-level role of the frame. +#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, Default)] +#[repr(u8)] +pub enum DiscourseRole { + /// Topic of the current discourse segment. + #[default] + Topic = 0, + /// Comment / new information about the topic. + Comment = 1, + /// Opening of a new discourse segment / basin. + Opener = 2, + /// Closing / summary of an existing basin. + Closer = 3, + /// Bridge: connects prior and new discourse segments. + Bridge = 4, + /// Background / presupposition. + Background = 5, +} + +// ── EpisodicSpoFrame ───────────────────────────────────────────────────── + +/// One auditable episodic SPO row — the reading surface for a single triple. +/// +/// All fields are `Copy`. Stack many in `Vec` for the SoA sweep. +#[derive(Clone, Copy, Debug)] +pub struct EpisodicSpoFrame { + // ── Position ──────────────────────────────────────────────────────────── + pub doc_id: u32, + pub sentence_id: u32, + pub token_span_start: u16, // inclusive byte/token offset of the head term + pub token_span_end: u16, // exclusive + + // ── Lexical ───────────────────────────────────────────────────────────── + /// Vocabulary rank of the head term (lemma_id in the NSM / COCA vocabulary). + pub term_id: u16, + pub pos_tag: PoS, + pub morph_flags: MorphFlags, + + // ── Syntactic ─────────────────────────────────────────────────────────── + pub dependency_role: DependencyRole, + pub clause_role: ClauseRole, + pub discourse_role: DiscourseRole, + + // ── SPO candidates ────────────────────────────────────────────────────── + /// Vocabulary rank of the resolved subject (NO_ROLE if absent). + pub subject_candidate_id: u16, + /// Vocabulary rank of the resolved predicate. + pub predicate_candidate_id: u16, + /// Vocabulary rank of the resolved object (NO_ROLE if intransitive). + pub object_candidate_id: u16, + /// Resolved coreference target (NO_ROLE if not a pronoun or unresolvable). + pub refers_to_candidate_id: u16, + + // ── Window position ───────────────────────────────────────────────────── + /// Offset from the current sentence: 0 for current, -1 for prior, etc. + /// Always 0 at emit time; downstream can patch window-relative offsets. + pub sentence_window_offset: i8, + + // ── NSM semantic masks ────────────────────────────────────────────────── + /// Bitmask of NSM semantic primes active in this frame (63 primes → 64 bits). + pub nsm_prime_mask: u64, + /// Bitmask of NSM semantic molecules (composite concepts). + pub nsm_molecule_mask: u64, + + // ── CAM locality ──────────────────────────────────────────────────────── + /// CAM-PQ 6-subspace code for the subject head (6 centroid indices, one per subspace). + /// Used for fast palette-distance lookups against the codebook. + pub cam_code: [u8; 6], + + // ── Episodic quality ──────────────────────────────────────────────────── + pub confidence: f32, + pub frequency: f32, + pub novelty: f32, + pub wisdom: f32, + pub staunen: f32, // aesthetic/cognitive surprise (German: astonishment) + pub entropy: f32, + pub free_energy_delta: f32, + + // ── Reading-state locality code (NOT the truth) ────────────────────── + /// Fast 64-bit reading-locality index. Basin-matching, prefetch, coreference heuristics. + /// The `subject/predicate/object_candidate_id` fields above are the truth. + pub cam64: Cam64, +} + +impl EpisodicSpoFrame { + /// Is the subject a pronoun resolved to a prior entity? + #[inline] + pub fn is_coreference(&self) -> bool { + self.refers_to_candidate_id != NO_ROLE + } + + /// Is this triple intransitive (no object)? + #[inline] + pub fn is_intransitive(&self) -> bool { + self.object_candidate_id == NO_ROLE + } + + /// Is this triple negated? + #[inline] + pub fn is_negated(&self) -> bool { + self.morph_flags.is_negated() + } + + /// Is this triple a past-tense event (story-arc candidate)? + #[inline] + pub fn is_episodic_event(&self) -> bool { + self.morph_flags.is_past() + } + + /// Classification relative to an existing basin. + /// + /// V1 heuristic over novelty, entropy, confidence, and wisdom: + /// - high novelty + low entropy → `Epiphany` + /// - high novelty (high entropy) → `NoveltyDelta` + /// - low confidence → `Contradiction` + /// - high wisdom → `WisdomDelta` + /// - otherwise → `Reinforcement` + /// + /// `BasinClassification::Branch` is intentionally never produced here: a + /// per-frame heuristic cannot see the parallel narrative line that defines a + /// branch. It is reserved for the cross-frame basin tracker, which assigns it + /// when a frame opens a divergent story arc. + pub fn basin_classification(&self) -> BasinClassification { + let high_novelty = self.novelty > 0.7; + let low_entropy = self.entropy < 0.3; + + if high_novelty && low_entropy { + BasinClassification::Epiphany + } else if high_novelty { + BasinClassification::NoveltyDelta + } else if self.confidence < 0.2 { + BasinClassification::Contradiction + } else if self.wisdom > 0.6 { + BasinClassification::WisdomDelta + } else { + BasinClassification::Reinforcement + } + } +} + +/// How a new episodic frame relates to an existing story basin. +/// +/// From the spec: +/// - `Reinforcement` — detail repeats or strengthens the basin +/// - `NoveltyDelta` — surprising new detail that extends the basin +/// - `WisdomDelta` — detail reduces entropy / makes the story simpler/more explanatory +/// - `Contradiction` — detail conflicts with basin +/// - `Branch` — opens a parallel narrative line +/// - `Epiphany` — high novelty + coherence gain (surprising but entropy-reducing) +#[derive(Clone, Copy, Debug, PartialEq, Eq)] +pub enum BasinClassification { + Reinforcement, + NoveltyDelta, + WisdomDelta, + Contradiction, + Branch, + Epiphany, +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::cam64::Cam64; + use crate::morphology::MorphFlags; + use crate::pos::PoS; + + fn blank_frame() -> EpisodicSpoFrame { + EpisodicSpoFrame { + doc_id: 1, + sentence_id: 0, + token_span_start: 0, + token_span_end: 5, + term_id: 42, + pos_tag: PoS::Noun, + morph_flags: MorphFlags::default(), + dependency_role: DependencyRole::Subject, + clause_role: ClauseRole::Main, + discourse_role: DiscourseRole::Topic, + subject_candidate_id: 10, + predicate_candidate_id: 20, + object_candidate_id: 30, + refers_to_candidate_id: NO_ROLE, + sentence_window_offset: 0, + nsm_prime_mask: 0, + nsm_molecule_mask: 0, + cam_code: [0; 6], + confidence: 0.9, + frequency: 0.5, + novelty: 0.1, + wisdom: 0.3, + staunen: 0.0, + entropy: 0.5, + free_energy_delta: 0.0, + cam64: Cam64::default(), + } + } + + #[test] + fn coreference_detection() { + let mut f = blank_frame(); + assert!(!f.is_coreference()); + f.refers_to_candidate_id = 5; + assert!(f.is_coreference()); + } + + #[test] + fn intransitive_detection() { + let mut f = blank_frame(); + assert!(!f.is_intransitive()); + f.object_candidate_id = NO_ROLE; + assert!(f.is_intransitive()); + } + + #[test] + fn negated_via_morph() { + let mut f = blank_frame(); + assert!(!f.is_negated()); + f.morph_flags = f.morph_flags.set(MorphFlags::NEGATED); + assert!(f.is_negated()); + } + + #[test] + fn episodic_event_via_past() { + let mut f = blank_frame(); + assert!(!f.is_episodic_event()); + f.morph_flags = f.morph_flags.set(MorphFlags::PAST); + assert!(f.is_episodic_event()); + } + + #[test] + fn epiphany_high_novelty_low_entropy() { + let mut f = blank_frame(); + f.novelty = 0.9; + f.entropy = 0.1; + assert_eq!(f.basin_classification(), BasinClassification::Epiphany); + } + + #[test] + fn reinforcement_low_novelty() { + let mut f = blank_frame(); + f.novelty = 0.1; + f.entropy = 0.5; + f.confidence = 0.9; + assert_eq!(f.basin_classification(), BasinClassification::Reinforcement); + } + + #[test] + fn contradiction_low_confidence() { + let mut f = blank_frame(); + f.novelty = 0.1; + f.confidence = 0.1; + assert_eq!(f.basin_classification(), BasinClassification::Contradiction); + } + + #[test] + fn size_is_reasonable() { + // Frame should be small enough to stack efficiently. + // Current layout: ~80 bytes is acceptable; alert if it balloons. + let size = core::mem::size_of::(); + assert!( + size <= 128, + "EpisodicSpoFrame grew to {size} bytes — check alignment/padding" + ); + } +} diff --git a/crates/deepnsm/src/fingerprint16k.rs b/crates/deepnsm/src/fingerprint16k.rs index 06f5de4fe..d3cdecf0f 100644 --- a/crates/deepnsm/src/fingerprint16k.rs +++ b/crates/deepnsm/src/fingerprint16k.rs @@ -13,7 +13,7 @@ /// 16Kbit = 2048 bytes = 256 u64 words. pub const DIM_BITS: usize = 16384; pub const DIM_BYTES: usize = DIM_BITS / 8; // 2048 -pub const DIM_U64: usize = DIM_BITS / 64; // 256 +pub const DIM_U64: usize = DIM_BITS / 64; // 256 /// A 16Kbit binary fingerprint. Stack-allocated, Copy, SIMD-friendly. #[derive(Clone, Copy)] @@ -24,7 +24,9 @@ pub struct Fingerprint16K { impl Fingerprint16K { /// Zero fingerprint. - pub const ZERO: Self = Self { words: [0u64; DIM_U64] }; + pub const ZERO: Self = Self { + words: [0u64; DIM_U64], + }; /// Generate deterministic fingerprint for a centroid. /// Uses golden-ratio hashing for uniform bit distribution. @@ -83,12 +85,7 @@ impl Fingerprint16K { /// As byte slice for SIMD paths. pub fn as_bytes(&self) -> &[u8] { - unsafe { - std::slice::from_raw_parts( - self.words.as_ptr() as *const u8, - DIM_BYTES, - ) - } + unsafe { std::slice::from_raw_parts(self.words.as_ptr() as *const u8, DIM_BYTES) } } /// Belichtungsmesser early exit: are two fingerprints in the same σ-band? @@ -145,7 +142,8 @@ pub fn bundle(fingerprints: &[Fingerprint16K]) -> Fingerprint16K { let mut out = 0u64; for bit in 0..64 { let mask = 1u64 << bit; - let count = fingerprints.iter() + let count = fingerprints + .iter() .filter(|fp| fp.words[word_idx] & mask != 0) .count(); if count > threshold { @@ -170,7 +168,8 @@ pub fn bundle_weighted(fingerprints: &[(Fingerprint16K, f32)]) -> Fingerprint16K let mut out = 0u64; for bit in 0..64 { let mask = 1u64 << bit; - let weight_sum: f32 = fingerprints.iter() + let weight_sum: f32 = fingerprints + .iter() .filter(|(fp, _)| fp.words[word_idx] & mask != 0) .map(|(_, w)| w) .sum(); @@ -185,7 +184,12 @@ pub fn bundle_weighted(fingerprints: &[(Fingerprint16K, f32)]) -> Fingerprint16K impl std::fmt::Debug for Fingerprint16K { fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { - write!(f, "FP16K(pop={}, w0={:#018x})", self.popcount(), self.words[0]) + write!( + f, + "FP16K(pop={}, w0={:#018x})", + self.popcount(), + self.words[0] + ) } } @@ -228,9 +232,12 @@ mod tests { let sim_distant = fp_base.similarity(&fp_distant); // Neighbor should be more similar than distant - assert!(sim_neighbor > sim_distant, + assert!( + sim_neighbor > sim_distant, "neighbor sim {:.3} should be > distant sim {:.3}", - sim_neighbor, sim_distant); + sim_neighbor, + sim_distant + ); } #[test] diff --git a/crates/deepnsm/src/lib.rs b/crates/deepnsm/src/lib.rs index 098ab068a..99be66ce8 100644 --- a/crates/deepnsm/src/lib.rs +++ b/crates/deepnsm/src/lib.rs @@ -61,9 +61,9 @@ pub mod similarity; pub mod spo; pub mod vocabulary; -pub mod trajectory; pub mod markov_bundle; pub mod nsm_primes; +pub mod trajectory; // E-ENGLISH-BIFURCATES — two SEPARATE faculties (don't fuse them): // arcs (Broca/projection): basin/literal decomposition of the MarkovBundler wave. @@ -106,11 +106,33 @@ pub mod triangle_bridge; // ─── Re-exports ────────────────────────────────────────────────────────────── +pub use context::ContextWindow; +pub use encoder::{RoleVectors, VsaVec}; pub use pipeline::DeepNsmEngine; pub use pos::PoS; +pub use similarity::SimilarityTable; pub use spo::SpoTriple; pub use vocabulary::Vocabulary; -pub use similarity::SimilarityTable; -pub use encoder::{VsaVec, RoleVectors}; -pub use context::ContextWindow; pub mod fingerprint16k; + +// ── DeepNSM reader — sentence-level AriGraph reader ────────────────────── +// Left-corner state machine: expectation + evidence → episodic SPO + next state. +// Five modules; writer order is dependency order. +pub mod cam64; // Cam64 — 8-lane reading-state locality key (NOT the truth) +pub mod episodic_spo; // EpisodicSpoFrame — the auditable witness rows +pub mod morphology; // MorphFlags — heuristic tense/voice/clause flags +pub mod reader_state; +pub mod window; // SentenceWindow — ±5 exact entity tracking for coreference // ReadingState + step() — the left-corner transition + // Signed discrete reading-crystal: P64 meaning field + Crystal4096 coordinate. + // Bridge from DeepNSM grammar reader → holograph bitpacked resonance substrate. + // Integer-only hot path; floats remain only in EpisodicSpoFrame quality fields. +pub mod signed_crystal; +// SentenceTransformer64 — deterministic state-transition transformer. +// Maps grammar/NSM/discourse → P64 native meaning field → Cam4096 codebook address. +// "Transformer" = state-transition automaton, NOT neural self-attention. +// P64 is the native address space; Cam4096 is its deterministic 12-bit locality key. +pub mod sentence_transformer64; +// L1 local geometry for Crystal4096: neighbors_4096(), chebyshev/manhattan distance, +// NeighborhoodMetric (Manhattan / Chebyshev / LaneCompatible). No floats. +// blasgraph (L2) will consume these for frontier propagation in v2. +pub mod crystal_neighborhood; diff --git a/crates/deepnsm/src/markov_bundle.rs b/crates/deepnsm/src/markov_bundle.rs index b02b7d915..3491559b5 100644 --- a/crates/deepnsm/src/markov_bundle.rs +++ b/crates/deepnsm/src/markov_bundle.rs @@ -11,9 +11,8 @@ use crate::trajectory::Trajectory; use lance_graph_contract::grammar::role_keys::{ - CONTEXT_SLICE, INSTRUMENT_SLICE, KAUSAL_SLICE, LOKAL_SLICE, MODAL_SLICE, - MODIFIER_SLICE, OBJECT_SLICE, PREDICATE_SLICE, RoleKeySlice, SUBJECT_SLICE, - TEMPORAL_SLICE, VSA_DIMS, + RoleKeySlice, CONTEXT_SLICE, INSTRUMENT_SLICE, KAUSAL_SLICE, LOKAL_SLICE, MODAL_SLICE, + MODIFIER_SLICE, OBJECT_SLICE, PREDICATE_SLICE, SUBJECT_SLICE, TEMPORAL_SLICE, VSA_DIMS, }; #[derive(Debug, Clone, Copy, PartialEq, Eq, Default)] @@ -189,9 +188,7 @@ mod tests { } #[test] fn kernel_mexican_symmetric() { - assert!( - (Kernel::MexicanHat.weight(-2, 5) - Kernel::MexicanHat.weight(2, 5)).abs() < 1e-6 - ); + assert!((Kernel::MexicanHat.weight(-2, 5) - Kernel::MexicanHat.weight(2, 5)).abs() < 1e-6); } #[test] fn role_slices_disjoint() { @@ -206,24 +203,20 @@ mod tests { fn role_slice_widths_match_role_keys_canonical() { // Spot-check that `GrammaticalRole::slice` returns the role_keys-canonical // widths (NOT the old equal-partition 16384/5 = 3277 layout). - assert_eq!(GrammaticalRole::Subject.slice().len(), 2000); - assert_eq!(GrammaticalRole::Predicate.slice().len(), 2000); - assert_eq!(GrammaticalRole::Object.slice().len(), 2000); - assert_eq!(GrammaticalRole::Modifier.slice().len(), 1500); - assert_eq!(GrammaticalRole::Context.slice().len(), 1500); - assert_eq!(GrammaticalRole::Temporal.slice().len(), 200); - assert_eq!(GrammaticalRole::Kausal.slice().len(), 200); - assert_eq!(GrammaticalRole::Modal.slice().len(), 100); - assert_eq!(GrammaticalRole::Lokal.slice().len(), 150); - assert_eq!(GrammaticalRole::Instrument.slice().len(), 100); + assert_eq!(GrammaticalRole::Subject.slice().len(), 2000); + assert_eq!(GrammaticalRole::Predicate.slice().len(), 2000); + assert_eq!(GrammaticalRole::Object.slice().len(), 2000); + assert_eq!(GrammaticalRole::Modifier.slice().len(), 1500); + assert_eq!(GrammaticalRole::Context.slice().len(), 1500); + assert_eq!(GrammaticalRole::Temporal.slice().len(), 200); + assert_eq!(GrammaticalRole::Kausal.slice().len(), 200); + assert_eq!(GrammaticalRole::Modal.slice().len(), 100); + assert_eq!(GrammaticalRole::Lokal.slice().len(), 150); + assert_eq!(GrammaticalRole::Instrument.slice().len(), 100); } /// Helper: fill a bundler's window so a single push triggers `bundle_current`. - fn fill_and_bundle( - kernel: Kernel, - radius: u32, - sent: WindowedSentence, - ) -> Trajectory { + fn fill_and_bundle(kernel: Kernel, radius: u32, sent: WindowedSentence) -> Trajectory { let mut b = MarkovBundler::new(radius, kernel); let cap = (2 * radius + 1) as usize; let mut last: Option = None; @@ -243,7 +236,11 @@ mod tests { ) -> Trajectory { let mut b = MarkovBundler::new(radius, kernel); let cap = (2 * radius + 1) as usize; - assert_eq!(sentences.len(), cap, "sequence must fill exactly one window"); + assert_eq!( + sentences.len(), + cap, + "sequence must fill exactly one window" + ); let mut last: Option = None; for s in sentences { last = b.push(s); @@ -259,8 +256,8 @@ mod tests { fn bundle_does_not_rotate_subject_dims_outside_subject_slice() { // SUBJECT-only window: every sentence has a single SUBJECT token // whose content_fp is all 1.0 across the SUBJECT slice. - let subject_len = GrammaticalRole::Subject.slice().stop - - GrammaticalRole::Subject.slice().start; + let subject_len = + GrammaticalRole::Subject.slice().stop - GrammaticalRole::Subject.slice().start; let sent = WindowedSentence { tokens: vec![TokenWithRole { content_fp: vec![1.0; subject_len], @@ -273,8 +270,7 @@ mod tests { let s_start = _slice.start; let s_stop = _slice.stop; // SUBJECT slice should be non-zero (positive after normalization). - let subject_sum: f32 = - traj.fingerprint[s_start..s_stop].iter().sum(); + let subject_sum: f32 = traj.fingerprint[s_start..s_stop].iter().sum(); assert!( subject_sum > 1.0, "expected non-trivial SUBJECT content, got sum={subject_sum}" @@ -298,8 +294,8 @@ mod tests { /// way symmetric kernels can't equalize. #[test] fn mexican_hat_bundle_differs_from_uniform_bundle() { - let subject_len = GrammaticalRole::Subject.slice().stop - - GrammaticalRole::Subject.slice().start; + let subject_len = + GrammaticalRole::Subject.slice().stop - GrammaticalRole::Subject.slice().start; let radius = 5u32; let cap = (2 * radius + 1) as usize; // Single outlier at position 1 (delta = -4). Uniform weights this @@ -311,10 +307,7 @@ mod tests { let sentences: Vec = (0..cap) .map(|i| WindowedSentence { tokens: vec![TokenWithRole { - content_fp: vec![ - if i == outlier_pos { 1.0 } else { 0.0 }; - subject_len - ], + content_fp: vec![if i == outlier_pos { 1.0 } else { 0.0 }; subject_len], role: GrammaticalRole::Subject, }], }) @@ -340,8 +333,8 @@ mod tests { /// land in a loose [0.5, 1.5] band on a controlled SUBJECT-only window. #[test] fn bundle_l2_norm_invariant_to_kernel() { - let subject_len = GrammaticalRole::Subject.slice().stop - - GrammaticalRole::Subject.slice().start; + let subject_len = + GrammaticalRole::Subject.slice().stop - GrammaticalRole::Subject.slice().start; let sent = WindowedSentence { tokens: vec![TokenWithRole { content_fp: vec![1.0; subject_len], @@ -351,12 +344,7 @@ mod tests { for k in [Kernel::Uniform, Kernel::MexicanHat, Kernel::Gaussian] { let traj = fill_and_bundle(k, 5, sent.clone()); // Per-dim mean of |v| × sqrt(N_subj) ≈ L2 norm; we test L2 directly. - let l2: f32 = traj - .fingerprint - .iter() - .map(|v| v * v) - .sum::() - .sqrt(); + let l2: f32 = traj.fingerprint.iter().map(|v| v * v).sum::().sqrt(); // Each SUBJECT dim sums to (Σ_i w_i) / (Σ_i |w_i|). For Uniform // and Gaussian (all-positive weights) this is exactly 1.0 per dim, // so L2 = sqrt(subject_len) ≈ 57.2. For Mexican-hat the negative diff --git a/crates/deepnsm/src/morphology.rs b/crates/deepnsm/src/morphology.rs new file mode 100644 index 000000000..103a7efb0 --- /dev/null +++ b/crates/deepnsm/src/morphology.rs @@ -0,0 +1,276 @@ +//! Morphological feature flags — heuristically derived from grammar signals. +//! +//! `MorphFlags` is a 16-bit bitfield. For v1, flags are derived from what +//! `SentenceStructure` already carries: negation, temporal markers, and +//! modal/passive patterns visible in the PoS sequence. Full morphological +//! analysis (number, person, voice) requires a dedicated morphology pass; +//! this module provides the column *shape* for those fields and the +//! heuristic baseline. +//! +//! **Invariant:** these flags describe the *parse frame*, not the reading +//! state. The 64-bit CAM code (`cam64::Cam64`) encodes the reading-state +//! transition; these flags are one of its inputs. + +use crate::parser::SentenceStructure; + +/// 16-bit morphological feature bitfield. +/// +/// Bit layout: +/// ```text +/// bit 0: PAST (temporal event, heuristic: temporal marker present) +/// bit 1: PRESENT (atemporal / general) +/// bit 2: FUTURE (modal present) +/// bit 3: SINGULAR (number singular) +/// bit 4: PLURAL (number plural) +/// bit 5: FIRST_PERSON (I / we) +/// bit 6: SECOND_PERSON (you) +/// bit 7: THIRD_PERSON (he / she / it / they, default) +/// bit 8: PASSIVE (passive construction detected) +/// bit 9: NEGATED (negation marker in triple) +/// bit 10: INTERROGATIVE (question marker) +/// bit 11: RELATIVE_CLAUSE (relative pronoun / subordinate clause) +/// bit 12: INFINITIVE (bare infinitive / to-infinitive) +/// bit 13: SUBORDINATE (subordinating conjunction) +/// bits 14-15: spare +/// ``` +#[derive(Clone, Copy, Default, PartialEq, Eq, Hash)] +pub struct MorphFlags(u16); + +impl MorphFlags { + pub const PAST: u16 = 1 << 0; + pub const PRESENT: u16 = 1 << 1; + pub const FUTURE: u16 = 1 << 2; + pub const SINGULAR: u16 = 1 << 3; + pub const PLURAL: u16 = 1 << 4; + pub const FIRST_PERSON: u16 = 1 << 5; + pub const SECOND_PERSON: u16 = 1 << 6; + pub const THIRD_PERSON: u16 = 1 << 7; + pub const PASSIVE: u16 = 1 << 8; + pub const NEGATED: u16 = 1 << 9; + pub const INTERROGATIVE: u16 = 1 << 10; + pub const RELATIVE_CLAUSE: u16 = 1 << 11; + pub const INFINITIVE: u16 = 1 << 12; + pub const SUBORDINATE: u16 = 1 << 13; + + pub fn new(bits: u16) -> Self { + Self(bits) + } + + pub fn bits(self) -> u16 { + self.0 + } + + pub fn has(self, flag: u16) -> bool { + self.0 & flag != 0 + } + pub fn set(self, flag: u16) -> Self { + Self(self.0 | flag) + } + pub fn clear(self, flag: u16) -> Self { + Self(self.0 & !flag) + } + + pub fn is_past(self) -> bool { + self.has(Self::PAST) + } + pub fn is_present(self) -> bool { + self.has(Self::PRESENT) + } + pub fn is_future(self) -> bool { + self.has(Self::FUTURE) + } + pub fn is_singular(self) -> bool { + self.has(Self::SINGULAR) + } + pub fn is_plural(self) -> bool { + self.has(Self::PLURAL) + } + pub fn is_first_person(self) -> bool { + self.has(Self::FIRST_PERSON) + } + pub fn is_second_person(self) -> bool { + self.has(Self::SECOND_PERSON) + } + pub fn is_third_person(self) -> bool { + self.has(Self::THIRD_PERSON) + } + pub fn is_passive(self) -> bool { + self.has(Self::PASSIVE) + } + pub fn is_negated(self) -> bool { + self.has(Self::NEGATED) + } + pub fn is_interrogative(self) -> bool { + self.has(Self::INTERROGATIVE) + } + pub fn is_relative_clause(self) -> bool { + self.has(Self::RELATIVE_CLAUSE) + } + pub fn is_infinitive(self) -> bool { + self.has(Self::INFINITIVE) + } + pub fn is_subordinate(self) -> bool { + self.has(Self::SUBORDINATE) + } + + /// Derive heuristic morph flags from a parsed sentence and triple index. + /// + /// V1 heuristics (deterministic, no learned parameters): + /// - negation list → `NEGATED` + /// - temporal marker → `PAST` + `THIRD_PERSON` + `SINGULAR` + /// - neither → `PRESENT` + `THIRD_PERSON` + `SINGULAR` (English default) + /// + /// This v1 pass only ever sets the flags above. `FUTURE`, `PLURAL`, + /// `FIRST_PERSON`, `SECOND_PERSON`, `PASSIVE`, `INTERROGATIVE`, + /// `RELATIVE_CLAUSE`, `INFINITIVE`, and `SUBORDINATE` are never derived here; + /// they require a dedicated morphology pass or an explicit `set()` by the caller. + pub fn from_sentence_structure(s: &SentenceStructure, triple_idx: usize) -> Self { + let mut flags = Self::default(); + + if s.negations.contains(&triple_idx) { + flags = flags.set(Self::NEGATED); + } + + let has_temporal = s.temporals.iter().any(|&(ti, _)| ti == triple_idx); + + if has_temporal { + // V1: temporal marker → treat as past event + flags = flags + .set(Self::PAST) + .set(Self::THIRD_PERSON) + .set(Self::SINGULAR); + } else { + // Default: present atemporal statement + flags = flags + .set(Self::PRESENT) + .set(Self::THIRD_PERSON) + .set(Self::SINGULAR); + } + + flags + } +} + +impl core::fmt::Debug for MorphFlags { + fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { + let mut names = Vec::new(); + if self.is_past() { + names.push("PAST"); + } + if self.is_present() { + names.push("PRESENT"); + } + if self.is_future() { + names.push("FUTURE"); + } + if self.is_singular() { + names.push("SINGULAR"); + } + if self.is_plural() { + names.push("PLURAL"); + } + if self.is_first_person() { + names.push("1P"); + } + if self.is_second_person() { + names.push("2P"); + } + if self.is_third_person() { + names.push("3P"); + } + if self.is_passive() { + names.push("PASSIVE"); + } + if self.is_negated() { + names.push("NEG"); + } + if self.is_interrogative() { + names.push("?"); + } + if self.is_relative_clause() { + names.push("REL"); + } + if self.is_infinitive() { + names.push("INF"); + } + if self.is_subordinate() { + names.push("SUB"); + } + write!(f, "MorphFlags({:?})", names) + } +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::spo::SpoTriple; + + fn make_sentence(negated: bool, temporal: bool) -> SentenceStructure { + SentenceStructure { + triples: vec![SpoTriple::new(1, 2, 3)], + modifiers: vec![], + negations: if negated { vec![0] } else { vec![] }, + temporals: if temporal { vec![(0, 42)] } else { vec![] }, + } + } + + #[test] + fn negated_triple() { + let s = make_sentence(true, false); + let m = MorphFlags::from_sentence_structure(&s, 0); + assert!(m.is_negated()); + assert!(m.is_present()); + } + + #[test] + fn temporal_sets_past() { + let s = make_sentence(false, true); + let m = MorphFlags::from_sentence_structure(&s, 0); + assert!(m.is_past()); + assert!(!m.is_present()); + } + + #[test] + fn default_present_third_singular() { + let s = make_sentence(false, false); + let m = MorphFlags::from_sentence_structure(&s, 0); + assert!(m.is_present()); + assert!(m.is_third_person()); + assert!(m.is_singular()); + } + + #[test] + fn set_clear_roundtrip() { + let m = MorphFlags::default() + .set(MorphFlags::PAST) + .set(MorphFlags::PLURAL) + .clear(MorphFlags::PAST); + assert!(!m.is_past()); + assert!(m.is_plural()); + } + + #[test] + fn bits_are_distinct() { + let flags = [ + MorphFlags::PAST, + MorphFlags::PRESENT, + MorphFlags::FUTURE, + MorphFlags::SINGULAR, + MorphFlags::PLURAL, + MorphFlags::FIRST_PERSON, + MorphFlags::SECOND_PERSON, + MorphFlags::THIRD_PERSON, + MorphFlags::PASSIVE, + MorphFlags::NEGATED, + MorphFlags::INTERROGATIVE, + MorphFlags::RELATIVE_CLAUSE, + MorphFlags::INFINITIVE, + MorphFlags::SUBORDINATE, + ]; + for i in 0..flags.len() { + for j in (i + 1)..flags.len() { + assert_eq!(flags[i] & flags[j], 0, "flags[{i}] and flags[{j}] overlap"); + } + } + } +} diff --git a/crates/deepnsm/src/nsm_primes.rs b/crates/deepnsm/src/nsm_primes.rs index 304c8a548..256ce9f0a 100644 --- a/crates/deepnsm/src/nsm_primes.rs +++ b/crates/deepnsm/src/nsm_primes.rs @@ -38,10 +38,8 @@ pub static NSM_PRIME_IDS: LazyLock> = LazyLock::new(|| { // mapping is available). for id in [ // Pronouns + demonstratives (rank 0..30 in COCA) - 2, 4, 8, 12, 14, 18, 22, 26, 28, - // Common NSM-mapped function words (rank 30..200) - 35, 45, 58, 67, 73, 89, 102, 117, 134, 158, 192, - // Mental predicates + 2, 4, 8, 12, 14, 18, 22, 26, 28, // Common NSM-mapped function words (rank 30..200) + 35, 45, 58, 67, 73, 89, 102, 117, 134, 158, 192, // Mental predicates 201, 233, 287, 309, 354, ] { s.insert(id as u16); @@ -58,7 +56,9 @@ pub fn is_nsm_prime(token_id: u16) -> bool { pub fn count_primes(tokens: impl Iterator) -> u8 { let mut n: u8 = 0; for t in tokens { - if is_nsm_prime(t) { n = n.saturating_add(1); } + if is_nsm_prime(t) { + n = n.saturating_add(1); + } } n } @@ -70,12 +70,12 @@ mod tests { #[test] fn primes_set_is_nonempty_and_bounded() { assert!(!NSM_PRIME_IDS.is_empty()); - assert!(NSM_PRIME_IDS.len() <= 65); // Wierzbicka's count + assert!(NSM_PRIME_IDS.len() <= 65); // Wierzbicka's count } #[test] fn count_primes_saturates_at_255() { - let many = std::iter::repeat(*NSM_PRIME_IDS.iter().next().unwrap()).take(1000); + let many = std::iter::repeat_n(*NSM_PRIME_IDS.iter().next().unwrap(), 1000); assert_eq!(count_primes(many), 255); } diff --git a/crates/deepnsm/src/parser.rs b/crates/deepnsm/src/parser.rs index d8cdc24bd..9e4b9998c 100644 --- a/crates/deepnsm/src/parser.rs +++ b/crates/deepnsm/src/parser.rs @@ -533,7 +533,7 @@ impl Parser { resolved.push(r); // Use the curated NSM-prime ID set rather than the // earlier `r < 64` heuristic. See nsm_primes.rs. - if crate::nsm_primes::is_nsm_prime(r as u16) { + if crate::nsm_primes::is_nsm_prime(r) { primes = primes.saturating_add(1); } } @@ -755,8 +755,8 @@ mod parser_coverage_tests { let prime = nsm_prime_rank(); let tokens = vec![ tok(Some(prime), PoS::Pronoun, "i"), - tok(Some(100), PoS::Verb, "see"), - tok(Some(200), PoS::Noun, "thing"), + tok(Some(100), PoS::Verb, "see"), + tok(Some(200), PoS::Noun, "thing"), ]; let parser = Parser::new(); let result = parser.parse_with_coverage(&tokens); @@ -772,9 +772,9 @@ mod parser_coverage_tests { fn parse_with_coverage_below_threshold_emits_ticket() { // Mostly OOV (rank: None) → coverage drops far below 0.85. let tokens = vec![ - tok(None, PoS::Noun, "xyzzy"), - tok(None, PoS::Noun, "plugh"), - tok(None, PoS::Verb, "fnord"), + tok(None, PoS::Noun, "xyzzy"), + tok(None, PoS::Noun, "plugh"), + tok(None, PoS::Verb, "fnord"), tok(Some(2943), PoS::Verb, "bites"), ]; let parser = Parser::new(); @@ -801,9 +801,9 @@ mod parser_coverage_tests { fn unresolved_tokens_preserve_position_identity() { // Mixed resolved/unresolved: positions of OOV tokens are 0 and 2. let tokens = vec![ - tok(None, PoS::Noun, "blarf"), - tok(Some(100), PoS::Verb, "is"), - tok(None, PoS::Noun, "wibble"), + tok(None, PoS::Noun, "blarf"), + tok(Some(100), PoS::Verb, "is"), + tok(None, PoS::Noun, "wibble"), ]; let parser = Parser::new(); let result = parser.parse_with_coverage(&tokens); diff --git a/crates/deepnsm/src/pipeline.rs b/crates/deepnsm/src/pipeline.rs index 385689e70..c266382fd 100644 --- a/crates/deepnsm/src/pipeline.rs +++ b/crates/deepnsm/src/pipeline.rs @@ -119,7 +119,10 @@ impl DeepNsmEngine { // 4. Build similarity table from exact distribution let similarity_table = SimilarityTable::from_distance_matrix(&distance_matrix); - eprintln!("[deepnsm] Similarity table calibrated: {:?}", similarity_table); + eprintln!( + "[deepnsm] Similarity table calibrated: {:?}", + similarity_table + ); // 5. Create role vectors and context window let roles = RoleVectors::new(); @@ -135,10 +138,7 @@ impl DeepNsmEngine { } /// Load with a precomputed distance matrix (skip CAM-PQ computation). - pub fn load_with_matrix( - data_dir: &Path, - matrix_data: Vec, - ) -> Result { + pub fn load_with_matrix(data_dir: &Path, matrix_data: Vec) -> Result { let vocab = Vocabulary::load(data_dir)?; let distance_matrix = WordDistanceMatrix::from_flat(matrix_data); let similarity_table = SimilarityTable::from_distance_matrix(&distance_matrix); @@ -173,14 +173,22 @@ impl DeepNsmEngine { encoder::encode_triple_negated( triple.subject(), triple.predicate(), - if triple.has_object() { Some(triple.object()) } else { None }, + if triple.has_object() { + Some(triple.object()) + } else { + None + }, &self.roles, ) } else { encoder::encode_triple( triple.subject(), triple.predicate(), - if triple.has_object() { Some(triple.object()) } else { None }, + if triple.has_object() { + Some(triple.object()) + } else { + None + }, &self.roles, ) }; @@ -209,7 +217,11 @@ impl DeepNsmEngine { } /// Compute similarity between two sentences. - pub fn sentence_similarity(&self, a: &ProcessedSentence, b: &ProcessedSentence) -> SentenceSimilarity { + pub fn sentence_similarity( + &self, + a: &ProcessedSentence, + b: &ProcessedSentence, + ) -> SentenceSimilarity { // VSA similarity let vsa_sim = a.sentence_vec.similarity(&b.sentence_vec); diff --git a/crates/deepnsm/src/pos.rs b/crates/deepnsm/src/pos.rs index 689606ef9..970a9ed82 100644 --- a/crates/deepnsm/src/pos.rs +++ b/crates/deepnsm/src/pos.rs @@ -120,7 +120,9 @@ mod tests { #[test] fn roundtrip_tags() { - for tag in &["a", "v", "j", "r", "i", "p", "c", "d", "n", "u", "t", "x", "e"] { + for tag in &[ + "a", "v", "j", "r", "i", "p", "c", "d", "n", "u", "t", "x", "e", + ] { let pos = PoS::from_tag(tag).unwrap(); assert_eq!(pos.as_tag(), *tag); } diff --git a/crates/deepnsm/src/quantum_mode.rs b/crates/deepnsm/src/quantum_mode.rs index 0d5a1827c..b7379cd01 100644 --- a/crates/deepnsm/src/quantum_mode.rs +++ b/crates/deepnsm/src/quantum_mode.rs @@ -35,7 +35,7 @@ impl PhaseTag { // Use the low 64 bits (the high 64 are reserved for future precision). let low = (self.0 & u64::MAX as u128) as u64; let normalized = (low as f64) / (u64::MAX as f64); - (normalized * std::f64::consts::TAU as f64) as f32 + (normalized * std::f64::consts::TAU) as f32 } pub fn distance(self, other: Self) -> u32 { @@ -79,7 +79,11 @@ mod tests { // f64 intermediate gives sub-1e-3 round-trip; f32 final cast caps // precision around 1e-6 of TAU (~6e-6 absolute). let diff = (recovered - theta).abs(); - assert!(diff < 0.001, "round-trip diff {} exceeds tolerance 0.001", diff); + assert!( + diff < 0.001, + "round-trip diff {} exceeds tolerance 0.001", + diff + ); } #[test] diff --git a/crates/deepnsm/src/reader_state.rs b/crates/deepnsm/src/reader_state.rs new file mode 100644 index 000000000..ab89169ac --- /dev/null +++ b/crates/deepnsm/src/reader_state.rs @@ -0,0 +1,701 @@ +//! Reading state machine — sentence-by-sentence AriGraph reader. +//! +//! ## Left-corner framing +//! +//! `step()` is a left-corner transition: it fuses top-down expectation +//! (what the current reading frame predicts) with bottom-up evidence +//! (what the parser saw in this sentence) to emit episodic SPO frames +//! and advance the state. +//! +//! A pure bottom-up parser says "I saw these words, build structure." +//! A pure top-down parser says "I expect this form, fill it." +//! A left-corner parser (Manning & Carpenter 1997) does the useful hybrid: +//! +//! ```text +//! ReadingState_t (top-down expectation) +//! + SentenceFeatures_t (bottom-up evidence) +//! + LeftCornerTrigger (first strong frame signal) +//! + ±5 sentence window (entity / coreference stack) +//! → Vec (auditable witnesses) +//! → ReadingState_t+1 (updated expectation) +//! ``` +//! +//! The `Cam64` in each emitted frame encodes the reading-state locality, +//! not semantic truth. The SPO fields in `EpisodicSpoFrame` are the truth. +//! +//! ## Coreference (v1) +//! +//! When the caller marks `TripleFeatures::subject_is_pronoun = true`, the +//! state machine resolves the subject against the `SentenceWindow` entity +//! stack (most-recent-first, recency heuristic). Resolution sets +//! `refers_to_candidate_id` and the coreference bit in `Cam64`. +//! Full antecedent-ranking (gender / number / semantic-type agreement) +//! is a v2 concern. + +use crate::cam64::Cam64; +use crate::episodic_spo::{ClauseRole, DependencyRole, DiscourseRole, EpisodicSpoFrame, NO_ROLE}; +use crate::morphology::MorphFlags; +use crate::parser::SentenceStructure; +use crate::pos::PoS; +use crate::spo::SpoTriple; +use crate::window::{ExpectedReason, SentenceWindow, WindowEntry}; + +// ── Left-corner trigger ─────────────────────────────────────────────────── + +/// The first strong signal of a semantic frame at the start of a sentence. +/// +/// Used to set the top-down expectation in `ReadingState` before the full +/// SPO parse completes. Maps vocabulary rank or morphological pattern to +/// a frame type, enabling O(1) frame pre-selection. +#[derive(Clone, Copy, Debug, PartialEq, Eq, Default)] +pub enum LeftCornerTrigger { + /// No special trigger — plain declarative SVO frame. + #[default] + Declarative, + /// Causal connector ("because", "therefore", "so") → causal explanation frame. + Causal, + /// Temporal marker ("after", "before", "when", "then") → temporal ordering frame. + Temporal, + /// Relative pronoun ("who", "which", "that") → relative-clause + coreference frame. + Relative, + /// Personal pronoun subject ("he", "she", "it", "they") → anaphora lookup. + Anaphora, + /// First-person subject ("I", "we") → agent-perspective frame. + FirstPerson, + /// Domain-specific trigger from caller (e.g. "invoice" → business-document frame). + /// Note: `basin_byte()` reserves bit 7 as the domain marker and keeps only the + /// low 7 bits of the tag (`0x80 | tag & 0x7F`), so distinct tags must differ in + /// bits 0-6 (0..=127) to map to distinct basin bytes. + Domain(u8), +} + +impl LeftCornerTrigger { + /// Lane-7 basin byte for this trigger (feeds into Cam64 basin lane). + pub fn basin_byte(self) -> u8 { + match self { + Self::Declarative => 0x00, + Self::Causal => 0x01, + Self::Temporal => 0x02, + Self::Relative => 0x04, + Self::Anaphora => 0x08, + Self::FirstPerson => 0x10, + Self::Domain(tag) => 0x80 | (tag & 0x7F), + } + } +} + +// ── Per-triple caller-supplied features ────────────────────────────────── + +/// Features supplied by the caller for one triple within a sentence. +/// +/// The parser produces `SentenceStructure` (grammar signals). The caller +/// supplies `TripleFeatures` (semantic / lexical annotations unavailable +/// to the FSM parser). Both together feed `step()`. +#[derive(Clone, Debug, Default)] +pub struct TripleFeatures { + /// Byte offsets (token-span) of the head term in the original text. + pub token_span_start: u16, + pub token_span_end: u16, + /// PoS of the head term (if known; defaults to Noun). + pub pos_tag: Option, + /// Is the subject head a personal pronoun? (enables coreference lookup) + pub subject_is_pronoun: bool, + /// Is the object head a personal pronoun? + pub object_is_pronoun: bool, + /// NSM semantic prime mask (63 primes → 64 bits, bit 0-62 = prime index). + pub nsm_prime_mask: u64, + /// NSM semantic molecule mask. + pub nsm_molecule_mask: u64, + /// CAM-PQ 6-subspace code for the subject head. + pub cam_code: [u8; 6], + /// Left-corner trigger for this sentence (first strong frame signal). + /// Only the first triple's trigger is used to set the frame expectation. + pub left_corner_trigger: LeftCornerTrigger, + /// Episodic quality annotations (caller-supplied, 0.0..1.0). + pub confidence: f32, + pub frequency: f32, + pub novelty: f32, + pub wisdom: f32, + pub staunen: f32, + pub entropy: f32, + pub free_energy_delta: f32, +} + +/// All caller-supplied features for a sentence. +#[derive(Clone, Debug, Default)] +pub struct SentenceFeatures { + /// One entry per triple in `SentenceStructure::triples`. + /// If shorter than the triple list, missing entries use `TripleFeatures::default()`. + pub per_triple: Vec, +} + +impl SentenceFeatures { + pub fn get(&self, idx: usize) -> &TripleFeatures { + self.per_triple.get(idx).unwrap_or(&DEFAULT_TRIPLE_FEATURES) + } +} + +static DEFAULT_TRIPLE_FEATURES: TripleFeatures = TripleFeatures { + token_span_start: 0, + token_span_end: 0, + pos_tag: None, + subject_is_pronoun: false, + object_is_pronoun: false, + nsm_prime_mask: 0, + nsm_molecule_mask: 0, + cam_code: [0u8; 6], + left_corner_trigger: LeftCornerTrigger::Declarative, + confidence: 1.0, + frequency: 0.5, + novelty: 0.0, + wisdom: 0.0, + staunen: 0.0, + entropy: 0.5, + free_energy_delta: 0.0, +}; + +// ── ReadingState ────────────────────────────────────────────────────────── + +/// The complete reading state at sentence boundary `t`. +/// +/// `step()` takes `ReadingState_t` + `SentenceFeatures_t` and returns +/// `(frames_t, ReadingState_t+1)`. Pure function; `self` is not mutated. +#[derive(Clone, Debug)] +pub struct ReadingState { + pub doc_id: u32, + pub sentence_id: u32, + + // ── Top-down expectation (left-corner "I expected something") ──────── + /// Vocabulary-rank bucket of the expected subject (NO_ROLE = no expectation). + pub expected_subject_bucket: u16, + /// Vocabulary-rank bucket of the expected predicate. + pub expected_predicate_bucket: u16, + /// Active frame type set by the left-corner trigger. + pub active_trigger: LeftCornerTrigger, + + // ── Bottom-up evidence (last resolved triple) ──────────────────────── + pub active_subject: u16, + pub active_predicate: u16, + pub active_object: u16, + + // ── Entity / coreference stack ─────────────────────────────────────── + entity_stack: [u16; 8], + entity_stack_len: usize, + + // ── Current Cam64 locality code ────────────────────────────────────── + pub cam64: Cam64, + + // ── ±5 sentence window ──────────────────────────────────────────────── + pub window: SentenceWindow, +} + +impl ReadingState { + /// Create an initial reading state for a new document. + pub fn new(doc_id: u32) -> Self { + Self { + doc_id, + sentence_id: 0, + expected_subject_bucket: NO_ROLE, + expected_predicate_bucket: NO_ROLE, + active_trigger: LeftCornerTrigger::Declarative, + active_subject: NO_ROLE, + active_predicate: NO_ROLE, + active_object: NO_ROLE, + entity_stack: [NO_ROLE; 8], + entity_stack_len: 0, + cam64: Cam64::default(), + window: SentenceWindow::new(), + } + } + + /// Advance the reading state by one sentence. + /// + /// Returns the episodic SPO frames emitted for this sentence and the + /// next reading state. Pure: `self` is consumed, `next` is returned. + /// + /// `features` provides per-triple annotations unavailable to the FSM + /// parser (pronoun flags, NSM masks, CAM codes, quality markers). + pub fn step( + self, + sentence: &SentenceStructure, + features: &SentenceFeatures, + ) -> (Vec, ReadingState) { + let mut frames = Vec::with_capacity(sentence.triples.len().max(1)); + let mut next = self; + next.sentence_id += 1; + + if sentence.is_empty() { + return (frames, next); + } + + // Forward expectations are single-step predictions: each `step` rebuilds + // them fresh. Clearing here bounds the expectation buffer (it can never + // accumulate stale slots across sentences until `MAX_EXPECTED` fills and + // `push_expected` silently drops newer antecedents). + next.window.clear_expected(); + + // Left-corner trigger from the first triple's features sets the frame. + let first_feat = features.get(0); + next.active_trigger = first_feat.left_corner_trigger; + + // Update top-down expectation based on the trigger. + // Causal/temporal triggers shift the expected predicate toward the + // connective vocabulary bucket; anaphora trigger signals that we + // need entity stack lookup for the subject. + match next.active_trigger { + LeftCornerTrigger::Causal | LeftCornerTrigger::Temporal => { + // Expectation: predicate will be a temporal/causal verb. + // Keep as NO_ROLE but trigger bit propagates via cam64 lane 7. + } + // Left-corner: relative pronoun ("who", "which") → the active subject + // from the PRIOR sentence is expected to be the antecedent. Pre-push it + // into the window's expectation buffer so resolve_pronoun() finds it + // first (Pika chart-arc slot pre-population). + LeftCornerTrigger::Relative if next.active_subject != NO_ROLE => { + next.window + .push_expected(next.active_subject, ExpectedReason::RelativeClause); + } + // Left-corner: personal pronoun subject → prior active subject is the + // most likely referent. Pre-push as anaphora expectation. + LeftCornerTrigger::Anaphora if next.active_subject != NO_ROLE => { + next.window + .push_expected(next.active_subject, ExpectedReason::Anaphora); + } + _ => {} + } + + let mut window_entry = WindowEntry { + sentence_id: next.sentence_id, + ..WindowEntry::default() + }; + + for (triple_idx, triple) in sentence.triples.iter().enumerate() { + let feat = features.get(triple_idx); + let morph = MorphFlags::from_sentence_structure(sentence, triple_idx); + let has_temporal = sentence.is_temporal(triple_idx); + + // ── Coreference resolution ─────────────────────────────────── + let refers_to = if feat.subject_is_pronoun { + // Left-corner anaphora: resolve from window entity stack. + next.window.resolve_pronoun(triple.subject()) + } else { + NO_ROLE + }; + let coref_resolved = refers_to != NO_ROLE; + + // ── Effective subject after coreference ────────────────────── + let effective_subject = if coref_resolved { + refers_to + } else { + triple.subject() + }; + + // ── Build Cam64 locality code ──────────────────────────────── + // Build from the EFFECTIVE triple (subject replaced by the resolved + // antecedent) so the locality key, P64, and CAM4096 are keyed to the + // real entity. Otherwise "John … He …" emits a frame whose truth + // fields point at John but whose cam64 is bucketed on the pronoun. + // When there is no coreference, effective_subject == triple.subject() + // so this is a no-op. + let effective_triple = + SpoTriple::new(effective_subject, triple.predicate(), triple.object()); + let stack_depth = next.entity_stack_len.min(127) as u8; + let base_cam64 = Cam64::from_triple( + &effective_triple, + morph, + stack_depth, + coref_resolved, + has_temporal, + feat.novelty > 0.7, + ); + // Overlay basin lane with the left-corner trigger signal. + let cam64 = base_cam64.with_lane( + 7, + base_cam64.basin_state() | next.active_trigger.basin_byte(), + ); + + // ── Discourse role ────────────────────────────────────────── + let discourse_role = if triple_idx == 0 { + match next.active_trigger { + LeftCornerTrigger::Causal => DiscourseRole::Comment, + LeftCornerTrigger::Temporal => DiscourseRole::Bridge, + LeftCornerTrigger::Anaphora | LeftCornerTrigger::Relative => { + DiscourseRole::Background + } + _ => DiscourseRole::Topic, + } + } else { + DiscourseRole::Comment + }; + + // ── Clause role ───────────────────────────────────────────── + let clause_role = if morph.is_relative_clause() { + ClauseRole::Relative + } else if morph.is_subordinate() { + ClauseRole::Subordinate + } else if morph.is_infinitive() { + ClauseRole::Infinitival + } else { + ClauseRole::Main + }; + + // ── Emit frame ─────────────────────────────────────────────── + let frame = EpisodicSpoFrame { + doc_id: next.doc_id, + sentence_id: next.sentence_id, + token_span_start: feat.token_span_start, + token_span_end: feat.token_span_end, + term_id: effective_subject, + pos_tag: feat.pos_tag.unwrap_or(PoS::Noun), + morph_flags: morph, + dependency_role: DependencyRole::Subject, + clause_role, + discourse_role, + subject_candidate_id: effective_subject, + predicate_candidate_id: triple.predicate(), + object_candidate_id: triple.object(), + refers_to_candidate_id: refers_to, + sentence_window_offset: 0, + nsm_prime_mask: feat.nsm_prime_mask, + nsm_molecule_mask: feat.nsm_molecule_mask, + cam_code: feat.cam_code, + confidence: feat.confidence, + frequency: feat.frequency, + novelty: feat.novelty, + wisdom: feat.wisdom, + staunen: feat.staunen, + entropy: feat.entropy, + free_energy_delta: feat.free_energy_delta, + cam64, + }; + + frames.push(frame); + + // ── Update entity stack ────────────────────────────────────── + // Push non-pronoun heads so future sentences can resolve coreference. + if !feat.subject_is_pronoun && triple.subject() != NO_ROLE { + next.push_entity(triple.subject()); + window_entry.push_head(triple.subject()); + } + if !feat.object_is_pronoun && triple.object() != NO_ROLE { + next.push_entity(triple.object()); + window_entry.push_head(triple.object()); + } + + // Track the primary (first) triple as the active bottom-up evidence. + if triple_idx == 0 { + next.active_subject = effective_subject; + next.active_predicate = triple.predicate(); + next.active_object = triple.object(); + next.cam64 = cam64; + + // Update top-down expectation buckets from what we just saw. + next.expected_subject_bucket = (effective_subject >> 5) & 0x7F; + next.expected_predicate_bucket = (triple.predicate() >> 5) & 0x7F; + } + } + + window_entry.primary_spo_packed = if !sentence.triples.is_empty() { + sentence.triples[0].as_u64() + } else { + 0 + }; + next.window.push(window_entry); + + (frames, next) + } + + // ── Entity stack helpers ───────────────────────────────────────────── + + /// Push an entity into the coreference stack (LIFO, bounded at 8). + /// Evicts the oldest entry when full. + pub fn push_entity(&mut self, rank: u16) { + if rank == NO_ROLE { + return; + } + if self.entity_stack_len < 8 { + self.entity_stack[self.entity_stack_len] = rank; + self.entity_stack_len += 1; + } else { + // Rotate left, drop oldest (index 0), push newest at the end. + self.entity_stack.rotate_left(1); + self.entity_stack[7] = rank; + } + } + + /// Iterate entity stack from most recent to oldest. + pub fn entities_recent_first(&self) -> impl Iterator + '_ { + self.entity_stack[..self.entity_stack_len] + .iter() + .rev() + .copied() + } + + /// Number of entities currently in the stack. + pub fn entity_count(&self) -> usize { + self.entity_stack_len + } +} + +#[cfg(test)] +mod tests { + use super::*; + + fn sentence_one_triple(s: u16, p: u16, o: u16) -> SentenceStructure { + SentenceStructure { + triples: vec![SpoTriple::new(s, p, o)], + modifiers: vec![], + negations: vec![], + temporals: vec![], + } + } + + fn plain_features() -> SentenceFeatures { + SentenceFeatures { + per_triple: vec![TripleFeatures { + confidence: 0.9, + frequency: 0.5, + ..Default::default() + }], + } + } + + #[test] + fn step_increments_sentence_id() { + let rs = ReadingState::new(1); + let s = sentence_one_triple(10, 20, 30); + let (_, rs2) = rs.step(&s, &plain_features()); + assert_eq!(rs2.sentence_id, 1); + let (_, rs3) = rs2.step(&s, &plain_features()); + assert_eq!(rs3.sentence_id, 2); + } + + #[test] + fn active_spo_updated_from_first_triple() { + let rs = ReadingState::new(0); + let s = sentence_one_triple(100, 200, 300); + let (frames, rs2) = rs.step(&s, &plain_features()); + assert_eq!(frames.len(), 1); + assert_eq!(rs2.active_subject, 100); + assert_eq!(rs2.active_predicate, 200); + assert_eq!(rs2.active_object, 300); + } + + #[test] + fn entity_stack_grows_with_non_pronoun_heads() { + let rs = ReadingState::new(0); + let s = sentence_one_triple(50, 60, 70); + let (_, rs2) = rs.step(&s, &plain_features()); + // Subject (50) and object (70) both pushed + assert_eq!(rs2.entity_count(), 2); + } + + #[test] + fn pronoun_resolves_to_prior_entity() { + let rs = ReadingState::new(0); + // First sentence: introduce entity 50 + let s1 = sentence_one_triple(50, 60, 70); + let (_, rs2) = rs.step(&s1, &plain_features()); + + // Second sentence: subject is a pronoun (rank=5, some pronoun rank) + let s2 = sentence_one_triple(5, 80, 90); + let feat = SentenceFeatures { + per_triple: vec![TripleFeatures { + subject_is_pronoun: true, + confidence: 0.8, + ..Default::default() + }], + }; + let (frames, _) = rs2.step(&s2, &feat); + assert_eq!(frames.len(), 1); + // Most recent entity was 70 (object of first sentence), then 50. + // resolve_pronoun returns most recent → 70. + assert_eq!(frames[0].refers_to_candidate_id, 70); + // Effective subject = resolved referent. + assert_eq!(frames[0].subject_candidate_id, 70); + } + + #[test] + fn pronoun_no_prior_entity_stays_no_role() { + let rs = ReadingState::new(0); + let s = sentence_one_triple(5, 20, 30); + let feat = SentenceFeatures { + per_triple: vec![TripleFeatures { + subject_is_pronoun: true, + ..Default::default() + }], + }; + let (frames, _) = rs.step(&s, &feat); + assert_eq!(frames[0].refers_to_candidate_id, NO_ROLE); + } + + #[test] + fn empty_sentence_emits_no_frames() { + let rs = ReadingState::new(0); + let s = SentenceStructure { + triples: vec![], + modifiers: vec![], + negations: vec![], + temporals: vec![], + }; + let (frames, rs2) = rs.step(&s, &SentenceFeatures::default()); + assert!(frames.is_empty()); + assert_eq!(rs2.sentence_id, 1); + } + + #[test] + fn left_corner_trigger_recorded_in_cam64_basin_lane() { + let rs = ReadingState::new(0); + let s = sentence_one_triple(10, 20, 30); + let feat = SentenceFeatures { + per_triple: vec![TripleFeatures { + left_corner_trigger: LeftCornerTrigger::Causal, + ..Default::default() + }], + }; + let (frames, _) = rs.step(&s, &feat); + // Basin lane must have the causal trigger bit set. + let basin = frames[0].cam64.basin_state(); + assert_ne!(basin & LeftCornerTrigger::Causal.basin_byte(), 0); + } + + #[test] + fn entity_stack_evicts_oldest_at_capacity() { + let mut rs = ReadingState::new(0); + for i in 0..9u16 { + rs.push_entity(i * 10); + } + // Stack is bounded at 8; oldest entry (0) should be gone. + let entities: Vec = rs.entities_recent_first().collect(); + assert_eq!(entities.len(), 8); + assert!( + !entities.contains(&0), + "oldest entity should have been evicted" + ); + assert!(entities.contains(&80), "newest entity should be present"); + } + + #[test] + fn window_populated_after_step() { + let rs = ReadingState::new(0); + let s = sentence_one_triple(10, 20, 30); + let (_, rs2) = rs.step(&s, &plain_features()); + assert_eq!(rs2.window.len(), 1); + let entry = rs2.window.most_recent().unwrap(); + assert!(entry.contains(10)); // subject pushed + assert!(entry.contains(30)); // object pushed + } + + #[test] + fn temporal_sentence_sets_past_morph_in_frame() { + let rs = ReadingState::new(0); + let s = SentenceStructure { + triples: vec![SpoTriple::new(1, 2, 3)], + modifiers: vec![], + negations: vec![], + temporals: vec![(0, 99)], // triple 0 is temporal + }; + let (frames, _) = rs.step(&s, &plain_features()); + assert!(frames[0].morph_flags.is_past()); + } + + #[test] + fn relative_trigger_pushes_expected_subject() { + let rs = ReadingState::new(0); + // Sentence 1: establish active_subject = 100. + let s1 = sentence_one_triple(100, 200, 300); + let (_, rs2) = rs.step(&s1, &plain_features()); + assert_eq!(rs2.active_subject, 100); + + // Sentence 2 has a Relative trigger → should pre-push 100 as expected slot. + let s2 = sentence_one_triple(10, 20, 30); + let feat = SentenceFeatures { + per_triple: vec![TripleFeatures { + left_corner_trigger: LeftCornerTrigger::Relative, + ..Default::default() + }], + }; + let (_, rs3) = rs2.step(&s2, &feat); + // The expected slot was pushed during step and is visible via iter_expected(). + // (The count is consumed only if resolve_pronoun drains it; here + // subject_is_pronoun=false so the slot remains.) + assert_eq!(rs3.window.iter_expected().len(), 1); + assert_eq!(rs3.window.iter_expected()[0].rank, 100); + } + + #[test] + fn pronoun_prefers_expected_relative_subject() { + let rs = ReadingState::new(0); + // Sentence 1: introduce entities 50 (subj) and 70 (obj). + let s1 = sentence_one_triple(50, 60, 70); + let (_, rs2) = rs.step(&s1, &plain_features()); + + // Sentence 2: Relative trigger fires → expects 50 (prior active_subject). + // Subject of sentence 2 is a pronoun. + let s2 = sentence_one_triple(5, 80, 90); + let feat = SentenceFeatures { + per_triple: vec![TripleFeatures { + left_corner_trigger: LeftCornerTrigger::Relative, + subject_is_pronoun: true, + confidence: 0.9, + ..Default::default() + }], + }; + let (frames, _) = rs2.step(&s2, &feat); + // Expected slot has rank=50; recency heuristic without it would give 70. + // With the expectation, resolve_pronoun should return 50. + assert_eq!(frames[0].refers_to_candidate_id, 50); + assert_eq!(frames[0].subject_candidate_id, 50); + } + + #[test] + fn cam64_keyed_to_resolved_antecedent_not_pronoun() { + // "John(70) ... He(5) ...": the frame's cam64 entity lane must bucket + // the resolved antecedent (70), NOT the pronoun rank (5). + let rs = ReadingState::new(0); + let s1 = sentence_one_triple(50, 60, 70); // most-recent head = 70 + let (_, rs2) = rs.step(&s1, &plain_features()); + + let s2 = sentence_one_triple(5, 80, 90); // subject is a pronoun + let feat = SentenceFeatures { + per_triple: vec![TripleFeatures { + subject_is_pronoun: true, + ..Default::default() + }], + }; + let (frames, _) = rs2.step(&s2, &feat); + // Resolved antecedent is 70 (most recent prior head). + assert_eq!(frames[0].subject_candidate_id, 70); + // Entity lane must be bucketed on 70, not on the pronoun rank 5. + assert_eq!(frames[0].cam64.entity_state(), (70u16 >> 5) as u8); + assert_ne!(frames[0].cam64.entity_state(), (5u16 >> 5) as u8); + } + + #[test] + fn expectations_do_not_accumulate_across_sentences() { + // Many consecutive Relative-trigger sentences must NOT fill the + // expectation buffer and silently drop newer antecedents: each step + // clears stale slots first, so the most recent antecedent always wins. + let mut rs = ReadingState::new(0); + // Prime an initial subject. + let (_, next) = rs.step(&sentence_one_triple(1000, 1, 2), &plain_features()); + rs = next; + + // Run 8 relative-trigger sentences (> MAX_EXPECTED = 4). + for i in 0..8u16 { + let subj = 2000 + i * 10; + let s = sentence_one_triple(subj, 1, 2); + let feat = SentenceFeatures { + per_triple: vec![TripleFeatures { + left_corner_trigger: LeftCornerTrigger::Relative, + ..Default::default() + }], + }; + let (_, next) = rs.step(&s, &feat); + rs = next; + // After each step the buffer holds exactly this step's single + // expectation — never accumulating toward the MAX_EXPECTED drop point. + assert_eq!(rs.window.iter_expected().len(), 1); + } + } +} diff --git a/crates/deepnsm/src/sentence_transformer64.rs b/crates/deepnsm/src/sentence_transformer64.rs new file mode 100644 index 000000000..8648f043d --- /dev/null +++ b/crates/deepnsm/src/sentence_transformer64.rs @@ -0,0 +1,929 @@ +//! SentenceTransformer64 — deterministic state-transition transformer. +//! +//! **"Transformer" here means state-transition transformer, not neural self-attention.** +//! +//! ## What P64 is +//! +//! `P64` is the **native address space** of the English reading state machine. +//! It is NOT a compressed approximation of a float embedding. It is a direct +//! symbolic-palette projection: +//! +//! ```text +//! COCA frequency / NSM primes / morphology / grammar / discourse +//! → direct codebook address +//! → P64 meaning field (8 lanes × 8 bits) +//! → CAM4096 locality key (12-bit deterministic projection) +//! → HHTL / GridLake neighborhood lookup +//! ``` +//! +//! Floats may approximate P64 for external ML interop. P64 does not approximate +//! floats. The direction of approximation is one-way, from the outside in. +//! +//! ## Vertical meaning field +//! +//! Each word / sentence token projects **vertically** into the 64-bit field — +//! it activates across multiple lanes simultaneously: +//! +//! ```text +//! "because" +//! lane 6 (causal): opens causal frame +//! lane 4 (clause): subordinate clause trigger +//! lane 5 (discourse): explanation continuation +//! lane 7 (basin): possible coherence delta +//! ``` +//! +//! One word = one column of meaning, not a scalar token. The 8 lanes are +//! **orthogonal semantic planes**, not storage rows. +//! +//! ## Local 4×4 perturbation tile +//! +//! For local reading ambiguity, instead of a continuous Gaussian in f32 space, +//! DeepNSM uses a **discrete 4×4 perturbation tile**: 16 local alternatives per +//! step. Axes: +//! +//! ```text +//! row axis = semantic lane perturbation (entity/predicate/object shift) +//! column axis = syntactic/pragmatic perturbation (clause/discourse shift) +//! ``` +//! +//! Over `n` tokens/sentences this is an implicit (4×4)^n trajectory space. +//! We do NOT materialise it. We keep a small active frontier (Pika-style): +//! +//! - exact P64 state +//! - CAM4096 arc to next state +//! - HHTL/GridLake neighbourhood (Hamming ±1/±2, lane masks) +//! - popcount early exit +//! - AriGraph basin continuity +//! +//! ## CAM4096 codebook classes (examples) +//! +//! ```text +//! pronoun_subject_masc_recent relative_clause_subject_continuation +//! causal_clause_opener temporal_anchor_before +//! business_document_object approval_action_frame +//! negated_action_frame reported_speech_frame +//! basin_reinforcement basin_contradiction epiphany_candidate +//! ``` +//! +//! These are native-English reading-state classes, not raw word ids. +//! +//! ## Flow +//! +//! ```text +//! sentence +//! → SentenceTransformer64::project() +//! → Sentence64 { p64, cam4096, spo_hint } +//! → EpisodicSpoFrame (truth witness, in episodic_spo) +//! → holograph BitpackedVector (resonance, 16Kbit) +//! → AriGraph basin update +//! ``` + +use crate::cam64::Cam64; +use crate::episodic_spo::{DependencyRole, EpisodicSpoFrame}; +use crate::spo::NO_ROLE; + +// ── P64 ────────────────────────────────────────────────────────────────────── + +/// 8×8-bit vertical meaning field — the native P64 address space. +/// +/// Eight orthogonal semantic planes, each 8 bits wide: +/// +/// | Lane | Semantic plane | Source | +/// |------|----------------|--------| +/// | 0 | entity/subject bucket | COCA rank >> 5 (128 buckets) | +/// | 1 | predicate/action bucket | COCA rank >> 5 | +/// | 2 | object/complement bucket | COCA rank >> 5 (0 if absent) | +/// | 3 | morphology (tense/number/person/voice/negation) | MorphFlags low byte | +/// | 4 | clause structure (relative/subordinate/infinitival) | MorphFlags high byte | +/// | 5 | discourse/coreference (depth + coref flag) | entity stack | +/// | 6 | causal/temporal/conditional markers | temporal/causal signal | +/// | 7 | basin/novelty/wisdom/epiphany markers | quality annotations | +/// +/// ## This is NOT quantised float space +/// +/// `P64` is computed directly from vocabulary ranks, grammar tags, and NSM +/// prime masks — no neural network, no float arithmetic, no rounding error. +/// It is the same information as [`Cam64`] but emphasised as the *meaning-field* +/// output: the full-resolution 64-bit semantic/grammar palette. +#[derive(Clone, Copy, Debug, Default, PartialEq, Eq, Hash)] +#[repr(transparent)] +pub struct P64(pub u64); + +impl P64 { + /// Construct from an explicit 8-byte lane array. + #[inline] + pub fn from_lanes(lanes: [u8; 8]) -> Self { + let mut v = 0u64; + for (i, &b) in lanes.iter().enumerate() { + v |= (b as u64) << (i * 8); + } + Self(v) + } + + /// Extract one lane (0-7). + #[inline] + pub fn lane(self, i: usize) -> u8 { + debug_assert!(i < 8); + (self.0 >> (i * 8)) as u8 + } + + /// Return a new `P64` with one lane replaced. + #[inline] + pub fn with_lane(self, i: usize, val: u8) -> Self { + debug_assert!(i < 8); + let mask = !(0xFFu64 << (i * 8)); + Self((self.0 & mask) | ((val as u64) << (i * 8))) + } + + /// Named lanes — entity/subject bucket (lane 0). + #[inline] + pub fn entity(self) -> u8 { + self.lane(0) + } + /// Named lanes — predicate/action bucket (lane 1). + #[inline] + pub fn predicate(self) -> u8 { + self.lane(1) + } + /// Named lanes — object/complement bucket (lane 2). + #[inline] + pub fn object(self) -> u8 { + self.lane(2) + } + /// Named lanes — morphology low byte (lane 3). + #[inline] + pub fn morph(self) -> u8 { + self.lane(3) + } + /// Named lanes — clause structure / MorphFlags high byte (lane 4). + #[inline] + pub fn clause(self) -> u8 { + self.lane(4) + } + /// Named lanes — discourse / coreference (lane 5). + #[inline] + pub fn discourse(self) -> u8 { + self.lane(5) + } + /// Named lanes — causal/temporal/conditional (lane 6). + #[inline] + pub fn causal(self) -> u8 { + self.lane(6) + } + /// Named lanes — basin/novelty/epiphany (lane 7). + #[inline] + pub fn basin(self) -> u8 { + self.lane(7) + } + + /// XOR bind with another P64 (VSA binding — recovers either component when + /// the other is known). + #[inline] + pub fn bind(self, other: P64) -> P64 { + P64(self.0 ^ other.0) + } + + /// Popcount — active bits in the meaning field. + #[inline] + pub fn popcount(self) -> u32 { + self.0.count_ones() + } + + /// Lane-level agreement with another field (64 = identical). + /// + /// Computed as `64 - (self XOR other).count_ones()`. No floats. + #[inline] + pub fn agreement(self, other: P64) -> u32 { + 64 - (self.0 ^ other.0).count_ones() + } + + /// True if the two fields are in the same reading basin. + /// + /// Threshold: ≥ 40 of 64 bits agree. Tuned to survive normal sentence + /// progression (morph/discourse lanes shift each sentence; entity/predicate + /// lanes are stable within a topic). + #[inline] + pub fn same_basin(self, other: P64) -> bool { + self.agreement(other) >= 40 + } + + /// Derive from a [`Cam64`] locality key and the NSM prime mask. + /// + /// The NSM prime mask contributes the *semantic prime* signal that Cam64 + /// doesn't carry. Low 16 bits of the mask are folded into lanes 3-4 via + /// XOR so the meaning field is sensitive to prime coverage without losing + /// grammar-lane signals. + /// + /// This is the canonical construction path: grammar → Cam64 → P64. + pub fn from_cam64_and_nsm(cam: Cam64, nsm_prime_mask: u64) -> Self { + let nsm_low = nsm_prime_mask & 0xFF; + let nsm_high = (nsm_prime_mask >> 8) & 0xFF; + let nsm_xor = nsm_low | (nsm_high << 8); // into bits 24-39 + Self(cam.raw() ^ (nsm_xor << 24)) + } + + /// Raw u64. + #[inline] + pub fn raw(self) -> u64 { + self.0 + } +} + +impl From for P64 { + /// Lift a Cam64 locality key into a P64 meaning field (no NSM contribution). + /// + /// Prefer `P64::from_cam64_and_nsm()` when an NSM prime mask is available. + fn from(cam: Cam64) -> Self { + Self(cam.raw()) + } +} + +// ── Cam4096 ────────────────────────────────────────────────────────────────── + +/// 12-bit deterministic CAM codebook address derived from a [`P64`] meaning field. +/// +/// 4096 cells = the full-resolution native-English reading-state palette. +/// +/// **Derivation is a fold, not quantisation.** Three nibbles are selected from +/// P64 lanes and packed: +/// +/// ```text +/// bits 0.. 3 = entity lane top nibble (vocabulary bucket cluster) +/// bits 4.. 7 = predicate lane top nibble +/// bits 8..11 = basin lane top nibble +/// bits 12..15 = always zero +/// ``` +/// +/// The three nibbles cover the subject, predicate, and episodic-basin +/// dimensions — sufficient to select the reading-state class without +/// redundant information from the discourse/morphology lanes. +#[derive(Clone, Copy, Debug, Default, PartialEq, Eq, Hash)] +#[repr(transparent)] +pub struct Cam4096(pub u16); + +impl Cam4096 { + /// Derive deterministically from a `P64` meaning field. + /// + /// Uses top nibbles of entity (lane 0), predicate (lane 1), and basin + /// (lane 7). The fold is lossless in the sense that no float operation is + /// involved — it is a bit-selection + pack. + #[inline] + pub fn from_p64(p: P64) -> Self { + let e = (p.entity() >> 4) as u16; // top nibble of entity lane + let r = (p.predicate() >> 4) as u16; // top nibble of predicate lane + let b = (p.basin() >> 4) as u16; // top nibble of basin lane + Self(e | (r << 4) | (b << 8)) + } + + /// Raw 12-bit codebook address (bits 12-15 always zero). + #[inline] + pub fn raw(self) -> u16 { + self.0 & 0x0FFF + } + + /// Nibble at position 0 (entity cluster). + #[inline] + pub fn entity_nibble(self) -> u8 { + (self.0 & 0xF) as u8 + } + + /// Nibble at position 1 (predicate cluster). + #[inline] + pub fn predicate_nibble(self) -> u8 { + ((self.0 >> 4) & 0xF) as u8 + } + + /// Nibble at position 2 (basin class). + #[inline] + pub fn basin_nibble(self) -> u8 { + ((self.0 >> 8) & 0xF) as u8 + } + + /// Nibble distance (0-3): count of differing nibble positions. + #[inline] + pub fn nibble_distance(self, other: Cam4096) -> u8 { + let x = self.0 ^ other.0; + ((x & 0x00F != 0) as u8) + ((x & 0x0F0 != 0) as u8) + ((x & 0xF00 != 0) as u8) + } + + /// True if same entity, predicate, and basin cluster (nibble_distance == 0). + #[inline] + pub fn exact_match(self, other: Cam4096) -> bool { + self.0 == other.0 + } + + /// True if at most one cluster differs (basin continuity heuristic). + #[inline] + pub fn near_match(self, other: Cam4096) -> bool { + self.nibble_distance(other) <= 1 + } +} + +// ── Perturbation4x4 ─────────────────────────────────────────────────────────── + +/// Local 4×4 perturbation tile — 16 discrete reading alternatives. +/// +/// Rows = semantic lane perturbation (entity / predicate / object shift). +/// Cols = syntactic/pragmatic perturbation (clause / discourse shift). +/// +/// Each cell encodes a **lane-delta** as a pair of signed nibbles packed into +/// one byte: `(row_delta: i4, col_delta: i4)`. Over `n` steps the implicit +/// trajectory space is (4×4)^n but we never materialise it — HHTL/GridLake +/// prunes to the small living frontier. +/// +/// ## Encoding +/// +/// Each of the 16 cells is a `u8` with two nibbles: +/// - bits 0-3: semantic axis delta (0..7 = +0..+7, 8..15 = -8..-1 in two's complement nibble) +/// - bits 4-7: syntactic axis delta (same encoding) +/// +/// Delta 0 = centre / no perturbation. +#[derive(Clone, Copy, Debug, PartialEq, Eq)] +pub struct Perturbation4x4 { + /// Row-major 4×4 grid of (semantic_delta, syntactic_delta) pairs. + pub cells: [u8; 16], +} + +impl Perturbation4x4 { + /// The identity tile — all 16 cells are zero perturbation. + pub const IDENTITY: Self = Self { cells: [0u8; 16] }; + + /// Encode a cell from signed (semantic, syntactic) deltas (-8..+7). + #[inline] + pub fn encode_cell(semantic: i8, syntactic: i8) -> u8 { + let s = (semantic as u8) & 0xF; // two's complement nibble + let y = (syntactic as u8) & 0xF; + s | (y << 4) + } + + /// Decode a cell's semantic delta. + #[inline] + pub fn semantic_delta(cell: u8) -> i8 { + let nibble = cell & 0xF; + if nibble < 8 { + nibble as i8 + } else { + nibble as i8 - 16 + } + } + + /// Decode a cell's syntactic delta. + #[inline] + pub fn syntactic_delta(cell: u8) -> i8 { + let nibble = (cell >> 4) & 0xF; + if nibble < 8 { + nibble as i8 + } else { + nibble as i8 - 16 + } + } + + /// Apply cell `idx` (0-15) to a `P64` field, perturbing lanes 0 and 4. + /// + /// - Lane 0 (entity): shifted by `semantic_delta` + /// - Lane 4 (clause): shifted by `syntactic_delta` + /// + /// Wraps within the 8-bit lane (no overflow into adjacent lanes). + pub fn apply(&self, p: P64, cell_idx: usize) -> P64 { + debug_assert!(cell_idx < 16); + let cell = self.cells[cell_idx]; + let sem = Self::semantic_delta(cell); + let syn = Self::syntactic_delta(cell); + let new_entity = p.entity().wrapping_add(sem as u8); + let new_clause = p.clause().wrapping_add(syn as u8); + p.with_lane(0, new_entity).with_lane(4, new_clause) + } +} + +// ── Discrete palette splat ──────────────────────────────────────────────────── + +/// One neighbour in a discrete palette splat. +#[derive(Clone, Copy, Debug)] +pub struct SplatNeighbour { + /// The perturbed P64 meaning field. + pub p64: P64, + /// Derived CAM4096 address of this neighbour. + pub cam: Cam4096, + /// Hamming distance from the centre (0 = exact, 1..64 = off-centre). + pub hamming: u8, +} + +/// Discrete palette splat: expand a P64 centre into a small neighbourhood. +/// +/// This is NOT a Gaussian in f32 space. It is a **discrete palette splat**: +/// the centre code activates nearby palette cells selected by: +/// - small Hamming distance (≤ `radius` bits across all lanes) +/// - valid morphology transition (no illegal tense/clause combinations) +/// - near CAM4096 match (nibble_distance ≤ 1 for neighbours) +/// +/// The result is a small `SmallNeighbourhood` (≤ 16 entries) that represents +/// the local reading ambiguity without materialising the (4×4)^n space. +/// +/// `tile` provides the pre-defined perturbation alternatives. Pass +/// `Perturbation4x4::IDENTITY` for the trivial one-cell splat. +pub fn splat_p64(centre: P64, tile: &Perturbation4x4, radius_bits: u8) -> SmallNeighbourhood { + let centre_cam = Cam4096::from_p64(centre); + let mut out = SmallNeighbourhood::new(); + + // Centre is always included (hamming = 0). + out.push(SplatNeighbour { + p64: centre, + cam: centre_cam, + hamming: 0, + }); + + // Apply each tile cell, keep those that actually perturb the centre. + for (i, _cell) in tile.cells.iter().enumerate() { + if i == 0 { + continue; + } // centre already emitted + let perturbed = tile.apply(centre, i); + if perturbed == centre { + continue; + } // no-op cell (e.g. all-zero identity) + let h = hamming_p64(centre, perturbed); + if h <= radius_bits { + let cam = Cam4096::from_p64(perturbed); + if cam.near_match(centre_cam) { + out.push(SplatNeighbour { + p64: perturbed, + cam, + hamming: h, + }); + } + } + } + out +} + +/// Hamming distance between two P64 fields (0-64). +#[inline] +pub fn hamming_p64(a: P64, b: P64) -> u8 { + (a.0 ^ b.0).count_ones() as u8 +} + +/// A small fixed-capacity neighbourhood for discrete splat results (≤ 16 entries). +pub struct SmallNeighbourhood { + buf: [SplatNeighbour; 16], + len: usize, +} + +impl SmallNeighbourhood { + fn new() -> Self { + Self { + buf: [SplatNeighbour { + p64: P64(0), + cam: Cam4096(0), + hamming: 0, + }; 16], + len: 0, + } + } + + fn push(&mut self, n: SplatNeighbour) { + if self.len < 16 { + self.buf[self.len] = n; + self.len += 1; + } + } + + /// Iterate the active neighbours. + pub fn iter(&self) -> &[SplatNeighbour] { + &self.buf[..self.len] + } + + /// Number of neighbours (including centre). + pub fn len(&self) -> usize { + self.len + } + + /// True if the neighbourhood holds no cells. After `splat_p64` the centre is + /// always present (so this is `false` there); provided to satisfy the + /// `len_without_is_empty` contract for general callers. + pub fn is_empty(&self) -> bool { + self.len == 0 + } + + /// True if only the centre is present. + pub fn is_singleton(&self) -> bool { + self.len == 1 + } +} + +// ── EpisodicSpoHint ─────────────────────────────────────────────────────────── + +/// Compact SPO candidate hint carried alongside a `Sentence64`. +/// +/// This is a reference into the auditable `EpisodicSpoFrame` — three vocabulary +/// ranks plus the dependency role of the primary frame. Callers that only need +/// the codebook address can ignore this; callers that need to commit to AriGraph +/// use it to reconstruct the full frame. +#[derive(Clone, Copy, Debug, Default, PartialEq, Eq)] +pub struct EpisodicSpoHint { + pub subject: u16, + pub predicate: u16, + pub object: u16, + pub role: DependencyRole, +} + +impl EpisodicSpoHint { + /// Extract the primary (first) frame's hint from a slice of episodic frames. + pub fn from_primary_frame(frames: &[EpisodicSpoFrame]) -> Self { + match frames.first() { + None => Self { + subject: NO_ROLE, + predicate: NO_ROLE, + object: NO_ROLE, + role: DependencyRole::Unknown, + }, + Some(f) => Self { + subject: f.subject_candidate_id, + predicate: f.predicate_candidate_id, + object: f.object_candidate_id, + role: f.dependency_role, + }, + } + } +} + +// ── Sentence64 ──────────────────────────────────────────────────────────────── + +/// Complete output of `SentenceTransformer64` for one sentence. +/// +/// Three layers of the discrete substrate: +/// - `p64`: full 64-bit vertical meaning field (grammar + NSM + discourse) +/// - `cam`: 12-bit deterministic codebook address (P4096 palette key) +/// - `spo_hint`: compact SPO reference for AriGraph basin commitment +/// +/// No floats. Quality annotations (confidence, novelty, etc.) live in the +/// companion `EpisodicSpoFrame`. +#[derive(Clone, Copy, Debug, Default, PartialEq, Eq)] +pub struct Sentence64 { + pub p64: P64, + pub cam: Cam4096, + pub spo_hint: EpisodicSpoHint, +} + +impl Sentence64 { + /// Construct from parts (e.g. when `EpisodicSpoFrame` frames are already computed). + pub fn new(p64: P64, spo_hint: EpisodicSpoHint) -> Self { + Self { + p64, + cam: Cam4096::from_p64(p64), + spo_hint, + } + } + + /// True if this sentence is in the same reading basin as `other`. + /// + /// Both P64 agreement (≥ 40 bits) and CAM proximity (nibble_distance ≤ 1) + /// must hold. + #[inline] + pub fn same_basin_as(&self, other: &Sentence64) -> bool { + self.p64.same_basin(other.p64) && self.cam.near_match(other.cam) + } +} + +// ── SentenceTransformer64 ───────────────────────────────────────────────────── + +/// Deterministic reading-state transformer: maps grammar/NSM/discourse to P64. +/// +/// ## Not neural self-attention +/// +/// This is a state-transition transformer in the automata sense: +/// +/// ```text +/// ReadingState_t + SentenceFeatures_t +/// → P64 meaning field +/// → Cam4096 codebook address +/// → EpisodicSpoHint +/// → Sentence64 +/// ``` +/// +/// ## The codebook +/// +/// 4096 cells representing native-English reading-state classes. +/// Addressed directly from P64 lane nibbles — no float lookup, no nearest- +/// neighbour in embedding space. +pub struct SentenceTransformer64; + +impl SentenceTransformer64 { + /// Project a resolved `Cam64` + NSM mask + SPO triple into a `Sentence64`. + /// + /// This is the primary construction path: grammar already resolved by + /// `ReadingState::step()`, now lifted into the P64 meaning field. + /// + /// `spo` is the primary resolved triple (subject after coreference). + /// `nsm_prime_mask` is the 64-bit NSM prime bitset for this sentence. + pub fn project( + cam: Cam64, + nsm_prime_mask: u64, + subject: u16, + predicate: u16, + object: u16, + role: DependencyRole, + ) -> Sentence64 { + let p64 = P64::from_cam64_and_nsm(cam, nsm_prime_mask); + Sentence64::new( + p64, + EpisodicSpoHint { + subject, + predicate, + object, + role, + }, + ) + } + + /// Project directly from an `EpisodicSpoFrame`. + /// + /// Convenience wrapper: extracts `cam64`, `nsm_prime_mask`, and SPO + /// candidates from the already-emitted frame. + pub fn project_from_frame(frame: &EpisodicSpoFrame) -> Sentence64 { + Self::project( + frame.cam64, + frame.nsm_prime_mask, + frame.subject_candidate_id, + frame.predicate_candidate_id, + frame.object_candidate_id, + frame.dependency_role, + ) + } + + /// Project a batch of frames into `Sentence64` values. + /// + /// Returns one `Sentence64` per frame. The primary frame's hint is used + /// for each output; callers that need multi-triple output should call + /// `project_from_frame` for each triple individually. + pub fn project_frames(frames: &[EpisodicSpoFrame]) -> Vec { + frames.iter().map(Self::project_from_frame).collect() + } + + /// Compute a local 4×4 perturbation tile centred at `p64`. + /// + /// The tile represents the 16 most natural reading alternatives from this + /// P64 state: small lane perturbations covering adjacent vocabulary buckets + /// and clause transitions. + /// + /// `entity_step` and `clause_step` control the stride of the semantic and + /// syntactic axes respectively (typically 1-4 bucket positions). + pub fn local_tile(p64: P64, entity_step: u8, clause_step: u8) -> Perturbation4x4 { + let mut cells = [0u8; 16]; + // Row = semantic axis (entity perturbation 0..3 × entity_step) + // Col = syntactic axis (clause perturbation 0..3 × clause_step) + for row in 0i8..4 { + for col in 0i8..4 { + let sem = row * entity_step as i8; + let syn = col * clause_step as i8; + cells[(row * 4 + col) as usize] = Perturbation4x4::encode_cell(sem, syn); + } + } + // Verify the tile is non-trivial when steps > 0. + let _ = p64; // p64 is unused here; a future version may use lane context + Perturbation4x4 { cells } + } +} + +// ── Tests ───────────────────────────────────────────────────────────────────── + +#[cfg(test)] +mod tests { + use super::*; + use crate::cam64::Cam64; + use crate::episodic_spo::DependencyRole; + + // ── P64 ────────────────────────────────────────────────────────────────── + + #[test] + fn p64_lane_roundtrip() { + let lanes = [10u8, 20, 30, 40, 50, 60, 70, 80]; + let p = P64::from_lanes(lanes); + for (i, &v) in lanes.iter().enumerate() { + assert_eq!(p.lane(i), v, "lane {i}"); + } + } + + #[test] + fn p64_with_lane_does_not_corrupt_others() { + let p = P64::from_lanes([0xFF; 8]); + let p2 = p.with_lane(3, 0x00); + assert_eq!(p2.lane(3), 0x00); + for i in [0, 1, 2, 4, 5, 6, 7] { + assert_eq!(p2.lane(i), 0xFF, "lane {i} corrupted"); + } + } + + #[test] + fn p64_bind_is_self_inverse() { + let a = P64::from_lanes([0xAB; 8]); + let b = P64::from_lanes([0xCD; 8]); + assert_eq!(a.bind(b).bind(b), a); + } + + #[test] + fn p64_agreement_self_is_64() { + let p = P64::from_lanes([1, 2, 3, 4, 5, 6, 7, 8]); + assert_eq!(p.agreement(p), 64); + } + + #[test] + fn p64_same_basin_identical() { + let p = P64::from_lanes([10; 8]); + assert!(p.same_basin(p)); + } + + #[test] + fn p64_same_basin_fails_when_far() { + let a = P64(0x0000_0000_0000_0000); + let b = P64(0xFFFF_FFFF_FFFF_FFFF); + assert!(!a.same_basin(b)); + } + + #[test] + fn p64_from_cam64_and_nsm_zero_nsm_equals_cam() { + let cam = Cam64::from_lanes([1, 2, 3, 4, 5, 6, 7, 8]); + let p = P64::from_cam64_and_nsm(cam, 0); + assert_eq!(p.raw(), cam.raw()); // zero NSM → identity + } + + #[test] + fn p64_from_cam64_and_nsm_differs_with_nsm() { + let cam = Cam64::default(); + let p0 = P64::from_cam64_and_nsm(cam, 0); + let p1 = P64::from_cam64_and_nsm(cam, 0xFFFF); + assert_ne!(p0, p1); + } + + #[test] + fn p64_from_cam64_conversion() { + let cam = Cam64::from_lanes([7, 8, 9, 10, 11, 12, 13, 14]); + let p: P64 = cam.into(); + assert_eq!(p.raw(), cam.raw()); + } + + // ── Cam4096 ────────────────────────────────────────────────────────────── + + #[test] + fn cam4096_fits_in_12_bits() { + let p = P64::from_lanes([0xFF; 8]); + let c = Cam4096::from_p64(p); + assert_eq!(c.raw(), c.0 & 0x0FFF); + } + + #[test] + fn cam4096_entity_nibble_is_top_nibble_of_entity_lane() { + let p = P64::from_lanes([0xAB, 0, 0, 0, 0, 0, 0, 0]); + let c = Cam4096::from_p64(p); + assert_eq!(c.entity_nibble(), 0xA); // top nibble of 0xAB + } + + #[test] + fn cam4096_predicate_nibble() { + let p = P64::from_lanes([0, 0xCD, 0, 0, 0, 0, 0, 0]); + let c = Cam4096::from_p64(p); + assert_eq!(c.predicate_nibble(), 0xC); + } + + #[test] + fn cam4096_basin_nibble() { + let p = P64::from_lanes([0, 0, 0, 0, 0, 0, 0, 0xEF]); + let c = Cam4096::from_p64(p); + assert_eq!(c.basin_nibble(), 0xE); + } + + #[test] + fn cam4096_exact_match_same_p64() { + let p = P64::from_lanes([0x12, 0x34, 0, 0, 0, 0, 0, 0x56]); + let c = Cam4096::from_p64(p); + assert!(c.exact_match(c)); + } + + #[test] + fn cam4096_near_match_one_nibble_differs() { + // Entity nibble differs by 1 → entity cluster shifts; others unchanged. + let p1 = P64::from_lanes([0x10, 0x20, 0, 0, 0, 0, 0, 0x30]); + let p2 = P64::from_lanes([0x20, 0x20, 0, 0, 0, 0, 0, 0x30]); + let c1 = Cam4096::from_p64(p1); + let c2 = Cam4096::from_p64(p2); + assert!(c1.near_match(c2)); + } + + #[test] + fn cam4096_near_match_false_three_nibbles_differ() { + let p1 = P64::from_lanes([0x10, 0x20, 0, 0, 0, 0, 0, 0x30]); + let p2 = P64::from_lanes([0x80, 0x90, 0, 0, 0, 0, 0, 0xA0]); + let c1 = Cam4096::from_p64(p1); + let c2 = Cam4096::from_p64(p2); + assert!(!c1.near_match(c2)); + } + + // ── Perturbation4x4 ────────────────────────────────────────────────────── + + #[test] + fn perturbation_encode_decode_zero() { + let cell = Perturbation4x4::encode_cell(0, 0); + assert_eq!(Perturbation4x4::semantic_delta(cell), 0); + assert_eq!(Perturbation4x4::syntactic_delta(cell), 0); + } + + #[test] + fn perturbation_encode_decode_positive() { + let cell = Perturbation4x4::encode_cell(3, 5); + assert_eq!(Perturbation4x4::semantic_delta(cell), 3); + assert_eq!(Perturbation4x4::syntactic_delta(cell), 5); + } + + #[test] + fn perturbation_encode_decode_negative() { + let cell = Perturbation4x4::encode_cell(-2, -4); + assert_eq!(Perturbation4x4::semantic_delta(cell), -2); + assert_eq!(Perturbation4x4::syntactic_delta(cell), -4); + } + + #[test] + fn perturbation_identity_does_not_change_p64() { + let p = P64::from_lanes([0x10, 0x20, 0, 0, 0, 0, 0, 0x30]); + let result = Perturbation4x4::IDENTITY.apply(p, 0); + assert_eq!(result, p); + } + + // ── Splat ───────────────────────────────────────────────────────────────── + + #[test] + fn splat_identity_tile_returns_singleton() { + let p = P64::from_lanes([0x10, 0x20, 0, 0, 0, 0, 0, 0x30]); + let nb = splat_p64(p, &Perturbation4x4::IDENTITY, 8); + assert_eq!(nb.len(), 1); + assert_eq!(nb.iter()[0].hamming, 0); + } + + #[test] + fn splat_small_tile_stays_within_radius() { + let p = P64::from_lanes([0x10, 0x20, 0, 0, 0, 0, 0, 0x30]); + let tile = SentenceTransformer64::local_tile(p, 1, 1); + let nb = splat_p64(p, &tile, 8); + for n in nb.iter() { + assert!(n.hamming <= 8, "hamming {} exceeds radius 8", n.hamming); + } + } + + #[test] + fn splat_centre_is_first_entry() { + let p = P64::from_lanes([5, 6, 7, 8, 9, 10, 11, 12]); + let nb = splat_p64(p, &Perturbation4x4::IDENTITY, 4); + assert_eq!(nb.iter()[0].p64, p); + } + + // ── Sentence64 + SentenceTransformer64 ─────────────────────────────────── + + #[test] + fn sentence64_cam_derived_from_p64() { + let cam = Cam64::from_lanes([0xAB, 0xCD, 0, 0, 0, 0, 0, 0xEF]); + let s = SentenceTransformer64::project(cam, 0, 10, 20, 30, DependencyRole::Subject); + // CAM4096 should be deterministic from P64. + assert_eq!(s.cam, Cam4096::from_p64(s.p64)); + } + + #[test] + fn sentence64_same_basin_identical() { + let cam = Cam64::from_lanes([1, 2, 3, 4, 5, 6, 7, 8]); + let s = SentenceTransformer64::project(cam, 0, 1, 2, 3, DependencyRole::Subject); + assert!(s.same_basin_as(&s)); + } + + #[test] + fn sentence64_different_basin_far_cam() { + let cam_a = Cam64::from_lanes([0x00; 8]); + let cam_b = Cam64::from_lanes([0xFF; 8]); + let a = SentenceTransformer64::project(cam_a, 0, 1, 2, 3, DependencyRole::Subject); + let b = SentenceTransformer64::project(cam_b, 0xFFFF, 4, 5, 6, DependencyRole::Object); + assert!(!a.same_basin_as(&b)); + } + + #[test] + fn sentence_transformer64_local_tile_has_16_cells() { + let p = P64::from_lanes([0x10; 8]); + let tile = SentenceTransformer64::local_tile(p, 2, 2); + assert_eq!(tile.cells.len(), 16); + } + + #[test] + fn sentence_transformer64_project_frames_empty() { + let frames: &[EpisodicSpoFrame] = &[]; + let out = SentenceTransformer64::project_frames(frames); + assert!(out.is_empty()); + } + + #[test] + fn hamming_p64_same_is_zero() { + let p = P64::from_lanes([1, 2, 3, 4, 5, 6, 7, 8]); + assert_eq!(hamming_p64(p, p), 0); + } + + #[test] + fn hamming_p64_all_bits_differ_is_64() { + let a = P64(0x0000_0000_0000_0000); + let b = P64(0xFFFF_FFFF_FFFF_FFFF); + assert_eq!(hamming_p64(a, b), 64); + } +} diff --git a/crates/deepnsm/src/signed_crystal.rs b/crates/deepnsm/src/signed_crystal.rs new file mode 100644 index 000000000..0a6909c46 --- /dev/null +++ b/crates/deepnsm/src/signed_crystal.rs @@ -0,0 +1,594 @@ +//! Signed discrete reading-crystal: grammar / discourse / episodic meaning field. +//! +//! ## Why this is NOT a float path +//! +//! `holograph::sentence_crystal::SemanticCrystal` already provides the correct +//! integer-first architecture (char n-gram hashing → bit rotation → majority +//! bundling → 16Kbit fingerprint). This module provides the **grammar/discourse +//! axis** that DeepNSM emits *before* the holograph fingerprint is computed. +//! +//! The two layers are complementary, not competing: +//! +//! ```text +//! DeepNSM step() → EpisodicSpoFrame +//! → P64MeaningField (8-lane grammar/discourse byte field) +//! → Crystal4096 (signed 3-axis reading coordinate) +//! +//! holograph SemanticCrystal → BitpackedVector (16Kbit fingerprint) +//! +//! AriGraph basin ← XOR-bind(Crystal4096, BitpackedVector) → tombstone witness +//! ``` +//! +//! ## Signed nibble axes +//! +//! A **`SignedOffset4`** encodes a local reading offset in the range −7..=+7 +//! (14 values) plus one overflow/basin sentinel (15): +//! +//! ```text +//! 0..=14 → signed offset (value - 7): 0=-7, 7=0, 14=+7 +//! 15 → overflow / basin-change / unknown +//! ``` +//! +//! Three axes packed into 12 bits give **4096 cells** — the same cardinality as +//! the P4096 palette codebook, enabling O(1) lookup against it. +//! +//! ## Three axes +//! +//! | Axis | Name | What it encodes | +//! |------|------|-----------------| +//! | X | sentence offset | distance from current sentence (±5 window = ±5) | +//! | Y | clause offset | intra-sentence clause position (±3 sub-clauses) | +//! | Z | basin delta | drift from the prior basin anchor (±7 SPO hops) | +//! +//! All three stay in the −7..+7 band during normal reading; overflow (15) fires +//! on basin transitions, topic shifts, or coreference chains exceeding the window. +//! +//! ## P64 meaning field +//! +//! `P64MeaningField` carries the 8-lane grammar/semantic signal from `Cam64` +//! augmented with the NSM prime contribution. It is the same u64 substrate as +//! `Cam64` but emphasised as the *meaning-field* output (distinct semantic role): +//! +//! ```text +//! Cam64 = reading-state locality key (NOT the truth) +//! P64 = meaning-field projection (grammar + NSM composite) +//! ``` +//! +//! They share the same bit width but carry different information and must not be +//! fused without an explicit projection step. + +use crate::cam64::Cam64; + +// ── HorizonPolarity (v2 stub) ───────────────────────────────────────────────── + +/// Epistemic provenance of a reading horizon offset. +/// +/// `SignedOffset4` encodes **where** (signed local distance −7..+7 + overflow). +/// `HorizonPolarity` encodes **why/how known** — the epistemic status of that +/// position. The two are orthogonal: +/// +/// ```text +/// +1 could mean: +/// next sentence physically (ConfirmedBackward after a step) +/// expected referent not yet seen (ExpectedForward from left-corner trigger) +/// right-context memo already known (InferredRight from inverse/Pika pass) +/// basin continuation projected (Basin) +/// ``` +/// +/// In v1, the expectation information lives in `SentenceWindow` via +/// `ExpectedReason` and `push_expected()`. `HorizonPolarity` is the v2 type +/// that generalises this to all three reading directions (backward confirmed, +/// forward expected, inverse/right inferred) so they can be tracked uniformly +/// in `Crystal4096` metadata or a P64 lane without stealing nibble values from +/// `SignedOffset4`. +/// +/// **Do NOT fold polarity into the 4-bit offset.** The clean split is: +/// - `SignedOffset4` = compact ABI, always means signed distance, nothing else. +/// - `HorizonPolarity` = caller-side metadata attached to the coordinate. +#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, Default)] +#[repr(u8)] +pub enum HorizonPolarity { + /// Ordinary prior context: backward from the current sentence. Confirmed by + /// the sentence ring. + #[default] + ConfirmedBackward = 0, + /// Left-corner forward prediction: antecedent/subject expected but not yet + /// confirmed. Created by `push_expected()` on `SentenceWindow`. + ExpectedForward = 1, + /// Pika-style right-context / inverse pass: memo available from later clause + /// material (right-to-left prepopulation). V2 — not yet wired in v1. + InferredRight = 2, + /// Offset is outside the local ±7 window; use basin/archetype lookup instead. + BasinOverflow = 3, +} + +impl HorizonPolarity { + /// Pack into 2 bits (fits in any spare lane bits or metadata field). + #[inline] + pub fn to_bits(self) -> u8 { + self as u8 + } + + /// Unpack from 2 bits. Values > 3 map to `BasinOverflow`. + #[inline] + pub fn from_bits(b: u8) -> Self { + match b & 0x3 { + 0 => Self::ConfirmedBackward, + 1 => Self::ExpectedForward, + 2 => Self::InferredRight, + _ => Self::BasinOverflow, + } + } + + /// True if this position is known from evidence (not a prediction). + #[inline] + pub fn is_confirmed(self) -> bool { + matches!(self, Self::ConfirmedBackward) + } + + /// True if this position is a forward prediction that may not materialise. + #[inline] + pub fn is_prediction(self) -> bool { + matches!(self, Self::ExpectedForward | Self::InferredRight) + } +} + +// ── SignedOffset4 ───────────────────────────────────────────────────────────── + +/// A signed 4-bit reading offset: values 0..=14 encode −7..=+7; 15 = overflow. +/// +/// Encoding: `raw = offset + 7` for offset in −7..=+7. +/// `raw = 15` is the overflow/basin-change sentinel. +/// +/// ## Epistemic polarity is NOT encoded here +/// +/// `SignedOffset4` encodes **signed local distance only**. It intentionally +/// does not encode epistemic provenance (confirmed backward context vs +/// left-corner forward expectation vs inverse/right-context prepopulation). +/// Those distinctions are carried by `HorizonPolarity` (v2) or by +/// `ExpectedReason` + `SentenceWindow::push_expected()` (v1). Callers that +/// need to distinguish "physically +1 sentence ahead" from "predicted +1 +/// referent not yet seen" must track `HorizonPolarity` separately. +#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash)] +#[repr(transparent)] +pub struct SignedOffset4(pub u8); + +impl SignedOffset4 { + /// Overflow / basin-change / unknown sentinel. + pub const OVERFLOW: Self = Self(15); + + /// Zero offset (raw value 7). + pub const ZERO: Self = Self(7); + + /// Minimum representable offset (−7, raw 0). + pub const MIN: Self = Self(0); + + /// Maximum representable offset (+7, raw 14). + pub const MAX: Self = Self(14); + + /// Encode a signed offset. Clamps to −7..=+7; values outside produce OVERFLOW. + #[inline] + pub fn from_offset(offset: i8) -> Self { + if !(-7..=7).contains(&offset) { + Self::OVERFLOW + } else { + Self((offset + 7) as u8) + } + } + + /// Decode to a signed offset. Returns `None` for the overflow sentinel. + #[inline] + pub fn to_offset(self) -> Option { + if self.0 == 15 { + None + } else { + Some(self.0 as i8 - 7) + } + } + + /// Raw nibble value (0..=15). + #[inline] + pub fn raw(self) -> u8 { + self.0 + } + + /// True if this is the overflow/basin sentinel. + #[inline] + pub fn is_overflow(self) -> bool { + self.0 == 15 + } +} + +impl Default for SignedOffset4 { + fn default() -> Self { + Self::ZERO + } +} + +// ── Crystal4096 ─────────────────────────────────────────────────────────────── + +/// A 12-bit signed reading crystal coordinate: three `SignedOffset4` axes. +/// +/// Layout (little-endian nibble packing): +/// ```text +/// bits 0.. 3 = X (sentence offset) +/// bits 4.. 7 = Y (clause offset) +/// bits 8..11 = Z (basin delta) +/// bits 12..15 = reserved (always 0) +/// ``` +/// +/// 4096 valid cells (bit patterns 0x000..=0xFFF), directly addressable in the +/// P4096 palette codebook. +#[derive(Clone, Copy, Debug, Default, PartialEq, Eq, Hash)] +#[repr(transparent)] +pub struct Crystal4096(pub u16); + +impl Crystal4096 { + /// Construct from three signed axes. + #[inline] + pub fn new(x: SignedOffset4, y: SignedOffset4, z: SignedOffset4) -> Self { + Self(x.raw() as u16 | ((y.raw() as u16) << 4) | ((z.raw() as u16) << 8)) + } + + /// Extract the X axis (sentence offset). + #[inline] + pub fn x(self) -> SignedOffset4 { + SignedOffset4((self.0 & 0xF) as u8) + } + + /// Extract the Y axis (clause offset). + #[inline] + pub fn y(self) -> SignedOffset4 { + SignedOffset4(((self.0 >> 4) & 0xF) as u8) + } + + /// Extract the Z axis (basin delta). + #[inline] + pub fn z(self) -> SignedOffset4 { + SignedOffset4(((self.0 >> 8) & 0xF) as u8) + } + + /// Raw 12-bit coordinate (bits 0-11 used, bits 12-15 always zero). + #[inline] + pub fn raw(self) -> u16 { + self.0 & 0x0FFF + } + + /// True if any axis is the overflow sentinel. + #[inline] + pub fn has_overflow(self) -> bool { + self.x().is_overflow() || self.y().is_overflow() || self.z().is_overflow() + } + + /// XOR two coordinates — used for binding/unbinding in holograph. + #[inline] + pub fn xor(self, other: Crystal4096) -> Crystal4096 { + Crystal4096(self.0 ^ other.0) + } + + /// Hamming-style distance: count differing nibbles (0, 1, 2, or 3). + #[inline] + pub fn nibble_distance(self, other: Crystal4096) -> u8 { + let xor = self.0 ^ other.0; + ((xor & 0x00F != 0) as u8) + ((xor & 0x0F0 != 0) as u8) + ((xor & 0xF00 != 0) as u8) + } + + /// True if both coordinates are in the same basin (no axis overflows and + /// nibble distance ≤ 1 — at most one axis shifted by one step). + #[inline] + pub fn same_basin(self, other: Crystal4096) -> bool { + !self.has_overflow() && !other.has_overflow() && self.nibble_distance(other) <= 1 + } +} + +// ── P64MeaningField (alias of the canonical P64) ─────────────────────────────── + +/// 8-lane grammar/semantic/discourse meaning field. +/// +/// **This is an alias of [`crate::sentence_transformer64::P64`]** — the single +/// canonical meaning-field type. The two were introduced separately but are +/// byte-identical (8-lane `u64`, `from_cam64_and_nsm` / `lane` / `bind` / +/// `agreement` / `popcount` / `raw`); consolidating removes the drift risk of +/// maintaining two copies. +/// +/// The name `P64MeaningField` is retained here because the holograph-bridge +/// framing in this module reads more clearly with the longer name: it is the +/// *meaning-field projection* (what the sentence is *about*), distinct from +/// `Cam64` (a reading-state locality key, not truth). Same bits, different +/// interpretive contract. +pub use crate::sentence_transformer64::P64 as P64MeaningField; + +// ── SignedSentenceCrystal ───────────────────────────────────────────────────── + +/// The complete signed reading-crystal output for one sentence. +/// +/// Emitted by `DeepNSM::step()` alongside `EpisodicSpoFrame`. Carries: +/// +/// - `p64`: the grammar/semantic/discourse meaning field (8 × u8 lanes) +/// - `coord`: the signed 3-axis crystal coordinate (P4096 codebook key) +/// +/// The `coord` is the bridge to the holograph fingerprint substrate: +/// `Crystal4096::raw()` is a direct index into the P4096 palette. The +/// holograph `BitpackedVector` for this sentence can be XOR-bound with +/// the cell prototype at that index for basin-aware resonance search. +/// +/// ## Floats at the border +/// +/// This struct is entirely integer. Quality annotations (confidence, novelty, +/// wisdom) remain as `f32` in `EpisodicSpoFrame` — they are boundary tools, +/// not the hot-path substrate. +#[derive(Clone, Copy, Debug, Default, PartialEq, Eq)] +pub struct SignedSentenceCrystal { + /// Grammar / semantic / discourse meaning field (P64 lattice). + pub p64: P64MeaningField, + /// Signed 3-axis reading coordinate (P4096 codebook key, 12 bits valid). + pub coord: Crystal4096, +} + +impl SignedSentenceCrystal { + /// Construct from a `Cam64`, NSM prime mask, and the three axis offsets. + pub fn new( + cam: Cam64, + nsm_prime_mask: u64, + sentence_offset: i8, + clause_offset: i8, + basin_delta: i8, + ) -> Self { + Self { + p64: P64MeaningField::from_cam64_and_nsm(cam, nsm_prime_mask), + coord: Crystal4096::new( + SignedOffset4::from_offset(sentence_offset), + SignedOffset4::from_offset(clause_offset), + SignedOffset4::from_offset(basin_delta), + ), + } + } + + /// True if this crystal and `other` are plausibly in the same reading basin. + /// + /// Combines P64 agreement (≥ 40 shared bits) with crystal coordinate + /// proximity (nibble distance ≤ 1). + #[inline] + pub fn same_basin_as(&self, other: &SignedSentenceCrystal) -> bool { + self.p64.agreement(other.p64) >= 40 && self.coord.same_basin(other.coord) + } + + /// XOR bind two crystals — used for holograph integration. + /// + /// Binding combines both the meaning field and the coordinate so the result + /// encodes the *relationship* between two reading positions, not either one + /// alone. Pass the bound result to `holograph::XorBind` or use it as a + /// lookup key in the P4096 codebook. + #[inline] + pub fn bind(&self, other: &SignedSentenceCrystal) -> SignedSentenceCrystal { + SignedSentenceCrystal { + p64: self.p64.bind(other.p64), + coord: self.coord.xor(other.coord), + } + } +} + +// ── Convenience: build from EpisodicSpoFrame fields ────────────────────────── + +/// Build a `SignedSentenceCrystal` from the fields already present on an +/// `EpisodicSpoFrame` plus a sentence-window offset. +/// +/// - `sentence_window_offset`: −5..+5 (from `EpisodicSpoFrame::sentence_window_offset`) +/// - `clause_idx`: 0-based clause position within the sentence (0=main, 1=first sub, …) +/// - `basin_hop_delta`: how many SPO hops this sentence is from the prior basin anchor +pub fn crystal_from_frame_context( + cam64: Cam64, + nsm_prime_mask: u64, + sentence_window_offset: i8, + clause_idx: u8, + basin_hop_delta: i8, +) -> SignedSentenceCrystal { + // Clause offset: clause_idx 0 = centre (0), 1 = +1, 2 = +2, capped at +3. + let clause_offset = (clause_idx as i8).min(3); + SignedSentenceCrystal::new( + cam64, + nsm_prime_mask, + sentence_window_offset, + clause_offset, + basin_hop_delta, + ) +} + +/// Derive the basin hop delta from two consecutive `Cam64` locality codes. +/// +/// Returns a single-transition delta: `0` if `curr.continues_basin(prev)`, +/// else `1`. It does **not** accumulate or detect overflow — the caller sums +/// deltas across a sentence chain and maps the running total onto +/// `SignedOffset4`, which saturates to `OVERFLOW` outside −7..=+7. +pub fn basin_delta_from_cam(prev: Cam64, curr: Cam64) -> i8 { + if curr.continues_basin(prev) { + 0 + } else { + 1 + } +} + +// ── Tests ───────────────────────────────────────────────────────────────────── + +#[cfg(test)] +mod tests { + use super::*; + use crate::cam64::Cam64; + + #[test] + fn signed_offset4_encode_decode() { + for v in -7i8..=7 { + let s = SignedOffset4::from_offset(v); + assert!(!s.is_overflow(), "offset {v} should not overflow"); + assert_eq!(s.to_offset(), Some(v)); + } + } + + #[test] + fn signed_offset4_overflow_outside_range() { + assert_eq!(SignedOffset4::from_offset(-8), SignedOffset4::OVERFLOW); + assert_eq!(SignedOffset4::from_offset(8), SignedOffset4::OVERFLOW); + assert!(SignedOffset4::OVERFLOW.is_overflow()); + assert_eq!(SignedOffset4::OVERFLOW.to_offset(), None); + } + + #[test] + fn signed_offset4_zero_is_seven() { + assert_eq!(SignedOffset4::ZERO.raw(), 7); + assert_eq!(SignedOffset4::ZERO.to_offset(), Some(0)); + } + + #[test] + fn crystal4096_pack_unpack() { + let x = SignedOffset4::from_offset(-3); + let y = SignedOffset4::from_offset(0); + let z = SignedOffset4::from_offset(5); + let c = Crystal4096::new(x, y, z); + assert_eq!(c.x(), x); + assert_eq!(c.y(), y); + assert_eq!(c.z(), z); + assert_eq!(c.raw(), c.0 & 0x0FFF); + } + + #[test] + fn crystal4096_raw_fits_in_12_bits() { + // All possible nibble combinations should fit in 4096. + for raw in 0u16..=0x0FFF { + let c = Crystal4096(raw); + assert_eq!(c.raw(), raw); + } + } + + #[test] + fn crystal4096_nibble_distance_same_is_zero() { + let c = Crystal4096::new( + SignedOffset4::ZERO, + SignedOffset4::ZERO, + SignedOffset4::ZERO, + ); + assert_eq!(c.nibble_distance(c), 0); + } + + #[test] + fn crystal4096_nibble_distance_one_axis_differs() { + let a = Crystal4096::new( + SignedOffset4::from_offset(0), + SignedOffset4::from_offset(0), + SignedOffset4::from_offset(0), + ); + let b = Crystal4096::new( + SignedOffset4::from_offset(1), + SignedOffset4::from_offset(0), + SignedOffset4::from_offset(0), + ); + assert_eq!(a.nibble_distance(b), 1); + } + + #[test] + fn crystal4096_same_basin_adjacent_coords() { + let a = Crystal4096::new( + SignedOffset4::ZERO, + SignedOffset4::ZERO, + SignedOffset4::ZERO, + ); + let b = Crystal4096::new( + SignedOffset4::from_offset(1), + SignedOffset4::ZERO, + SignedOffset4::ZERO, + ); + assert!(a.same_basin(b)); + } + + #[test] + fn crystal4096_different_basin_two_axes_differ() { + let a = Crystal4096::new( + SignedOffset4::ZERO, + SignedOffset4::ZERO, + SignedOffset4::ZERO, + ); + let b = Crystal4096::new( + SignedOffset4::from_offset(3), + SignedOffset4::from_offset(2), + SignedOffset4::ZERO, + ); + assert!(!a.same_basin(b)); + } + + #[test] + fn crystal4096_overflow_axis_is_not_same_basin() { + let a = Crystal4096::new( + SignedOffset4::ZERO, + SignedOffset4::ZERO, + SignedOffset4::ZERO, + ); + let b = Crystal4096::new( + SignedOffset4::OVERFLOW, + SignedOffset4::ZERO, + SignedOffset4::ZERO, + ); + assert!(!a.same_basin(b)); + } + + #[test] + fn crystal4096_xor_bind_unbind() { + let a = Crystal4096(0x123); + let b = Crystal4096(0x456); + let bound = a.xor(b); + assert_eq!(bound.xor(b), a); // XOR is self-inverse + } + + #[test] + fn p64_meaning_field_agreement_self() { + let cam = Cam64::from_lanes([1, 2, 3, 4, 5, 6, 7, 8]); + let p = P64MeaningField::from_cam64_and_nsm(cam, 0xDEAD_BEEF_0000_0000); + assert_eq!(p.agreement(p), 64); + } + + #[test] + fn p64_meaning_field_agreement_differs_on_nsm() { + let cam = Cam64::default(); + let p0 = P64MeaningField::from_cam64_and_nsm(cam, 0); + let p1 = P64MeaningField::from_cam64_and_nsm(cam, 0xFFFF); + // NSM folds into lanes 3-4 — agreement drops below 64. + assert!(p0.agreement(p1) < 64); + } + + #[test] + fn signed_sentence_crystal_same_basin_nearby() { + let cam = Cam64::from_lanes([10, 20, 30, 40, 50, 60, 70, 80]); + let a = SignedSentenceCrystal::new(cam, 0, 0, 0, 0); + let b = SignedSentenceCrystal::new(cam, 0, 0, 1, 0); // clause +1 + assert!(a.same_basin_as(&b)); + } + + #[test] + fn signed_sentence_crystal_different_basin_far_coord() { + let cam_a = Cam64::from_lanes([10, 20, 30, 40, 50, 60, 70, 80]); + let cam_b = Cam64::from_lanes([200, 201, 202, 203, 204, 205, 206, 207]); + let a = SignedSentenceCrystal::new(cam_a, 0, 0, 0, 0); + let b = SignedSentenceCrystal::new(cam_b, 0xFFFF, -5, 3, 7); + assert!(!a.same_basin_as(&b)); + } + + #[test] + fn crystal_from_frame_context_zero_offsets() { + let cam = Cam64::default(); + let c = crystal_from_frame_context(cam, 0, 0, 0, 0); + assert_eq!(c.coord.x(), SignedOffset4::ZERO); + assert_eq!(c.coord.y(), SignedOffset4::ZERO); + assert_eq!(c.coord.z(), SignedOffset4::ZERO); + } + + #[test] + fn basin_delta_from_cam_same_basin() { + let c = Cam64::from_lanes([1, 2, 3, 4, 5, 6, 7, 8]); + assert_eq!(basin_delta_from_cam(c, c), 0); + } + + #[test] + fn basin_delta_from_cam_different_basin() { + let a = Cam64::from_raw(0x0000_0000_0000_0000); + let b = Cam64::from_raw(0xFFFF_FFFF_FFFF_FFFF); + assert_eq!(basin_delta_from_cam(a, b), 1); + } +} diff --git a/crates/deepnsm/src/similarity.rs b/crates/deepnsm/src/similarity.rs index 04e28d156..baf95b827 100644 --- a/crates/deepnsm/src/similarity.rs +++ b/crates/deepnsm/src/similarity.rs @@ -86,9 +86,9 @@ impl SimilarityTable { let mut table = [0.0f32; 256]; let sigma = sigma.max(1.0); - for d in 0..256 { + for (d, slot) in table.iter_mut().enumerate() { let z = (d as f32 - mu) / sigma; - table[d] = 1.0 / (1.0 + z.exp()); + *slot = 1.0 / (1.0 + z.exp()); } Self { table } @@ -144,15 +144,14 @@ impl SimilarityTable { return None; } let mut table = [0.0f32; 256]; - for i in 0..256 { + for (i, slot) in table.iter_mut().enumerate() { let offset = i * 4; - let val = f32::from_le_bytes([ + *slot = f32::from_le_bytes([ bytes[offset], bytes[offset + 1], bytes[offset + 2], bytes[offset + 3], ]); - table[i] = val; } Some(Self { table }) } diff --git a/crates/deepnsm/src/ticket_emit.rs b/crates/deepnsm/src/ticket_emit.rs index dcf72c3a6..0f7d1bff3 100644 --- a/crates/deepnsm/src/ticket_emit.rs +++ b/crates/deepnsm/src/ticket_emit.rs @@ -24,8 +24,7 @@ #![cfg(feature = "contract-ticket")] use lance_graph_contract::grammar::{ - CausalAmbiguity, FailureTicket, NarsInference, PartialParse, TekamoloSlots, - WechselAmbiguity, + CausalAmbiguity, FailureTicket, NarsInference, PartialParse, TekamoloSlots, WechselAmbiguity, }; /// Threshold above which `classification_distance` flags a novel domain. diff --git a/crates/deepnsm/src/trajectory_audit.rs b/crates/deepnsm/src/trajectory_audit.rs index f6c64a3b9..133caafea 100644 --- a/crates/deepnsm/src/trajectory_audit.rs +++ b/crates/deepnsm/src/trajectory_audit.rs @@ -43,7 +43,10 @@ impl Trajectory { /// Hamming distance between two trajectory hashes — how grammatically /// similar two queries / sentences are. pub fn trajectory_distance(a: &TrajectoryHash, b: &TrajectoryHash) -> u32 { - a.iter().zip(b.iter()).map(|(x, y)| (x ^ y).count_ones()).sum() + a.iter() + .zip(b.iter()) + .map(|(x, y)| (x ^ y).count_ones()) + .sum() } /// Threshold for "grammatically similar" — used by the audit log to flag diff --git a/crates/deepnsm/src/triangle_bridge.rs b/crates/deepnsm/src/triangle_bridge.rs index f17e3db7b..63006a6aa 100644 --- a/crates/deepnsm/src/triangle_bridge.rs +++ b/crates/deepnsm/src/triangle_bridge.rs @@ -14,9 +14,7 @@ use crate::parser::SentenceStructure; use crate::spo::{SpoTriple, NO_ROLE}; #[cfg(feature = "grammar-triangle")] -use lance_graph_cognitive::grammar::{ - CausalityFlow, GrammarTriangle, NSMField, QualiaField, -}; +use lance_graph_cognitive::grammar::{CausalityFlow, GrammarTriangle, NSMField, QualiaField}; /// Merged output: DeepNSM SPO triples + Triangle's three lenses. /// @@ -78,8 +76,16 @@ pub struct SpoWithGrammar { /// so downstream consumers can cast directly. #[inline] pub fn compute_pearl_mask(triple: &SpoTriple) -> u8 { - let s_bit = if triple.subject() != NO_ROLE { 0b100 } else { 0 }; - let p_bit = if triple.predicate() != NO_ROLE { 0b010 } else { 0 }; + let s_bit = if triple.subject() != NO_ROLE { + 0b100 + } else { + 0 + }; + let p_bit = if triple.predicate() != NO_ROLE { + 0b010 + } else { + 0 + }; let o_bit = if triple.object() != NO_ROLE { 0b001 } else { 0 }; s_bit | p_bit | o_bit } @@ -213,9 +219,15 @@ fn expected_qualia_footprint(structure: &SentenceStructure) -> Vec { // Map Pearl mask bits (S=bit2, P=bit1, O=bit0) to qualia dimension // bits (dim0=Agency<-S, dim1=Activity<-P, dim2=Affection<-O). let mut packed: u64 = 0; - if pearl & 0b100 != 0 { packed |= 1u64 << 0; } // S -> dim 0 (Agency) - if pearl & 0b010 != 0 { packed |= 1u64 << 1; } // P -> dim 1 (Activity) - if pearl & 0b001 != 0 { packed |= 1u64 << 2; } // O -> dim 2 (Affection) + if pearl & 0b100 != 0 { + packed |= 1u64 << 0; + } // S -> dim 0 (Agency) + if pearl & 0b010 != 0 { + packed |= 1u64 << 1; + } // P -> dim 1 (Activity) + if pearl & 0b001 != 0 { + packed |= 1u64 << 2; + } // O -> dim 2 (Affection) vec![packed] } @@ -351,8 +363,8 @@ mod tests { #[test] fn same_subject_same_mask_different_predicates_distinguishable() { // "dog bites man" vs "dog loves man" -- same mask (0b111), different predicate - let s_a = fixture_structure_with(671, 2943, 95); // bites - let s_b = fixture_structure_with(671, 500, 95); // loves + let s_a = fixture_structure_with(671, 2943, 95); // bites + let s_b = fixture_structure_with(671, 500, 95); // loves let out_a = analyze_without_triangle(s_a); let out_b = analyze_without_triangle(s_b); diff --git a/crates/deepnsm/src/vocabulary.rs b/crates/deepnsm/src/vocabulary.rs index 4718bc7d4..3058fd94d 100644 --- a/crates/deepnsm/src/vocabulary.rs +++ b/crates/deepnsm/src/vocabulary.rs @@ -350,7 +350,9 @@ fn split_words(text: &str) -> Vec { let rest: String = chars[i..].iter().take(4).collect(); let rest_lower = rest.to_lowercase(); - if current.ends_with('n') && (rest_lower.starts_with("'t") || rest_lower.starts_with("\u{2019}t")) { + if current.ends_with('n') + && (rest_lower.starts_with("'t") || rest_lower.starts_with("\u{2019}t")) + { // "don't" → push "do", then "n't" // Pop the 'n' from current before pushing current.pop(); @@ -379,8 +381,10 @@ fn split_words(text: &str) -> Vec { while end < len && chars[end].is_alphabetic() { end += 1; } - let contraction: String = - chars[i..end].iter().map(|c| c.to_lowercase().next().unwrap_or(*c)).collect(); + let contraction: String = chars[i..end] + .iter() + .map(|c| c.to_lowercase().next().unwrap_or(*c)) + .collect(); words.push(contraction); i = end; } else { @@ -416,7 +420,10 @@ fn split_words(text: &str) -> Vec { /// Strip common English suffixes for fallback resolution. fn strip_suffix(word: &str) -> &str { // Order matters: try longest suffixes first - for suffix in &["ing", "tion", "sion", "ness", "ment", "ous", "ive", "ful", "less", "ly", "ed", "er", "est", "es", "s"] { + for suffix in &[ + "ing", "tion", "sion", "ness", "ment", "ous", "ive", "ful", "less", "ly", "ed", "er", + "est", "es", "s", + ] { if word.len() > suffix.len() + 2 && word.ends_with(suffix) { return &word[..word.len() - suffix.len()]; } diff --git a/crates/deepnsm/src/window.rs b/crates/deepnsm/src/window.rs new file mode 100644 index 000000000..423e65d48 --- /dev/null +++ b/crates/deepnsm/src/window.rs @@ -0,0 +1,429 @@ +//! ±5 sentence window for coreference / pronoun resolution. +//! +//! This is **distinct** from `ContextWindow` in `context.rs`: +//! - `ContextWindow` holds VSA projections (the Broca / MarkovBundler +//! projection band) for distributional disambiguation. +//! - `SentenceWindow` holds **exact entity candidates** (vocabulary ranks +//! of NP heads) for **Wernicke / coreference resolution** — the auditable +//! side of the reading state machine. +//! +//! The two windows serve different faculties and must not be fused +//! (cf. `E-ENGLISH-BIFURCATES` in `comprehension.rs`). +//! +//! ## Coreference heuristic (v1) +//! +//! When the reader encounters a pronoun in the subject slot it calls +//! `resolve_pronoun()`, which walks the entity stack from the most recent +//! sentence backward and returns the first matching entity. In v1 "matching" +//! means "any non-pronoun NP head from a prior sentence" — a pure recency +//! heuristic. Richer disambiguation (gender, number, semantic type) is v2. +//! +//! ## Future (v1.5) — Tekamolo/Anaphora64 provenance sidecar +//! +//! v1 records *what* resolved (exact NP-head ranks + expected slots) but not +//! *why* it resolved that way. A future `Anaphora64` sidecar (its own module, +//! NOT folded into `Cam64` or `P64`) should encode coreference provenance: +//! +//! ```text +//! bits 0..11 antecedent_rank_bucket / local entity id +//! bits 12..15 sentence_offset_signed4 +//! bits 16..19 source_polarity (HorizonPolarity: confirmed/expected/inferred_right/basin) +//! bits 20..23 expected_reason (ExpectedReason: relative/anaphora/ellipsis/causal/temporal) +//! bits 24..31 agreement flags (number/gender/person/semantic-type/role) +//! bits 32..39 grammatical role score +//! bits 40..47 salience score +//! bits 48..55 confidence q8 +//! bits 56..63 reserved / version +//! ``` +//! +//! It belongs to the **next** PR (coreference ranking/provenance), not the +//! reader-substrate PR. Add it only once agreement/ranking is implemented, and +//! store it as a provenance field on `EpisodicSpoFrame` (`anaphora_tag: Anaphora64`). +//! The boundary stays clean: SentenceWindow resolves, EpisodicSpoFrame witnesses, +//! Cam64 indexes, Anaphora64 (later) *explains* the resolution. + +use crate::spo::NO_ROLE; + +/// Maximum number of NP heads tracked per sentence entry. +const MAX_HEADS_PER_ENTRY: usize = 4; + +/// Maximum sentences kept in the window (±5 = 11 total, one per slot). +pub const WINDOW_SIZE: usize = 11; + +/// Maximum expected (forward-predicted) entities tracked at one time. +const MAX_EXPECTED: usize = 4; + +/// Why an entity was pushed into the expected slot. +/// +/// Carrying the reason prevents "mystery meat" in resolve_pronoun: callers can +/// filter by reason if they only want to consume a specific prediction type. +/// V1 only uses `RelativeClause` and `Anaphora` in practice; the others are +/// reserved for v2 (ellipsis, causal/temporal continuation). +#[derive(Clone, Copy, Debug, PartialEq, Eq)] +pub enum ExpectedReason { + /// A relative pronoun ("who", "which") was the left-corner trigger — + /// the active subject is expected to be the antecedent of the clause. + RelativeClause, + /// A personal pronoun was the left-corner trigger — the active subject + /// is expected to be the referent of the anaphoric pronoun. + Anaphora, + /// An omitted subject (pro-drop) — prior subject is expected to continue. + Ellipsis, + /// A causal connector ("because", "therefore") — causal agent continues. + CausalContinuation, + /// A temporal connector ("then", "after") — temporal anchor continues. + TemporalContinuation, +} + +/// An entity predicted by a left-corner trigger before clause closure. +#[derive(Clone, Copy, Debug)] +pub struct ExpectedSlot { + /// Vocabulary rank of the predicted entity (NO_ROLE if unused). + pub rank: u16, + pub reason: ExpectedReason, +} + +/// One entry in the ±5 sentence window: the entity candidates from a sentence. +#[derive(Clone, Copy, Debug, Default)] +pub struct WindowEntry { + /// Sentence identifier (monotonically increasing). + pub sentence_id: u32, + /// Vocabulary ranks of NP heads mentioned in this sentence + /// (subject, object, nominal complements). NO_ROLE fills unused slots. + pub heads: [u16; MAX_HEADS_PER_ENTRY], + /// How many heads are actually set (0..=MAX_HEADS_PER_ENTRY). + pub head_count: usize, + /// Packed SPO triple from the primary triple in this sentence. + pub primary_spo_packed: u64, +} + +impl WindowEntry { + /// Push an NP head rank into this entry. Silently drops if full. + pub fn push_head(&mut self, rank: u16) { + if self.head_count < MAX_HEADS_PER_ENTRY { + self.heads[self.head_count] = rank; + self.head_count += 1; + } + } + + /// Iterate over the heads actually set. + pub fn heads(&self) -> &[u16] { + &self.heads[..self.head_count] + } + + /// Does this entry contain the given vocabulary rank? + pub fn contains(&self, rank: u16) -> bool { + self.heads().contains(&rank) + } +} + +/// ±5 sentence ring buffer for exact entity candidate tracking. +/// +/// The ring buffer always holds at most `WINDOW_SIZE` (11) entries, dropping +/// the oldest when full. The current sentence is conceptually at offset 0; +/// prior sentences are at offsets −1 .. −5. +/// +/// ## Forward expectation slots +/// +/// When a left-corner trigger fires (relative pronoun, anaphora), the caller +/// pushes the predicted referent into `expected` via `push_expected()`. +/// `resolve_pronoun()` checks these slots first — a confirmed expectation +/// beats any recency heuristic from confirmed sentences. +#[derive(Debug)] +pub struct SentenceWindow { + /// Fixed-size ring buffer. + entries: [WindowEntry; WINDOW_SIZE], + /// Write head (next slot to overwrite). + head: usize, + /// Number of valid entries (0..=WINDOW_SIZE). + count: usize, + /// Forward-predicted entity slots (Pika left-corner expectations). + expected: [ExpectedSlot; MAX_EXPECTED], + /// How many expected slots are active (0..=MAX_EXPECTED). + expected_count: usize, +} + +impl SentenceWindow { + /// Create an empty window. + pub fn new() -> Self { + Self { + entries: [WindowEntry::default(); WINDOW_SIZE], + head: 0, + count: 0, + expected: [ExpectedSlot { + rank: NO_ROLE, + reason: ExpectedReason::Anaphora, + }; MAX_EXPECTED], + expected_count: 0, + } + } + + /// Push a forward-predicted entity into the expectation buffer. + /// + /// Called by the reader state machine when a left-corner trigger fires + /// (relative pronoun, anaphoric pronoun) before the clause closes. + /// `resolve_pronoun()` drains these slots before consulting the confirmed + /// sentence ring — expectation beats recency. + /// + /// Silently drops if the buffer is full (MAX_EXPECTED = 4). + pub fn push_expected(&mut self, rank: u16, reason: ExpectedReason) { + if self.expected_count < MAX_EXPECTED { + self.expected[self.expected_count] = ExpectedSlot { rank, reason }; + self.expected_count += 1; + } + } + + /// Clear all expectation slots. + /// + /// `ReadingState::step` calls this at the start of each sentence: v1 treats + /// every expectation as a **single-step** left-corner prediction, so the + /// buffer can never accumulate stale slots toward `MAX_EXPECTED`. + /// + /// Future (v2, bidirectional / Pika right-context passes): this policy + /// splits — clear one-step expectations as now, but **retain** multi-step + /// `HorizonPolarity::InferredRight` slots with a TTL so right-context memo + /// entries survive across sentences until their window elapses. + pub fn clear_expected(&mut self) { + self.expected_count = 0; + } + + /// Return the active expectation slots as a slice, in push order (FIFO — + /// oldest first). `resolve_pronoun` iterates this in reverse so the + /// most-recently-pushed expectation wins. + pub fn iter_expected(&self) -> &[ExpectedSlot] { + &self.expected[..self.expected_count] + } + + /// Push a new sentence entry, overwriting the oldest if full. + pub fn push(&mut self, entry: WindowEntry) { + self.entries[self.head] = entry; + self.head = (self.head + 1) % WINDOW_SIZE; + if self.count < WINDOW_SIZE { + self.count += 1; + } + } + + /// Iterate entries from most recent to oldest (offset 0 = most recent). + pub fn iter_recent_first(&self) -> impl Iterator { + let count = self.count; + let head = self.head; + (0..count).map(move |i| { + let slot = (head + WINDOW_SIZE - 1 - i) % WINDOW_SIZE; + (-(i as i8), &self.entries[slot]) + }) + } + + /// Resolve a pronoun: return the vocabulary rank of the predicted or most + /// recent non-excluded NP head. + /// + /// Resolution order (Pika left-corner priority): + /// 1. Forward expectation slots (most-recently-pushed first) — these were + /// pre-populated by a left-corner trigger and are the strongest signal. + /// 2. Confirmed sentence ring, most-recent entry first, heads in reverse + /// within each entry (last-mentioned in text = highest index). + /// + /// `exclude_rank` is typically the pronoun's own vocabulary rank. + /// Returns `NO_ROLE` if no candidate is found. + pub fn resolve_pronoun(&self, exclude_rank: u16) -> u16 { + // Phase 1: forward expectations (Pika left-corner slots). + for slot in self.iter_expected().iter().rev() { + if slot.rank != NO_ROLE && slot.rank != exclude_rank { + return slot.rank; + } + } + // Phase 2: confirmed sentence ring, last-mentioned wins. + for (_offset, entry) in self.iter_recent_first() { + for &head in entry.heads().iter().rev() { + if head != NO_ROLE && head != exclude_rank { + return head; + } + } + } + NO_ROLE + } + + /// How many entries are in the window. + pub fn len(&self) -> usize { + self.count + } + + /// Is the window empty? + pub fn is_empty(&self) -> bool { + self.count == 0 + } + + /// Most recent entry, or `None` if the window is empty. + pub fn most_recent(&self) -> Option<&WindowEntry> { + if self.count == 0 { + return None; + } + let slot = (self.head + WINDOW_SIZE - 1) % WINDOW_SIZE; + Some(&self.entries[slot]) + } +} + +impl Default for SentenceWindow { + fn default() -> Self { + Self::new() + } +} + +impl Clone for SentenceWindow { + fn clone(&self) -> Self { + Self { + entries: self.entries, + head: self.head, + count: self.count, + expected: self.expected, + expected_count: self.expected_count, + } + } +} + +#[cfg(test)] +mod tests { + use super::*; + + fn entry(sentence_id: u32, heads: &[u16]) -> WindowEntry { + let mut e = WindowEntry { + sentence_id, + ..Default::default() + }; + for &h in heads { + e.push_head(h); + } + e + } + + #[test] + fn push_and_most_recent() { + let mut w = SentenceWindow::new(); + w.push(entry(0, &[10, 20])); + w.push(entry(1, &[30])); + let r = w.most_recent().unwrap(); + assert_eq!(r.sentence_id, 1); + assert_eq!(r.heads(), &[30]); + } + + #[test] + fn iter_recent_first_order() { + let mut w = SentenceWindow::new(); + for i in 0..4u32 { + w.push(entry(i, &[(i * 10) as u16])); + } + let ids: Vec = w.iter_recent_first().map(|(_, e)| e.sentence_id).collect(); + assert_eq!(ids, vec![3, 2, 1, 0]); + } + + #[test] + fn ring_wraps_correctly() { + let mut w = SentenceWindow::new(); + for i in 0..15u32 { + w.push(entry(i, &[(i * 5) as u16])); + } + assert_eq!(w.len(), WINDOW_SIZE); + // Most recent should be sentence 14 + assert_eq!(w.most_recent().unwrap().sentence_id, 14); + // iter_recent_first should give 14 downto 4 + let ids: Vec = w.iter_recent_first().map(|(_, e)| e.sentence_id).collect(); + let expected: Vec = (4..=14).rev().collect(); + assert_eq!(ids, expected); + } + + #[test] + fn resolve_pronoun_returns_most_recent() { + let mut w = SentenceWindow::new(); + w.push(entry(0, &[100, 200])); // older + w.push(entry(1, &[300])); // newer + // pronoun rank=5 (not in window), exclude it, expect 300 (most recent) + assert_eq!(w.resolve_pronoun(5), 300); + } + + #[test] + fn resolve_pronoun_skips_excluded() { + let mut w = SentenceWindow::new(); + w.push(entry(0, &[100])); + w.push(entry(1, &[200])); // most recent has 200 + // If pronoun rank is 200, it should skip 200 and return 100 + assert_eq!(w.resolve_pronoun(200), 100); + } + + #[test] + fn resolve_pronoun_empty_returns_no_role() { + let w = SentenceWindow::new(); + assert_eq!(w.resolve_pronoun(5), NO_ROLE); + } + + #[test] + fn head_capacity_does_not_overflow() { + let mut e = WindowEntry::default(); + for i in 0..10u16 { + e.push_head(i); // only first 4 stored + } + assert_eq!(e.head_count, MAX_HEADS_PER_ENTRY); + assert_eq!(e.heads(), &[0, 1, 2, 3]); + } + + #[test] + fn contains_check() { + let e = entry(0, &[10, 20, 30]); + assert!(e.contains(20)); + assert!(!e.contains(99)); + } + + #[test] + fn push_expected_stores_slot() { + let mut w = SentenceWindow::new(); + w.push_expected(42, ExpectedReason::RelativeClause); + assert_eq!(w.expected_count, 1); + assert_eq!(w.iter_expected()[0].rank, 42); + assert_eq!(w.iter_expected()[0].reason, ExpectedReason::RelativeClause); + } + + #[test] + fn push_expected_capacity_does_not_overflow() { + let mut w = SentenceWindow::new(); + for i in 0..10u16 { + w.push_expected(i, ExpectedReason::Anaphora); + } + assert_eq!(w.expected_count, MAX_EXPECTED); + } + + #[test] + fn resolve_pronoun_prefers_expected_over_confirmed() { + let mut w = SentenceWindow::new(); + // Confirmed ring has rank 100. + w.push(entry(0, &[100])); + // Forward expectation has rank 200. + w.push_expected(200, ExpectedReason::RelativeClause); + // Should return 200 (expectation beats confirmed recency). + assert_eq!(w.resolve_pronoun(5), 200); + } + + #[test] + fn resolve_pronoun_falls_back_to_confirmed_when_expected_empty() { + let mut w = SentenceWindow::new(); + w.push(entry(0, &[100])); + // No expectations pushed. + assert_eq!(w.resolve_pronoun(5), 100); + } + + #[test] + fn resolve_pronoun_skips_excluded_in_expected() { + let mut w = SentenceWindow::new(); + w.push(entry(0, &[100])); + w.push_expected(200, ExpectedReason::Anaphora); + // Exclude the expected slot — should fall back to confirmed. + assert_eq!(w.resolve_pronoun(200), 100); + } + + #[test] + fn clear_expected_resets_slots() { + let mut w = SentenceWindow::new(); + w.push_expected(42, ExpectedReason::Ellipsis); + w.clear_expected(); + assert_eq!(w.expected_count, 0); + // resolve_pronoun now falls through to NO_ROLE (empty window). + assert_eq!(w.resolve_pronoun(5), NO_ROLE); + } +} diff --git a/crates/deepnsm/tests/integration_role_alignment.rs b/crates/deepnsm/tests/integration_role_alignment.rs index 99803c400..b7ec9b76e 100644 --- a/crates/deepnsm/tests/integration_role_alignment.rs +++ b/crates/deepnsm/tests/integration_role_alignment.rs @@ -10,9 +10,8 @@ use deepnsm::markov_bundle::GrammaticalRole; use lance_graph_contract::grammar::role_keys::{ - CONTEXT_SLICE, INSTRUMENT_SLICE, KAUSAL_SLICE, LOKAL_SLICE, MODAL_SLICE, - MODIFIER_SLICE, OBJECT_SLICE, PREDICATE_SLICE, RoleKeySlice, SUBJECT_SLICE, - TEMPORAL_SLICE, + RoleKeySlice, CONTEXT_SLICE, INSTRUMENT_SLICE, KAUSAL_SLICE, LOKAL_SLICE, MODAL_SLICE, + MODIFIER_SLICE, OBJECT_SLICE, PREDICATE_SLICE, SUBJECT_SLICE, TEMPORAL_SLICE, }; #[test] @@ -24,8 +23,14 @@ fn subject_role_aligns_with_role_keys_subject_slice() { #[test] fn predicate_role_aligns() { - assert_eq!(GrammaticalRole::Predicate.slice().start, PREDICATE_SLICE.start); - assert_eq!(GrammaticalRole::Predicate.slice().stop, PREDICATE_SLICE.stop); + assert_eq!( + GrammaticalRole::Predicate.slice().start, + PREDICATE_SLICE.start + ); + assert_eq!( + GrammaticalRole::Predicate.slice().stop, + PREDICATE_SLICE.stop + ); } #[test] @@ -36,7 +41,10 @@ fn object_role_aligns() { #[test] fn modifier_role_aligns() { - assert_eq!(GrammaticalRole::Modifier.slice().start, MODIFIER_SLICE.start); + assert_eq!( + GrammaticalRole::Modifier.slice().start, + MODIFIER_SLICE.start + ); assert_eq!(GrammaticalRole::Modifier.slice().stop, MODIFIER_SLICE.stop); } @@ -51,16 +59,25 @@ fn tekamolo_roles_align() { // The TEKAMOLO sub-slices (Temporal/Kausal/Modal/Lokal/Instrument) live in // the [9000..9750) post-context band per role_keys.rs — NOT inside the // Context band as the original (broken) markov_bundle layout claimed. - assert_eq!(GrammaticalRole::Temporal.slice().start, TEMPORAL_SLICE.start); - assert_eq!(GrammaticalRole::Temporal.slice().stop, TEMPORAL_SLICE.stop); - assert_eq!(GrammaticalRole::Kausal.slice().start, KAUSAL_SLICE.start); - assert_eq!(GrammaticalRole::Kausal.slice().stop, KAUSAL_SLICE.stop); - assert_eq!(GrammaticalRole::Modal.slice().start, MODAL_SLICE.start); - assert_eq!(GrammaticalRole::Modal.slice().stop, MODAL_SLICE.stop); - assert_eq!(GrammaticalRole::Lokal.slice().start, LOKAL_SLICE.start); - assert_eq!(GrammaticalRole::Lokal.slice().stop, LOKAL_SLICE.stop); - assert_eq!(GrammaticalRole::Instrument.slice().start, INSTRUMENT_SLICE.start); - assert_eq!(GrammaticalRole::Instrument.slice().stop, INSTRUMENT_SLICE.stop); + assert_eq!( + GrammaticalRole::Temporal.slice().start, + TEMPORAL_SLICE.start + ); + assert_eq!(GrammaticalRole::Temporal.slice().stop, TEMPORAL_SLICE.stop); + assert_eq!(GrammaticalRole::Kausal.slice().start, KAUSAL_SLICE.start); + assert_eq!(GrammaticalRole::Kausal.slice().stop, KAUSAL_SLICE.stop); + assert_eq!(GrammaticalRole::Modal.slice().start, MODAL_SLICE.start); + assert_eq!(GrammaticalRole::Modal.slice().stop, MODAL_SLICE.stop); + assert_eq!(GrammaticalRole::Lokal.slice().start, LOKAL_SLICE.start); + assert_eq!(GrammaticalRole::Lokal.slice().stop, LOKAL_SLICE.stop); + assert_eq!( + GrammaticalRole::Instrument.slice().start, + INSTRUMENT_SLICE.start + ); + assert_eq!( + GrammaticalRole::Instrument.slice().stop, + INSTRUMENT_SLICE.stop + ); } #[test] @@ -77,7 +94,10 @@ fn no_overlap_between_major_roles() { assert!( win[0].1.stop <= win[1].1.start, "{} ends at {} but {} starts at {}", - win[0].0, win[0].1.stop, win[1].0, win[1].1.start + win[0].0, + win[0].1.stop, + win[1].0, + win[1].1.start ); } } @@ -87,15 +107,15 @@ fn no_overlap_across_all_ten_roles() { // Stronger check: every GrammaticalRole variant's slice is disjoint from // every other variant's slice (sorted-pairwise via the role_keys layout). let mut spans: Vec<(&str, RoleKeySlice)> = vec![ - ("subject", GrammaticalRole::Subject.slice()), - ("predicate", GrammaticalRole::Predicate.slice()), - ("object", GrammaticalRole::Object.slice()), - ("modifier", GrammaticalRole::Modifier.slice()), - ("context", GrammaticalRole::Context.slice()), - ("temporal", GrammaticalRole::Temporal.slice()), - ("kausal", GrammaticalRole::Kausal.slice()), - ("modal", GrammaticalRole::Modal.slice()), - ("lokal", GrammaticalRole::Lokal.slice()), + ("subject", GrammaticalRole::Subject.slice()), + ("predicate", GrammaticalRole::Predicate.slice()), + ("object", GrammaticalRole::Object.slice()), + ("modifier", GrammaticalRole::Modifier.slice()), + ("context", GrammaticalRole::Context.slice()), + ("temporal", GrammaticalRole::Temporal.slice()), + ("kausal", GrammaticalRole::Kausal.slice()), + ("modal", GrammaticalRole::Modal.slice()), + ("lokal", GrammaticalRole::Lokal.slice()), ("instrument", GrammaticalRole::Instrument.slice()), ]; spans.sort_by_key(|(_, s)| s.start); @@ -103,8 +123,12 @@ fn no_overlap_across_all_ten_roles() { assert!( win[0].1.stop <= win[1].1.start, "{} [{}..{}) overlaps {} [{}..{})", - win[0].0, win[0].1.start, win[0].1.stop, - win[1].0, win[1].1.start, win[1].1.stop, + win[0].0, + win[0].1.start, + win[0].1.stop, + win[1].0, + win[1].1.start, + win[1].1.stop, ); } } diff --git a/docs/architecture/deepnsm-reader-design.md b/docs/architecture/deepnsm-reader-design.md new file mode 100644 index 000000000..1d1d54fa1 --- /dev/null +++ b/docs/architecture/deepnsm-reader-design.md @@ -0,0 +1,337 @@ +# DeepNSM Sentence-Level AriGraph Reader — Design Document + +**Branch:** `claude/stoic-turing-M0Eiq` +**Crate:** `crates/deepnsm/` +**New modules:** `morphology`, `cam64`, `episodic_spo`, `window`, `reader_state`, `signed_crystal`, `sentence_transformer64` +**Tests added:** 200 lib tests (0 failures) + +--- + +## Glossary (one-liners for reviewers) + +| Term | Definition | +|------|-----------| +| **P64** | 8×8-bit native reading-state address space; NOT a quantised float embedding | +| **CAM4096** | 12-bit deterministic address selected from P64 lanes; NOT a quantised embedding vector | +| **Crystal4096** | 3-axis signed reading coordinate (12 bits, 4096 cells); P4096 palette codebook key | +| **Cam64** | 64-bit fast reading-locality index (NOT semantic truth); used for prefetch and basin heuristics | +| **EpisodicSpoFrame** | The auditable SPO truth witness; `cam64` inside it is the index, not the truth | +| **SentenceWindow** | Wernicke faculty: exact NP head ranks for coreference; distinct from `ContextWindow` (Broca/VSA) | +| **splat_p64** | Discrete palette splat into a Hamming neighbourhood; NOT a Gaussian in f32 space | +| **SentenceTransformer64** | State-transition automaton (Manning & Carpenter sense); NOT neural self-attention | + +--- + +## What this is + +DeepNSM is a distributional semantic engine that replaces transformer inference +with precomputed distributional lookup (4,096-word COCA vocabulary, 8 MB distance +matrix, `<10 μs/sentence`). This reader layer adds the **sentence-level +AriGraph reading state machine** on top of the existing tokenizer/parser/encoder +stack — the auditable, Wernicke-faculty side of the pipeline. + +The one-line description: + +> DeepNSM reads sentences one at a time, emits auditable episodic SPO rows, +> maintains a ±5 sentence reading state for pronoun/coreference/inference, and +> derives a compact 64-bit CAM code from morphology + grammar + NSM markers for +> fast basin matching and prefetch. + +--- + +## Architecture: five distinct responsibilities + +```text +SentenceStructure (from parser) + + SentenceFeatures (caller-supplied annotations) + │ + ▼ + ReadingState::step() ← left-corner state machine + │ + ├─► Vec truth witness (auditable SPO rows) + │ + ├─► ReadingState_next updated ±5 window + entity stack + │ + ├─► Cam64 reading-state locality key (NOT the truth) + │ + ├─► SignedSentenceCrystal P64MeaningField + Crystal4096 coordinate + │ + └─► Sentence64 P64 + CAM4096 + EpisodicSpoHint + │ + ▼ + holograph BitpackedVector (16Kbit resonance, separate crate) + │ + ▼ + AriGraph basin update +``` + +### The two-faculty split (E-ENGLISH-BIFURCATES) + +Two windows serve different cognitive faculties and must not be fused: + +| Window | Faculty | Content | Purpose | +|--------|---------|---------|---------| +| `ContextWindow` (`context.rs`) | Broca / projection | VSA projections (MarkovBundler band) | Distributional disambiguation | +| `SentenceWindow` (`window.rs`) | Wernicke / coreference | Exact NP head vocabulary ranks | Auditable coreference resolution | + +--- + +## Module-by-module description + +### `morphology.rs` — MorphFlags + +`MorphFlags(u16)` is a 14-bit packed field of heuristic morphological features +derived from `SentenceStructure`. No float arithmetic. Flags: `PAST`, `PRESENT`, +`FUTURE`, `SINGULAR`, `PLURAL`, `FIRST_PERSON`, `SECOND_PERSON`, `THIRD_PERSON`, +`PASSIVE`, `NEGATED`, `INTERROGATIVE`, `RELATIVE_CLAUSE`, `INFINITIVE`, +`SUBORDINATE`. + +```rust +let morph = MorphFlags::from_sentence_structure(&sentence, triple_idx); +assert!(morph.is_past()); +``` + +### `cam64.rs` — Reading-state locality key + +`Cam64(u64)` is 8 lanes × 8 bits. It is **NOT semantic truth** — it is a fast +reading-locality index for candidate prefetch, basin matching, and coreference +heuristics. + +| Lane | Content | Source | +|------|---------|--------| +| 0 | entity/subject bucket | vocabulary rank >> 5 | +| 1 | predicate/action bucket | vocabulary rank >> 5 | +| 2 | object/complement bucket | vocabulary rank >> 5 | +| 3 | morphology low byte | MorphFlags bits 0-7 | +| 4 | clause structure | MorphFlags bits 8-13 | +| 5 | discourse / anaphora | entity stack depth + coref flag | +| 6 | causal / temporal | temporal marker present | +| 7 | episodic basin | novelty_high hint | + +**`continues_basin(prev: Cam64) → bool`**: Pika chart-arc predicate. Uses +`count_ones()` on the XOR: shared ≥ 16 bits AND diff ≤ 24 bits. Dumb, +deterministic, no semantic reasoning. + +### `episodic_spo.rs` — Auditable witness row + +`EpisodicSpoFrame` is the truth: one auditable SPO row per triple per sentence. +All 25 fields are `Copy`; the struct stacks in `Vec` for SoA +sweep. The `cam64` field is the fast-index; the `subject/predicate/object_candidate_id` +fields are the truth. Size constrained to ≤ 128 bytes (tested). + +`BasinClassification` expresses how a new frame relates to an existing AriGraph +story basin: `Reinforcement`, `NoveltyDelta`, `WisdomDelta`, `Contradiction`, +`Branch`, `Epiphany`. + +### `window.rs` — ±5 sentence ring buffer + Pika expectation slots + +`SentenceWindow` is an 11-entry ring buffer of `WindowEntry` (up to 4 NP heads +per sentence). Provides `resolve_pronoun(exclude_rank) → u16` with a two-phase +resolution strategy: + +**Phase 1 — Pika forward expectation slots (added in left-corner adaptation):** +When a left-corner trigger fires (relative pronoun, anaphora), `push_expected(rank, reason)` +pre-populates a slot. Resolution checks these first — confirmed expectation beats +recency heuristic. + +**Phase 2 — Confirmed ring, most-recent-first:** +Heads iterated in reverse within each entry (last-mentioned in text = highest +index = most recent). Original Manning & Carpenter recency heuristic. + +```rust +window.push_expected(active_subject, ExpectedReason::RelativeClause); +let referent = window.resolve_pronoun(pronoun_rank); // returns expected first +``` + +`ExpectedReason` enum: `RelativeClause`, `Anaphora`, `Ellipsis`, +`CausalContinuation`, `TemporalContinuation`. + +### `reader_state.rs` — Left-corner state machine + +`ReadingState::step(self, &SentenceStructure, &SentenceFeatures) → (Vec, ReadingState)` + +Pure function — `self` is consumed, `next` is returned. No `&mut self` during +computation (data-flow.md rule). State carries: + +- **Top-down expectation**: `expected_subject_bucket`, `expected_predicate_bucket`, + `active_trigger` (set by the first triple's `LeftCornerTrigger`) +- **Bottom-up evidence**: `active_subject`, `active_predicate`, `active_object` +- **Entity stack**: LIFO bounded at 8, evicts oldest on overflow +- **±5 window**: `SentenceWindow` for coreference +- **Cam64**: current reading-state locality code + +**Left-corner trigger wiring** (Pika chart-arc pre-population): +```rust +LeftCornerTrigger::Relative | LeftCornerTrigger::Anaphora => { + // Prior active_subject is the most likely antecedent. + // Pre-push into window's expectation buffer before processing. + next.window.push_expected(next.active_subject, reason); +} +``` + +`LeftCornerTrigger` variants: `Declarative`, `Causal`, `Temporal`, `Relative`, +`Anaphora`, `FirstPerson`, `Domain(u8)`. Each carries a `basin_byte()` that +feeds into Cam64 lane 7. + +### `signed_crystal.rs` — Discrete reading crystal + +Three types for the holograph bridge: + +**`SignedOffset4`**: 0-14 encodes −7..+7 (raw = offset + 7); 15 = overflow/basin-change. + +**`Crystal4096`**: three axes × 4 bits = 12 bits, 4096 cells. Direct P4096 palette +codebook key. `xor()` for VSA bind/unbind (self-inverse). `same_basin()` = +no overflow AND `nibble_distance ≤ 1`. + +**`SignedSentenceCrystal { p64: P64MeaningField, coord: Crystal4096 }`**: complete +output bridging DeepNSM to holograph. `bind()` XOR-binds both fields. `same_basin_as()` += P64 agreement ≥ 40 bits AND coordinate nibble_distance ≤ 1. + +### `sentence_transformer64.rs` — Native P64 meaning field + +**The architectural correction this module encodes:** + +> P64 is the **native address space**, not a compressed approximation of a float +> embedding. Floats may approximate P64 for external ML interop. P64 does not +> approximate floats. + +**`P64(u64)`**: 8 orthogonal semantic planes. Each word projects *vertically* +into the field — activates across multiple lanes simultaneously. `from_cam64_and_nsm()` +is the canonical construction path (grammar → Cam64 → P64, no floats). +`bind()` = XOR (VSA, self-inverse). `agreement()` = `64 - popcount(XOR)`. + +**`Cam4096(u16)`**: 12-bit deterministic codebook address. `from_p64()` folds +top nibbles of entity, predicate, and basin lanes — a bit-selection, not +nearest-neighbour search in float space. 4096 cells = native-English reading-state +classes at full resolution. + +**`Perturbation4x4`**: local 4×4 discrete ambiguity tile. Row = semantic axis +(entity/predicate shift), col = syntactic axis (clause/discourse shift). 16 +alternatives per step. The implicit `(4×4)^n` trajectory space is never +materialised — HHTL/GridLake prunes to the small living frontier (Pika-style). + +**`splat_p64(centre, tile, radius_bits) → SmallNeighbourhood`**: discrete palette +splat — NOT Gaussian in f32 space. Keeps only cells that change the P64, stay +within Hamming radius, and `near_match` the centre's CAM4096. Stack-allocated, +≤ 16 entries. + +**`SentenceTransformer64`**: projects `(Cam64, nsm_prime_mask, subject, predicate, object, role)` +into `Sentence64 { p64, cam, spo_hint }`. `project_from_frame()` is the ergonomic +path from an `EpisodicSpoFrame`. The name is honest in its own docs: +*"Transformer here means state-transition transformer, not neural self-attention."* + +--- + +## What is NOT changed + +- `ContextWindow` (`context.rs`) — untouched. VSA/Broca faculty. +- `pipeline.rs`, `encoder.rs`, `similarity.rs` — untouched. +- Float fields on `EpisodicSpoFrame` (confidence, novelty, wisdom, entropy, + free_energy_delta) — retained as **boundary quality annotations**, not hot-path + substrate. +- All existing 132 deepnsm tests — still pass. + +--- + +## Float boundary policy + +```text +HOT PATH — zero floats: + P64, Cam4096, Crystal4096, SignedOffset4 + MorphFlags, Cam64, SentenceWindow + splat_p64, hamming_p64, nibble_distance + +BOUNDARY ANNOTATIONS — f32 permitted: + EpisodicSpoFrame.{confidence, novelty, wisdom, entropy, staunen, free_energy_delta} + +FORBIDDEN INTERNAL PATH (absent by omission): + fn from_f32_embedding(...) -> P64 ← does not exist + fn quantize_embedding(...) -> Cam4096 ← does not exist +``` + +--- + +## Test summary + +| Module | Tests | +|--------|-------| +| `morphology` | 8 | +| `cam64` | 13 (5 new: basin continuation) | +| `episodic_spo` | 8 | +| `window` | 11 (6 new: expectation slots) | +| `reader_state` | 14 (4 new: trigger wiring + coref-keyed cam64 + no expectation accumulation) | +| `signed_crystal` | 18 (`P64MeaningField` is now an alias of the canonical `P64`) | +| `sentence_transformer64` | 26 | +| `crystal_neighborhood` | 16 | +| **Existing deepnsm tests** | 104 (unchanged) | +| **Total** | **217** | + +--- + +## Relationship to holograph + +```text +DeepNSM (this PR): + local 64-bit reading-state code + P64 / CAM4096 / Crystal4096 discrete palette + +holograph (crates/holograph): + large 10K/16K/32K bitpacked resonance field + XOR bind/unbind, HDR cascade, stacked popcount, no floats + +AriGraph: + episodic crystallisation into story basins and tombstone witnesses + +Flow: + sentence + → SentenceTransformer64 → Sentence64 { P64, CAM4096 } + → EpisodicSpoFrame (auditable SPO truth) + → holograph SemanticCrystal → BitpackedVector (16Kbit) + → AriGraph basin update +``` + +The holograph `sentence_crystal.rs` (integer-first, char n-gram hashing, +bit rotation, majority bundling) is the correct large-field ancestor. + +--- + +## Deliberately out of scope (follow-up PRs) + +This PR is the **reading substrate**. Two natural extensions are intentionally +deferred so #479 stays a clean review unit: + +### v1.5 — Tekamolo/Anaphora64 coreference provenance + +v1 records *what* resolved (exact NP-head ranks + expected slots). It does not +record *why* — was the resolution confirmed-backward, expected-forward, or +inferred-right? Was gender/number/type agreement used? The `HorizonPolarity` +enum (in `signed_crystal.rs`) and `ExpectedReason` (in `window.rs`) are the +v1 hooks. A future `anaphora64.rs` sidecar should pack provenance (antecedent +bucket, sentence offset, source polarity, expected reason, agreement flags, +role/salience scores, confidence q8) into a `u64`, stored as a provenance field +on `EpisodicSpoFrame`. **Not in Cam64** (locality key, stays lean) and **not in +P64** (native address space). Boundary law: *SentenceWindow resolves, +EpisodicSpoFrame witnesses, Cam64 indexes, Anaphora64 explains.* Belongs to the +coreference-ranking PR, after agreement/ranking is implemented. + +### v2 — OGAR/SurrealDB AST adapter (separate crate) + +The same three-layer split (semantics / syntax / pragmatics) carries into the +OGAR → SurrealQL/DLL/AST adapter, with a domain role rather than a linguistic +one: + +```text +OGAR semantics = what business/domain thing is this? (ClassId/PredicateId/ActionId) +SurrealQL/DLL/AST = how is it represented/executed? (AstNodeId/DllSymbolId/TemplateId) +planner pragmatics= what may this actor do with it now? (ActorId/V_ref/HorizonPolarity/PolicyId) +``` + +Adapter law (mirrors the DeepNSM truth/index/context split): +*Semantics can exist without syntax. Syntax must resolve to semantics before +execution. Pragmatics decides whether resolved syntax may run.* SemanticFrame is +truth; AST node, SurrealQL text, and DLL symbol are execution vehicles, not +truth. SurrealDB is a syntax/runtime view, never the ontology master — class +truth stays in OGAR. This is its own PR (`crates/ogar-surreal-adapter/` or split +across `ogar` / `surreal-adapter` / `lance-graph-planner`), not part of #479. +The ladybug-rs `sentence_crystal.rs` (f32 random projection → 5D coords) +is a float-projection prototype and is NOT the reference here.