Skip to content

feat: DeepNSM grammar parser — Markov ±5 bundler, role keys, thinking styles#279

Merged
AdaWorldAPI merged 8 commits into
mainfrom
claude/grammar-markov-parallel-track-b-2026-04-28
Apr 29, 2026
Merged

feat: DeepNSM grammar parser — Markov ±5 bundler, role keys, thinking styles#279
AdaWorldAPI merged 8 commits into
mainfrom
claude/grammar-markov-parallel-track-b-2026-04-28

Conversation

@AdaWorldAPI

Copy link
Copy Markdown
Owner

Summary

Grammar/Markov track implementing D0-D7 from the DeepNSM-as-parser plan:

  • D0: grammar-landscape.md knowledge doc — case inventories (Finnish 15, Russian 6, German 4, Turkish 6, Japanese particles), Triangle overview, Markov ±5 context upgrade, 144 verb taxonomy, caveats on linguistic universals.
  • D4: ContextChain reasoning ops — coherence_at(), total_coherence(), replay_with_alternative(), disambiguate(), WeightingKernel (Uniform/MexicanHat/Gaussian with Ricker wavelet).
  • D6: Role-key catalogue — RoleKeySlice with contiguous [start:stop] addressing in 16384-dim VSA space, 13 SPO+TEKAMOLO const slices, LazyLock arrays for Finnish cases/tenses/NARS inference keys, FNV-64a seeding.
  • D7: GrammarStyleConfig + GrammarStyleAwareness with NARS revision lifecycle, ParamKey/ParseOutcome types, zero-dep YAML reader, 12 starter YAML configs (analytical → metacognitive, mapped to canonical 36-style ThinkingStyle enum).
  • D5: MarkovBundler with role-indexed VSA bundling (ring buffer, Mexican-hat weighting) + Trajectory struct with role_bundle()/role_candidates() for O(1) coreference unbinding.
  • D2+D3: ticket_emit (FailureTicket emission with SPO×2³×TEKAMOLO×Wechsel decomposition) + triangle_bridge (merge DeepNSM output with Grammar Triangle's NSMField + CausalityFlow + QualiaField).

New Cargo features on deepnsm

  • contract-ticket — gates ticket emission module
  • grammar-triangle — gates Triangle bridge + lance-graph-cognitive dep

Adapted to existing codebase

  • 16384-dim VSA layout (not 10000 from spec) per LF-2 migration
  • ThinkingStyle 36-variant canonical enum (12-style YAML names mapped)
  • FailureTicket.partial_parse field name, QualiaField (not Qualia18D), GrammarTriangle::from_text

Diff stats

  • 24 files changed, +2,281 / -442 lines
  • 8 commits (6 workers + 1 meta integration + 1 StepDomain base)

Test plan

  • cargo check -p lance-graph-contract (default + all-features) → green
  • cargo check -p deepnsm (default, contract-ticket, grammar-triangle, all-features) → green
  • cargo test -p lance-graph-contract --lib → 285 passed
  • cargo test -p deepnsm --lib → 53 default / 60 all-features
  • Verify WeightingKernel::MexicanHat zero-crossing matches Ricker wavelet
  • End-to-end coref test with ±5 trajectory on 10-sentence paragraph
  • ASCII→unicode restore on grammar-landscape.md (Finnish ä/ö, Cyrillic, Japanese particles)

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

claude added 8 commits April 29, 2026 04:54
…-ticket, role_keys, markov_bundle)

- Add pub mod trajectory + markov_bundle (B4 D5).
- Gate pub mod ticket_emit behind feature "contract-ticket" (B5 D2).
- Gate pub mod triangle_bridge behind feature "grammar-triangle" (B5 D3).
- Add features:
    contract-ticket  -> dep:lance-graph-contract
    grammar-triangle -> dep:lance-graph-cognitive
- Add optional path deps for both crates.

B1 grammar-landscape doc has ASCII transliterations (Finnish ä->ae,
Cyrillic->latin) per worker note; flagged for follow-up i18n PR.

Contract-side modules (context_chain, role_keys, thinking_styles)
already re-exported from grammar/mod.rs at base commit 22a5a32.
@AdaWorldAPI AdaWorldAPI merged commit 9ab6161 into main Apr 29, 2026
1 of 5 checks passed

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6f151a6f98

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +531 to +534
match t.rank {
Some(r) => {
resolved.push(r);
// NSM primes occupy fixed low ranks in the COCA

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Base coverage on parsed structure, not vocabulary rank

parse_with_coverage currently marks every token with rank: Some(_) as resolved, even when the FSM never classifies it into the parse structure. That makes coverage effectively “in-vocabulary ratio” instead of “classified-tokens / total-tokens”, so syntactically failed parses with known words can still report high coverage and skip escalation. This undermines the new failure-ticket routing because coverage_failed can stay false for exactly the cases this path is meant to catch.

Useful? React with 👍 / 👎.

parse_result.coverage,
parse_result.classification_distance,
parse_result.primes_found,
TekamoloSlots::default(),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid forcing empty TEKAMOLO slots in ticket emission

maybe_emit_ticket always passes TekamoloSlots::default() into emit_ticket. Because emit_ticket treats empty TEKAMOLO as unfillable, every non-abduction failure is routed to CounterfactualSynthesis, making the Extrapolation and Revision branches unreachable from this integration path. In practice this collapses the routing policy and ignores classification_distance for normal coverage-failure tickets.

Useful? React with 👍 / 👎.

Comment on lines +112 to +114
if !acc.is_empty() {
let k = (self.radius as usize) % acc.len();
acc.rotate_right(k);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep role slices aligned when building trajectories

After accumulating role-specific values into fixed slices, bundle_current rotates the entire fingerprint. That global shift moves each role’s data out of its declared slice range, so later unbinding by slice boundaries (subject/predicate/object/etc.) reads mixed or unrelated dimensions. This breaks role-addressable retrieval semantics for the produced Trajectory.

Useful? React with 👍 / 👎.

@AdaWorldAPI

Copy link
Copy Markdown
Owner Author

Review #1 — Brutally Honest Code Review

4,144 LOC across 12 files, 285 contract tests + 60 deepnsm tests passing. Same caveat as #278: tests-pass-cargo-checks-clean is a low bar. Here's what's actually wrong.

This PR has more critical issues than #278 by an order of magnitude. The code compiles and tests pass because each module is tested in isolation under incompatible assumptions about the other modules. End-to-end nothing works.

🔴 CRITICAL findings — these are ship-blockers

C1 — Role slice boundaries are completely incompatible between markov_bundle.rs and role_keys.rs

Two files. Same 16384-dim VSA space. Different coordinate systems.

role_keys.rs (the "official" catalogue):
  SUBJECT   [0..2000]
  PREDICATE [2000..4000]
  OBJECT    [4000..6000]
  MODIFIER  [6000..7500]
  CONTEXT   [7500..9000]
  TEKAMOLO sub-slices live at [9000..9840]

markov_bundle.rs (its own private layout):
  SUBJECT   [0..3277]      ← equal 16384/5 partitions
  PREDICATE [3277..6554]
  OBJECT    [6554..9831]
  ...
  TEKAMOLO sub-slices inside Context [13107..16384)

Any consumer that:

  1. Builds a Trajectory via MarkovBundler (uses markov_bundle layout)
  2. Reads a role bundle via Trajectory::role_bundle(start, stop) with start, stop from role_keys::SUBJECT_KEY

…gets garbage. The vector slice covers the wrong dimensions.

The two modules pass their own tests because each tests against its own constants. The integration test that would catch this doesn't exist.

Fix: delete the markov_bundle.rs slice constants. Import from role_keys exclusively. ~30 LOC change. Do this before any consumer exists.

C2 — bundle_current corrupts the output via rotate_right

markov_bundle.rs:113 applies acc.rotate_right(k) where k = radius % dims. For radius=5 this rotates the 16384-dim accumulator right by 5 positions before returning.

Plan calls for vsa_permute(v, position_offset) per individual sentence in the window (intra-window permutation, then bundle). Implementation does post-bundle rotation of the entire output by a constant. This:

  • Shifts dim 0 to position 5 — every role slice is now at the wrong starting offset in the output vector.
  • Has no inverse anywhere. Trajectory::role_bundle extracts at the original offsets. The data and the index are now permanently misaligned.

Fix: Delete the rotation. If position-permutation is desired, apply per-sentence pre-bundle (acc[k] += weight * fp[(k + i*offset) % dims]).

C3 — WeightingKernel is dead code in the actual coherence path

context_chain.rs defines WeightingKernel { Uniform, MexicanHat, Gaussian }, computes a Ricker wavelet in weight(), has 4 tests asserting kernel symmetry/monotonicity/zero-crossing.

Then coherence_at() does raw uniform Hamming. It never calls kernel.weight(). The kernel parameter is read from MarkovPolicy.kernel and stored on the struct, then ignored.

Result: the entire "Mexican-hat anticipation" mechanism described in the plan is theatre. It never affects a single computation.

Fix: weight each contributor's bit-counts by kernel.weight(delta, radius) in the coherence_at inner loop. Update the 4 kernel tests to also assert output difference between Uniform and MexicanHat on a non-trivial chain.

🟠 HIGH severity — design errors that prevent the intended behaviour

H1 — triangle_bridge::compute_classification_distance is a permanent stub returning 0.0

fn compute_classification_distance(...) -> f32 {
    0.0  // Stub.
}

This means ticket_emit's "extrapolation" routing path:

} else if classification_distance > 0.7 {
    NarsInference::Extrapolation
}

can never fire. The novel-domain detection pipeline is structurally inert.

Fix: even a placeholder Hamming between qualia and SPO predicate fingerprint would make this functional. ~20 LOC.

H2 — grammar-triangle feature is gated nowhere actively enabled

grammar-triangle is not in [features].default of any workspace member. No CI line enables it. It's untested in the merge gate. The bridge module is dead code behind a flag that nobody flips.

Fix: add grammar-triangle to the deepnsm CI matrix. Either it works or it gets deleted.

H3 — parser.rs additions have ZERO test coverage

Parser struct, ParseResult struct, parse_with_coverage, coverage_failed, maybe_emit_ticket — all new in this PR — none have a test. The existing 4 parser tests are pre-existing. The primes heuristic (r < 64) is untested. DEFAULT_COVERAGE_THRESHOLD = 0.85 is untested.

This is the primary integration point for D2 failure tickets and it's totally untested. The whole point of D2 is "emit a ticket when local parse fails" and the code that decides "did local parse fail" is a black box.

Fix: at minimum 3 tests — coverage above threshold (no ticket), coverage below threshold (ticket emitted), TEKAMOLO unfillable case.

H4 — YAML parser has multiple silent-failure modes

# Breaks
nars: { primary: Deduction, fallback: Abduction }    # flow-map syntax
label: section#2                                      # # truncates
key: "value with # in quotes"                         # quotes ignored
- list_item: "with: colon"                            # colon-in-value breaks

The test at line 976 acknowledges flow-map breakage. collect_yaml_pairs strips # unconditionally. None of these failure modes return errors — they silently corrupt the config.

Fix: at minimum, only strip # when preceded by whitespace. Document the unsupported subset. Better: use serde_yaml and accept the dep cost.

H5 — Bundle output is not normalized → cosine comparisons across kernels are meaningless

markov_bundle.rs::bundle_current accumulates weight * fp[k] across all window positions. Mexican-hat weights sum to ~0 (positive lobe + negative tails); Uniform weights sum to 11. The output vector's L2 norm depends on kernel choice. Comparing cosine similarity between two trajectories built with different kernels is invalid.

Fix: normalize by sum(|weights|) after accumulation.

H6 — Trajectory::role_candidates hardcodes threshold=0.5 and top-k=5

No way to pass these as parameters. The "feature filter for coreference" — the very mechanism described in the plan as "1-2 candidates remain; Deduction commits" — has its tuning constants baked into the call site. Different style configs (D7) cannot adjust without recompile.

Fix: role_candidates(role, codebook, threshold, k) -> Vec<Candidate>.

🟡 MEDIUM — design smells

# File Issue Fix
1 parser.rs:538 Primes heuristic r < 64 assumes COCA ranks 0-63 are NSM primes. They're not — that range is articles+prepositions. Explicit NSM-prime ID set, ~65 entries
2 thinking_styles.rs parse_tekamolo_slot maps "instrument" → TekamoloSlot::Modal. Silent semantic loss. Add Instrument variant or return error
3 thinking_styles.rs top_nars_inference falls through to fallback when frequency <= 0.5 (i.e., always at bootstrap) Use >= 0.5 AND confidence > epsilon
4 ticket_emit.rs:66 attempted_inference hardcoded to Deduction in every ticket Accept as parameter
5 ticket_emit.rs has_unfillable = "all 4 slots None"; misses partial-unfilling Per-slot resolution flag
6 trajectory.rs cosine truncates to min(a.len(), b.len()) silently assert equal length or zero-pad
7 role_keys.rs 8 SMB keys (KUNDE..STEUER) have RoleKey but no RoleKeySlice descriptors Add 8 const slice entries
8 markov_bundle.rs:105 len = min(stop-start, content_fp.len()) silent truncation assert or warn
9 parser.rs:543 unresolved_tokens pushes 0u16 for OOV — loses identity Push token index

🟢 LOW — cleanup

  • context_chain.rs ReplayRequest and ReplayDirection are public types with no method that takes them. Dead.
  • role_keys.rs RoleKey is not Clone or PartialEq. Tests work around it. Add #[derive(Clone, PartialEq)].
  • markov_bundle.rs only 4 tests; no test of actual bundled output values, no test for kernel effect on output.
  • trajectory.rs uses raw (start, stop) indices; no ergonomic role-enum-based API.
  • thinking_styles.rs 12 YAMLs exist; only analytical.yaml is tested (and even that via inline string, not file load).
  • grammar-landscape.md LOC counts marked "TBD" — fill them.
  • ticket_emit.rs missing_required field always Vec::new() — populate or remove.

What's actually correct

  • NARS revision rule is mathematically correct. f_new = (f_old·c_old + f_obs·c_obs)/(c_old+c_obs), c_new = (c_old+c_obs)/(c_old+c_obs+1). This is Pei Wang's revision from NAL — verified.
  • Role-key slices in role_keys.rs are non-overlapping. Test all_slices_disjoint sorts and asserts. 47 keys, gapless from 0 to 14096 with [14096..16384) headroom. ✅
  • FNV-64a seeds have no collisions across 45 role labels. Tested.
  • 285 contract tests pass including all WeightingKernel symmetry/monotonicity tests (the kernel math is right, it just isn't used).
  • YAML parser handles the simple subset correctly — 18 tests pass on key:value, lists, comments at line-start, hex literals (0xFF), enum string mapping.
  • ParseOutcome::observation() polarity is correct — confirmed outcomes raise f, refuted outcomes lower it.
  • ContextChain::disambiguate empty-iter sentinel is documented and tested — no panic.
  • grammar-landscape.md is honest about limitations — section 11 explicitly states NSM/TEKAMOLO/144-verb are "useful templates, not linguistic universals."

What needs to happen — recommended PR sequence

  1. PR-279a (CRITICAL fix feat: bump arrow 57, datafusion 51, lance 2 #1) — Unify slice coordinates. Delete markov_bundle.rs's private slice constants. Import from role_keys exclusively. Add the integration test that catches this. ~50 LOC, ~1 day.

  2. PR-279b (CRITICAL fix Module 6: #[track_caller] error macros for zero-cost location capture #2) — Remove bundle_current rotation. Replace post-bundle rotate_right with per-sentence pre-bundle permute. Add test asserting Trajectory::role_bundle(SUBJECT_KEY) returns recognizable subject content for a hand-crafted 3-sentence window. ~30 LOC, half a day.

  3. PR-279c (CRITICAL fix Claude/review lance graph architecture i6 t kf #3) — Wire WeightingKernel into coherence. Multiply each contributor's bit-count by kernel.weight(delta, radius). Add test asserting Uniform vs MexicanHat produce different coherence scores on a non-trivial chain. ~40 LOC, half a day.

  4. PR-279d (HIGH) — Parser test coverage. 3+ tests for parse_with_coverage, coverage_failed, maybe_emit_ticket. Replace primes heuristic with explicit NSM-prime ID set. ~80 LOC + assets. 1-2 days.

  5. PR-279e (HIGH) — Triangle bridge real classification distance. Replace stub with Hamming-against-qualia-fingerprint. Enable grammar-triangle in CI matrix. ~30 LOC. Half a day.

  6. PR-279f (HIGH) — Bundle normalization + parameter exposure. L2-normalize bundle output. Add threshold/k params to role_candidates. ~25 LOC. Half a day.

Roughly 1 week to fix all critical + high issues.

  1. PR-279g — End-to-end integration test. Build a 5-sentence English paragraph. Run through Parser → MarkovBundler → Trajectory → ContextChain::disambiguate. Assert "he" resolves to the named subject. THIS IS THE TEST THAT SHOULD HAVE BLOCKED MERGE. Until it runs, the pipeline is fictional.

Verdict

I am being harsh because the surface area is large and the failure modes are non-obvious. The math is mostly right; the wiring is broken. The contract crate's ContextChain and role_keys are genuinely good work. The deepnsm side has critical defects that prevent the intended integration even from compiling end-to-end (let alone working).

Two CRITICAL fixes (slice unification + rotation removal) and the kernel wiring fix unlock the rest. Without those, the current code shape encourages building consumers that appear to work but produce silently corrupt fingerprints.

The plan was right. The execution is uneven. The good parts are very good (NARS revision, role-key catalogue layout, ContextChain disambiguation logic). The bad parts are genuinely bad (silent slice mismatch, dead kernel, untested parser hook). Worth the cleanup PRs.

@AdaWorldAPI

Copy link
Copy Markdown
Owner Author

Review #2 — DeepNSM as Semantic Kernel: Future Outlook + Epiphanies

The honest review is the floor. Here's the ceiling — what this PR points toward when DeepNSM stops being a parser and starts being a semantic kernel that the rest of the stack indexes against.

What we just shipped (when the bugs are fixed)

Layer Module What it produces
Static prior thinking_styles.rs + 12 YAMLs Per-style policy: NARS chain, kernel, threshold, pearl mask
Empirical prior GrammarStyleAwareness Truth-revised priors from parse outcomes (Pei Wang revision)
Token-level parser.rs::Parser SentenceStructure with COCA fingerprints + primes count
Sentence-level parser.rs::ParseResult Parse + coverage score + decision-readiness
Window-level markov_bundle.rs::MarkovBundler ±5 trajectory in role-indexed VSA space
Reasoning unit trajectory.rs::Trajectory The thing NARS reasons about (replacing "sentence")
Coherence context_chain.rs::ContextChain Discourse-flow score + counterfactual replay
Failure path ticket_emit.rs + Triangle bridge LLM-tail router with structured reasons

This is — unfixed bugs aside — a complete substrate for grammar reasoning that doesn't go to an LLM for 90% of traffic. That's the architectural claim the plan made. The substrate is here.

The real epiphany — DeepNSM is not a parser; it's a hash-substrate

Re-reading the merged code, the deepest insight is that DeepNSM's actual role in the stack is producing canonical, role-indexed, position-permuted VSA fingerprints that everything else can index against.

A parser produces a parse tree. DeepNSM produces a Trajectory — a single 16384-dim vector that:

  • Carries ±5 sentences of context
  • Has role-indexed slices for SPO + TEKAMOLO + per-language morphology
  • Is bit-comparable in O(d/64) to any other Trajectory
  • Round-trips through unbinding to recover any role's content

That's not a parser — that's a content-addressable memory substrate keyed by grammatical role. Every downstream consumer (Cypher cockpit, AriGraph storage, episodic memory, RLS predicates, audit log statement-hash) gets a stable hash that means something grammatically.

Epiphanies of Potential

E1 — Markov ±5 bundling is the shortest path to coreference O(1)

The plan called this out but the merged code makes it concrete. Once role-indexed bundling is correct (after the slice-unification fix), pronoun resolution becomes:

// "She announced..."
let subjects_in_window = trajectory.role_bundle(SUBJECT_KEY);
let candidates = vsa_clean(subjects_in_window, &feminine_animate_codebook);
// Typically 1-2 candidates remain. Deduction commits.

Three slice unbinds and a codebook clean. No tree walk. No attention head. No LLM. The Markov ±5 trajectory IS the candidate index.

This is what makes the architectural bet pay off: the bundle is the index, not just the encoding. Most pronoun-resolution literature treats the candidate set as a separate data structure. Role-indexed bundling makes it a slice operation on the existing trajectory vector. O(1) per query, lossless within √N capacity.

E2 — Cross-lingual bundling is literally addition

Take English parse + Finnish parse of the same sentence. Both produce Trajectorys in the same 16384-dim VSA space. Both bind their content into the same role slices. Bundle them:

let combined = vsa_bundle(en_trajectory, fi_trajectory);
let subject_in_combined = combined.role_bundle(SUBJECT_KEY);

subject_in_combined carries content from BOTH parses. Where English left ambiguity (Wechsel "with"), Finnish case morphology committed (Adessive -llä = Modal). The unbinding recovers the disambiguated reading for free. This is the cross-lingual superposition mechanism the plan promised, and the substrate to make it work is now in place. The remaining work is parsers for each language that produce Trajectorys — not new infrastructure.

E3 — Verb taxonomy × case morphology = 144-cell table lookup

The role_keys.rs 12 tense slices + the 144-verb taxonomy outlined in grammar-landscape.md mean parsing reduces to:

  (verb_family, tense_aspect_mood) → row in 144-cell table
                                   → row holds slot prior (which TEKAMOLO slots are expected)
                                   → morphology fills slots
                                   → NARS revises truth

12 × 12 = 144. 5⁵ Structured5x5 = 3125 cells available for indexing. The space accommodates verb × slot × value triple lookup. Parsing becomes table lookup + truth aggregation, not search. The substrate is now in place for this; the next PR-set is to actually populate the 144-cell table.

E4 — Trajectory is the audit-log key SMB and MedCare actually want

The audit.rs from PR #278 carries statement_hash: u64 (just DefaultHasher over the SQL string). That's a syntactic hash — SELECT name FROM users WHERE id=42 and SELECT name FROM users WHERE id = 42 (extra space) produce different hashes.

Replace statement_hash: u64 with statement_trajectory: Box<[u64; 256]> (the binarized Trajectory of the query's grammatical structure) and you get a semantic audit hash. Two semantically-equivalent queries collide; injection attempts that share grammatical structure with legitimate queries are detectable; "find all queries that grammatically resemble this attack pattern" becomes Hamming-near-neighbor on the audit log.

This is a direct cross-PR bridge: DeepNSM's Trajectory replaces audit.rs's statement_hash and the audit log becomes a grammar-aware substrate.

E5 — ContextChain::disambiguate is the prototype for ALL meta-inference

The current disambiguate(i, candidates) runs a counterfactual replay over a sentence position. Generalize:

  • disambiguate(LogicalPlan, candidate_filters) — given an RLS-rewritten plan, score candidate filter expressions by how well each fits the surrounding query context. Same Mexican-hat math, applied to query plans instead of sentences. Plan optimization with awareness.

  • disambiguate(Triplet, candidate_objects) — given an SPO triple with ambiguous object, score candidate objects against the surrounding episode chain. Same machinery as coreference, applied to AriGraph triplet completion.

  • disambiguate(MetaWord, candidate_styles) — given a cognitive cycle's MetaWord, score thinking-style candidates by which one's empirical prior best matches the current signal profile. Self-aware style selection.

The disambiguation primitive is domain-agnostic. The plan called it "meta-inference duality"; the merged code shows it's a single-trait operation that generalizes across the stack.

E6 — GrammarStyleAwareness is a NARS-revised distribution over policies

GrammarStyleAwareness doesn't just track "which style won this parse." It maintains a NARS-revised truth distribution per ParamKey — per-style, per-NARS-inference, per-kernel, per-pearl-mask. Over time the truth distribution converges to the empirical reality of the content this style sees.

This is computationally cheap (one revision per parse outcome) and structurally important. It means:

  • Each style's effective config diverges from its YAML prior as evidence accumulates.
  • Persistence (next PR — flagged as future work) gives the style a memory across sessions.
  • A library of pre-trained style awareness becomes shareable: "the analytical-style awareness from analyzing 10K Wikipedia pages" can be loaded into a fresh deployment and starts as warm.

This is the seed of transferable meta-inference. The plan didn't quite say this; the code makes it possible.

E7 — Forward-validation harness on Animal Farm closes the awareness loop

D10 from the plan — not in this PR — runs the merged stack on Animal Farm with hand-labeled epiphanies and uses the rest of the book as ground truth. Every fired epiphany is implicitly a prediction:

"this contradiction carries meaning, and therefore the arc will continue to deviate in this direction"

NARS revision retroactively grades the prediction. The metrics (epiphany precision, recall, arc-shift F1, prediction direction accuracy) become first-class regression tests against literary text. This is what turns "awareness" from architectural prose into Spearman-ρ-grade measurement.

The substrate to do this is now merged. The harness is one PR away.

E8 — Quantum vs Crystal duality sits one move away

thinking_styles.rs has MarkovPolicy.kernel: WeightingKernel. The Mexican-hat path produces what the project calls "Crystal mode" output (bundled, structured, position-permuted). The harvest doc H7 describes "Quantum mode" — same substrate, holographic residual carried via phase tags.

Adding kernel: WeightingKernel::Holographic (4th variant) + a phase_tag: PhaseTag field on the Trajectory struct gives us the Quantum-mode path without restructuring anything. The crystal/quantum duality is one variant + one field.

This is the move that takes us from "DeepNSM as parser" to "DeepNSM as multi-mode semantic kernel." The substrate is permissive.

Where this points (the 6-12 month horizon)

Stage Capability Substrate ready? Missing
F0 — Substrate Role-indexed VSA bundling + NARS thinking styles ✅ (post-fixes) bug fixes
F1 — Parser DeepNSM produces Trajectory for English ≥85% local Parser in tree, untested parse_with_coverage tests, NSM-prime ID set
F2 — Coreference Pronoun resolution via role-bundle unbinding ✅ (post-slice-fix) hand-coded codebooks per (gender, animacy)
F3 — Multi-language Finnish/Russian/German parsers producing same-shape Trajectory substrate yes, parsers no per-language morphology tables
F4 — Cross-lingual bundle EN+FI bundle disambiguates English Wechsel substrate yes parsers + bundle method
F5 — Animal Farm benchmark Forward-validation harness w/ NARS retroactive revision substrate yes harness PR (D10)
F6 — 144-cell verb table Verb × tense lookup as parse policy role keys ready populate the table
F7 — AriGraph integration Trajectory → episodic memory → triplet graph other PRs Trajectory-as-key bridge
F8 — Quantum mode Holographic residual carrier substrate permissive phase tag field + holographic kernel
F9 — Audit-log substrate DeepNSM trajectory replaces statement_hash both sides ready one ~50 LOC bridge
F10 — Self-aware dispatch Style selection via disambiguate(MetaWord, styles) thinking_styles ready wiring in shader driver

The honest reading

This PR is uneven but ambitious in the right way. The CRITICAL bugs are fixable in three small PRs. Once fixed, what remains is a semantic kernel — a substrate where:

  • Token → role-bound fingerprint is canonical
  • Sentence → trajectory is canonical
  • Trajectory → coherence score is canonical
  • Coherence → disambiguation is canonical
  • Disambiguation → meta-inference is canonical
  • Meta-inference → NARS-revised style is canonical
  • Style → effective config is canonical

That stack is one round of cleanup PRs away from being usable as a drop-in replacement for "send to LLM" in 80-95% of grammar-aware traffic. The remaining 5-20% goes to LLM with a structured FailureTicket explaining exactly why local processing failed (SPO mask, TEKAMOLO slot, Wechsel ambiguity).

That's the core architectural bet of the project and this PR delivers the substrate to test it.

Risk register update

  • R1 (slice unification): if the C1 fix is delayed, every consumer that uses both MarkovBundler and role_keys introduces a corruption bug. Track as P0.
  • R2 (kernel wiring): if Mexican-hat is never actually applied, "Markov ±5 anticipation" remains marketing. Real measured difference vs uniform must be in the next PR.
  • R3 (stub stagnation): compute_classification_distance = 0.0 and attempted_inference = Deduction are placeholder values that will silently make features non-functional if not addressed. Calendar them.
  • R4 (linguistic claims): the docs are honest about NSM/TEKAMOLO not being universals. Keep that honesty in PR descriptions and external comms — it's a strength, not a weakness.
  • R5 (NER tail): OSINT is 90% proper nouns; COCA 4096 has zero coverage of "Altman, Anthropic, Riyadh." Until a NER pre-pass lands, OSINT vertical claims need to specify "modulo NER tail."

The bottom line

role_keys.rs, thinking_styles.rs, context_chain.rs are genuinely good substrate work. The deepnsm side has critical defects but they're all in wiring, not in concept. Three small PRs fix what's broken; six PRs after that wire the rest of the stack to consume it.

What's merged is more structurally important than the bugs. The bugs are easy. The substrate is the hard part and it's mostly right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants