diff --git a/.claude/knowledge/grammar-landscape.md b/.claude/knowledge/grammar-landscape.md index 176069ae..ae9c9a73 100644 --- a/.claude/knowledge/grammar-landscape.md +++ b/.claude/knowledge/grammar-landscape.md @@ -1,429 +1,113 @@ -# Grammar Landscape — Three Stacks, One Target +# Grammar Landscape (D0) -> **READ BY:** agents working on DeepNSM extraction, grammar triangle -> integration, coreference resolution, Markov context chains, OSINT -> pipelines, or anything that touches `grammar/*` or `crystal/*` in -> `lance-graph-contract`. -> -> **Companion docs (load together):** -> - `grammar-tiered-routing.md` — 5-criterion coverage detector, -> failure decomposition, morphology coverage baseline. -> - `linguistic-epiphanies-2026-04-19.md` — E13–E27 cross-repo -> harvest (Chomsky isomorphism, Σ10 tiers, sigma_rosetta, -> membrane, resonanzsiebe). -> - `cross-repo-harvest-2026-04-19.md` — H1–H14 VSA / CFG / Born- -> rule foundation. -> - `integration-plan-grammar-crystal-arigraph.md` — E1–E12 -> shipping plan. -> - `crystal-quantum-blueprints.md` — mode duality. -> - `endgame-holographic-agi.md` — 5-layer north star. +> 2026-04-28. Status: knowledge anchor for the DeepNSM-as-parser PR. +> Cross-refs: grammar-tiered-routing.md, integration-plan-grammar-crystal-arigraph.md, +> crystal-quantum-blueprints.md, cross-repo-harvest-2026-04-19.md, +> session-capstone-2026-04-18.md, endgame-holographic-agi.md. ---- +## 1. Three Grammar Stacks -## §1 The Three Grammar Stacks +| Stack | Crate / package | Key files | LOC | +|---|---|---|---| +| Rust | `lance-graph-contract::grammar` | mod.rs, ticket.rs, context_chain.rs, role_keys.rs, thinking_styles.rs, finnish.rs, tekamolo.rs, wechsel.rs, inference.rs, free_energy.rs | (LOC TBD) | +| Rust | `deepnsm` | parser.rs, encoder.rs, codebook.rs, fingerprint16k.rs, pos.rs, spo.rs, vocabulary.rs | (LOC TBD) | +| Python / TS bridges | `agi-chat::grammar/grammar-awareness.ts` (237 LOC) | mirror of Rust thinking_styles | 237 | -The same mechanism — **text → Grammar Triangle (NSM × Causality × -Qualia) → structured output** — exists in three independent -implementations. None of them is wired into DeepNSM. The shipping -work is not *building* grammar; it is *routing DeepNSM output -through the existing triangle* and *sharing the results across all -three consumers*. +## 2. The Triangle: NSM x Causality x Qualia +NSM = 65 universal primes (Wierzbicka). Causality = SPO with Pearl 2^3 mask. Qualia = 18-D emotional/evaluative signature. Each lens projects orthogonally; bundled, they hydrate a SentenceStructure into SpoWithGrammar. -### 1.1 Rust — `lance-graph-cognitive/src/grammar/` (1,929 LOC) +## 3. TEKAMOLO Slot Schema +TE temporal, KA kausal, MO modal, LO lokal -- German pedagogical mnemonic. +Currently 3 of 6 thematic slots covered. Deferred: beneficiary, goal, source. Future-only: path, purpose, result. NOT a linguistic universal -- see Section 11. -| File | LOC | Role | -|---|---|---| -| `nsm.rs` | 448 | 65 Wierzbicka semantic primes, text → prime activation vector (`NSMField`), fingerprint encoding | -| `causality.rs` | 396 | `CausalityFlow` = agent / action / patient / reason / temporality / agency / dependency_type | -| `qualia.rs` | 718 | 18-D felt-sense field | -| `triangle.rs` | 304 | `GrammarTriangle` composes all three → `Fingerprint` | -| `mod.rs` | 63 | Module root | - -This is the most mature implementation and the one DeepNSM must -consume. **Shipped, untouched by this PR.** - -### 1.2 Python — `bighorn/extension/agi_stack/universal_grammar/` (~5,000 LOC, 16 modules) - -| File | Role | -|---|---| -| `core_types.py` | `Glyph5B` 5-byte archetype address, `Dimension` enum (18D / 64D / qHDR / HOT) | -| `verb_endpoints.py` | `VerbFamily` + `VerbMode` + `VerbRouter` + FastAPI routes + MCP tool generation | -| `calibrated_grammar.py` | Grammar with calibration data | -| `method_grammar.py` | `[method]payload` HTTP-as-ontology | -| `resonance.py` + `resonanzsiebe.py` | Resonance sieve (6-level SILENCE → SCREAM filter) | -| `scent_optimizer.py` | 1-byte scent-level grammar optimization | -| `situation_executor.py` + `_storage.py` | Situation-driven execution + persistence | -| `meta_uncertainty.py` | MUL integration | -| `exploration.py` + `invoke_router.py` | Dispatch routers | -| `markov_context.py` | Session-depth Markov (`SessionToken`, trajectory) | -| `awareness_blink.py` | Blink-unit awareness | -| `jina_integration.py` | Jina 1024-D bridge | -| `integration.py` | Cross-module glue | +## 4. Markov +/-5 as Context Upgrade +Pre: reasoning unit = sentence. Post: reasoning unit = trajectory carrying +/-5 sentences (Mexican-hat weighted). NARS reasons "this sentence in this flow", not "this sentence". Cross-lingual bundle: bind EN+FI parses of same entity -> Finnish case morphology disambiguates Wechsel-ambiguous English roles for free. -The Python stack carries the architectural maturity. It's the -**reference spec** for the Rust and TypeScript implementations. +## 5. Case Inventories Per Language -### 1.3 TypeScript — `agi-chat/src/grammar/` + `src/thinking/` - -| File | LOC | Role | +### Finnish (15 cases -- CORRECTION applied) +| Case | Suffix | Native role | |---|---|---| -| `grammar-awareness.ts` | 237 | Soft / strict awareness modes, response steering | -| `grammar-bridge.ts` | 212 | Bridge to thinking cycle | -| `extensions/langextract/grammar_triangle.py` | — | Parallel Python grammar triangle (duplicate of Rust) | - -TS handles the user-facing awareness / bias steering. Consumes the -Python output where available. - -### 1.4 Convergence verdict - -All three implement the same Grammar Triangle — NSM × Causality × -Qualia → fingerprint. The convergence target is **DeepNSM as the -shared extraction engine**, with the Grammar Triangle as its -structured output format, and all three language-specific stacks -downstream consumers of the same shape. - -This PR's role: pipe DeepNSM FSM output through the existing Rust -`GrammarTriangle::analyze(text)` (D3 triangle bridge in the plan). - ---- - -## §2 The Grammar Triangle (NSM × Causality × Qualia) - -The Triangle composes three orthogonal linguistic signals per sentence -into a single fingerprint. - -``` - NSM 65 primes - ▲ - │ - ┌───┴───┐ - │ │ - ▼ ▼ - Causality 2³ Qualia 18-D -``` - -- **NSM** (Wierzbicka 65 primes) — universal semantic atoms. I, YOU, - THINK, WANT, BIG, BAD, BECAUSE, etc. Language-independent. -- **Causality** — `CausalityFlow` holds agent / action / patient / - reason / temporality / agency / dependency_type. Currently 3/9 - slots; see §3 for extension. -- **Qualia** — 18-D felt-sense field (17-D experienced + 1 `classification_distance`, - the RGB→CMYK qualia distinction from PR #205). - -Grammar Triangle IS ContextCrystal at window=1 (harvest H4). Widening -the window to ±5 IS the Markov trajectory (D5). - ---- - -## §3 TEKAMOLO Template + the 3/6 → 6/9 Slot Gap - -TEKAMOLO (**T**emporal / **K**ausal / **M**odal / **L**okal, German -grammar-pedagogy mnemonic) is the adverbial-slot template that -extends SVO to cross-linguistic coverage. - -### 3.1 What `CausalityFlow` currently has (3/9 TEKAMOLO slots) - -```rust -pub struct CausalityFlow { - pub agent: Option, // Subject - pub action: Option, // Verb - pub patient: Option, // Object - pub reason: Option, // KAUSAL ✓ - pub temporality: f32, // TEMPORAL (float, needs slot form) - pub agency: f32, - pub dependency_type: DependencyType, -} -``` - -### 3.2 The TEKAMOLO-completion extension (3 more slots, deferred) - -```rust - pub modal: Option, // MODAL (how — manner adverbials) - pub local: Option, // LOKAL (where — spatial) - pub instrument: Option, // WITH (means / instrumental) -``` - -### 3.3 The thematic-role completion (3 more beyond TEKAMOLO) - -Yesterday's discussion surfaced that modal / local / instrument alone -is insufficient. Full thematic-role theory adds: - -```rust - pub beneficiary: Option, // for whom (dative of benefit) - pub goal: Option, // to where (directional) - pub source: Option, // from where (ablative origin) -``` - -### 3.4 Optional language-specific additions - -- **Path** — "through where". Finnish Prolative `-tse/-itse`; - Turkish instrumental-path construction. -- **Purpose / Finale** — "in order to". German `zum + Inf`; Finnish - Translative `-ksi` in purposive reading. -- **Result** — "leading to what". German `sodass`; Finnish - consequence constructions. - -### 3.5 Deferred from this PR (explicit) - -The full 9-slot CausalityFlow is **deferred** (per user decision): -D0 documents it here; D2 `ticket_emit.rs` populates only the 3 -existing slots; D3 triangle bridge maps only what's available. -Future PR lands the extension as a pure struct change. - -### 3.6 YAML training angle (future target) - -Author **200–500 TEKAMOLO templates per language** as YAML, fine- -tune a small LLM to emit slot-filled templates instead of free text. -The templates ARE the grammar constraint — output is slot-filling, -not generation, so the LLM cannot hallucinate new relations. - -```yaml -- text: "The Pentagon contracted OpenAI in December because of ChatGPT's capabilities" - template: tekamolo - subject: "Pentagon" - verb: "contracted" - object: "OpenAI" - temporal: "December" - kausal: "ChatGPT's capabilities" - modal: null - local: null -``` - ---- - -## §4 Case Inventories Per Language (Native Terminology) - -**Critical correction from yesterday's draft:** each language uses -its native case inventory, not a Latinate translation. Yesterday I -wrote Finnish "Accusative `-n/-t`" which is a Latinate transplant; -the actual Finnish object marking works differently. Every case -table below uses the language's own grammar tradition. - -### 4.1 Finnish — 15 cases - -**Object marking (correction from prior draft):** -- **Total object:** Nominative (plural) or Genitive `-n` (singular) -- **Partial / negated object:** Partitive `-a / -ä` -- **True Accusative:** only for personal pronouns (`minut`, `sinut`, - `hänet`, `meidät`, `teidät`, `heidät`) - -**Full 15-case inventory:** - -| Case | Suffix | TEKAMOLO / role | -|---|---|---| -| Nominative | `-∅` | Subject (S) | -| Genitive | `-n` | Possessor / total-object singular | -| Partitive | `-a / -ä` | Partial / negated object | -| Accusative | `-n / -t` (personal pronouns only) | Object for pronouns only | -| Inessive | `-ssa / -ssä` | **LO** — in / inside | -| Elative | `-sta / -stä` | **LO** — from inside | -| Illative | `-Vn / -hVn / -seen` | **LO** — into | -| Adessive | `-lla / -llä` | **LO** / **MO** — at / on / by | -| Ablative | `-lta / -ltä` | **LO** — from surface | -| Allative | `-lle` | **LO** — onto / toward | -| Essive | `-na / -nä` | **MO** — as / in the state of | -| Translative | `-ksi` | **MO** — becoming | -| Abessive | `-tta / -ttä` | **MO** — without | -| Comitative | `-ine-` | **MO** — together with | -| Instructive | `-n` (pl) | **MO** — by means of | - -### 4.2 Russian — 6 cases (full inventory including Instrumental) - -| Case | Masc. sg | Fem. sg | Neut. sg | Role | -|---|---|---|---|---| -| Nominative | `-∅` | `-а / -я` | `-о / -е` | Subject | -| Genitive | `-а / -я` | `-ы / -и` | `-а / -я` | Possessor / negated-obj / partitive | -| Dative | `-у / -ю` | `-е / -и` | `-у / -ю` | Recipient — often Kausal indirect | -| Accusative | = Nom (inanimate) / = Gen (animate) | `-у / -ю` | `-о / -е` | Direct object | -| **Instrumental** | `-ом / -ем` | `-ой / -ей` | `-ом / -ем` | **Means / agent in passive = MODAL** | -| Prepositional | `-е` | `-е / -и` | `-е` | With в/на/о = LO or TE | - -Russian Instrumental `-ом` ≈ Finnish Adessive `-lla/-llä` -(means/instrument) plus Finnish Essive `-na/-nä` (role/state) -folded together. One case ending commits TEKAMOLO Modal by -morphology alone. - -### 4.3 German — 4 cases - -| Case | Article (masc/fem/neut/pl) | Role | +| Nominative | -0 | Subject; Total object (plural) | +| Genitive | -n | Possessor; Total object (singular) | +| Partitive | -a/-ae | Partial / negated object | +| Accusative | -t | **PERSONAL PRONOUNS ONLY** (minut, sinut, haenet, meidaet, teidaet, heidaet) | +| Inessive | -ssa/-ssae | "in" -- TEKAMOLO Lokal | +| Elative | -sta/-stae | "from inside" -- TEKAMOLO Lokal/Source | +| Illative | -Vn/-hVn/-seen | "into" -- TEKAMOLO Lokal/Goal | +| Adessive | -lla/-llae | "at/by" -- TEKAMOLO Modal/Lokal | +| Ablative | -lta/-ltae | "from" -- TEKAMOLO Source | +| Allative | -lle | "to" -- TEKAMOLO Goal/Beneficiary | +| Essive | -na/-nae | "as" -- TEKAMOLO Modal (state) | +| Translative | -ksi | "into being" -- TEKAMOLO Modal/Purpose | +| Instructive | -in | "by means of" -- TEKAMOLO Modal | +| Abessive | -tta/-ttae | "without" -- TEKAMOLO Modal (negative) | +| Comitative | -ne- | "with" -- TEKAMOLO Modal/Companion | + +NOTE: prior `grammar-tiered-routing.md` mapped Accusative `-n/-t` -> Object generally; that was a Latinate transplant. True Accusative is personal-pronoun-only; nominal total object is Nom (pl) or Gen (sg). + +### Russian (6 cases) +| Case | Suffix sg masc/fem/neut | Role | |---|---|---| -| Nominativ | der / die / das / die | Subject | -| Genitiv | des / der / des / der | Possessor | -| Dativ | dem / der / dem / den | Recipient / Lokal with spatial prep | -| Akkusativ | den / die / das / die | Object | - -Wechsel-prepositions (an, auf, hinter, in, neben, über, unter, vor, -zwischen) govern Dativ (static) or Akkusativ (directional). German -Dativ + `mit` = Modal; German Akkusativ + `durch` = Path. - -### 4.4 Turkish — 6 cases + agglutinative chain - -| Case | Suffix | Role | -|---|---|---| -| Nominative | `-∅` | Subject | -| Accusative | `-i / -ı / -u / -ü` | Object | -| Dative | `-e / -a` | Goal / direction | -| Locative | `-de / -da / -te / -ta` | LO | -| Ablative | `-den / -dan / -ten / -tan` | Source | -| Genitive | `-in / -ın / -un / -ün` | Possessor | - -Agglutinative stacking: `ev-ler-imiz-de-y-diler` = "they were at our -houses" = 6 morphemes (house + plural + 1pl-possessor + locative + -buffer + past-3pl-copula). - -### 4.5 Japanese — particles (no cases) - -| Particle | Role | -|---|---| -| が (ga) | Subject marker | -| を (wo) | Object marker | -| に (ni) | Dative / Lokal / Temporal | -| で (de) | Lokal (locative-instrumental) / Modal | -| へ (he) | Directional (goal) | -| と (to) | Comitative / quotative | -| から (kara) | Source (ablative) | -| まで (made) | Terminative | - -Particles attach post-positionally and commit grammatical role as -cleanly as case endings. No gender, no case paradigms — pronouns -usually dropped (verb-agreement-free language). +| Nominative | -0 / -a,-ya / -o,-e | Subject | +| Genitive | -a,-ya / -y,-i / -a,-ya | Possessor; negated object; partitive | +| Dative | -u,-yu / -e,-i / -u,-yu | Recipient -- TEKAMOLO Kausal | +| Accusative | =Nom (inan) / =Gen (anim) / -u,-yu / -o,-e | Direct object | +| **Instrumental** | -om,-em / -oy,-ey / -om,-em | Means/agent -- TEKAMOLO Modal | +| Prepositional | -e / -e,-i / -e | Governed by v/na/o -- TEKAMOLO Lokal/Temporal | -### 4.6 Hebrew / Arabic — no cases, root-pattern morphology +### German (4 cases) +Nom (subject) / Gen (possessor) / Dat (indirect object, "mit + Dat" = TEKAMOLO Modal) / Akk (direct object). -Semitic languages carry grammatical information in a trilateral -consonantal root (K-T-B = write) + vocalic template pattern -(katab = "he wrote", yaktub = "he writes", maktub = "written"). -Role is determined by the **template**, not by suffix. +### Turkish (agglutinative) +Nom -0 / Gen -in / Dat -e / Acc -i / Loc -de / Abl -den. Suffix order: stem + plural + possessive + case + question. *evlerimizdeydiler* = ev-ler-imiz-de-y-di-ler ("they were at our houses"). -For OSINT the Semitic handling needs its own harvest pass (root + -pattern tables) — out of scope for this PR. +### Japanese (particles) +ga (subject) / wo (object) / ni (dative/locative) / de (instrumental/locative) / he (directional) / to (companion) / kara (source) / made (terminus). Particle replaces case morphology. ---- +NOTE: each language uses native terminology. Latinate labels can mislead -- Finnish Accusative != Russian Accusative != German Akkusativ in scope. -## §5 Pronoun-Feature Commitment (Second Orthogonal Axis) +## 6. YAML Templates Pipeline (future, NOT in current PR) +Target: 200-500 TEKAMOLO templates per priority language as training pairs for the local 90-99% tier. Out of scope for the current PR. -Morphological slot-filling (§4) and pronoun-feature commitment are -**two orthogonal axes**. A language can be rich in one and poor in -the other. For coreference resolution both matter. +## 7. Pronoun Classes +**Fixed** (axiomatic features): I/you/he/she/they/proper-names. Feature filter over +/-5 candidates IS the resolution. Cheap, permanent. +**Wechsel** (zero inherent commitment): it/that/this/which/one/singular-they. Need full meta-inference (CF axis x Markov axis x cross-lingual bundle). -| Language | Morphology slot-filling | Pronoun feature commitment | +Cross-linguistic commitment profile: +| Lang | Morphology | Pronoun features | |---|---|---| -| English | **weak** (word order only) | moderate (he/she/it on 3sg) | -| German | moderate (4 cases) | **strong** (er/sie/es + full case paradigm) | -| Russian | heavy (6 cases) | **strong** (он/она/оно + full case paradigm) | -| Finnish | **very heavy** (15 cases) | **weak** (single `hän` for he/she — gender-neutral) | -| Japanese | agglutinative particles | **minimal** (usually dropped) | -| Turkish | agglutinative | weak (single `o` for he/she/it) | +| English | weak | moderate (he/she/it on 3sg) | +| German | moderate (4) | strong (er/sie/es + case) | +| Russian | heavy (6) | strong (on/ona/ono + full case) | +| Finnish | very heavy (15) | weak (single haen gender-neutral) | +| Japanese | particles | minimal (often dropped) | +| Turkish | agglutinative | weak (single o for he/she/it) | -**Counter-intuitive reversal:** Finnish is easiest on morphological -slot-filling (98%+ coverage per `grammar-tiered-routing.md` §Morph) -but NOT on pronoun feature resolution (`hän` is gender-neutral). -German and Russian are richest on pronoun features; they commit -gender AND case on every 3rd-person pronoun. +Finnish: easiest morphology, hardest pronoun-features. Cross-lingual EN+DE+RU+FI bundle = complementary quartet. -**Cross-lingual bundle strategy (EN + DE + RU + FI):** each language -contributes where its commitment is strongest. Bundle maximises both -axes simultaneously. This is the VSA-native coref superpower — -parse the same entity in four languages, XOR-bundle the trajectories, -let each language's morphology collapse the others' ambiguities. - ---- - -## §6 Markov ±5 as the Context Upgrade - -Pre-Markov reasoning unit = **sentence** (Grammar Triangle at -window=1). Isolated. Fragile on Wechsel. - -Post-Markov reasoning unit = **trajectory** (ContextCrystal at -window=5). NARS reasons about "this sentence in this flow," not -about isolated sentences. - -This is the **context dimension upgrade** to NARS + SPO 2³ + TEKAMOLO. -The 144-verb taxonomy (12 semantic families × 12 tense/aspect -variants) carries a TEKAMOLO slot prior per cell: - -| Verb family | Expected TEKAMOLO profile | +## 8. Markov vs Counterfactual Axes -- when each is primary +| Situation | Primary axis | |---|---| -| BECOMES | Temporal + Modal | -| CAUSES | Subject + Object + Kausal | -| TRANSFERS | Subject + Object + Goal + (optional Kausal) | -| GROUNDS | Object + Lokal | -| TRANSFORMS | Object + Temporal + Modal | - -× 12 tense/aspect = **144 verb-role cells, each a slot-filling policy**. -Parse becomes cell-lookup + morphology-driven column fill + NARS -truth-merge. - ---- - -## §7 Convergence Target — DeepNSM as Shared Extraction Engine +| Heavy morphology + clear discourse | Neither -- Deduction closes | +| Heavy morphology + weird discourse | Markov | +| Light morphology + clear discourse | Counterfactual | +| Light morphology + weird discourse | Both weak -> FailureTicket | +| Cross-lingual bundle available | Bundle collapses CF | -### 7.1 Today's state - -- DeepNSM: 6-state PoS FSM, 4,096 COCA, SpoTriple packed u64, - <10 µs/sentence. **85% English SVO.** -- Grammar Triangle: NSM × Causality × Qualia → fingerprint. Shipped, - unwired from DeepNSM. -- Three language stacks: Rust (canonical), Python (reference), - TypeScript (UI). Doing the same thing with different depth. - -### 7.2 Target state (this PR) - -- DeepNSM emits `FailureTicket` when coverage < 0.9 (D2). -- DeepNSM calls `GrammarTriangle::analyze(text)` via `triangle_bridge.rs` - (D3). Output: `SpoWithGrammar { triples, causality, nsm_field, - qualia_signature, classification_distance }`. -- Markov ±5 trajectory bundled via role-indexed VSA (D5). -- `ContextChain` supports coherence / replay / disambiguate with - Mexican-hat kernel (D4). -- Role keys (SUBJECT / PREDICATE / OBJECT / TEKAMOLO / Finnish cases - / tenses / NARS inferences) as contiguous `[start:stop]` slices - in 10K VSA (D6). - -### 7.3 Target state (future PRs, documented here for bootloading) - -- **CausalityFlow 3→9 slots extension** (§3). `beneficiary`, `goal`, - `source`, `modal`, `local`, `instrument` added as `Option`. -- **200–500 YAML templates per language** (§3.6). Fine-tune small - LLM to emit slot-filled templates instead of free text. -- **D3 triangle wired to TypeScript grammar-awareness.ts** — cross- - stack unification. -- **D8 story-context bridge** — Markov ±5 trajectory feeds AriGraph - `EpisodicMemory::add_with_trajectory`; `TripletGraph::story_vector` - + graph direct lookup superpose for coref escalation. -- **D10 forward-validation harness** — Animal Farm benchmark, - NARS-tested epiphanies against future arc. -- **Named Entity pre-pass** (the biggest OSINT blocker — flagged in - `grammar-tiered-routing.md` §C8). - ---- - -## §8 Cross-References (agents: load these together) - -| Doc | Covers | -|---|---| -| `grammar-tiered-routing.md` | 5-criterion detector, failure decomposition, morphology coverage baseline (98% Finnish > 85% English), self-improving loop | -| `linguistic-epiphanies-2026-04-19.md` | E13–E27 cross-repo harvest: Chomsky isomorphism, Σ10 Rubicon tiers, method grammar, Markov living frame, resonanzsiebe, sigma_rosetta, 4D glyph coordinates, membrane | -| `cross-repo-harvest-2026-04-19.md` | H1–H14: Born rule, phase tag threshold, interference truth, Grammar Triangle ≡ ContextCrystal(w=1), NSM ≡ SPO axes, FP_WORDS=160, Mexican-hat, Int4State, Glyph5B, Crystal4K 41:1, teleport F=1, 144-verb, Three Mountains | -| `integration-plan-grammar-crystal-arigraph.md` | E1–E12 shipping plan for the contract grammar + crystal modules | -| `crystal-quantum-blueprints.md` | Crystal mode (bundled Markov SPO) vs Quantum mode (holographic residual) on the same 10K substrate | -| `endgame-holographic-agi.md` | 5-layer north-star stack, 12-step holographic memory loop, P0–P3 priorities | -| `fractal-codec-argmax-regime.md` | **Separate research thread** — MFDFA on Hadamard-rotated coefficients as a codec leaf. Not entangled with grammar work. | +## 9. The 144 Verb-Role Taxonomy +12 semantic families x 12 tense/aspect/mood variants. Each cell = TEKAMOLO slot prior. 5^5 = 3125 Structured5x5 cells > 144 x ~10 x ~10, so the index space fits. ---- +Families: BECOMES, CAUSES, SUPPORTS, CONTRADICTS, REFINES, GROUNDS, ABSTRACTS, ENABLES, PREVENTS, TRANSFORMS, MIRRORS, DISSOLVES. +Tense/aspect: present, past, future, perfect, continuous, pluperfect, future-perfect, habitual, potential, imperative, subjunctive, gerund. -## §9 How DeepNSM Must Change (Minimal-Diff Summary) +## 10. Out of Scope This PR +Path 2 (holographic residue), CausalityFlow extension (modal/local/instrument), FP_WORDS=160 migration, Crystal4K persistence (H10), Int4State upper-nibble (H8), Glyph5B wide-container (H9), NER pre-pass for proper nouns, Cockpit Cypher, chess vertical, 200-500 YAML templates per language. -1. Read `SentenceStructure` from the FSM (existing). -2. Call `GrammarTriangle::analyze(text)` in parallel (new via D3). -3. Merge: SPO triples + CausalityFlow slots + NSM field + qualia - signature → `SpoWithGrammar`. -4. Compute coverage from the 5-criterion detector. -5. If coverage < 0.9 → emit `FailureTicket` via `ticket_emit.rs` (D2). -6. Else → role-indexed VSA bundle into `Trajectory` (D5) using D6 - role keys. -7. `Trajectory.fingerprint IS SentenceCrystal.fingerprint`. No new - crystal type. Feeds AriGraph episodic + triplet graph as before. +## 11. Caveats -- Templates, Not Universals +NSM 65 primes (Wierzbicka): cited by cogsci, contested in mainstream linguistics; doesn't survive empirical testing on polysynthetic languages. +TEKAMOLO: German pedagogical mnemonic, NOT cross-linguistic universal. Arabic *hal*, Mandarin *ba* don't fit. +Chomskyan UG: Tomasello, Evans & Levinson argue empirical weakness. +144-verb taxonomy: numerologically chosen (12 x 12 from sigma_rosetta), not empirically derived. -Total DeepNSM change: ~600 LOC new + ~30 LOC edits. Every other -layer of the stack reuses shipped infrastructure. +These are useful templates for engineering a 90-99% local parser. They are NOT theoretical claims about human language universals. The architecture works because the templates are good enough for the bounded task; we make no claim beyond that. diff --git a/crates/deepnsm/Cargo.toml b/crates/deepnsm/Cargo.toml index 9723261c..ed5c842d 100644 --- a/crates/deepnsm/Cargo.toml +++ b/crates/deepnsm/Cargo.toml @@ -18,7 +18,14 @@ No GPU. No learned weights. Same decision boundaries as cosine. # simd_amx.rs based on hardware. This crate USES exported types like # F32x16 via their public Add/Sub/Mul impls and mul_add primitive; it # never touches backend optimization files. +[features] +default = [] +contract-ticket = ["dep:lance-graph-contract"] +grammar-triangle = ["dep:lance-graph-cognitive"] + [dependencies] ndarray = { path = "../../../ndarray", default-features = false, features = ["std"] } +lance-graph-contract = { path = "../lance-graph-contract", optional = true } +lance-graph-cognitive = { path = "../lance-graph-cognitive", optional = true } [dev-dependencies] diff --git a/crates/deepnsm/assets/grammar_styles/analytical.yaml b/crates/deepnsm/assets/grammar_styles/analytical.yaml new file mode 100644 index 00000000..86ebf96d --- /dev/null +++ b/crates/deepnsm/assets/grammar_styles/analytical.yaml @@ -0,0 +1,22 @@ +# (starter prior — tune empirically) +# Rule-clear, depth-first, English+Finnish morphology, low ambiguity tolerance. +style: analytical +nars: + primary: Deduction + fallback: Abduction +morphology: + tables: [english_svo, finnish_case_table] + agglutinative_mode: false +tekamolo: + priority: [temporal, lokal, kausal, modal] + require_fillable: true +markov: + radius: 5 + kernel: uniform + replay: forward +spo_causal: + pearl_mask: 0x01 + ambiguity_tolerance: 0.1 +coverage: + local_threshold: 0.90 + escalate_below: 0.85 diff --git a/crates/deepnsm/assets/grammar_styles/convergent.yaml b/crates/deepnsm/assets/grammar_styles/convergent.yaml new file mode 100644 index 00000000..44471404 --- /dev/null +++ b/crates/deepnsm/assets/grammar_styles/convergent.yaml @@ -0,0 +1,22 @@ +# (starter prior — tune empirically) +# Convergent: collapse alternatives quickly, deduce with high confidence. +style: convergent +nars: + primary: Deduction + fallback: Revision +morphology: + tables: [english_svo, finnish_case_table] + agglutinative_mode: false +tekamolo: + priority: [temporal, kausal, lokal, modal] + require_fillable: true +markov: + radius: 3 + kernel: uniform + replay: forward +spo_causal: + pearl_mask: 0x01 + ambiguity_tolerance: 0.05 +coverage: + local_threshold: 0.92 + escalate_below: 0.88 diff --git a/crates/deepnsm/assets/grammar_styles/creative.yaml b/crates/deepnsm/assets/grammar_styles/creative.yaml new file mode 100644 index 00000000..b07add6b --- /dev/null +++ b/crates/deepnsm/assets/grammar_styles/creative.yaml @@ -0,0 +1,22 @@ +# (starter prior — tune empirically) +# Creative: cross-domain synthesis, mexican-hat to suppress mid-distance noise. +style: creative +nars: + primary: Synthesis + fallback: CounterfactualSynthesis +morphology: + tables: [english_svo, finnish_case_table, japanese_particles] + agglutinative_mode: true +tekamolo: + priority: [modal, kausal, temporal, lokal] + require_fillable: false +markov: + radius: 5 + kernel: mexican_hat + replay: both_and_compare +spo_causal: + pearl_mask: 0x7F + ambiguity_tolerance: 0.35 +coverage: + local_threshold: 0.75 + escalate_below: 0.55 diff --git a/crates/deepnsm/assets/grammar_styles/deliberate.yaml b/crates/deepnsm/assets/grammar_styles/deliberate.yaml new file mode 100644 index 00000000..cc72623f --- /dev/null +++ b/crates/deepnsm/assets/grammar_styles/deliberate.yaml @@ -0,0 +1,22 @@ +# (starter prior — tune empirically) +# Deliberate: slow, methodical, full TEKAMOLO required, both-direction replay. +style: deliberate +nars: + primary: Deduction + fallback: Revision +morphology: + tables: [english_svo, finnish_case_table, german_case_table, russian_case_table] + agglutinative_mode: false +tekamolo: + priority: [temporal, kausal, modal, lokal] + require_fillable: true +markov: + radius: 5 + kernel: gaussian + replay: both_and_compare +spo_causal: + pearl_mask: 0x07 + ambiguity_tolerance: 0.10 +coverage: + local_threshold: 0.92 + escalate_below: 0.85 diff --git a/crates/deepnsm/assets/grammar_styles/diffuse.yaml b/crates/deepnsm/assets/grammar_styles/diffuse.yaml new file mode 100644 index 00000000..e4fd358c --- /dev/null +++ b/crates/deepnsm/assets/grammar_styles/diffuse.yaml @@ -0,0 +1,22 @@ +# (starter prior — tune empirically) +# Diffuse: wide context, induction-leaning, low fillable strictness. +style: diffuse +nars: + primary: Induction + fallback: Synthesis +morphology: + tables: [english_svo, finnish_case_table, russian_case_table, german_case_table] + agglutinative_mode: true +tekamolo: + priority: [modal, lokal, temporal, kausal] + require_fillable: false +markov: + radius: 5 + kernel: mexican_hat + replay: both_and_compare +spo_causal: + pearl_mask: 0x3F + ambiguity_tolerance: 0.40 +coverage: + local_threshold: 0.70 + escalate_below: 0.55 diff --git a/crates/deepnsm/assets/grammar_styles/divergent.yaml b/crates/deepnsm/assets/grammar_styles/divergent.yaml new file mode 100644 index 00000000..a9489ae1 --- /dev/null +++ b/crates/deepnsm/assets/grammar_styles/divergent.yaml @@ -0,0 +1,22 @@ +# (starter prior — tune empirically) +# Divergent: broad fan-out, counterfactuals welcome, full pearl mask. +style: divergent +nars: + primary: CounterfactualSynthesis + fallback: Synthesis +morphology: + tables: [english_svo, finnish_case_table, russian_case_table, turkish_aggl] + agglutinative_mode: true +tekamolo: + priority: [modal, kausal, lokal, temporal] + require_fillable: false +markov: + radius: 5 + kernel: mexican_hat + replay: both_and_compare +spo_causal: + pearl_mask: 0xFF + ambiguity_tolerance: 0.45 +coverage: + local_threshold: 0.65 + escalate_below: 0.45 diff --git a/crates/deepnsm/assets/grammar_styles/exploratory.yaml b/crates/deepnsm/assets/grammar_styles/exploratory.yaml new file mode 100644 index 00000000..4141b976 --- /dev/null +++ b/crates/deepnsm/assets/grammar_styles/exploratory.yaml @@ -0,0 +1,22 @@ +# (starter prior — tune empirically) +# Exploratory: counterfactual primary, agglutinative on, broad coverage tolerance. +style: exploratory +nars: + primary: CounterfactualSynthesis + fallback: Abduction +morphology: + tables: [english_svo, finnish_case_table, russian_case_table] + agglutinative_mode: true +tekamolo: + priority: [modal, kausal, lokal, temporal] + require_fillable: false +markov: + radius: 5 + kernel: mexican_hat + replay: both_and_compare +spo_causal: + pearl_mask: 0xFF + ambiguity_tolerance: 0.4 +coverage: + local_threshold: 0.70 + escalate_below: 0.50 diff --git a/crates/deepnsm/assets/grammar_styles/focused.yaml b/crates/deepnsm/assets/grammar_styles/focused.yaml new file mode 100644 index 00000000..f50dabbd --- /dev/null +++ b/crates/deepnsm/assets/grammar_styles/focused.yaml @@ -0,0 +1,22 @@ +# (starter prior — tune empirically) +# Focused: tight Markov radius, single morphology table, deduction-first. +style: focused +nars: + primary: Deduction + fallback: Revision +morphology: + tables: [english_svo] + agglutinative_mode: false +tekamolo: + priority: [temporal, lokal, kausal, modal] + require_fillable: true +markov: + radius: 2 + kernel: gaussian + replay: forward +spo_causal: + pearl_mask: 0x01 + ambiguity_tolerance: 0.05 +coverage: + local_threshold: 0.95 + escalate_below: 0.90 diff --git a/crates/deepnsm/assets/grammar_styles/intuitive.yaml b/crates/deepnsm/assets/grammar_styles/intuitive.yaml new file mode 100644 index 00000000..08f922d6 --- /dev/null +++ b/crates/deepnsm/assets/grammar_styles/intuitive.yaml @@ -0,0 +1,22 @@ +# (starter prior — tune empirically) +# Intuitive: leap to abduction, soft thresholds, agglutinative for surface tells. +style: intuitive +nars: + primary: Abduction + fallback: Synthesis +morphology: + tables: [english_svo, finnish_case_table, turkish_aggl] + agglutinative_mode: true +tekamolo: + priority: [modal, temporal, kausal, lokal] + require_fillable: false +markov: + radius: 4 + kernel: gaussian + replay: forward +spo_causal: + pearl_mask: 0x0F + ambiguity_tolerance: 0.30 +coverage: + local_threshold: 0.78 + escalate_below: 0.60 diff --git a/crates/deepnsm/assets/grammar_styles/metacognitive.yaml b/crates/deepnsm/assets/grammar_styles/metacognitive.yaml new file mode 100644 index 00000000..f1f2ade3 --- /dev/null +++ b/crates/deepnsm/assets/grammar_styles/metacognitive.yaml @@ -0,0 +1,22 @@ +# (starter prior — tune empirically) +# Metacognitive: revision-primary, both-direction replay, mid-tolerance for self-check. +style: metacognitive +nars: + primary: Revision + fallback: Abduction +morphology: + tables: [english_svo, finnish_case_table, russian_case_table] + agglutinative_mode: false +tekamolo: + priority: [kausal, modal, temporal, lokal] + require_fillable: false +markov: + radius: 5 + kernel: mexican_hat + replay: both_and_compare +spo_causal: + pearl_mask: 0x3F + ambiguity_tolerance: 0.25 +coverage: + local_threshold: 0.80 + escalate_below: 0.65 diff --git a/crates/deepnsm/assets/grammar_styles/peripheral.yaml b/crates/deepnsm/assets/grammar_styles/peripheral.yaml new file mode 100644 index 00000000..de413de1 --- /dev/null +++ b/crates/deepnsm/assets/grammar_styles/peripheral.yaml @@ -0,0 +1,22 @@ +# (starter prior — tune empirically) +# Peripheral: scan the edges, low-radius mexican-hat catches outliers. +style: peripheral +nars: + primary: Abduction + fallback: Extrapolation +morphology: + tables: [english_svo, finnish_case_table, japanese_particles] + agglutinative_mode: true +tekamolo: + priority: [lokal, temporal, modal, kausal] + require_fillable: false +markov: + radius: 4 + kernel: mexican_hat + replay: backward +spo_causal: + pearl_mask: 0x1F + ambiguity_tolerance: 0.35 +coverage: + local_threshold: 0.72 + escalate_below: 0.58 diff --git a/crates/deepnsm/assets/grammar_styles/systematic.yaml b/crates/deepnsm/assets/grammar_styles/systematic.yaml new file mode 100644 index 00000000..e7de802c --- /dev/null +++ b/crates/deepnsm/assets/grammar_styles/systematic.yaml @@ -0,0 +1,22 @@ +# (starter prior — tune empirically) +# Systematic: methodical traversal, full TEKAMOLO, gaussian-weighted Markov. +style: systematic +nars: + primary: Deduction + fallback: Induction +morphology: + tables: [english_svo, finnish_case_table, german_case_table] + agglutinative_mode: false +tekamolo: + priority: [temporal, kausal, modal, lokal] + require_fillable: true +markov: + radius: 5 + kernel: gaussian + replay: forward +spo_causal: + pearl_mask: 0x03 + ambiguity_tolerance: 0.15 +coverage: + local_threshold: 0.88 + escalate_below: 0.82 diff --git a/crates/deepnsm/src/lib.rs b/crates/deepnsm/src/lib.rs index f715843a..5a4eba34 100644 --- a/crates/deepnsm/src/lib.rs +++ b/crates/deepnsm/src/lib.rs @@ -61,6 +61,15 @@ pub mod similarity; pub mod spo; pub mod vocabulary; +pub mod trajectory; +pub mod markov_bundle; + +#[cfg(feature = "contract-ticket")] +pub mod ticket_emit; + +#[cfg(feature = "grammar-triangle")] +pub mod triangle_bridge; + // ─── Re-exports ────────────────────────────────────────────────────────────── pub use pipeline::DeepNsmEngine; diff --git a/crates/deepnsm/src/markov_bundle.rs b/crates/deepnsm/src/markov_bundle.rs new file mode 100644 index 00000000..262b914c --- /dev/null +++ b/crates/deepnsm/src/markov_bundle.rs @@ -0,0 +1,165 @@ +//! META-AGENT: add `pub mod markov_bundle;` to lib.rs. + +use crate::trajectory::Trajectory; + +#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)] +pub enum Kernel { + Uniform, + #[default] + MexicanHat, + Gaussian, +} + +impl Kernel { + pub fn weight(&self, delta: i32, radius: u32) -> f32 { + let d = delta.abs() as f32 / radius.max(1) as f32; + match self { + Self::Uniform => 1.0, + Self::MexicanHat => (1.0 - d * d) * (-(d * d) / 2.0).exp(), + Self::Gaussian => (-(d * d) / 2.0).exp(), + } + } +} + +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub enum GrammaticalRole { + Subject, + Predicate, + Object, + Modifier, + Context, + Temporal, + Kausal, + Modal, + Lokal, + Instrument, +} + +impl GrammaticalRole { + /// Slice of the 16384-dim VSA carrier that owns this role. + pub fn slice(&self) -> (usize, usize) { + match self { + Self::Subject => (0, 3277), + Self::Predicate => (3277, 6554), + Self::Object => (6554, 9830), + Self::Modifier => (9830, 13107), + Self::Context => (13107, 16384), + // TEKAMOLO sub-slices inside Context band. + Self::Temporal => (13107, 13762), + Self::Kausal => (13762, 14418), + Self::Modal => (14418, 15074), + Self::Lokal => (15074, 15729), + Self::Instrument => (15729, 16384), + } + } +} + +#[derive(Debug, Clone)] +pub struct TokenWithRole { + pub content_fp: Vec, + pub role: GrammaticalRole, +} + +#[derive(Debug, Clone)] +pub struct WindowedSentence { + pub tokens: Vec, +} + +pub struct MarkovBundler { + pub radius: u32, + pub kernel: Kernel, + pub dims: usize, + buffer: std::collections::VecDeque, +} + +impl MarkovBundler { + pub fn new(radius: u32, kernel: Kernel) -> Self { + Self { + radius, + kernel, + dims: 16_384, + buffer: std::collections::VecDeque::with_capacity((2 * radius + 1) as usize), + } + } + + pub fn push(&mut self, sentence: WindowedSentence) -> Option { + let cap = (2 * self.radius + 1) as usize; + if self.buffer.len() == cap { + self.buffer.pop_front(); + } + self.buffer.push_back(sentence); + if self.buffer.len() < cap { + return None; + } + Some(self.bundle_current()) + } + + fn bundle_current(&self) -> Trajectory { + let mut acc = vec![0.0f32; self.dims]; + let focal = self.radius as i32; + for (i, sent) in self.buffer.iter().enumerate() { + let delta = (i as i32) - focal; + let weight = self.kernel.weight(delta, self.radius); + for tok in &sent.tokens { + let (start, stop) = tok.role.slice(); + let len = (stop - start).min(tok.content_fp.len()); + for k in 0..len { + acc[start + k] += weight * tok.content_fp[k]; + } + } + } + // permute by position offset (rotate_right) + if !acc.is_empty() { + let k = (self.radius as usize) % acc.len(); + acc.rotate_right(k); + } + Trajectory { + fingerprint: acc, + radius: self.radius, + } + } +} + +#[cfg(test)] +mod tests { + use super::*; + fn tok(role: GrammaticalRole, len: usize) -> TokenWithRole { + TokenWithRole { + content_fp: vec![1.0; len], + role, + } + } + #[test] + fn first_pushes_return_none_until_window_full() { + let mut b = MarkovBundler::new(5, Kernel::MexicanHat); + for _ in 0..10 { + assert!(b + .push(WindowedSentence { + tokens: vec![tok(GrammaticalRole::Subject, 4)] + }) + .is_none()); + } + assert!(b + .push(WindowedSentence { + tokens: vec![tok(GrammaticalRole::Subject, 4)] + }) + .is_some()); + } + #[test] + fn kernel_uniform_constant() { + assert_eq!(Kernel::Uniform.weight(0, 5), 1.0); + assert_eq!(Kernel::Uniform.weight(3, 5), 1.0); + } + #[test] + fn kernel_mexican_symmetric() { + assert!( + (Kernel::MexicanHat.weight(-2, 5) - Kernel::MexicanHat.weight(2, 5)).abs() < 1e-6 + ); + } + #[test] + fn role_slices_disjoint() { + let s = GrammaticalRole::Subject.slice(); + let p = GrammaticalRole::Predicate.slice(); + assert_eq!(s.1, p.0); + } +} diff --git a/crates/deepnsm/src/parser.rs b/crates/deepnsm/src/parser.rs index 73dc70d0..a8a018ff 100644 --- a/crates/deepnsm/src/parser.rs +++ b/crates/deepnsm/src/parser.rs @@ -440,6 +440,167 @@ pub fn parse_with_secondary(tokens: &[Token]) -> SentenceStructure { result } +// ──────────────────────────────────────────────────────────────────────── +// Coverage-branch hook for D2 FailureTicket emission. +// +// Wraps the existing free-function parser in a thin newtype that owns the +// coverage threshold (default 0.85; configurable later from D7 +// `GrammarStyleConfig`). When a parse falls below threshold, the hook +// hands the partial off to `ticket_emit::emit_ticket` so the LLM-tail +// router can route the failure-mode itself as the inference signal. +// +// `Parser::parse` is preserved verbatim against the free `parse()` so no +// existing call sites break. +// ──────────────────────────────────────────────────────────────────────── + +/// Default coverage threshold below which a parse triggers a FailureTicket. +/// Mirrors `lance_graph_contract::grammar::LOCAL_COVERAGE_THRESHOLD` (0.9) +/// minus a small slack so DeepNSM's looser FSM gets a chance. +pub const DEFAULT_COVERAGE_THRESHOLD: f32 = 0.85; + +/// A parse outcome plus the metrics needed to decide whether it should +/// escalate to the LLM router. +#[derive(Clone, Debug)] +pub struct ParseResult { + /// Token-derived semantic structure. + pub structure: SentenceStructure, + /// Coverage ∈ [0, 1]: classified-tokens / total-tokens. + pub coverage: f32, + /// Tokens the FSM successfully classified (rank-encoded). + pub resolved_tokens: Vec, + /// Tokens the FSM could not place (rank-encoded; OOV / unknown PoS). + pub unresolved_tokens: Vec, + /// NSM-prime count found in the resolved set. Drives Abduction routing. + pub primes_found: u8, + /// Distance vs. the SPO's expected qualia footprint (0.0 = identical). + /// Filled by `triangle_bridge::compute_classification_distance` once + /// the Triangle is wired; stays 0.0 in the bare-DeepNSM path. + pub classification_distance: f32, +} + +/// Newtype around the FSM parser. Owns the coverage threshold so the +/// LLM-tail policy is colocated with the parse decision instead of +/// scattered across call sites. +#[derive(Clone, Debug)] +pub struct Parser { + coverage_threshold: f32, +} + +impl Default for Parser { + fn default() -> Self { + Self { + coverage_threshold: DEFAULT_COVERAGE_THRESHOLD, + } + } +} + +impl Parser { + /// Construct with the default 0.85 threshold. + pub fn new() -> Self { + Self::default() + } + + /// Override the coverage threshold (D7 `GrammarStyleConfig` will feed + /// this once style-aware routing lands). + pub fn with_threshold(mut self, threshold: f32) -> Self { + self.coverage_threshold = threshold.clamp(0.0, 1.0); + self + } + + /// Current coverage threshold ∈ [0, 1]. + pub fn coverage_threshold(&self) -> f32 { + self.coverage_threshold + } + + /// Run the FSM and return the structure unchanged (preserves the + /// existing public `parse()` shape for callers that don't need + /// coverage metrics). + pub fn parse(&self, tokens: &[crate::vocabulary::Token]) -> SentenceStructure { + parse(tokens) + } + + /// Coverage-aware parse: returns the structure plus the metrics + /// `maybe_emit_ticket` needs. + pub fn parse_with_coverage(&self, tokens: &[crate::vocabulary::Token]) -> ParseResult { + let structure = parse(tokens); + + let mut resolved = Vec::new(); + let mut unresolved = Vec::new(); + let mut primes = 0u8; + for t in tokens { + match t.rank { + Some(r) => { + resolved.push(r); + // NSM primes occupy fixed low ranks in the COCA + // vocabulary (62/63 of them per lib.rs header). + // Treat r < 64 as a primes-found heuristic. + if r < 64 { + primes = primes.saturating_add(1); + } + } + None => unresolved.push(0u16), + } + } + + let total = (resolved.len() + unresolved.len()) as f32; + let coverage = if total == 0.0 { + 0.0 + } else { + resolved.len() as f32 / total + }; + + ParseResult { + structure, + coverage, + resolved_tokens: resolved, + unresolved_tokens: unresolved, + primes_found: primes, + classification_distance: 0.0, + } + } + + /// Whether the result fell below the configured threshold. + pub fn coverage_failed(&self, parse_result: &ParseResult) -> bool { + parse_result.coverage < self.coverage_threshold + } + + /// D2 hook: if coverage falls below threshold, hand the partial off + /// to `ticket_emit::emit_ticket` and return the FailureTicket. Above + /// threshold returns `None` — the caller commits to AriGraph instead. + /// + /// Gated behind `contract-ticket` because the FailureTicket type + /// lives in `lance_graph_contract`. With the feature off, the hook + /// becomes a no-op `()` returner so the parser still compiles in + /// minimal builds. + #[cfg(feature = "contract-ticket")] + pub fn maybe_emit_ticket( + &self, + parse_result: &ParseResult, + ) -> Option { + if !self.coverage_failed(parse_result) { + return None; + } + use lance_graph_contract::grammar::{PartialParse, TekamoloSlots}; + let partial = PartialParse { + resolved_tokens: parse_result.resolved_tokens.clone(), + unresolved_tokens: parse_result.unresolved_tokens.clone(), + coverage: parse_result.coverage, + }; + // TekamoloSlots / Wechsel / CausalAmbiguity stay empty until D3 + // wires the Grammar Triangle; the ticket already routes correctly + // on `primes_found` + `classification_distance`. + Some(crate::ticket_emit::emit_ticket( + partial, + parse_result.coverage, + parse_result.classification_distance, + parse_result.primes_found, + TekamoloSlots::default(), + Vec::new(), + None, + )) + } +} + #[cfg(test)] mod tests { use super::*; diff --git a/crates/deepnsm/src/ticket_emit.rs b/crates/deepnsm/src/ticket_emit.rs new file mode 100644 index 00000000..9ebd64b9 --- /dev/null +++ b/crates/deepnsm/src/ticket_emit.rs @@ -0,0 +1,181 @@ +//! META-AGENT: add `pub mod ticket_emit;` to lib.rs. Gate behind feature +//! `contract-ticket`. Also requires adding to Cargo.toml: +//! lance-graph-contract = { path = "../lance-graph-contract", optional = true } +//! [features] contract-ticket = ["dep:lance-graph-contract"] +//! +//! Emit FailureTicket from a partial DeepNSM parse for the LLM-tail router. +//! +//! See plan §D2 + grammar-tiered-routing.md "Combined failure ticket". +//! +//! Adapted to the actual `lance_graph_contract::grammar` surface: +//! - `PartialParse { resolved_tokens, unresolved_tokens, coverage }` +//! - `FailureTicket { partial_parse, attempted_inference, recommended_next, +//! causal_ambiguity, tekamolo, wechsel, coverage, missing_required }` +//! - `TekamoloSlots { temporal, kausal, modal, lokal }` (no `has_unfillable`). +//! +//! `recommended_next` decision rules — the failure-mode IS the routing +//! signal: +//! +//! - `primes_found < 4` → `Abduction` (NSM-thin → LLM names primes) +//! - any TEKAMOLO slot unfillable → `CounterfactualSynthesis` (slot must be hypothesised) +//! - `classification_distance > 0.7` → `Extrapolation` (novel domain marker) +//! - else → `Revision` (default refinement) + +#![cfg(feature = "contract-ticket")] + +use lance_graph_contract::grammar::{ + CausalAmbiguity, FailureTicket, NarsInference, PartialParse, TekamoloSlots, + WechselAmbiguity, +}; + +/// Threshold above which `classification_distance` flags a novel domain. +pub const NOVEL_DOMAIN_THRESHOLD: f32 = 0.7; + +/// Minimum NSM primes required before a parse is considered semantically +/// thick enough to NOT need LLM abduction. +pub const PRIMES_NEEDED: u8 = 4; + +/// Decompose a parse coverage failure into the SPO × 2³ × TEKAMOLO × +/// Wechsel fields the LLM router needs. +/// +/// The caller checks `coverage_score >= LOCAL_COVERAGE_THRESHOLD` first — +/// if so, no ticket is needed. Once we are here, we already know the +/// parse failed coverage; we only need to choose the routing inference +/// and stash the partial fields. +pub fn emit_ticket( + partial: PartialParse, + coverage_score: f32, + classification_distance: f32, + primes_found: u8, + tekamolo: TekamoloSlots, + wechsel: Vec, + causal_ambiguity: Option, +) -> FailureTicket { + let recommended = if primes_found < PRIMES_NEEDED { + NarsInference::Abduction + } else if has_unfillable(&tekamolo) { + NarsInference::CounterfactualSynthesis + } else if classification_distance > NOVEL_DOMAIN_THRESHOLD { + NarsInference::Extrapolation + } else { + NarsInference::Revision + }; + + FailureTicket { + partial_parse: partial, + attempted_inference: NarsInference::Deduction, + recommended_next: recommended, + causal_ambiguity, + tekamolo, + wechsel, + coverage: coverage_score, + missing_required: Vec::new(), + } +} + +/// A TEKAMOLO slot is "unfillable" when the parser has none of the four +/// adverbials filled. Local copy of the rule until the contract surfaces +/// a more granular per-slot resolution flag. +fn has_unfillable(slots: &TekamoloSlots) -> bool { + slots.is_empty() +} + +#[cfg(test)] +mod tests { + use super::*; + use lance_graph_contract::grammar::wechsel::WechselRole; + + fn empty_partial() -> PartialParse { + PartialParse { + resolved_tokens: vec![1, 2], + unresolved_tokens: vec![3, 4], + coverage: 0.5, + } + } + + fn filled_tekamolo() -> TekamoloSlots { + TekamoloSlots { + temporal: Some((0, 1)), + kausal: Some((2, 3)), + modal: Some((4, 5)), + lokal: Some((6, 7)), + } + } + + #[test] + fn low_primes_routes_to_abduction() { + let t = emit_ticket( + empty_partial(), + 0.6, + 0.1, + 2, + filled_tekamolo(), + Vec::new(), + None, + ); + assert_eq!(t.recommended_next, NarsInference::Abduction); + assert_eq!(t.coverage, 0.6); + } + + #[test] + fn unfillable_tekamolo_routes_to_counterfactual_synthesis() { + let t = emit_ticket( + empty_partial(), + 0.6, + 0.1, + 5, + TekamoloSlots::default(), + Vec::new(), + None, + ); + assert_eq!(t.recommended_next, NarsInference::CounterfactualSynthesis); + } + + #[test] + fn high_classification_distance_routes_to_extrapolation() { + let t = emit_ticket( + empty_partial(), + 0.7, + 0.85, + 5, + filled_tekamolo(), + Vec::new(), + None, + ); + assert_eq!(t.recommended_next, NarsInference::Extrapolation); + } + + #[test] + fn default_path_is_revision() { + let t = emit_ticket( + empty_partial(), + 0.7, + 0.1, + 5, + filled_tekamolo(), + Vec::new(), + None, + ); + assert_eq!(t.recommended_next, NarsInference::Revision); + } + + #[test] + fn wechsel_payload_passes_through() { + let amb = WechselAmbiguity { + token_index: 3, + candidates: vec![WechselRole::PrepTemporal, WechselRole::PrepSpatial], + local_ambiguity: 0.85, + }; + let t = emit_ticket( + empty_partial(), + 0.6, + 0.1, + 2, + filled_tekamolo(), + vec![amb], + None, + ); + assert_eq!(t.wechsel.len(), 1); + assert_eq!(t.wechsel[0].token_index, 3); + } +} diff --git a/crates/deepnsm/src/trajectory.rs b/crates/deepnsm/src/trajectory.rs new file mode 100644 index 00000000..86047c0d --- /dev/null +++ b/crates/deepnsm/src/trajectory.rs @@ -0,0 +1,98 @@ +//! META-AGENT: add `pub mod trajectory;` to lib.rs. + +#[derive(Debug, Clone)] +pub struct Trajectory { + pub fingerprint: Vec, + pub radius: u32, +} + +impl Trajectory { + pub fn role_bundle(&self, start: usize, stop: usize) -> Vec { + let stop = stop.min(self.fingerprint.len()); + if start >= stop { + return Vec::new(); + } + self.fingerprint[start..stop].to_vec() + } + + pub fn role_candidates( + &self, + start: usize, + stop: usize, + codebook: &[Vec], + ) -> Vec { + let bundle = self.role_bundle(start, stop); + let mut scored: Vec = codebook + .iter() + .enumerate() + .map(|(i, entry)| { + let score = cosine(&bundle, entry); + Candidate { + codebook_index: i, + score, + } + }) + .filter(|c| c.score > 0.5) + .collect(); + scored.sort_by(|a, b| { + b.score + .partial_cmp(&a.score) + .unwrap_or(std::cmp::Ordering::Equal) + }); + scored.truncate(5); + scored + } +} + +fn cosine(a: &[f32], b: &[f32]) -> f32 { + let n = a.len().min(b.len()); + if n == 0 { + return 0.0; + } + let dot: f32 = a[..n].iter().zip(&b[..n]).map(|(x, y)| x * y).sum(); + let na: f32 = a[..n].iter().map(|x| x * x).sum::().sqrt(); + let nb: f32 = b[..n].iter().map(|x| x * x).sum::().sqrt(); + if na < 1e-9 || nb < 1e-9 { + 0.0 + } else { + dot / (na * nb) + } +} + +#[derive(Debug, Clone)] +pub struct Candidate { + pub codebook_index: usize, + pub score: f32, +} + +#[cfg(test)] +mod tests { + use super::*; + #[test] + fn role_bundle_returns_slice() { + let t = Trajectory { + fingerprint: vec![1.0; 100], + radius: 5, + }; + assert_eq!(t.role_bundle(10, 30).len(), 20); + } + #[test] + fn role_bundle_empty_when_inverted() { + let t = Trajectory { + fingerprint: vec![1.0; 100], + radius: 5, + }; + assert_eq!(t.role_bundle(50, 30).len(), 0); + } + #[test] + fn role_candidates_filters_by_threshold() { + let t = Trajectory { + fingerprint: vec![1.0; 100], + radius: 5, + }; + let codebook: Vec> = vec![vec![1.0; 100], vec![-1.0; 100]]; + let cands = t.role_candidates(0, 100, &codebook); + assert_eq!(cands.len(), 1); + assert_eq!(cands[0].codebook_index, 0); + } +} diff --git a/crates/deepnsm/src/triangle_bridge.rs b/crates/deepnsm/src/triangle_bridge.rs new file mode 100644 index 00000000..dc96f041 --- /dev/null +++ b/crates/deepnsm/src/triangle_bridge.rs @@ -0,0 +1,138 @@ +//! META-AGENT: add `pub mod triangle_bridge;` to lib.rs. Gate behind +//! feature `grammar-triangle`. Also requires adding to Cargo.toml: +//! lance-graph-cognitive = { path = "../lance-graph-cognitive", optional = true } +//! lance-graph-contract = { path = "../lance-graph-contract", optional = true } +//! [features] grammar-triangle = ["dep:lance-graph-cognitive", "dep:lance-graph-contract"] +//! +//! Grammar Triangle bridge: merge DeepNSM SPO output with the Triangle's +//! NSMField + CausalityFlow + QualiaField into a single SpoWithGrammar. +//! +//! The Triangle lives in `lance_graph_cognitive::grammar::GrammarTriangle` +//! and is already-shipped (`from_text` constructs it). The actual API has +//! `QualiaField` (18D phenomenal field) — not `Qualia18D`. + +use crate::parser::SentenceStructure; + +#[cfg(feature = "grammar-triangle")] +use lance_graph_cognitive::grammar::{ + CausalityFlow, GrammarTriangle, NSMField, QualiaField, +}; + +/// Merged output: DeepNSM SPO triples + Triangle's three lenses. +/// +/// Consumers downstream of DeepNSM read this struct to get both the +/// discrete SPO commit + the continuous semantic field. When the +/// `grammar-triangle` feature is off, the Triangle fields collapse to +/// nothing and the consumer only sees the SPO half. +#[derive(Clone, Debug)] +pub struct SpoWithGrammar { + /// DeepNSM-extracted SPO triples + modifiers + negations + temporals. + pub triples: SentenceStructure, + + /// Causality flow from the Triangle (agency, temporality, dependency). + #[cfg(feature = "grammar-triangle")] + pub causality: CausalityFlow, + + /// 65-prime NSM field activations. + #[cfg(feature = "grammar-triangle")] + pub nsm_field: NSMField, + + /// 18D qualia phenomenal coordinates. + #[cfg(feature = "grammar-triangle")] + pub qualia_signature: QualiaField, + + /// Distance between this parse and the SPO's expected qualia + /// footprint. Higher = more "novel domain" → routes to + /// `NarsInference::Extrapolation` in the ticket. + pub classification_distance: f32, +} + +/// Build the merged Triangle + SPO view. Default entry point. +/// +/// When the `grammar-triangle` feature is enabled, this calls +/// `GrammarTriangle::from_text(text)` and stamps the three lenses onto +/// the SPO output. When it is off, this is a thin wrapper that just +/// carries the SPO and a 0.0 classification distance. +#[cfg(feature = "grammar-triangle")] +pub fn analyze_with_triangle(text: &str, structure: SentenceStructure) -> SpoWithGrammar { + let triangle = GrammarTriangle::from_text(text); + let dist = compute_classification_distance(&structure, &triangle); + SpoWithGrammar { + triples: structure, + causality: triangle.causality, + nsm_field: triangle.nsm, + qualia_signature: triangle.qualia, + classification_distance: dist, + } +} + +/// Feature-off fallback: just carry the SPO with `classification_distance = 0`. +/// +/// Available regardless of feature so the parser always has something to +/// hand the LLM router. +pub fn analyze_without_triangle(structure: SentenceStructure) -> SpoWithGrammar { + SpoWithGrammar { + triples: structure, + #[cfg(feature = "grammar-triangle")] + causality: CausalityFlow::default(), + #[cfg(feature = "grammar-triangle")] + nsm_field: NSMField::default(), + #[cfg(feature = "grammar-triangle")] + qualia_signature: QualiaField::default(), + classification_distance: 0.0, + } +} + +/// Hamming-style distance between the SPO's expected qualia footprint +/// and the Triangle's actual qualia signature. +/// +/// **Stub**: returns 0.0 today. The expected footprint is currently a +/// fixed prior; once D7 GrammarStyleConfig surfaces per-style qualia +/// expectations, this lookup becomes "compare actual qualia to the +/// style-specific footprint and emit a normalized distance." +/// +/// FOLLOW-UP: tune against the Jirak-derived noise floor (see CLAUDE.md +/// §I-NOISE-FLOOR-JIRAK) — values that exceed the n^(-1/2) weak-dependence +/// bound are real signal, not register noise. +#[cfg(feature = "grammar-triangle")] +fn compute_classification_distance( + _structure: &SentenceStructure, + _triangle: &GrammarTriangle, +) -> f32 { + 0.0 +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::parser::SentenceStructure; + use crate::spo::SpoTriple; + + fn fixture_structure() -> SentenceStructure { + // Build a minimal SentenceStructure via the public parse() entry. + // Use parser::parse on an empty token slice to get a default + // structure shape, then push one synthetic triple in. + let empty: Vec = Vec::new(); + let mut s = crate::parser::parse(&empty); + s.triples.push(SpoTriple::new(1, 2, 3)); + s + } + + #[test] + fn analyze_without_triangle_yields_zero_distance() { + let s = fixture_structure(); + let out = analyze_without_triangle(s); + assert_eq!(out.classification_distance, 0.0); + assert_eq!(out.triples.triples.len(), 1); + } + + #[cfg(feature = "grammar-triangle")] + #[test] + fn analyze_with_triangle_stamps_lenses() { + let s = fixture_structure(); + let out = analyze_with_triangle("the dog runs", s); + // Stub returns 0.0 today — until D7 footprint lookup lands. + assert_eq!(out.classification_distance, 0.0); + assert_eq!(out.triples.triples.len(), 1); + } +} diff --git a/crates/lance-graph-contract/src/grammar/context_chain.rs b/crates/lance-graph-contract/src/grammar/context_chain.rs index a0b1cec9..aa131369 100644 --- a/crates/lance-graph-contract/src/grammar/context_chain.rs +++ b/crates/lance-graph-contract/src/grammar/context_chain.rs @@ -45,43 +45,97 @@ pub struct ContextChain { /// Result of a counterfactual disambiguation: the chosen candidate, its /// coherence, the margin to second place, the full ranked alternatives, /// and whether the caller should escalate to an LLM. +/// +/// D4 (2026-04 worker B2) extended this with `winner_index`, an alias +/// `winner`, `dispersion` across the top-3 candidates, and a +/// `candidate_count`. `chosen` and `winner` are equal by construction +/// (`winner` is the canonical D4 name; `chosen` is preserved for +/// existing consumers). +/// +/// Empty-candidates contract: returns a sentinel result with +/// `candidate_count = 0`, `winner_index = usize::MAX`, a zero +/// `Binary16K` placeholder fingerprint, and `escalate_to_llm = true`. +/// Callers should check `candidate_count == 0` (or `escalate_to_llm`) +/// before reading `winner` / `chosen`. #[derive(Debug, Clone)] pub struct DisambiguationResult { pub chosen: CrystalFingerprint, pub coherence: f32, - /// `chosen.coherence - second_place.coherence`. Zero if only one candidate. + /// `chosen.coherence - second_place.coherence`. Zero if only one + /// candidate. `> DISAMBIGUATION_MARGIN_THRESHOLD` (~0.1) means + /// the winner is confidently above the runner-up. pub margin: f32, /// All candidates with their scores, sorted descending by coherence. pub alternatives: Vec<(CrystalFingerprint, f32)>, /// True if `margin < DISAMBIGUATION_MARGIN_THRESHOLD` (ambiguous, escalate). pub escalate_to_llm: bool, + + // ── D4 reasoning-operator extensions ────────────────────────── + /// Index of the winner in the original candidate iterator (0-based). + /// `usize::MAX` if the candidate iterator was empty. + pub winner_index: usize, + /// Best candidate's fingerprint. Equal to `chosen`; provided under the + /// canonical D4 name for new callers. + pub winner: CrystalFingerprint, + /// Mean pairwise normalized Hamming distance across the top-3 + /// candidates' Binary16K fingerprints. High value (close to 0.5) + /// indicates the alternatives spread out — "no clear winner." + /// Zero if fewer than two top candidates carry comparable + /// `Binary16K` fingerprints. + pub dispersion: f32, + /// Total candidates evaluated (length of the input iterator). + pub candidate_count: usize, } /// Weighting kernel for temporal position in the Markov chain. /// Mexican-hat emphasizes focal, de-emphasizes distant positions. -#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)] +#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Default)] pub enum WeightingKernel { + /// All positions weighted equally. Uniform, + /// Mexican-hat (DoG) — focal positive, near-neighbors decay through + /// zero-crossing into a small negative tail. Captures + /// anticipation / surprise on context shift. Default kernel. + #[default] MexicanHat, + /// Standard Gaussian decay from focal. Gaussian, } impl WeightingKernel { - /// Weight for a position at distance `d` from focal (0 = focal, 5 = edge). - pub fn weight(&self, d: usize) -> f32 { + /// Weight at signed offset `delta` (signed) for window `radius`. + /// Returns f32 in roughly `[-1, 1]`. + /// + /// `delta = 0` is the focal position; `|delta| = radius` is the edge of + /// the window. The kernel is symmetric in `delta` for all variants + /// (uniform / mexican-hat / gaussian). + /// + /// Approximate ricker wavelet: `(1 - d²) · exp(-d²/2)` where + /// `d = |delta| / max(radius, 1)`. Gaussian uses `exp(-d²/2)`. + pub fn weight(&self, delta: i32, radius: u32) -> f32 { + let r = radius.max(1) as f32; match self { Self::Uniform => 1.0, Self::MexicanHat => { - // Peak at focal (d=0), smooth fall-off, slight negative at edge. - let x = d as f32 / (MARKOV_RADIUS as f32); - (1.0 - 2.0 * x * x) * (-x * x * 2.0).exp() + let d = delta.unsigned_abs() as f32 / r; + let dd = d * d; + (1.0 - dd) * (-dd / 2.0).exp() } Self::Gaussian => { - let x = d as f32 / (MARKOV_RADIUS as f32); - (-x * x * 2.0).exp() + // delta is signed but the kernel is symmetric; using the + // absolute value avoids relying on signed-cast semantics. + let d = delta.unsigned_abs() as f32 / r; + (-(d * d) / 2.0).exp() } } } + + /// Convenience: weight at unsigned distance `d` from focal under the + /// chain's default radius (`MARKOV_RADIUS`). Equivalent to + /// `self.weight(d as i32, MARKOV_RADIUS as u32)`. + pub fn weight_at_distance(&self, d: usize) -> f32 { + self.weight(d as i32, MARKOV_RADIUS as u32) + } } impl ContextChain { @@ -202,10 +256,20 @@ impl ContextChain { /// Counterfactual disambiguation: try each candidate at position `i`, /// return the one with highest coherence and the decision margin. /// + /// Each candidate is scored by the `total_coherence` of the chain + /// after replacing position `i` with that candidate. The result + /// also carries `winner_index` (position in the input iterator), + /// `dispersion` (mean pairwise Binary16K Hamming distance across + /// the top-3 candidates), and `candidate_count`. + /// /// Edge cases: - /// - Empty candidate list: returns a result with a placeholder zero - /// fingerprint and `escalate_to_llm = true`. - /// - Single candidate: `margin = 0.0`, `escalate_to_llm = true`. + /// - **Empty candidate iterator**: returns the documented sentinel + /// (`candidate_count = 0`, `winner_index = usize::MAX`, + /// placeholder `Binary16K` fingerprint, `escalate_to_llm = true`). + /// Does *not* panic — keeping the API total simplifies caller + /// code in the cypher bridge. + /// - **Single candidate**: `margin = 0.0`, `dispersion = 0.0`, + /// `escalate_to_llm = true`. pub fn disambiguate( &self, i: usize, @@ -214,46 +278,90 @@ impl ContextChain { where I: IntoIterator, { - let mut scored: Vec<(CrystalFingerprint, f32)> = candidates + // Score with original input index preserved so we can report + // `winner_index` in the iterator's order. + let mut scored: Vec<(usize, CrystalFingerprint, f32)> = candidates .into_iter() - .map(|cand| { + .enumerate() + .map(|(idx, cand)| { let (_chain, coh) = self.replay_with_alternative(i, cand.clone()); - (cand, coh) + (idx, cand, coh) }) .collect(); + let candidate_count = scored.len(); + if scored.is_empty() { - // Placeholder: caller should check `escalate_to_llm`. + // Documented sentinel — never panic; callers gate on + // `escalate_to_llm` or `candidate_count == 0`. + let placeholder = + CrystalFingerprint::Binary16K(Box::new([0u64; 256])); return DisambiguationResult { - chosen: CrystalFingerprint::Binary16K(Box::new([0u64; 256])), + chosen: placeholder.clone(), coherence: 0.0, margin: 0.0, alternatives: Vec::new(), escalate_to_llm: true, + winner_index: usize::MAX, + winner: placeholder, + dispersion: 0.0, + candidate_count: 0, }; } // Sort descending by coherence; ties resolved by insertion order // (stable sort + NaN-safe partial_cmp fallback to Equal). scored.sort_by(|a, b| { - b.1.partial_cmp(&a.1).unwrap_or(std::cmp::Ordering::Equal) + b.2.partial_cmp(&a.2).unwrap_or(std::cmp::Ordering::Equal) }); - let (chosen, coherence) = scored[0].clone(); + let winner_index = scored[0].0; + let chosen = scored[0].1.clone(); + let coherence = scored[0].2; let margin = if scored.len() >= 2 { - scored[0].1 - scored[1].1 + scored[0].2 - scored[1].2 } else { 0.0 }; let escalate_to_llm = scored.len() < 2 || margin < DISAMBIGUATION_MARGIN_THRESHOLD; + // Dispersion: mean pairwise normalized Hamming distance over the + // top-3 candidates. Only Binary16K pairs contribute; if fewer + // than two contribute, dispersion is 0.0 (cannot say). + let top_n = scored.len().min(3); + let mut pair_sum: f32 = 0.0; + let mut pair_count: u32 = 0; + for a_idx in 0..top_n { + for b_idx in (a_idx + 1)..top_n { + let a_bits = binary16k_bits(&scored[a_idx].1); + let b_bits = binary16k_bits(&scored[b_idx].1); + if let (Some(a), Some(b)) = (a_bits, b_bits) { + let d = hamming_256(a, b) as f32 / MAX_HAMMING_BITS as f32; + pair_sum += d; + pair_count += 1; + } + } + } + let dispersion = if pair_count == 0 { + 0.0 + } else { + pair_sum / pair_count as f32 + }; + + let alternatives: Vec<(CrystalFingerprint, f32)> = + scored.into_iter().map(|(_, fp, c)| (fp, c)).collect(); + DisambiguationResult { - chosen, + chosen: chosen.clone(), coherence, margin, - alternatives: scored, + alternatives, escalate_to_llm, + winner_index, + winner: chosen, + dispersion, + candidate_count, } } } @@ -453,23 +561,206 @@ mod tests { #[test] fn mexican_hat_weights_monotone() { // Mexican-hat: peak at d=0, monotone decrease through d=1..5. + // Test through the convenience helper for compactness; the + // primary API is `weight(delta: i32, radius: u32)`. let k = WeightingKernel::MexicanHat; - let w0 = k.weight(0); - let w1 = k.weight(1); - let w2 = k.weight(2); - let w3 = k.weight(3); - let w4 = k.weight(4); - let w5 = k.weight(5); + let w0 = k.weight_at_distance(0); + let w1 = k.weight_at_distance(1); + let w2 = k.weight_at_distance(2); + let w3 = k.weight_at_distance(3); + let w4 = k.weight_at_distance(4); + let w5 = k.weight_at_distance(5); assert!(w0 > w1, "w(0)={w0} should exceed w(1)={w1}"); assert!(w1 > w2, "w(1)={w1} should exceed w(2)={w2}"); assert!(w2 > w3, "w(2)={w2} should exceed w(3)={w3}"); assert!(w3 > w4, "w(3)={w3} should exceed w(4)={w4}"); assert!(w4 > w5, "w(4)={w4} should exceed w(5)={w5}"); // Uniform and Gaussian sanity checks. - assert_eq!(WeightingKernel::Uniform.weight(0), 1.0); - assert_eq!(WeightingKernel::Uniform.weight(5), 1.0); - let g0 = WeightingKernel::Gaussian.weight(0); - let g5 = WeightingKernel::Gaussian.weight(5); + assert_eq!(WeightingKernel::Uniform.weight_at_distance(0), 1.0); + assert_eq!(WeightingKernel::Uniform.weight_at_distance(5), 1.0); + let g0 = WeightingKernel::Gaussian.weight_at_distance(0); + let g5 = WeightingKernel::Gaussian.weight_at_distance(5); assert!(g0 > g5, "gaussian should also decay: g(0)={g0}, g(5)={g5}"); } + + // ── D4 reasoning-operator tests (worker B2, 2026-04) ──────────────── + + /// 1. `Uniform` returns 1.0 at every offset. + #[test] + fn d4_uniform_kernel_is_constant() { + let k = WeightingKernel::Uniform; + for delta in -10i32..=10 { + for radius in 1u32..=5 { + let w = k.weight(delta, radius); + assert!( + (w - 1.0).abs() < f32::EPSILON, + "Uniform({delta}, {radius}) = {w}, expected 1.0" + ); + } + } + } + + /// 2. `MexicanHat` is symmetric: w(-d, r) == w(+d, r). + #[test] + fn d4_mexican_hat_symmetric() { + let k = WeightingKernel::MexicanHat; + for radius in [1u32, 2, 3, 5, 8] { + for d in 1i32..=10 { + let w_pos = k.weight(d, radius); + let w_neg = k.weight(-d, radius); + assert!( + (w_pos - w_neg).abs() < 1e-6, + "MexicanHat asymmetric at d={d} r={radius}: \ + w(+d)={w_pos} w(-d)={w_neg}" + ); + } + } + // Gaussian also symmetric (same code path via |delta|). + let g = WeightingKernel::Gaussian; + for radius in [1u32, 5] { + for d in 1i32..=5 { + assert!( + (g.weight(d, radius) - g.weight(-d, radius)).abs() < 1e-6, + "Gaussian should also be symmetric" + ); + } + } + } + + /// 3. `MexicanHat` weight is monotone-decreasing in `|delta|` over the + /// radius. Crosses zero at `|delta| ≈ radius` (where d² = 1) and + /// stays in the negative tail past the radius — that is the + /// Mexican-hat shape. + #[test] + fn d4_mexican_hat_monotone_and_zero_crossing() { + let k = WeightingKernel::MexicanHat; + let radius: u32 = 5; + // Monotone decrease in [0, radius]. + let mut prev = k.weight(0, radius); + assert!(prev > 0.99, "focal weight should be ~1.0, got {prev}"); + for d in 1i32..=radius as i32 { + let cur = k.weight(d, radius); + assert!( + cur < prev, + "MexicanHat not monotone at d={d}: prev={prev} cur={cur}" + ); + prev = cur; + } + // At |delta| = radius, d² = 1 → (1 - 1) · exp(-0.5) = 0. + let edge = k.weight(radius as i32, radius); + assert!( + edge.abs() < 1e-6, + "MexicanHat zero-crossing should be at |delta|=radius, got {edge}" + ); + // Beyond the radius the kernel goes negative (the "hat brim"). + let beyond = k.weight((radius as i32) + 1, radius); + assert!( + beyond < 0.0, + "MexicanHat should be negative beyond radius, got {beyond}" + ); + } + + /// 4. `coherence_at` on a chain of identical fingerprints is ~1.0. + /// (Existing `coherence_high_for_self_chain` covers this; this + /// test re-verifies the D4 contract independently of the + /// pre-existing test.) + #[test] + fn d4_coherence_self_chain_is_one() { + let fp = mk_fp(0x0102_0304_0506_0708); + let chain = fill_chain_with(&fp); + for i in 0..CHAIN_LEN { + let c = chain.coherence_at(i); + assert!( + c > 0.99, + "self-chain coherence at {i} should be ~1.0, got {c}" + ); + } + let total = chain.total_coherence(); + assert!( + total > 0.99, + "self-chain total_coherence should be ~1.0, got {total}" + ); + } + + /// 5. `disambiguate` with two candidates where one matches the + /// surrounding chain → that one wins with non-zero margin and + /// `winner_index` points at it. + #[test] + fn d4_disambiguate_picks_matching_candidate() { + let base = mk_fp(0x9999_AAAA_BBBB_CCCC); + let mut chain = fill_chain_with(&base); + // Blank position 4 so we can replay alternatives in. + chain.fingerprints[4] = None; + + // Far-miss: fully inverted vs. base. + let far = match &base { + CrystalFingerprint::Binary16K(bits) => { + let mut inv = Box::new([0u64; 256]); + for (i, w) in bits.iter().enumerate() { + inv[i] = !w; + } + CrystalFingerprint::Binary16K(inv) + } + _ => unreachable!(), + }; + + // Order: [far, base] → if base wins, winner_index must be 1. + let res = chain.disambiguate(4, vec![far, base.clone()]); + assert_eq!(res.candidate_count, 2); + assert_eq!(res.winner_index, 1, "base was at iterator index 1"); + assert!( + res.margin > 0.0, + "matching candidate should have non-zero margin, got {}", + res.margin + ); + // `winner` and `chosen` agree by construction. + match (&res.winner, &res.chosen) { + (CrystalFingerprint::Binary16K(a), + CrystalFingerprint::Binary16K(b)) => { + assert_eq!(**a, **b, "winner and chosen must agree"); + } + _ => panic!("unexpected fingerprint variants"), + } + // Winner equals base. + match (&res.winner, &base) { + (CrystalFingerprint::Binary16K(a), + CrystalFingerprint::Binary16K(b)) => { + assert_eq!(**a, **b, "winner must be the matching base"); + } + _ => panic!("unexpected fingerprint variants"), + } + } + + /// 6. `disambiguate` with an empty candidate iterator returns the + /// documented sentinel result (no panic). `candidate_count = 0`, + /// `winner_index = usize::MAX`, `escalate_to_llm = true`. + #[test] + fn d4_disambiguate_empty_returns_sentinel() { + let chain = fill_chain_with(&mk_fp(0x1)); + let res: DisambiguationResult = + chain.disambiguate(0, Vec::::new()); + assert_eq!(res.candidate_count, 0); + assert_eq!(res.winner_index, usize::MAX); + assert!(res.escalate_to_llm, "empty must escalate"); + assert!(res.alternatives.is_empty()); + assert_eq!(res.coherence, 0.0); + assert_eq!(res.margin, 0.0); + assert_eq!(res.dispersion, 0.0); + // The placeholder fingerprint is a zeroed Binary16K. + match &res.winner { + CrystalFingerprint::Binary16K(bits) => { + assert!(bits.iter().all(|&w| w == 0), + "sentinel placeholder should be all-zero"); + } + _ => panic!("sentinel must be Binary16K placeholder"), + } + } + + /// `WeightingKernel::default()` is `MexicanHat` (D4 chose this as + /// the canonical kernel — focal-emphasizing with anticipation tail). + #[test] + fn d4_default_kernel_is_mexican_hat() { + let k: WeightingKernel = Default::default(); + assert_eq!(k, WeightingKernel::MexicanHat); + } } diff --git a/crates/lance-graph-contract/src/grammar/role_keys.rs b/crates/lance-graph-contract/src/grammar/role_keys.rs index ff93167c..3704752c 100644 --- a/crates/lance-graph-contract/src/grammar/role_keys.rs +++ b/crates/lance-graph-contract/src/grammar/role_keys.rs @@ -314,6 +314,154 @@ pub static BANK_KEY: LazyLock = LazyLock::new(|| RoleKey::generate pub static FIBU_KEY: LazyLock = LazyLock::new(|| RoleKey::generate("smb.fibu", 13_072, 13_584)); pub static STEUER_KEY: LazyLock = LazyLock::new(|| RoleKey::generate("smb.steuer", 13_584, 14_096)); +// --------------------------------------------------------------------------- +// D6 — RoleKeySlice catalogue (const-addressable [start:stop) slices + FNV-64 +// fingerprint over the role label). This layer is the **catalogue index** for +// the live `RoleKey` static instances above: same boundaries, no duplication +// of the bipolar payload — just `Copy`/`const`-friendly descriptors that can +// be embedded in tables, dispatch maps, or codecs without taking a LazyLock. +// +// `RoleKeySlice::fnv_seed` is the FNV-64 of the canonical label string and +// can be used as a stable per-role identifier (e.g. unbinding lookup, codec +// keying). All slices are sub-ranges of the existing 16,384-dim VSA space. +// --------------------------------------------------------------------------- + +/// A role key descriptor: a contiguous `[start:stop)` slice of the VSA space +/// plus a deterministic FNV-64 fingerprint over the role's canonical label +/// (used for unbinding / similarity / codec keying). +/// +/// This is the `Copy`/`const`-friendly companion to [`RoleKey`]; both share +/// the same slice boundaries by construction (see `role_key_slice_*` tests). +#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)] +pub struct RoleKeySlice { + pub start: usize, + pub stop: usize, + pub fnv_seed: u64, +} + +impl RoleKeySlice { + /// Construct a const slice. `start <= stop <= VSA_DIMS` is the caller's + /// invariant (debug-checked at first use, not in this `const fn` body). + pub const fn new(start: usize, stop: usize, fnv_seed: u64) -> Self { + Self { start, stop, fnv_seed } + } + pub const fn len(&self) -> usize { self.stop - self.start } + pub const fn is_empty(&self) -> bool { self.start == self.stop } + pub const fn range(&self) -> std::ops::Range { self.start..self.stop } +} + +/// Hand-rolled FNV-64a over raw bytes. `const fn` so role-key tables can +/// be evaluated at compile time. No new deps. +pub const fn fnv64_bytes(bytes: &[u8]) -> u64 { + let mut hash: u64 = 0xcbf29ce484222325; + let mut i = 0; + while i < bytes.len() { + hash ^= bytes[i] as u64; + hash = hash.wrapping_mul(0x100000001b3); + i += 1; + } + hash +} + +// --- SPO core role slices (mirror of SUBJECT_KEY..CONTEXT_KEY) -------------- + +pub const SUBJECT_SLICE: RoleKeySlice = RoleKeySlice::new(0, 2000, fnv64_bytes(b"SUBJECT")); +pub const PREDICATE_SLICE: RoleKeySlice = RoleKeySlice::new(2000, 4000, fnv64_bytes(b"PREDICATE")); +pub const OBJECT_SLICE: RoleKeySlice = RoleKeySlice::new(4000, 6000, fnv64_bytes(b"OBJECT")); +pub const MODIFIER_SLICE: RoleKeySlice = RoleKeySlice::new(6000, 7500, fnv64_bytes(b"MODIFIER")); +pub const CONTEXT_SLICE: RoleKeySlice = RoleKeySlice::new(7500, 9000, fnv64_bytes(b"CONTEXT")); + +// --- TEKAMOLO sub-slices (mirror of TEMPORAL_KEY..LOKAL_KEY + extras) ------ + +pub const TEMPORAL_SLICE: RoleKeySlice = RoleKeySlice::new(9000, 9200, fnv64_bytes(b"TEMPORAL")); +pub const KAUSAL_SLICE: RoleKeySlice = RoleKeySlice::new(9200, 9400, fnv64_bytes(b"KAUSAL")); +pub const MODAL_SLICE: RoleKeySlice = RoleKeySlice::new(9400, 9500, fnv64_bytes(b"MODAL")); +pub const LOKAL_SLICE: RoleKeySlice = RoleKeySlice::new(9500, 9650, fnv64_bytes(b"LOKAL")); +pub const INSTRUMENT_SLICE: RoleKeySlice = RoleKeySlice::new(9650, 9750, fnv64_bytes(b"INSTRUMENT")); +pub const BENEFICIARY_SLICE: RoleKeySlice = RoleKeySlice::new(9750, 9780, fnv64_bytes(b"BENEFICIARY")); +pub const GOAL_SLICE: RoleKeySlice = RoleKeySlice::new(9780, 9810, fnv64_bytes(b"GOAL")); +pub const SOURCE_SLICE: RoleKeySlice = RoleKeySlice::new(9810, 9840, fnv64_bytes(b"SOURCE")); + +// --- Finnish 15 cases (mirror FINNISH_SLICES, indexed by FinnishCase as u8) + +pub static FINNISH_CASE_SLICES: LazyLock<[(FinnishCase, RoleKeySlice); 15]> = LazyLock::new(|| { + [ + (FinnishCase::Nominative, RoleKeySlice::new(FINNISH_SLICES[0].0, FINNISH_SLICES[0].1, fnv64_bytes(b"FI_NOMINATIVE"))), + (FinnishCase::Genitive, RoleKeySlice::new(FINNISH_SLICES[1].0, FINNISH_SLICES[1].1, fnv64_bytes(b"FI_GENITIVE"))), + (FinnishCase::Accusative, RoleKeySlice::new(FINNISH_SLICES[2].0, FINNISH_SLICES[2].1, fnv64_bytes(b"FI_ACCUSATIVE"))), + (FinnishCase::Partitive, RoleKeySlice::new(FINNISH_SLICES[3].0, FINNISH_SLICES[3].1, fnv64_bytes(b"FI_PARTITIVE"))), + (FinnishCase::Inessive, RoleKeySlice::new(FINNISH_SLICES[4].0, FINNISH_SLICES[4].1, fnv64_bytes(b"FI_INESSIVE"))), + (FinnishCase::Elative, RoleKeySlice::new(FINNISH_SLICES[5].0, FINNISH_SLICES[5].1, fnv64_bytes(b"FI_ELATIVE"))), + (FinnishCase::Illative, RoleKeySlice::new(FINNISH_SLICES[6].0, FINNISH_SLICES[6].1, fnv64_bytes(b"FI_ILLATIVE"))), + (FinnishCase::Adessive, RoleKeySlice::new(FINNISH_SLICES[7].0, FINNISH_SLICES[7].1, fnv64_bytes(b"FI_ADESSIVE"))), + (FinnishCase::Ablative, RoleKeySlice::new(FINNISH_SLICES[8].0, FINNISH_SLICES[8].1, fnv64_bytes(b"FI_ABLATIVE"))), + (FinnishCase::Allative, RoleKeySlice::new(FINNISH_SLICES[9].0, FINNISH_SLICES[9].1, fnv64_bytes(b"FI_ALLATIVE"))), + (FinnishCase::Essive, RoleKeySlice::new(FINNISH_SLICES[10].0, FINNISH_SLICES[10].1, fnv64_bytes(b"FI_ESSIVE"))), + (FinnishCase::Translative, RoleKeySlice::new(FINNISH_SLICES[11].0, FINNISH_SLICES[11].1, fnv64_bytes(b"FI_TRANSLATIVE"))), + (FinnishCase::Instructive, RoleKeySlice::new(FINNISH_SLICES[12].0, FINNISH_SLICES[12].1, fnv64_bytes(b"FI_INSTRUCTIVE"))), + (FinnishCase::Abessive, RoleKeySlice::new(FINNISH_SLICES[13].0, FINNISH_SLICES[13].1, fnv64_bytes(b"FI_ABESSIVE"))), + (FinnishCase::Comitative, RoleKeySlice::new(FINNISH_SLICES[14].0, FINNISH_SLICES[14].1, fnv64_bytes(b"FI_COMITATIVE"))), + ] +}); + +/// Lookup the [`RoleKeySlice`] for a Finnish case (round-trip via the +/// `LazyLock` array — exactly one slice per variant by construction). +pub fn finnish_case_slice(case: FinnishCase) -> RoleKeySlice { + FINNISH_CASE_SLICES[case as usize].1 +} + +// --- 12 Tense slices (mirror TENSE_KEYS) ----------------------------------- + +pub static TENSE_SLICES: LazyLock<[(Tense, RoleKeySlice); 12]> = LazyLock::new(|| { + let s = |i: usize| TENSE_START + i * TENSE_WIDTH; + let e = |i: usize| TENSE_START + (i + 1) * TENSE_WIDTH; + [ + (Tense::Present, RoleKeySlice::new(s(0), e(0), fnv64_bytes(b"T_PRESENT"))), + (Tense::Past, RoleKeySlice::new(s(1), e(1), fnv64_bytes(b"T_PAST"))), + (Tense::Future, RoleKeySlice::new(s(2), e(2), fnv64_bytes(b"T_FUTURE"))), + (Tense::PresentContinuous, RoleKeySlice::new(s(3), e(3), fnv64_bytes(b"T_PRESENT_CONTINUOUS"))), + (Tense::PastContinuous, RoleKeySlice::new(s(4), e(4), fnv64_bytes(b"T_PAST_CONTINUOUS"))), + (Tense::FutureContinuous, RoleKeySlice::new(s(5), e(5), fnv64_bytes(b"T_FUTURE_CONTINUOUS"))), + (Tense::Perfect, RoleKeySlice::new(s(6), e(6), fnv64_bytes(b"T_PERFECT"))), + (Tense::Pluperfect, RoleKeySlice::new(s(7), e(7), fnv64_bytes(b"T_PLUPERFECT"))), + (Tense::FuturePerfect, RoleKeySlice::new(s(8), e(8), fnv64_bytes(b"T_FUTURE_PERFECT"))), + (Tense::Habitual, RoleKeySlice::new(s(9), e(9), fnv64_bytes(b"T_HABITUAL"))), + (Tense::Potential, RoleKeySlice::new(s(10), e(10), fnv64_bytes(b"T_POTENTIAL"))), + (Tense::Imperative, RoleKeySlice::new(s(11), e(11), fnv64_bytes(b"T_IMPERATIVE"))), + ] +}); + +pub fn tense_slice(tense: Tense) -> RoleKeySlice { + TENSE_SLICES[tense as usize].1 +} + +// --- 7 NARS-inference slices (mirror NARS_SLICES) -------------------------- + +pub static NARS_INFERENCE_SLICES: LazyLock<[(NarsInference, RoleKeySlice); 7]> = LazyLock::new(|| { + [ + (NarsInference::Deduction, RoleKeySlice::new(NARS_SLICES[0].0, NARS_SLICES[0].1, fnv64_bytes(b"N_DEDUCTION"))), + (NarsInference::Induction, RoleKeySlice::new(NARS_SLICES[1].0, NARS_SLICES[1].1, fnv64_bytes(b"N_INDUCTION"))), + (NarsInference::Abduction, RoleKeySlice::new(NARS_SLICES[2].0, NARS_SLICES[2].1, fnv64_bytes(b"N_ABDUCTION"))), + (NarsInference::Revision, RoleKeySlice::new(NARS_SLICES[3].0, NARS_SLICES[3].1, fnv64_bytes(b"N_REVISION"))), + (NarsInference::Synthesis, RoleKeySlice::new(NARS_SLICES[4].0, NARS_SLICES[4].1, fnv64_bytes(b"N_SYNTHESIS"))), + (NarsInference::Extrapolation, RoleKeySlice::new(NARS_SLICES[5].0, NARS_SLICES[5].1, fnv64_bytes(b"N_EXTRAPOLATION"))), + (NarsInference::CounterfactualSynthesis, RoleKeySlice::new(NARS_SLICES[6].0, NARS_SLICES[6].1, fnv64_bytes(b"N_COUNTERFACTUAL"))), + ] +}); + +pub fn nars_inference_slice(inf: NarsInference) -> RoleKeySlice { + let idx = match inf { + NarsInference::Deduction => 0, + NarsInference::Induction => 1, + NarsInference::Abduction => 2, + NarsInference::Revision => 3, + NarsInference::Synthesis => 4, + NarsInference::Extrapolation => 5, + NarsInference::CounterfactualSynthesis => 6, + }; + NARS_INFERENCE_SLICES[idx].1 +} + // --------------------------------------------------------------------------- // Tests // --------------------------------------------------------------------------- @@ -461,4 +609,169 @@ mod tests { assert!(k.slice_end <= TENSE_END); } } + + // ----------------------------------------------------------------------- + // D6 — RoleKeySlice catalogue tests + // ----------------------------------------------------------------------- + + /// All five SPO core slices are non-overlapping and union to [0, 9000) + /// (the "SPO-spine" prefix of the 16,384-dim VSA carrier). + #[test] + fn spo_slices_disjoint_and_contiguous() { + let spo = [ + SUBJECT_SLICE, PREDICATE_SLICE, OBJECT_SLICE, MODIFIER_SLICE, CONTEXT_SLICE, + ]; + // Contiguous: each slice starts where the previous ended. + for pair in spo.windows(2) { + assert_eq!( + pair[0].stop, pair[1].start, + "SPO slices not contiguous: {:?} vs {:?}", pair[0], pair[1] + ); + } + // Union covers [0, 9000) — the SPO+TEKAMOLO-prefix region. (CONTEXT + // ends at 9000; TEKAMOLO sub-slices begin there.) + assert_eq!(spo[0].start, 0); + assert_eq!(spo[spo.len() - 1].stop, 9000); + } + + /// TEKAMOLO sub-slices fit within [9000, 9840) — the slice region beyond + /// CONTEXT_KEY where the original prompt placed them. (CONTEXT_KEY itself + /// owns [7500, 9000) and TEKAMOLO sits AFTER it in the LF-2 layout.) + #[test] + fn tekamolo_sub_slices_in_post_context_band() { + let teka = [ + TEMPORAL_SLICE, KAUSAL_SLICE, MODAL_SLICE, LOKAL_SLICE, + INSTRUMENT_SLICE, BENEFICIARY_SLICE, GOAL_SLICE, SOURCE_SLICE, + ]; + for s in teka { + assert!(s.start >= 9000, "TEKAMOLO slice starts before 9000: {s:?}"); + assert!(s.stop <= 9840, "TEKAMOLO slice ends after 9840: {s:?}"); + assert!(s.len() > 0, "empty TEKAMOLO slice: {s:?}"); + } + } + + /// Finnish case slices are non-overlapping AND fall inside the existing + /// `FINNISH_START..FINNISH_END` band. + #[test] + fn finnish_case_slices_disjoint_in_band() { + let arr = &*FINNISH_CASE_SLICES; + let mut by_start: Vec = arr.iter().map(|(_, s)| *s).collect(); + by_start.sort_by_key(|s| s.start); + for pair in by_start.windows(2) { + assert!( + pair[0].stop <= pair[1].start, + "Finnish slice overlap: {:?} vs {:?}", pair[0], pair[1] + ); + } + for (_, s) in arr.iter() { + assert!(s.start >= FINNISH_START); + assert!(s.stop <= FINNISH_END); + } + } + + /// FNV-64 of distinct labels does not collide on the canonical role names. + #[test] + fn fnv64_no_collisions_on_role_labels() { + let labels: &[&[u8]] = &[ + b"SUBJECT", b"PREDICATE", b"OBJECT", b"MODIFIER", b"CONTEXT", + b"TEMPORAL", b"KAUSAL", b"MODAL", b"LOKAL", + b"INSTRUMENT", b"BENEFICIARY", b"GOAL", b"SOURCE", + b"FI_NOMINATIVE", b"FI_GENITIVE", b"FI_ACCUSATIVE", b"FI_PARTITIVE", + b"FI_INESSIVE", b"FI_ELATIVE", b"FI_ILLATIVE", + b"FI_ADESSIVE", b"FI_ABLATIVE", b"FI_ALLATIVE", + b"FI_ESSIVE", b"FI_TRANSLATIVE", b"FI_INSTRUCTIVE", + b"FI_ABESSIVE", b"FI_COMITATIVE", + b"T_PRESENT", b"T_PAST", b"T_FUTURE", + b"T_PRESENT_CONTINUOUS", b"T_PAST_CONTINUOUS", b"T_FUTURE_CONTINUOUS", + b"T_PERFECT", b"T_PLUPERFECT", b"T_FUTURE_PERFECT", + b"T_HABITUAL", b"T_POTENTIAL", b"T_IMPERATIVE", + b"N_DEDUCTION", b"N_INDUCTION", b"N_ABDUCTION", b"N_REVISION", + b"N_SYNTHESIS", b"N_EXTRAPOLATION", b"N_COUNTERFACTUAL", + ]; + let mut seen = std::collections::HashSet::new(); + for l in labels { + let h = fnv64_bytes(l); + assert!(seen.insert(h), "FNV-64 collision on label {:?}", std::str::from_utf8(l).unwrap()); + } + // Spot-check the prompt's pinned non-collision. + assert_ne!(fnv64_bytes(b"SUBJECT"), fnv64_bytes(b"OBJECT")); + } + + /// Round-trip: each FinnishCase variant maps to exactly one + /// `RoleKeySlice` via the LazyLock array, and the array is keyed by + /// `FinnishCase as u8` (i.e. `arr[c as usize].0 == c`). + #[test] + fn finnish_case_round_trip() { + let all = [ + FinnishCase::Nominative, FinnishCase::Genitive, FinnishCase::Accusative, + FinnishCase::Partitive, FinnishCase::Inessive, FinnishCase::Elative, + FinnishCase::Illative, FinnishCase::Adessive, FinnishCase::Ablative, + FinnishCase::Allative, FinnishCase::Essive, FinnishCase::Translative, + FinnishCase::Instructive, FinnishCase::Abessive, FinnishCase::Comitative, + ]; + for case in all { + let (stored_case, slice) = FINNISH_CASE_SLICES[case as usize]; + assert_eq!(stored_case, case, "FINNISH_CASE_SLICES not indexed by `as u8`"); + // The free-function lookup must agree with the array entry. + assert_eq!(finnish_case_slice(case), slice); + // Slice mirrors the live RoleKey boundaries. + let live = finnish_case_key(case); + assert_eq!(slice.start, live.slice_start); + assert_eq!(slice.stop, live.slice_end); + // And the FNV-64 fingerprint is non-zero (every label hashes to + // something distinct from the empty string's seed). + assert_ne!(slice.fnv_seed, 0xcbf29ce484222325); + } + } + + /// The slice catalogue mirrors the live `RoleKey` boundaries for SPO/ + /// TEKAMOLO so consumers can swap the two without re-deriving widths. + #[test] + fn role_key_slice_mirrors_live_role_key_boundaries() { + let pairs: &[(RoleKeySlice, &RoleKey)] = &[ + (SUBJECT_SLICE, &SUBJECT_KEY), + (PREDICATE_SLICE, &PREDICATE_KEY), + (OBJECT_SLICE, &OBJECT_KEY), + (MODIFIER_SLICE, &MODIFIER_KEY), + (CONTEXT_SLICE, &CONTEXT_KEY), + (TEMPORAL_SLICE, &TEMPORAL_KEY), + (KAUSAL_SLICE, &KAUSAL_KEY), + (MODAL_SLICE, &MODAL_KEY), + (LOKAL_SLICE, &LOKAL_KEY), + (INSTRUMENT_SLICE, &INSTRUMENT_KEY), + (BENEFICIARY_SLICE,&BENEFICIARY_KEY), + (GOAL_SLICE, &GOAL_KEY), + (SOURCE_SLICE, &SOURCE_KEY), + ]; + for (slice, live) in pairs { + assert_eq!(slice.start, live.slice_start, "slice/live start mismatch for {}", live.label); + assert_eq!(slice.stop, live.slice_end, "slice/live stop mismatch for {}", live.label); + assert!(slice.stop <= VSA_DIMS); + } + } + + #[test] + fn role_key_slice_const_helpers() { + assert_eq!(SUBJECT_SLICE.len(), 2000); + assert!(!SUBJECT_SLICE.is_empty()); + let r = SUBJECT_SLICE.range(); + assert_eq!(r.start, 0); + assert_eq!(r.end, 2000); + } + + #[test] + fn nars_inference_slice_round_trip() { + let all = [ + NarsInference::Deduction, NarsInference::Induction, + NarsInference::Abduction, NarsInference::Revision, + NarsInference::Synthesis, NarsInference::Extrapolation, + NarsInference::CounterfactualSynthesis, + ]; + for inf in all { + let s = nars_inference_slice(inf); + assert!(s.start >= NARS_START); + assert!(s.stop <= NARS_END); + assert!(s.len() > 0); + } + } } diff --git a/crates/lance-graph-contract/src/grammar/thinking_styles.rs b/crates/lance-graph-contract/src/grammar/thinking_styles.rs index e655eab2..13cc34e9 100644 --- a/crates/lance-graph-contract/src/grammar/thinking_styles.rs +++ b/crates/lance-graph-contract/src/grammar/thinking_styles.rs @@ -302,6 +302,327 @@ pub fn revise_truth(current: TruthValue, f_obs: f32, c_obs: f32) -> TruthValue { TruthValue::new(f_new.clamp(0.0, 1.0), c_new.clamp(0.0, 1.0)) } +// --------------------------------------------------------------------------- +// YAML loader (zero-dep, line-based) +// --------------------------------------------------------------------------- +// +// Supports the strict subset our `grammar_styles/