Phase E: Cognitive Architecture (#67)

DeveshParagiri · web-flow · commit a842dea48028 · 2026-02-15T22:30:37.000-05:00
* feat: Phase E cognitive architecture - conviction trajectory, repetition detection, THINK/SAY

* docs: mark episodic/semantic memory as omitted
diff --git a/docs/simulation-v2-architecture.md b/docs/simulation-v2-architecture.md
@@ -813,9 +813,9 @@ Split into independent subsystems. Each assessed for actual value vs implementat
 
 - **3.10d: Repetition detection + forced deepening.** If Jaccard similarity on word-level trigrams between consecutive reasonings > 70%, inject a prompt nudge: "You've been thinking the same thing for several days. Has anything actually changed? Are you starting to doubt your plan? Have you actually done anything about it?" Simple trigram comparison, no embeddings needed. Prevents the stale convergence we saw in the ASI run ("No change — save, learn AI, backup income" × 5 timesteps). Without this, agents converge to identical outputs and the sim produces meaningless duplicate reasoning.
 
-**Tier 3: Build at high fidelity (medium effort, marginal value over full traces)**
+**OMITTED: Marginal value**
 
-- **3.10e: Episodic vs semantic memory.** After N timesteps of consistent reasoning on a theme, engine extracts a belief statement and adds to persistent "beliefs" field. Shown as "Things I've come to believe:" separate from "What I thought recently." Extraction is rule-based: if the same keywords appear in 3+ consecutive reasonings, consolidate into a belief. Marginal value because the LLM already consolidates beliefs implicitly when reading its own full history — making it explicit is a nice-to-have.
+- **~~3.10e: Episodic vs semantic memory.~~** ~~After N timesteps of consistent reasoning on a theme, engine extracts a belief statement and adds to persistent "beliefs" field.~~ **Omitted.** The LLM already consolidates beliefs implicitly when reading its own full history. Making it explicit adds complexity for marginal gain. The full memory trace (uncapped, timestamped) provides sufficient context.
 
 **CUT: Not building**
 
@@ -832,7 +832,7 @@ Split into independent subsystems. Each assessed for actual value vs implementat
 | 3.10b conviction self-awareness | Yes | Yes | Yes |
 | 3.10c THINK vs SAY | No | No | Yes |
 | 3.10d repetition detection | No | No | Yes |
-| 3.10e episodic/semantic memory | No | Yes | Yes |
+| ~~3.10e episodic/semantic memory~~ | — | — | **Omitted** |
 | ~~3.10f attention weighting~~ | — | — | **Cut** |
 | 3.10g spontaneous recall | No | No | **Deferred** |
 
@@ -1175,7 +1175,7 @@ Ship this alone. Every simulation immediately feels more human, and the accounta
 - Conviction self-awareness (all tiers — deterministic)
 - THINK vs SAY separation (high fidelity — schema change)
 - Repetition detection + deepening nudge (high fidelity — string overlap check)
-- Episodic/semantic memory consolidation (high fidelity — rule-based belief extraction)
+- ~~Episodic/semantic memory consolidation~~ — **Omitted**
 
 ### Phase F: Fidelity Tiers + Results (~1.5 weeks)
 
diff --git a/extropy/simulation/reasoning.py b/extropy/simulation/reasoning.py
@@ -267,6 +267,55 @@ def build_pass1_prompt(
                 trajectory = "fairly steady"
             prompt_parts.append(f"\nI've been feeling {trajectory} since this started.")
 
+    # --- Conviction self-awareness (Phase E) ---
+    if is_re_reasoning and len(context.memory_trace) >= 2:
+        convictions = [
+            m.conviction for m in context.memory_trace if m.conviction is not None
+        ]
+        if len(convictions) >= 2:
+            latest = convictions[-1]
+            trend = latest - convictions[0]
+
+            # Firmness label based on latest conviction
+            if latest >= 0.7:
+                firmness = "firm about this"
+            elif latest >= 0.5:
+                firmness = "moderately certain"
+            elif latest >= 0.3:
+                firmness = "leaning but uncertain"
+            else:
+                firmness = "quite uncertain"
+
+            # Trend suffix
+            if trend > 0.2:
+                trend_text = " and getting more certain"
+            elif trend < -0.2:
+                trend_text = " but my certainty has been slipping"
+            else:
+                trend_text = ""
+
+            prompt_parts.append(f"I've been {firmness}{trend_text} since this started.")
+
+    # --- Repetition detection (Phase E) ---
+    if is_re_reasoning and len(context.memory_trace) >= 2:
+        last_two = context.memory_trace[-2:]
+        prev_reasoning = last_two[0].raw_reasoning or ""
+        curr_reasoning = last_two[1].raw_reasoning or ""
+
+        if prev_reasoning and curr_reasoning:
+            from .text_utils import compute_trigram_jaccard
+
+            similarity = compute_trigram_jaccard(prev_reasoning, curr_reasoning)
+            if similarity > 0.7:
+                prompt_parts.extend(
+                    [
+                        "",
+                        "*You've been thinking the same things for a while now. "
+                        "Has anything actually changed? Are you starting to doubt yourself? "
+                        "Have you done anything about it, or just thought about it?*",
+                    ]
+                )
+
     # --- Intent accountability ---
     if is_re_reasoning and context.prior_action_intent:
         prompt_parts.extend(
@@ -306,8 +355,29 @@ def build_pass1_prompt(
             ]
         )
 
-    # --- Instructions ---
-    if is_re_reasoning:
+    # --- Instructions (with THINK vs SAY at high fidelity) ---
+    if fidelity == "high":
+        # Explicit THINK vs SAY separation (Phase E)
+        prompt_parts.extend(
+            [
+                "",
+                "## Your Honest Reaction",
+                "",
+                "There's often a gap between what you THINK and what you SAY.",
+                "",
+                "**Your internal monologue** — what you're actually thinking:",
+                "- Be raw and honest. Fears, doubts, contradictions, anger — whatever is true.",
+                "- This is just you, thinking to yourself.",
+                "",
+                "**Your public statement** — what you'd tell people:",
+                "- This might differ from your private thoughts.",
+                "- Consider who you're talking to and what image you want to project.",
+                "",
+                "Commit to both. Your reasoning should reflect the internal truth.",
+                "Your public_statement should reflect what you'd actually say out loud.",
+            ]
+        )
+    elif is_re_reasoning:
         prompt_parts.extend(
             [
                 "",
diff --git a/extropy/simulation/text_utils.py b/extropy/simulation/text_utils.py
@@ -0,0 +1,33 @@
+"""Text utilities for cognitive architecture features."""
+
+
+def compute_trigram_jaccard(text1: str, text2: str) -> float:
+    """Compute Jaccard similarity of word-level trigrams.
+
+    Word trigrams (3-word sequences) are more semantically meaningful
+    than character trigrams for detecting paraphrased repetition.
+
+    Args:
+        text1: First text
+        text2: Second text
+
+    Returns:
+        Jaccard similarity in [0, 1]. >0.7 indicates repetitive content.
+    """
+
+    def get_word_trigrams(text: str) -> set[tuple[str, ...]]:
+        words = text.lower().split()
+        if len(words) < 3:
+            return set()
+        return {tuple(words[i : i + 3]) for i in range(len(words) - 2)}
+
+    t1 = get_word_trigrams(text1)
+    t2 = get_word_trigrams(text2)
+
+    if not t1 or not t2:
+        return 0.0
+
+    intersection = len(t1 & t2)
+    union = len(t1 | t2)
+
+    return intersection / union if union > 0 else 0.0
diff --git a/tests/test_text_utils.py b/tests/test_text_utils.py
@@ -0,0 +1,81 @@
+"""Unit tests for text utilities."""
+
+from extropy.simulation.text_utils import compute_trigram_jaccard
+
+
+class TestComputeTrigramJaccard:
+    """Tests for trigram Jaccard similarity."""
+
+    def test_identical_texts_returns_1(self):
+        """Identical texts should have similarity of 1.0."""
+        text = (
+            "I am very worried about my job security and what this means for my family"
+        )
+        assert compute_trigram_jaccard(text, text) == 1.0
+
+    def test_completely_different_texts_returns_0(self):
+        """Completely different texts should have similarity near 0."""
+        text1 = "The quick brown fox jumps over the lazy dog"
+        text2 = "A completely unrelated sentence with no overlap whatsoever here"
+        similarity = compute_trigram_jaccard(text1, text2)
+        assert similarity < 0.1
+
+    def test_similar_texts_high_similarity(self):
+        """Similar/paraphrased texts should have high similarity."""
+        text1 = "I am worried about my job and what this means for my family"
+        text2 = "I am worried about my job and what this means for our family"
+        similarity = compute_trigram_jaccard(text1, text2)
+        # One word change still yields ~69% similarity
+        assert similarity > 0.6
+
+    def test_short_text_returns_0(self):
+        """Texts with fewer than 3 words should return 0."""
+        assert compute_trigram_jaccard("hello world", "hello world") == 0.0
+        assert compute_trigram_jaccard("one", "two") == 0.0
+
+    def test_empty_text_returns_0(self):
+        """Empty texts should return 0."""
+        assert compute_trigram_jaccard("", "") == 0.0
+        assert compute_trigram_jaccard("hello there friend", "") == 0.0
+
+    def test_case_insensitive(self):
+        """Similarity should be case-insensitive."""
+        text1 = "I Am Worried About My Job"
+        text2 = "i am worried about my job"
+        assert compute_trigram_jaccard(text1, text2) == 1.0
+
+    def test_partial_overlap(self):
+        """Texts with partial overlap should have intermediate similarity."""
+        text1 = "I need to save money and cut expenses immediately"
+        text2 = "I need to save money but also invest for the future"
+        similarity = compute_trigram_jaccard(text1, text2)
+        # Some overlap but not complete
+        assert 0.2 < similarity < 0.8
+
+    def test_repetitive_reasoning_detection(self):
+        """Should detect when agent reasoning is repetitive."""
+        reasoning1 = (
+            "I'm terrified about losing my job. Need to cut spending and save money. "
+            "Maybe look for backup work. Lisa and I need to talk about the budget."
+        )
+        reasoning2 = (
+            "Still terrified about losing my job. Need to cut spending and save money. "
+            "Looking at gig apps for backup work. Lisa and I talked about the budget."
+        )
+        similarity = compute_trigram_jaccard(reasoning1, reasoning2)
+        # These share themes but are paraphrased — ~43% similarity
+        # Higher than completely different texts, showing partial overlap
+        assert similarity > 0.3
+
+    def test_different_reasoning_low_similarity(self):
+        """Different reasoning should have low similarity."""
+        reasoning1 = (
+            "I'm terrified about losing my job. Need to cut spending and save money. "
+            "Maybe look for backup work."
+        )
+        reasoning2 = (
+            "Actually feeling more optimistic now. The retraining program looks promising. "
+            "I signed up for the AI course and it's going well."
+        )
+        similarity = compute_trigram_jaccard(reasoning1, reasoning2)
+        assert similarity < 0.3