Garrus800-stack
diff --git a/‎CHANGELOG-v7.md‎
Lines changed: 78 additions & 0 deletions b/‎CHANGELOG-v7.md‎
Lines changed: 78 additions & 0 deletions
diff --git a/‎CHANGELOG.md‎
Lines changed: 44 additions & 39 deletions b/‎CHANGELOG.md‎
Lines changed: 44 additions & 39 deletions
diff --git a/‎README.md‎
Lines changed: 3 additions & 3 deletions b/‎README.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎RELEASE_NOTES.md‎
Lines changed: 44 additions & 39 deletions b/‎RELEASE_NOTES.md‎
Lines changed: 44 additions & 39 deletions
diff --git a/‎docs/ARCHITECTURE-DEEP-DIVE.md‎
Lines changed: 8 additions & 2 deletions b/‎docs/ARCHITECTURE-DEEP-DIVE.md‎
Lines changed: 8 additions & 2 deletions
diff --git a/‎docs/BENCHMARKING.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/BENCHMARKING.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/CAPABILITIES.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/CAPABILITIES.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/COMMUNICATION.md‎
Lines changed: 5 additions & 1 deletion b/‎docs/COMMUNICATION.md‎
Lines changed: 5 additions & 1 deletion
diff --git a/‎docs/EVENT-FLOW.md‎
Lines changed: 80 additions & 0 deletions b/‎docs/EVENT-FLOW.md‎
Lines changed: 80 additions & 0 deletions
diff --git a/‎docs/GATE-INVENTORY.md‎
Lines changed: 3 additions & 0 deletions b/‎docs/GATE-INVENTORY.md‎
Lines changed: 3 additions & 0 deletions
@@ -8,8 +8,8 @@
   <br>
   <sub>Reads its own source code. Plans changes. Tests them in a sandbox before applying.<br>Verifies output programmatically before trusting it. Pursues multi-step goals across restarts.<br>Runs idle-time consolidation in the background. Tracks an emotional state as a behavioral steering signal — not a claim of sentience.<br>Learns what prompts and temperatures work for its specific model.</sub>
   <br><br>
-  <img src="https://img.shields.io/badge/version-7.9.9-d4a017?style=flat-square" alt="Version">
-  <img src="https://img.shields.io/badge/tests-7933%20passing-4ade80?style=flat-square" alt="Tests">
+  <img src="https://img.shields.io/badge/version-7.9.10-d4a017?style=flat-square" alt="Version">
+  <img src="https://img.shields.io/badge/tests-8105%20passing-4ade80?style=flat-square" alt="Tests">
   <img src="https://img.shields.io/badge/fitness-126%2F130-4ade80?style=flat-square" alt="Fitness">
   <img src="https://img.shields.io/badge/TSC-typecheck_ok-4ade80?style=flat-square" alt="TSC">
   <img src="https://img.shields.io/badge/schemas-100%25-4ade80?style=flat-square" alt="Schemas">
@@ -453,7 +453,7 @@ All tests run without external dependencies (no Ollama, no API keys, no internet
 | Manifest phases | 12 (Phase 1–12, boot order enforced) |
 | DI services | 165 manifest + 13 bootstrap = 178 at runtime |
 | Late-bindings | 263 cross-phase dependency bindings (2 optional skipped) |
-| Test suites | 488 files, 7933 tests (coverage gates: 80/76/78, ratchet floor 6014) |
+| Test suites | 488 files, 8105 tests (coverage gates: 80/76/78, ratchet floor 6014) |
 | Dependencies | 4 production + 1 optional + 10 dev |
 | LLM backends | 3 (Anthropic, OpenAI-compatible, Ollama) |
 | IPC channels | 79 main ↔ 79 preload (rate-limited, all in sync) |
 
@@ -14,7 +14,7 @@ Genesis Agent is a **self-modifying, self-verifying, cognitive AI agent** built
 |--------|-------|
 | Production LOC (src/) | ~101,500 |
 | Source Modules | 379 JS files |
-| Test Files / Tests | 500 / 7933 (Win baseline) |
+| Test Files / Tests | 502 / 8105 (Win baseline) |
 | DI Services | 178 (165 manifest + 13 bootstrap) |
 | Boot Phases | 12 |
 | Boot Time (Windows, cold) | ~1.3 s |
@@ -282,6 +282,12 @@ Perceive (WorldState) → Plan (FormalPlanner) → Act → Verify → Learn →
 ```
 Max 20 steps per goal (+10 after user approval), 3 consecutive error limit, 10-minute global timeout.
 
+**AgentLoopProgressDetector** `v7.9.9` — Reflexion-style degenerate-loop detector (Shinn et al. 2023, arXiv 2303.11366). Two state Maps cleared on `goal:completed` / `goal:abandoned` / `goal:obsolete` / `goal:stalled`. The action-loop detector hashes `(stepKind, resultDigest)` per step into a per-goal ring buffer; three identical hashes in a row emit `agent-loop:no-progress-detected` and force a replan. The plan-loop detector hashes `(goalDesc, plan-step-types)` at pursuit start; a hash seen before for the same goal emits `agent-loop:identical-plan-detected` and forces a replan with a different LLM hint. ProgressDetector is not a registered Container service — AgentLoopPursuit lazy-instantiates it on first use; when absent, pursuit still runs but loses the early-loop-break and relies on the existing `failureCap` (2) and `_repeatedFailures` paths instead.
+
+**AgentLoopPursuitGate three-branch dispatch** `v7.9.9` — When MentalSimulator returns `proceed: false` with `riskScore ≥ 5.0`, `handleHardGateAbort` reads `trustLevelSystem.getLevel()` and routes: SUPERVISED + AUTONOMOUS stay warn-only (`aborted: false`), letting the step route through `TrustLevelSystem.checkApproval(stepType)` which asks SUPERVISED users about everything and AUTONOMOUS users only about categorically critical action classes; FULL_AUTONOMY tries `_trySpawnObstacleSubgoal` and on refusal calls `goalStack.markObsolete`. The architectural point is decoupling: the hard-gate is a numerical signal from MentalSimulator about a plan's overall risk; the approval mechanism is categorical via TrustLevelSystem about an individual action's risk class. Pre-v7.9.9 iterations mixed them, producing a spam path where high-sim-risk goals at AUTONOMOUS dropped into approval prompts on every retry. `agent-loop:simulation-abort` telemetry still fires at every gate trigger, deduplicated per `goalId`.
+
+**AgentLoopRecovery decompose-on-failure** `v7.9.9` — `_repeatedFailures` Map keyed `(goalId, errorClass)` with 1h TTL, consulted at the bottom of `classifyAndRecover`. On the 2nd occurrence of the same error-class for the same goal — across pursuit retries, not within a single pursuit — recovery synthesises an obstacle and routes it through `_trySpawnObstacleSubgoal`. The cross-pursuit keying is the critical detail: pre-fix the key included `stepIndex`, which is unstable across retries (each retry generates a different plan), so the strikes never matched and decompose never fired in production. Goal-lifecycle events clear all entries for that goalId.
+
 ### Phase 9: Cognitive (35 files, ~13,200 LOC)
 
 Expectation, surprise, learning, self-model, adaptation. The cognitive substrate that makes Genesis self-correcting and self-improving. Includes CognitiveSelfModel (empirical capability tracking with Wilson-score calibration), AdaptiveStrategy (closed-loop self-correction), OnlineLearner (real-time behavioral adaptation), PromptEvolution (A/B prompt optimization), MemoryConsolidator (KG/Lessons hygiene), TaskRecorder (execution replay), CoreMemories (v7.3.7), LessonsStore, GateStats (v7.3.6 — central gate-verdict telemetry), SuspicionFrontier, LessonFrontier, ArchitectureReflection, **SelfStatementLog (v7.5.5 + DE/EN parity in v7.5.6)** — auto-classifies first-person statements (`strukturell` / `versprechen` / `emotional` / `uncertain`), persists to daily JSONL shards, fires `selfstatement:contradiction` when a structural claim lacks verified-data backing.
@@ -310,7 +316,7 @@ Persistent agency layer: GoalPersistence, FailureTaxonomy, DynamicContextBudget,
 
 Trust and effectors: TrustLevelSystem, EffectorRegistry, WebPerception, SelfSpawner.
 
-**TrustLevelSystem** `v4.1` — Four levels: Level 0 (supervised), Level 1 (assisted), Level 2 (autonomous), Level 3 (full autonomy). Auto-upgrade suggestions based on MetaLearning success rates.
+**TrustLevelSystem** `v3.0 — frozen v7.9.9` — Three levels: Level 0 SUPERVISED (always ask), Level 1 AUTONOMOUS (ask only on categorically critical actions: DEPLOY, EXTERNAL_API, EMAIL_SEND), Level 2 FULL_AUTONOMY (never ask). The four-level structure that existed through v7.9.6 (Supervised / Assisted / Autonomous / Full) was collapsed in v7.9.7 R1: the ASSISTED slot lacked a clear principle that distinguished it from SUPERVISED in practice, and the migration data showed users rarely settled there. v7.9.8 Fix 1 added migration writeback with `schemaVersion: 3`. v7.9.8 Fix 2 changed the fresh-install default from AUTONOMOUS to SUPERVISED at six call sites. v7.9.9 (A) closed the last two unaligned sites in Settings.js and rerouted the migration table so old ASSISTED (stored 1) buckets to SUPERVISED (new 0) instead of AUTONOMOUS (new 1) — "Ask for risky" was the level a user chose explicitly to limit autonomy, so re-bucketing downward honours the spirit of their choice. After v7.9.9 the trust system is frozen: no future version touches the migration table, the dropdown options, or the default level. The constructor distinguishes between caller-supplied `cfg.level` (already in the 3-level system, range 0..2 passes through) and stored values from `asyncLoad` (potentially 4-level, routes through `_migrateLevel`).
 
 **EffectorRegistry** `v4.1` — External action system with precondition checking. Built-in effectors: clipboard, notifications, browser, GitHub (issues, PRs, comments). Precondition failures emit `effector:blocked` events.
 
 
@@ -8,7 +8,7 @@
 
 | Command | What it does | Duration |
 |---------|-------------|----------|
-| `npm test` | Run all ~7933 tests | ~75s |
+| `npm test` | Run all ~8105 tests | ~75s |
 | `npm run test:ci` | Tests + coverage enforcement (80/76/78) | ~150s |
 | `npm run benchmark:agent --quick` | 3-task capability benchmark | ~2 min |
 | `npm run benchmark:agent:layer:organism` | A/B: full vs without organism | ~5 min |
@@ -22,7 +22,7 @@
 ### Run all tests
 
 ```bash
-npm test                    # Full suite (~7933 tests)
+npm test                    # Full suite (~8105 tests)
 npm run test:new            # Only per-module test files
 npm run test:legacy         # Only monolithic legacy suite
 ```
 
@@ -6,7 +6,7 @@
 
 - 379 source modules across 12 boot phases
 - 178 DI services (165 manifest + 13 bootstrap)
-- 7933 tests on Windows / 7932 on Linux (passing, 0 failures)
+- 8105 tests on Windows / 7932 on Linux (passing, 0 failures)
 - 489 events with 489 payload schemas (full parity)
 - Architectural fitness: 127/130
 - 18 CI audit gates — see [GATE-INVENTORY.md](GATE-INVENTORY.md) for the runtime gates
@@ -257,7 +257,7 @@ See [COMMUNICATION.md](COMMUNICATION.md) for the full protocol specification.
 | **Dashboard** | EventBus inspector, health status, dependency graph (v5.4: extracted to 3 delegate files) |
 | **i18n** | EN, DE, FR, ES UI (auto-detected, switchable) |
 | **Structured logging** | Human-readable or JSON-lines format, pluggable sink |
-| **500 test files** | 7933 tests (Win baseline, v7.9.6), coverage gates: 80% lines, 76% branches, 78% functions |
+| **502 test files** | 8105 tests (Win baseline, v7.9.6), coverage gates: 80% lines, 76% branches, 78% functions |
 | **CI scripts** | `npm run ci` = tests + event validation + channel validation + fitness gate |
 | **TypeScript CI** `v5.4` | `tsc --noEmit` blocks merges — zero type regressions allowed |
 | **Degradation matrix** | Auto-generated report showing what breaks if each service is missing |
 
@@ -39,7 +39,7 @@ EmotionalState ──emit('emotion:shift')──→ EventBus ──→ PromptBui
 
 Key properties:
 
-- **489 event types** catalogued in `EventTypes.js` (v7.9.9 baseline)
+- **489 event types** catalogued in `EventTypes.js` (v7.9.10 baseline)
 - **489 payload schemas** in `EventPayloadSchemas.js` — full parity since v7.6.x (every catalog entry has a registered schema); dev-mode validation throws on mismatch
 - **Ring buffer history** — last 500 events for debugging
 - **Source tracking** — every event carries `{ source: 'ModuleName' }` for audit
@@ -54,6 +54,10 @@ New v7.8.9–v7.9.4 events (Können maturity chain): `skill:candidate-extracted`
 
 New v7.9.4 events (IdleMind maturity): `idle:goal-balance-break` fires when IdleMind interrupts a goal-step stretch to pick a non-goal activity (default every 3 steps, configurable via `idleMind.goalStepsPerActivityPick`).
 
+New v7.9.9 events (Hard-Gate + Recovery + ProgressDetector): `agent-loop:simulation-abort` fires from `AgentLoopPursuitGate.handleHardGateAbort` whenever MentalSimulator returns `proceed: false` with `riskScore >= 5.0`. Three trust-level branches dispatch from there (warn-only at SUPERVISED + AUTONOMOUS, decompose-or-obsolete at FULL_AUTONOMY). Payload `{ goalId, riskScore, priorFailures, reason }`, deduplicated per `goalId`. `agent-loop:decompose-on-failure` fires from `AgentLoopRecovery._repeatedFailures` when the same error-class hits the same goal twice across pursuit retries — payload `{ goalId, stepIndex, errorClass, strikes }`. `agent-loop:no-progress-detected` and `agent-loop:identical-plan-detected` fire from `AgentLoopProgressDetector` (Reflexion-style heuristic, Shinn et al. 2023) when three identical (action, observation) hashes appear in a row, or when a plan hash recurs for the same goal.
+
+New v7.9.10 event (Lessons-Pipeline activated): `lessons:recorded` fires from `LessonsStore.record()` on every persisted lesson — payload `{ id, category, insight }` (insight truncated to 100 chars). The pipeline became fully functional in v7.9.10 once `recordReflection`'s `stableClass` gate was relaxed to accept LLM-verdict messages and `_save()` was moved from buffered (every 5th) to immediate (every record).
+
 ---
 
 ## Layer 2: IPC (UI ←→ Agent)
 
@@ -862,3 +862,83 @@ Listeners on most of these are not yet wired in production — they
 serve as bus-level instrumentation for anyone debugging gate
 behaviour.
 
+
+## v7.9.9 / v7.9.10 — Cognitive Pipelines
+
+### Hard-Gate Dispatch (v7.9.9)
+
+```mermaid
+flowchart TD
+    A[MentalSimulator.evaluate] --> B{proceed === false<br/>AND riskScore ≥ 5.0?}
+    B -->|no| C[Pursuit continues normally]
+    B -->|yes| D[emit agent-loop:simulation-abort<br/>dedup per goalId]
+    D --> E{trustLevelSystem.getLevel}
+    E -->|0 SUPERVISED| F[aborted: false<br/>step routes through<br/>TrustLevelSystem.checkApproval]
+    E -->|1 AUTONOMOUS| G[aborted: false<br/>asked only on<br/>DEPLOY / EXTERNAL_API / EMAIL_SEND]
+    E -->|2 FULL_AUTONOMY| H[_trySpawnObstacleSubgoal]
+    H -->|refused| I[goalStack.markObsolete]
+    H -->|spawned| J[Sub-goal decomposes obstacle]
+```
+
+The hard-gate is a numerical signal (riskScore) from MentalSimulator. The per-step ask is a categorical signal (actionType) from TrustLevelSystem. They are intentionally decoupled — mixing them produced approval-prompt spam on every retry of high-sim-risk goals through v7.9.7 + v7.9.8.
+
+### Decompose-on-Failure (v7.9.9)
+
+```mermaid
+flowchart TD
+    A[Pursuit fails with errorClass=X] --> B[classifyAndRecover called]
+    B --> C{_repeatedFailures<br/>has goalId+X?}
+    C -->|no, first strike| D[Record strike, 1h TTL]
+    D --> E[Default retry path]
+    C -->|yes, second strike| F[Synthesise obstacle<br/>contextKey=repeated-failure-X]
+    F --> G[_trySpawnObstacleSubgoal]
+    G --> H[emit agent-loop:decompose-on-failure]
+    G --> I[Sub-goal: Investigate why X keeps happening]
+    C -->|3rd+ strike| J[No-op — obstacle already spawned]
+    K[goal:completed / abandoned / obsolete / stalled] --> L[Clear ALL entries for goalId]
+```
+
+The cross-pursuit keying is the critical detail: the key is `(goalId, errorClass)`, not `(goalId, stepIndex, errorClass)`. Step indices are unstable across retries because each retry generates a different plan. Pre-v7.9.9 the strikes never matched and decompose never fired.
+
+### No-Progress Detector (v7.9.9)
+
+Reflexion-style heuristic, Shinn et al. 2023 (arXiv 2303.11366). Two parallel detectors with separate state Maps, both cleared on every goal-lifecycle terminal event.
+
+```mermaid
+flowchart TD
+    A[Step completes] --> B[Hash: sha256 stepKind + resultDigest, first 16 chars]
+    B --> C[Append to _actionObservationHashes goalId ring buffer]
+    C --> D{Last 3 hashes identical?}
+    D -->|yes| E[emit agent-loop:no-progress-detected]
+    E --> F[Force reflectOnProgress + replan]
+    G[Pursuit starts] --> H[Hash: sha256 goalDesc + plan-step-types]
+    H --> I{Hash seen before<br/>for this goalId?}
+    I -->|yes| J[emit agent-loop:identical-plan-detected]
+    J --> K[Force replan with different LLM hint]
+    I -->|no| L[Record hash, continue]
+```
+
+ProgressDetector is not a registered Container service. AgentLoopPursuit lazy-instantiates it on first use; when absent, pursuit still runs but loses the early loop-break and falls back to the `failureCap` (2) and `_repeatedFailures` paths.
+
+### Lessons-Pipeline (v7.9.10 — first fully functional)
+
+```mermaid
+flowchart TD
+    A[Plan fails or partial] --> B[recordReflection called]
+    B --> C[classifyFailure on errorMessage]
+    C --> D{stableClass gate}
+    D -->|classification === 'user-action'| E[Drop — not Genesis failing]
+    D -->|classification === 'unclassified'<br/>AND errorMessage empty| F[Drop — no signal]
+    D -->|otherwise| G[lessonsStore.record category=obstacle-resolution]
+    G --> H[bus.fire lessons:recorded]
+    H --> I[_save → JSON write to .genesis/lessons.json]
+    J[Next pursuit] --> K[AgentLoopRecovery._recallObstacleLessons]
+    K --> L[lessonsStore.recall 'obstacle-resolution', goalDesc]
+    L --> M[Inject AVOID-past-failure directives]
+```
+
+The pre-v7.9.10 silent bug: `stableClass` required `classification !== 'unclassified'`. LLM-generated verdict messages like `"PARTIAL because the critical step failed..."` never matched the technical regex buckets in `failure-patterns.js` and all bucketed as `'unclassified'`. Six hours of field-test produced zero stored lessons. v7.9.10 widens the gate to also accept `'unclassified'` when `errorMessage` is non-empty.
+
+A second silent bug: `_save()` ran only every 5th `record()` call. Short sessions never persisted any lessons. v7.9.10 saves on every record (cheap JSON write under 5 MB).
+
+---
@@ -19,13 +19,16 @@
 | 11 | `pse:content-sanity`      | `cognitive/proactiveSelfExpression/ContentSanity.js` (v7.7.9) | pass / block (length, repetition, self-negation, profanity) | blocking |
 | 12 | `pse:scoring`             | `cognitive/proactiveSelfExpression/Scoring.js` (v7.7.9) | passes when significance×novelty×context-fit ≥ per-kind floor | preventive (threshold) |
 | 13 | `pse:private-kind`        | `proactiveSelfExpression/HardGates.js` gate-0 (v7.9.5) | block on `thought.kind ∈ PRIVATE_KINDS` regardless of settings | **structurally blocking** — unreachable from settings |
+| 14 | `cognitive:hard-gate`     | `revolution/AgentLoopPursuitGate.handleHardGateAbort` (v7.9.9) | three-branch dispatch by trust level: SUPERVISED/AUTONOMOUS warn-only, FULL_AUTONOMY decompose-or-obsolete | **branching** — fires `agent-loop:simulation-abort` always; `aborted: false` at SUPERVISED+AUTONOMOUS, `aborted: true` at FULL_AUTONOMY |
 
 Integration test: `test/modules/gate-stats-integration.test.js` — end-to-end
 coverage that `recordGate()` is triggered by real ChatOrchestrator flows.
 Regression tests for the v7.5.x additions: `test/modules/v751-fix.test.js`,
 `test/modules/v756-fix.test.js`, `test/modules/thinking-block-stream-filter.test.js`,
 `test/modules/thinking-block-integration.test.js`.
 
+> **Cognitive hard-gate (v7.9.9):** Gate 14, `cognitive:hard-gate`, sits at the boundary between MentalSimulator and pursuit execution. When sim returns `proceed: false` with `riskScore >= 5.0`, `AgentLoopPursuitGate.handleHardGateAbort` reads the current trust level and dispatches one of three ways. At SUPERVISED and AUTONOMOUS it is warn-only — the per-step `TrustLevelSystem.checkApproval` is the actual asking mechanism, so this gate just records the simulation-risk signal without duplicating prompts. At FULL_AUTONOMY it tries `_trySpawnObstacleSubgoal` and on refusal calls `goalStack.markObsolete`. The architectural point is the decoupling: hard-gate is a *numerical* signal about plan-level risk, `TrustLevelSystem.checkApproval` is a *categorical* signal about per-action risk class. Earlier iterations mixed them, producing approval-prompt spam on every retry of high-sim-risk goals.
+
 > **Four-layer gate architecture (v7.5.6):**
 > The bus now has four deliberately-symmetric layers across the input/action/output axis:
 >