Skip to content

Commit e3928c0

Browse files
v7.9.10 — lessons-pipeline activation, cloud-no-prefill continuation cap, fr/es 464-key parity, settings-modal language-switch, detector offline guard, metabolism warning threshold + idlemind thoughtcount persistence: stableClass widened to accept unclassified-with-message (LLM-verdict messages no longer dropped) + LessonsStore _save per record, ContinuationLoop computeEffectiveMaxContinuations with CLOUD_NO_PREFILL_FLOOR 10 (verified-prefill respects caller, others lift), Language.js fr/es 23/464 → 464/464 parity + parity tests, settings-fields _decorateField on closest('.setting-group') anchor + refreshSettingsI18n calls _decorateAllFields + English fallback labels, LLMCapabilityDetector honors GENESIS_OFFLINE_TESTS (15s setTimeout per bridge.chat eliminated) + ContinuationLoop BACKOFF_BASE_MS env-aware 0ms in test mode, --test-timeout=2000 defensive, metabolism:cost threshold 0.08 → 0.12 (cloud routine calls no longer warn), IdleMindActivityStats persists thoughtCount with activityCounts (dashboard consistent across restarts), test suite 273s → 193s linux / 54.9s win, 8135 win, 41 hash-locked, 8 strict audits green, 57 doc claims match
1 parent 1640623 commit e3928c0

34 files changed

Lines changed: 1752 additions & 164 deletions

CHANGELOG-v7.md

Lines changed: 78 additions & 0 deletions
Large diffs are not rendered by default.

CHANGELOG.md

Lines changed: 44 additions & 39 deletions
Large diffs are not rendered by default.

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,8 @@
88
<br>
99
<sub>Reads its own source code. Plans changes. Tests them in a sandbox before applying.<br>Verifies output programmatically before trusting it. Pursues multi-step goals across restarts.<br>Runs idle-time consolidation in the background. Tracks an emotional state as a behavioral steering signal — not a claim of sentience.<br>Learns what prompts and temperatures work for its specific model.</sub>
1010
<br><br>
11-
<img src="https://img.shields.io/badge/version-7.9.9-d4a017?style=flat-square" alt="Version">
12-
<img src="https://img.shields.io/badge/tests-7933%20passing-4ade80?style=flat-square" alt="Tests">
11+
<img src="https://img.shields.io/badge/version-7.9.10-d4a017?style=flat-square" alt="Version">
12+
<img src="https://img.shields.io/badge/tests-8105%20passing-4ade80?style=flat-square" alt="Tests">
1313
<img src="https://img.shields.io/badge/fitness-126%2F130-4ade80?style=flat-square" alt="Fitness">
1414
<img src="https://img.shields.io/badge/TSC-typecheck_ok-4ade80?style=flat-square" alt="TSC">
1515
<img src="https://img.shields.io/badge/schemas-100%25-4ade80?style=flat-square" alt="Schemas">
@@ -453,7 +453,7 @@ All tests run without external dependencies (no Ollama, no API keys, no internet
453453
| Manifest phases | 12 (Phase 1–12, boot order enforced) |
454454
| DI services | 165 manifest + 13 bootstrap = 178 at runtime |
455455
| Late-bindings | 263 cross-phase dependency bindings (2 optional skipped) |
456-
| Test suites | 488 files, 7933 tests (coverage gates: 80/76/78, ratchet floor 6014) |
456+
| Test suites | 488 files, 8105 tests (coverage gates: 80/76/78, ratchet floor 6014) |
457457
| Dependencies | 4 production + 1 optional + 10 dev |
458458
| LLM backends | 3 (Anthropic, OpenAI-compatible, Ollama) |
459459
| IPC channels | 79 main ↔ 79 preload (rate-limited, all in sync) |

RELEASE_NOTES.md

Lines changed: 44 additions & 39 deletions
Large diffs are not rendered by default.

docs/ARCHITECTURE-DEEP-DIVE.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ Genesis Agent is a **self-modifying, self-verifying, cognitive AI agent** built
1414
|--------|-------|
1515
| Production LOC (src/) | ~101,500 |
1616
| Source Modules | 379 JS files |
17-
| Test Files / Tests | 500 / 7933 (Win baseline) |
17+
| Test Files / Tests | 502 / 8105 (Win baseline) |
1818
| DI Services | 178 (165 manifest + 13 bootstrap) |
1919
| Boot Phases | 12 |
2020
| Boot Time (Windows, cold) | ~1.3 s |
@@ -282,6 +282,12 @@ Perceive (WorldState) → Plan (FormalPlanner) → Act → Verify → Learn →
282282
```
283283
Max 20 steps per goal (+10 after user approval), 3 consecutive error limit, 10-minute global timeout.
284284

285+
**AgentLoopProgressDetector** `v7.9.9` — Reflexion-style degenerate-loop detector (Shinn et al. 2023, arXiv 2303.11366). Two state Maps cleared on `goal:completed` / `goal:abandoned` / `goal:obsolete` / `goal:stalled`. The action-loop detector hashes `(stepKind, resultDigest)` per step into a per-goal ring buffer; three identical hashes in a row emit `agent-loop:no-progress-detected` and force a replan. The plan-loop detector hashes `(goalDesc, plan-step-types)` at pursuit start; a hash seen before for the same goal emits `agent-loop:identical-plan-detected` and forces a replan with a different LLM hint. ProgressDetector is not a registered Container service — AgentLoopPursuit lazy-instantiates it on first use; when absent, pursuit still runs but loses the early-loop-break and relies on the existing `failureCap` (2) and `_repeatedFailures` paths instead.
286+
287+
**AgentLoopPursuitGate three-branch dispatch** `v7.9.9` — When MentalSimulator returns `proceed: false` with `riskScore ≥ 5.0`, `handleHardGateAbort` reads `trustLevelSystem.getLevel()` and routes: SUPERVISED + AUTONOMOUS stay warn-only (`aborted: false`), letting the step route through `TrustLevelSystem.checkApproval(stepType)` which asks SUPERVISED users about everything and AUTONOMOUS users only about categorically critical action classes; FULL_AUTONOMY tries `_trySpawnObstacleSubgoal` and on refusal calls `goalStack.markObsolete`. The architectural point is decoupling: the hard-gate is a numerical signal from MentalSimulator about a plan's overall risk; the approval mechanism is categorical via TrustLevelSystem about an individual action's risk class. Pre-v7.9.9 iterations mixed them, producing a spam path where high-sim-risk goals at AUTONOMOUS dropped into approval prompts on every retry. `agent-loop:simulation-abort` telemetry still fires at every gate trigger, deduplicated per `goalId`.
288+
289+
**AgentLoopRecovery decompose-on-failure** `v7.9.9``_repeatedFailures` Map keyed `(goalId, errorClass)` with 1h TTL, consulted at the bottom of `classifyAndRecover`. On the 2nd occurrence of the same error-class for the same goal — across pursuit retries, not within a single pursuit — recovery synthesises an obstacle and routes it through `_trySpawnObstacleSubgoal`. The cross-pursuit keying is the critical detail: pre-fix the key included `stepIndex`, which is unstable across retries (each retry generates a different plan), so the strikes never matched and decompose never fired in production. Goal-lifecycle events clear all entries for that goalId.
290+
285291
### Phase 9: Cognitive (35 files, ~13,200 LOC)
286292

287293
Expectation, surprise, learning, self-model, adaptation. The cognitive substrate that makes Genesis self-correcting and self-improving. Includes CognitiveSelfModel (empirical capability tracking with Wilson-score calibration), AdaptiveStrategy (closed-loop self-correction), OnlineLearner (real-time behavioral adaptation), PromptEvolution (A/B prompt optimization), MemoryConsolidator (KG/Lessons hygiene), TaskRecorder (execution replay), CoreMemories (v7.3.7), LessonsStore, GateStats (v7.3.6 — central gate-verdict telemetry), SuspicionFrontier, LessonFrontier, ArchitectureReflection, **SelfStatementLog (v7.5.5 + DE/EN parity in v7.5.6)** — auto-classifies first-person statements (`strukturell` / `versprechen` / `emotional` / `uncertain`), persists to daily JSONL shards, fires `selfstatement:contradiction` when a structural claim lacks verified-data backing.
@@ -310,7 +316,7 @@ Persistent agency layer: GoalPersistence, FailureTaxonomy, DynamicContextBudget,
310316

311317
Trust and effectors: TrustLevelSystem, EffectorRegistry, WebPerception, SelfSpawner.
312318

313-
**TrustLevelSystem** `v4.1` — Four levels: Level 0 (supervised), Level 1 (assisted), Level 2 (autonomous), Level 3 (full autonomy). Auto-upgrade suggestions based on MetaLearning success rates.
319+
**TrustLevelSystem** `v3.0 — frozen v7.9.9` — Three levels: Level 0 SUPERVISED (always ask), Level 1 AUTONOMOUS (ask only on categorically critical actions: DEPLOY, EXTERNAL_API, EMAIL_SEND), Level 2 FULL_AUTONOMY (never ask). The four-level structure that existed through v7.9.6 (Supervised / Assisted / Autonomous / Full) was collapsed in v7.9.7 R1: the ASSISTED slot lacked a clear principle that distinguished it from SUPERVISED in practice, and the migration data showed users rarely settled there. v7.9.8 Fix 1 added migration writeback with `schemaVersion: 3`. v7.9.8 Fix 2 changed the fresh-install default from AUTONOMOUS to SUPERVISED at six call sites. v7.9.9 (A) closed the last two unaligned sites in Settings.js and rerouted the migration table so old ASSISTED (stored 1) buckets to SUPERVISED (new 0) instead of AUTONOMOUS (new 1) — "Ask for risky" was the level a user chose explicitly to limit autonomy, so re-bucketing downward honours the spirit of their choice. After v7.9.9 the trust system is frozen: no future version touches the migration table, the dropdown options, or the default level. The constructor distinguishes between caller-supplied `cfg.level` (already in the 3-level system, range 0..2 passes through) and stored values from `asyncLoad` (potentially 4-level, routes through `_migrateLevel`).
314320

315321
**EffectorRegistry** `v4.1` — External action system with precondition checking. Built-in effectors: clipboard, notifications, browser, GitHub (issues, PRs, comments). Precondition failures emit `effector:blocked` events.
316322

docs/BENCHMARKING.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88

99
| Command | What it does | Duration |
1010
|---------|-------------|----------|
11-
| `npm test` | Run all ~7933 tests | ~75s |
11+
| `npm test` | Run all ~8105 tests | ~75s |
1212
| `npm run test:ci` | Tests + coverage enforcement (80/76/78) | ~150s |
1313
| `npm run benchmark:agent --quick` | 3-task capability benchmark | ~2 min |
1414
| `npm run benchmark:agent:layer:organism` | A/B: full vs without organism | ~5 min |
@@ -22,7 +22,7 @@
2222
### Run all tests
2323

2424
```bash
25-
npm test # Full suite (~7933 tests)
25+
npm test # Full suite (~8105 tests)
2626
npm run test:new # Only per-module test files
2727
npm run test:legacy # Only monolithic legacy suite
2828
```

docs/CAPABILITIES.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
- 379 source modules across 12 boot phases
88
- 178 DI services (165 manifest + 13 bootstrap)
9-
- 7933 tests on Windows / 7932 on Linux (passing, 0 failures)
9+
- 8105 tests on Windows / 7932 on Linux (passing, 0 failures)
1010
- 489 events with 489 payload schemas (full parity)
1111
- Architectural fitness: 127/130
1212
- 18 CI audit gates — see [GATE-INVENTORY.md](GATE-INVENTORY.md) for the runtime gates
@@ -257,7 +257,7 @@ See [COMMUNICATION.md](COMMUNICATION.md) for the full protocol specification.
257257
| **Dashboard** | EventBus inspector, health status, dependency graph (v5.4: extracted to 3 delegate files) |
258258
| **i18n** | EN, DE, FR, ES UI (auto-detected, switchable) |
259259
| **Structured logging** | Human-readable or JSON-lines format, pluggable sink |
260-
| **500 test files** | 7933 tests (Win baseline, v7.9.6), coverage gates: 80% lines, 76% branches, 78% functions |
260+
| **502 test files** | 8105 tests (Win baseline, v7.9.6), coverage gates: 80% lines, 76% branches, 78% functions |
261261
| **CI scripts** | `npm run ci` = tests + event validation + channel validation + fitness gate |
262262
| **TypeScript CI** `v5.4` | `tsc --noEmit` blocks merges — zero type regressions allowed |
263263
| **Degradation matrix** | Auto-generated report showing what breaks if each service is missing |

docs/COMMUNICATION.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ EmotionalState ──emit('emotion:shift')──→ EventBus ──→ PromptBui
3939

4040
Key properties:
4141

42-
- **489 event types** catalogued in `EventTypes.js` (v7.9.9 baseline)
42+
- **489 event types** catalogued in `EventTypes.js` (v7.9.10 baseline)
4343
- **489 payload schemas** in `EventPayloadSchemas.js` — full parity since v7.6.x (every catalog entry has a registered schema); dev-mode validation throws on mismatch
4444
- **Ring buffer history** — last 500 events for debugging
4545
- **Source tracking** — every event carries `{ source: 'ModuleName' }` for audit
@@ -54,6 +54,10 @@ New v7.8.9–v7.9.4 events (Können maturity chain): `skill:candidate-extracted`
5454

5555
New v7.9.4 events (IdleMind maturity): `idle:goal-balance-break` fires when IdleMind interrupts a goal-step stretch to pick a non-goal activity (default every 3 steps, configurable via `idleMind.goalStepsPerActivityPick`).
5656

57+
New v7.9.9 events (Hard-Gate + Recovery + ProgressDetector): `agent-loop:simulation-abort` fires from `AgentLoopPursuitGate.handleHardGateAbort` whenever MentalSimulator returns `proceed: false` with `riskScore >= 5.0`. Three trust-level branches dispatch from there (warn-only at SUPERVISED + AUTONOMOUS, decompose-or-obsolete at FULL_AUTONOMY). Payload `{ goalId, riskScore, priorFailures, reason }`, deduplicated per `goalId`. `agent-loop:decompose-on-failure` fires from `AgentLoopRecovery._repeatedFailures` when the same error-class hits the same goal twice across pursuit retries — payload `{ goalId, stepIndex, errorClass, strikes }`. `agent-loop:no-progress-detected` and `agent-loop:identical-plan-detected` fire from `AgentLoopProgressDetector` (Reflexion-style heuristic, Shinn et al. 2023) when three identical (action, observation) hashes appear in a row, or when a plan hash recurs for the same goal.
58+
59+
New v7.9.10 event (Lessons-Pipeline activated): `lessons:recorded` fires from `LessonsStore.record()` on every persisted lesson — payload `{ id, category, insight }` (insight truncated to 100 chars). The pipeline became fully functional in v7.9.10 once `recordReflection`'s `stableClass` gate was relaxed to accept LLM-verdict messages and `_save()` was moved from buffered (every 5th) to immediate (every record).
60+
5761
---
5862

5963
## Layer 2: IPC (UI ←→ Agent)

docs/EVENT-FLOW.md

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -862,3 +862,83 @@ Listeners on most of these are not yet wired in production — they
862862
serve as bus-level instrumentation for anyone debugging gate
863863
behaviour.
864864

865+
866+
## v7.9.9 / v7.9.10 — Cognitive Pipelines
867+
868+
### Hard-Gate Dispatch (v7.9.9)
869+
870+
```mermaid
871+
flowchart TD
872+
A[MentalSimulator.evaluate] --> B{proceed === false<br/>AND riskScore ≥ 5.0?}
873+
B -->|no| C[Pursuit continues normally]
874+
B -->|yes| D[emit agent-loop:simulation-abort<br/>dedup per goalId]
875+
D --> E{trustLevelSystem.getLevel}
876+
E -->|0 SUPERVISED| F[aborted: false<br/>step routes through<br/>TrustLevelSystem.checkApproval]
877+
E -->|1 AUTONOMOUS| G[aborted: false<br/>asked only on<br/>DEPLOY / EXTERNAL_API / EMAIL_SEND]
878+
E -->|2 FULL_AUTONOMY| H[_trySpawnObstacleSubgoal]
879+
H -->|refused| I[goalStack.markObsolete]
880+
H -->|spawned| J[Sub-goal decomposes obstacle]
881+
```
882+
883+
The hard-gate is a numerical signal (riskScore) from MentalSimulator. The per-step ask is a categorical signal (actionType) from TrustLevelSystem. They are intentionally decoupled — mixing them produced approval-prompt spam on every retry of high-sim-risk goals through v7.9.7 + v7.9.8.
884+
885+
### Decompose-on-Failure (v7.9.9)
886+
887+
```mermaid
888+
flowchart TD
889+
A[Pursuit fails with errorClass=X] --> B[classifyAndRecover called]
890+
B --> C{_repeatedFailures<br/>has goalId+X?}
891+
C -->|no, first strike| D[Record strike, 1h TTL]
892+
D --> E[Default retry path]
893+
C -->|yes, second strike| F[Synthesise obstacle<br/>contextKey=repeated-failure-X]
894+
F --> G[_trySpawnObstacleSubgoal]
895+
G --> H[emit agent-loop:decompose-on-failure]
896+
G --> I[Sub-goal: Investigate why X keeps happening]
897+
C -->|3rd+ strike| J[No-op — obstacle already spawned]
898+
K[goal:completed / abandoned / obsolete / stalled] --> L[Clear ALL entries for goalId]
899+
```
900+
901+
The cross-pursuit keying is the critical detail: the key is `(goalId, errorClass)`, not `(goalId, stepIndex, errorClass)`. Step indices are unstable across retries because each retry generates a different plan. Pre-v7.9.9 the strikes never matched and decompose never fired.
902+
903+
### No-Progress Detector (v7.9.9)
904+
905+
Reflexion-style heuristic, Shinn et al. 2023 (arXiv 2303.11366). Two parallel detectors with separate state Maps, both cleared on every goal-lifecycle terminal event.
906+
907+
```mermaid
908+
flowchart TD
909+
A[Step completes] --> B[Hash: sha256 stepKind + resultDigest, first 16 chars]
910+
B --> C[Append to _actionObservationHashes goalId ring buffer]
911+
C --> D{Last 3 hashes identical?}
912+
D -->|yes| E[emit agent-loop:no-progress-detected]
913+
E --> F[Force reflectOnProgress + replan]
914+
G[Pursuit starts] --> H[Hash: sha256 goalDesc + plan-step-types]
915+
H --> I{Hash seen before<br/>for this goalId?}
916+
I -->|yes| J[emit agent-loop:identical-plan-detected]
917+
J --> K[Force replan with different LLM hint]
918+
I -->|no| L[Record hash, continue]
919+
```
920+
921+
ProgressDetector is not a registered Container service. AgentLoopPursuit lazy-instantiates it on first use; when absent, pursuit still runs but loses the early loop-break and falls back to the `failureCap` (2) and `_repeatedFailures` paths.
922+
923+
### Lessons-Pipeline (v7.9.10 — first fully functional)
924+
925+
```mermaid
926+
flowchart TD
927+
A[Plan fails or partial] --> B[recordReflection called]
928+
B --> C[classifyFailure on errorMessage]
929+
C --> D{stableClass gate}
930+
D -->|classification === 'user-action'| E[Drop — not Genesis failing]
931+
D -->|classification === 'unclassified'<br/>AND errorMessage empty| F[Drop — no signal]
932+
D -->|otherwise| G[lessonsStore.record category=obstacle-resolution]
933+
G --> H[bus.fire lessons:recorded]
934+
H --> I[_save → JSON write to .genesis/lessons.json]
935+
J[Next pursuit] --> K[AgentLoopRecovery._recallObstacleLessons]
936+
K --> L[lessonsStore.recall 'obstacle-resolution', goalDesc]
937+
L --> M[Inject AVOID-past-failure directives]
938+
```
939+
940+
The pre-v7.9.10 silent bug: `stableClass` required `classification !== 'unclassified'`. LLM-generated verdict messages like `"PARTIAL because the critical step failed..."` never matched the technical regex buckets in `failure-patterns.js` and all bucketed as `'unclassified'`. Six hours of field-test produced zero stored lessons. v7.9.10 widens the gate to also accept `'unclassified'` when `errorMessage` is non-empty.
941+
942+
A second silent bug: `_save()` ran only every 5th `record()` call. Short sessions never persisted any lessons. v7.9.10 saves on every record (cheap JSON write under 5 MB).
943+
944+
---

docs/GATE-INVENTORY.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,13 +19,16 @@
1919
| 11 | `pse:content-sanity` | `cognitive/proactiveSelfExpression/ContentSanity.js` (v7.7.9) | pass / block (length, repetition, self-negation, profanity) | blocking |
2020
| 12 | `pse:scoring` | `cognitive/proactiveSelfExpression/Scoring.js` (v7.7.9) | passes when significance×novelty×context-fit ≥ per-kind floor | preventive (threshold) |
2121
| 13 | `pse:private-kind` | `proactiveSelfExpression/HardGates.js` gate-0 (v7.9.5) | block on `thought.kind ∈ PRIVATE_KINDS` regardless of settings | **structurally blocking** — unreachable from settings |
22+
| 14 | `cognitive:hard-gate` | `revolution/AgentLoopPursuitGate.handleHardGateAbort` (v7.9.9) | three-branch dispatch by trust level: SUPERVISED/AUTONOMOUS warn-only, FULL_AUTONOMY decompose-or-obsolete | **branching** — fires `agent-loop:simulation-abort` always; `aborted: false` at SUPERVISED+AUTONOMOUS, `aborted: true` at FULL_AUTONOMY |
2223

2324
Integration test: `test/modules/gate-stats-integration.test.js` — end-to-end
2425
coverage that `recordGate()` is triggered by real ChatOrchestrator flows.
2526
Regression tests for the v7.5.x additions: `test/modules/v751-fix.test.js`,
2627
`test/modules/v756-fix.test.js`, `test/modules/thinking-block-stream-filter.test.js`,
2728
`test/modules/thinking-block-integration.test.js`.
2829

30+
> **Cognitive hard-gate (v7.9.9):** Gate 14, `cognitive:hard-gate`, sits at the boundary between MentalSimulator and pursuit execution. When sim returns `proceed: false` with `riskScore >= 5.0`, `AgentLoopPursuitGate.handleHardGateAbort` reads the current trust level and dispatches one of three ways. At SUPERVISED and AUTONOMOUS it is warn-only — the per-step `TrustLevelSystem.checkApproval` is the actual asking mechanism, so this gate just records the simulation-risk signal without duplicating prompts. At FULL_AUTONOMY it tries `_trySpawnObstacleSubgoal` and on refusal calls `goalStack.markObsolete`. The architectural point is the decoupling: hard-gate is a *numerical* signal about plan-level risk, `TrustLevelSystem.checkApproval` is a *categorical* signal about per-action risk class. Earlier iterations mixed them, producing approval-prompt spam on every retry of high-sim-risk goals.
31+
2932
> **Four-layer gate architecture (v7.5.6):**
3033
> The bus now has four deliberately-symmetric layers across the input/action/output axis:
3134
>

0 commit comments

Comments
 (0)