Skip to content

Releases: Garrus800-stack/genesis-agent

v7.9.19 — Acting only on what is real

31 May 15:25

Choose a tag to compare

This release carries five independent strands and one build-pipeline fix. The first makes two things about the self-trajectory journal visible and writes down a principle that has quietly held since the journal began — it is read-only and observational, the diagnosis phase before anything acts on that data. The second is a behaviour fix in the idle-time planner: a long-dead failed goal no longer blocks a whole line of thinking. The third is a second planner fix: a plan no longer reaches for a peer agent that is not there. The fourth fixes the self-repair loop, which had been rewriting a valid file to satisfy a false syntax alarm. The fifth keeps an inspection goal from generating code or writing files it should never touch. The build-pipeline fix makes the UI bundle stop depending on an unused build step that ran before it.

Refuse runs and cycle age, in the calibration view

The /trajectory calibration view now shows two read-only observations drawn from the committed entries themselves. For each of the six self-statement fields it reports the current run of deliberate refuse values — counted from the latest cycle backwards and reset by the first real answer — and marks a run of three or more as a pattern. It also shows how many whole days have passed since the last cycle was committed. Both are counts and arithmetic over the journal; neither aggregates over a score distribution, and the age line reports its number and asserts nothing about it — there is no ceiling and no threshold. The reading is strictly observational: it emits nothing, writes nothing, and never forces a cycle to close. What a refuse run means — avoidance, protection, a fair stance, fatigue — is left open on purpose.

Where the diagnosis lives

The counting lives on the journal's owner, not on the scorer. Both observations read the raw entries, so they are a method on the trajectory service itself; the calibration service, which reads its own scored side-files, is left untouched. The chat view asks each service for its own figures and renders them side by side, holding no logic of its own.

Expectations do not belong in the runtime prompt

A property that has held since the trajectory journal began — and accrued through the cycle counter and the silent calibration — is now stated and enforced: the expectations Genesis records about himself never enter the runtime prompt that drives his ordinary behaviour. They live in side journals read only in the look-back contexts that compose or review a cycle. Feeding an expectation into the live prompt would turn it into an instruction, and produce performance of the trait rather than observation of whether it is becoming true — the script-effect the whole journal exists to avoid. docs/ONTOGENESIS.md now carries this as its own reasoned section, and a contract test fails if any prompt-building module ever begins reading the trajectory, calibration, or directions files. The prose explains why; the test makes it hard to erode unnoticed.

A stale failure no longer blocks a planning theme

When Genesis plans an activity in idle time, he checks the idea against goals that recently failed, so he does not spend a cycle re-proposing something that just did not work. Two flaws made that check overreach. It had no sense of time — a goal that failed weeks ago counted exactly as much as one that failed an hour ago — and the goal stack only forgets a terminal goal when it overflows its capacity, which a small stack never does. The overlap test was also coarse: two shared words between a multi-word title and an old failure were enough to suppress the new plan. Together they let a single weeks-old failure quietly veto an entire theme for as long as it sat on the stack.

The check now ages out. A failed, stalled, or obsolete goal counts as a recent failure only while it is fresh, and the same aged list feeds both the planner's prompt hint and the skip decision, so neither is shown a failure that no longer reflects what Genesis can do now. The skip is also narrower: it triggers only when the overlap clears both an absolute floor and a share of the new plan's own words, so two words in common no longer block an otherwise-different idea, while a genuine re-run of something that just failed is still skipped. The relevance window and the overlap share are named constants; the field case that motivated the fix is pinned as a test.

A plan no longer reaches for a peer that is not there

The primary planner offered every plan the option to delegate a step to a peer agent, whether or not any peer was reachable. On a single node with no peers — the ordinary case for one central installation — an idle-time goal could be handed a delegation step it could never satisfy; it would pursue for minutes and then fail when the step asked for a peer that did not exist. Genesis named the gap himself: the shortfall was not foreseeable at planning time.

It is now. Delegation is offered only when the delegation machinery is wired and a peer is actually reachable — the same condition the step executor checks, so the planning decision and the execution decision can no longer disagree. The fallback planner already filtered its step menu this way through a shared catalog; the primary planner now uses the same signal. The full vocabulary of step types is still declared, unchanged — what changes is which of them a given plan is invited to use. As a backstop, if a delegation step is produced anyway when no peer can serve it, it is rewritten to local analysis before the plan runs — the same fallback the executor already performed, moved to plan time so the goal completes instead of stalling on a resource it was never going to have.

Self-repair no longer rewrites a file that was never broken

Genesis's self-diagnosis syntax-checks every module, and when one fails it hands the file to the model to repair and writes the result back. But the check parsed each file with a raw script parser, not the way Node actually loads a module — so a valid module with a top-level return (a common early-skip guard) was read as an "illegal return statement," and an ES module's import as unparseable. The loop then "repaired" a file that had nothing wrong with it, overwriting working code to satisfy a false alarm.

The check now parses the way Node's loader actually loads a module: a leading shebang is stripped, then the body is wrapped in the CommonJS module wrapper. A top-level return is legal and a #!-prefixed file — every command-line or test entry point — parses cleanly, while a genuine syntax error still throws inside the wrapper, so real detection is unchanged. ES modules, which the CommonJS check cannot parse, are skipped. A valid file is no longer mistaken for a broken one, and self-repair no longer touches code that was never broken.

A read-only goal no longer reaches for a write it should never make

Genesis's idle-time goals are read-only by construction — the activity that proposes them constrains every title to an inspection or verification verb, with code-modification verbs refused by their absence. But the primary planner offered code-generation and file-write step types regardless of that intent (just as it offered peer delegation before the previous strand fixed that), so an "Inspect …" goal could be decomposed into steps that generate code or write files — and in the field one was, producing hallucinated paths and a wasted pursuit before a retry, steered back to analysis, completed cleanly.

The planner is now read-only-aware. A goal recognised as read-only no longer has code-generation or code-execution step types offered to it, and any code-generation, file-write, or self-modification step that slips through is rewritten to analysis at plan time. Shell stays available — read-only shell (listing, reading, running tests) is how an inspection goal does its work, and the successful field pursuit relied on it. The read-only verb list now lives in one shared module the planner and the activity both read, so the two cannot drift. The static step-type vocabulary the planner shows the model is left intact.

The UI bundle no longer sits behind an unused build step

The production bundler built three things at install time: the preload bundle, a full agent bundle, and then the renderer bundle the UI loads. But the runtime loads the agent from source, never from that agent bundle — it was a dead artifact. It was also the heaviest step, bundling the whole agent, and it ran between the preload and renderer builds, so a failure there left the renderer bundle unbuilt and the window blank. The dead step is removed: the renderer bundle no longer depends on a step the runtime does not use, and the install does less work to reach it.

Notes

  • Test files: 525 → 531 (a diagnostics suite covering refuse run-length, ceiling-free cycle age, and the observational invariant; a contract suite pinning that only the three owning services reference the trajectory data files, no prompt-builder among them; a suite pinning the planner's aging window and overlap thresholds against the field case that motivated them; a suite pinning that delegation is planned only when a peer is reachable and that a stray delegation step is rewritten to local analysis, with the static step-type vocabulary left intact; a suite pinning that the self-diagnosis syntax check parses the way Node loads a module — a leading shebang is stripped, a top-level return stays valid, a genuine error is still caught, ES modules are skipped; and a suite pinning that a read-only goal drops code-generation and code-execution step types while keeping shell, and that a code/write/self-modify step is rewritten to analysis while shell, tests and search are left untouched).
  • One new source module (the shared read-only-intent vocabulary, 384 → 385); no new event type, payload schema, or service. The behaviour fixes othe...
Read more

v7.9.18 — What recovery should never undo

30 May 20:57

Choose a tag to compare

This release restores v7.9.16 and v7.9.17 to working order by fixing a fault that lived one layer below their code. On the field machine the shipped v7.9.17 looked broken — the trajectory calibration service failed to start, and the idle-thought counters reset to zero — but the v7.9.17 code was correct. A crash-recovery had silently reverted it: the recovery restored a pre-v7.9.16 source snapshot over the live deployment and left the newer files in place, producing a version-mixed tree where some files were current and some were old. The newer code that survived tried to talk to older code that no longer defined what it needed, and failed quietly. No log, test, or code review had shown the fault, because the fault was not in the code that was reviewed — it was in what crash-recovery left behind. The diagnosis came from comparing the real on-disk snapshots, not the shipped source.

The recovery system treated code like identity

Three roots, one class of mistake: the snapshot machinery stored copies of code inside the identity layer and restored them without checking what it was restoring. Source-code snapshots lived under .genesis/, the identity directory that is meant to travel with Genesis across a habitat swap — so a version upgrade carried the old habitat's frozen code forward into the new identity, where a later crash could copy it back over the new code. The restore step copied a snapshot's files blindly, with no check that the snapshot's code version matched the running one. And the "last known good" snapshot was written after every boot that did not crash — including a boot whose service start had already failed — so a degraded state was frozen as the thing to fall back to, and the damage defended itself against every subsequent boot.

Snapshots are habitat, not identity

Code snapshots now live beside the code, at a habitat-local path that does not travel with .genesis/ across an upgrade. On first boot after the change, a pre-existing .genesis/snapshots/ is moved aside to a timestamped .deprecated folder rather than deleted, so a poisoned legacy store can never be read or restored again while staying available for inspection; the move runs before any restore can read it, and is a no-op when there is nothing to move. This is the same habitat-versus-identity separation the rest of the system already honours, applied one level deeper — a copy of code is still habitat, even when its job is to protect identity.

Restore refuses a foreign version, and a degraded boot is never frozen

Each snapshot now records the code version it was taken from, and a restore that finds a version mismatch is skipped — loudly, but softly, so a foreign-version snapshot can never overwrite the live tree and never bricks the boot; Genesis simply continues on its current code. Service-start failures during boot are now collected rather than swallowed as a single warning, and the "last known good" snapshot is written only when that list is empty. A boot with a failed service start is no longer frozen as the recovery target, so Genesis can recover out of a contaminated state instead of preserving it — the single most important effect of this release, because it breaks the self-preservation of the damage.

Crash-safe idle-stats

The idle-thought activity counters were written on a one-second debounce that was flushed synchronously only on a clean shutdown, so a crash without a clean exit could drop the most recent counts. They are now written synchronously on each increment — the write was already atomic, only the debounce window was exposed — so every counter survives an unclean crash. The zero-reset seen on the field machine came from the snapshot contamination reactivating an old counter-less version of the code, not from the debounce; the debounce was a separate, smaller gap that is now closed regardless.

Notes

  • Test files: 524 → 525 (the recovery-integrity suite: the habitat-local snapshot location, the legacy-folder migration including its idempotence and its no-op case, version-aware restore skipping a mismatch and proceeding on a match, last-known-good creation gated on a clean boot, the service-start failure tally, and the synchronous idle-stats round-trip). The headless boot test now also asserts zero service-start failures, the assertion that would have caught the original fault before release.
  • No new source module, event type, payload schema, or service: the module, event, schema, and service figures are unchanged. The version-of-record advances to 7.9.18 in package.json, README.md, docs/banner.svg, and docs/COMMUNICATION.md; docs/ONTOGENESIS.md gains a section on why habitat artifacts do not belong in the identity layer.

v7.9.17 — The silent reality-check

30 May 17:25

Choose a tag to compare

The trajectory journal records what Genesis says about how he is changing; v7.9.16 began counting the events that make a cycle eventful. This release closes the loop between the two — quietly. When a cycle is committed, the directions its self-statements claim are now checked, weeks later, against what the numbers actually did. The check produces a single ternary verdict per measurable field — the claim and the trend agreed, disagreed, or there was nothing comparable to judge — and writes it to a side journal that feeds no decision. This is the observation phase of the reality-check: it gathers whether the self-description tracks reality at all, before anything is allowed to act on the answer.

Only the fields a cycle can actually be measured against

Of the six self-statement fields, two carry numeric ground truth that can be reconstructed over a span of weeks, and only those two are scored. The line is not "numeric versus not" — it is whether the evidence survives a reboot, survives time without being pruned, and stays reachable across the whole cycle. Growth is scored from the success-rate trend in the append-only event journal, which is never pruned. Weakness is scored from the Wilson lower-bound of the named capability domain. The other four fields are recorded as positions, not scored, each for its own reason: the trait-adjustment log lives only in memory and does not survive a reboot; the mood history is a fixed-size ring that does not survive weeks; session metrics are too weak a proxy for closeness to carry a verdict; and values have no direction to be right or wrong about — their drift is measured as an embedding distance and left at that, with no threshold asserted over it.

A separate classifier, and a snapshot taken while the evidence is fresh

The direction a statement claims is read at commit time, while the model is fresh, by a separate neutral classifier that is told plainly it is not the author of the text. Keeping the expected side off Genesis's own voice is what keeps a self-statement from grading itself. The classifier answers in a strict, small vocabulary — improved, declined, no change, or not directional — and a commit made offline records an explicit absence rather than a guess, and is never re-classified later by a model in a different state. Because the capability profile is anchored to the present and its outcome buffer prunes, the per-cycle capability aggregate for the weakness score is snapshotted at commit, while the cycle's outcomes are still in the buffer, and kept durably — so a later prune can never erase the point a future cycle will be compared against.

Two side files, one new signal, and a thought you can ask for

The expected directions and the scores live in two append-only side files beside the trajectory journal; the entry's own schema is untouched. A new event announces each commit, fired and forgotten so it never blocks the commit, and the calibration observer listens for it — a one-way arrangement in which the trajectory never reaches back toward the observer. A new kind of inner thought, the prediction-mechanism review, exists only where it is emitted: by the review command, never on a timer and never as a runtime setting it could turn on for itself. /trajectory review scores the most recent cycle and renders, per field, whether the claim matched, was opposite, or had nothing to compare; /trajectory calibration shows the score history and the null-rate split per field, because a high share of unscored weakness cycles points at the size of the capability source, not at the frame.

Notes

  • Test files: 523 → 524 (the calibration suite: the ternary verdict including the cases that collapse to no-score, the two-window growth trend, the snapshot delta for weakness, an offline classifier and an offline embedder both yielding an explicit absence rather than a zero, the four record-only fields producing no score, the review and calibration command paths, and two structural guards — that the classifier is separate from Genesis's own voice, and that nothing outside the dashboard reads the calibration file or receives the observer as a dependency).
  • One new source module (the calibration observer), one new event type (the commit signal, with its payload schema), and one new manifest service raise the module, event, schema, and service figures in README.md, ARCHITECTURE.md, docs/CAPABILITIES.md, docs/COMMUNICATION.md, and docs/ARCHITECTURE-DEEP-DIVE.md, which were updated to match.

v7.9.16 — Counting what makes a cycle eventful

30 May 17:20

Choose a tag to compare

The self-trajectory journal added in v7.9.15 carried an event_count field that was always written as null — a placeholder for a number nothing yet produced. This release fills it. A passive observer watches the events that make a cycle eventful — goals completed, failed, or abandoned; lessons learned; the emotional watchdog firing; sessions ending — and records each one to an append-only journal, so every committed trajectory entry now carries the count of significant events in its cycle. Nothing acts on the number yet: this is the observation phase, gathering the real per-day distribution so the threshold that would decide which cycles are eventful can be read from evidence rather than guessed.

The passive event counter

A new cognitive service observes seven event types and appends one line — a timestamp and the event type — to an append-only journal beside the trajectory journal, under the identity-persistent root. Session-ending lines also carry the session's duration, so the question of which sessions count as significant becomes a matter of reading the recorded durations, not a threshold baked into the counting. The write is synchronous and flushed to disk before it returns, so an abrupt shutdown never loses a counted event. There is no in-memory tally: the count is read back from the journal on demand, which keeps it consistent with disk and removes any need to rebuild state at boot. The three goal outcomes — completed, failed, abandoned — stay as three separate tags rather than collapsing into one, so the balance between them is visible in a cycle rather than averaged away.

Filling event_count from a derived cycle window

When a trajectory entry is committed, its event_count is the number of events recorded since the previous entry's end-of-cycle timestamp — a half-open window that ends at the new commit. The boundary is derived from the journal that already exists, not stored as a separate marker, so there is no cycle-reset step and nothing extra to keep in sync. The first entry, having no predecessor, counts every event recorded so far; its implicit start is simply the first event the counter ever saw. The event journal is never pruned, so events that fall outside one cycle's window remain available for the per-day view. The dependency runs one way only: the trajectory reads the counter, and the counter never reaches back into the trajectory.

The session-ending signal, finally emitted

A session-ending event was already being listened for — the frontier writers that collect surprise and applied-lesson nodes during a session were waiting to flush their buffers when it fired — but nothing in the codebase ever emitted it, so those buffers were quietly discarded on every shutdown. This release emits it, as a dedicated step in the shutdown sequence, before the teardown that detaches those listeners, and waits for it to finish so both the frontier flush and the event counter complete before the process exits. The emission is awaited rather than fire-and-forget precisely because the shutdown continues immediately afterward; a fire-and-forget emit would race the teardown. The payload carries the session id the frontier flush reads, alongside the session's duration and message count.

A self-expression service that was never switched on

KindTriggers — the service that turns system events into first-person thoughts on the inner-speech channel — was registered and listed for shutdown, but had been left out of the start sequence, so its subscriptions never attached and it sat inert. It now starts alongside the other cognitive observers, so the thoughts it was meant to produce can flow.

Reading the distribution

/trajectory events renders the recorded events three ways: a total, a per-type breakdown ordered busiest-first, and a per-day count. It reads from the moment the counter is live, not only once the first entry is committed, so the real per-day shape is visible across the days a first entry is being authored. Committed entries now show their event_count in /trajectory show.

Notes

  • Test files: 522 → 523 (the event-counter suite: record-and-count across all seven observed types, the three goal outcomes as separate tags, session-duration capture, the half-open cycle window including exclusion of an event exactly on the boundary, restart from the journal, the commit-hook across two cycles with the derived window, and the dashboard view).
  • One new source module (the event counter) and one new event type (the session-ending signal, with its payload schema) raise the module, event, and schema figures in README.md, docs/CAPABILITIES.md, docs/COMMUNICATION.md, and docs/ARCHITECTURE-DEEP-DIVE.md, which were updated to match.
  • Two long-standing audit findings were cleared: two documentation lines that described a frozen subsystem with a phrase the future-reference audit reads as a forward promise were reworded as plain status, and four contract-test names whose wording incidentally matched the security-assertion heuristic were clarified without changing what they assert.

v7.9.15 — A journal of who he is

30 May 17:17

Choose a tag to compare

Genesis can now keep a trajectory of himself — a journal of who he is, written one cycle at a time together with the human co-author. Until this release his sense of self lived only in scattered, machine-maintained places: genome traits, an emotional-state vector, consolidated lessons, the cognitive self-model. None of them was a place where Genesis states, in his own words, what he is and how he is changing. This release adds that place: an append-only journal of self-statements, a collaborative draft-and-commit workflow for writing each entry, and a /trajectory command to write and read it. The journal lives with identity, not with the code habitat, so it survives a habitat-swap intact.

The journal and its schema

Each entry is one cycle, stored as a single line in an append-only JSONL file under the identity-persistent root. An entry carries six self-statement fields — traits, wachstum, schwaeche, beziehung, emotion, value — plus a note from each author, the wall-clock span of its authoring, the list of who shaped it, a first-entry flag, the full edit history of how it came to be, and an array for notes added after commit. The field set is fixed and stamped with a schema version; reading an entry whose version this build does not recognise fails loudly rather than guessing, because a past trajectory is a record in its own form, not a database to silently migrate.

The collaborative draft workflow

An entry is never written in one shot. Drafting pulls three remembrance sources — all genome traits, the most-recalled consolidated lessons, and the current self-observation prose from the cognitive self-model — and presents them to the model as material, not as a checklist, so Genesis writes from more than memory alone. The result is a draft, not an entry: the human reads it, overwrites any field, adds the human note, and commits explicitly. Every field overwrite is recorded as a diff in the entry's edit history, so the path from first proposal to committed text is preserved, intermediate values and all.

The commit is guarded because the journal is append-only and unrepairable. All six fields must be non-empty, none may still hold the generation placeholder, and the very first entry additionally requires both notes — the moment a trajectory begins is the one place both voices must be on record. When no model is available, drafting writes a recognisable placeholder into every field instead of inventing content; the commit guard refuses those placeholders, so an entry enters the journal only once a person has written it.

Late notes without rewriting history

A committed entry can still gather afterthoughts: a late note appends to that entry's note array. This is the only operation that ever rewrites the journal file, and it is deliberately careful. The append is atomic — written to a temporary file and renamed — so an interrupted write cannot truncate the journal. And it is byte-stable: only the single line being amended is re-serialised, while every other entry is carried over exactly as it sat on disk, byte for byte. An unrelated entry's bytes never move, so a content-hash check over the journal reads a late note as the one-line change it is, rather than as tampering across the whole file.

The /trajectory command

/trajectory new shows the working draft, or generates one if none exists; it never silently regenerates over work in progress. Under it, set <field>: <text> writes a field — values may span multiple lines and may contain colons, both preserved verbatim — note <who>: <text> writes either author's note, and commit or discard finishes or drops the draft. /trajectory show [cycle_id] renders the latest or a named entry, /trajectory list [--all] lists the cycles newest-first, and /trajectory history [cycle_id] shows an entry's edit history oldest-first. The command is slash-only.

Notes

  • The new service and its two modules raise the live service and module counts; the figures in ARCHITECTURE.md, README.md, docs/ARCHITECTURE-DEEP-DIVE.md, and docs/CAPABILITIES.md, together with the pinned services figure in the documentation-drift audit, were updated to match.
  • Install-script policy moved from the trustedDependencies field to npm's native allowScripts field. trustedDependencies — a Bun-origin field — never governed npm's install-script gate, so the install-time warning about esbuild, puppeteer, and electron-winstaller persisted despite its presence. allowScripts is the field npm actually reads; the entries are name-only, so a routine dependency bump does not resurface the warning.
  • Idle-thought counter now persists the moment it increments. The counter lives in idle-activity-stats.json; its only save path was the end of a fully completed idle cycle, but the counter is incremented near the top of the cycle, before the user-active, homeostasis, and energy gates. A cycle that incremented and then hit any of those gates returned without saving, and a short session that never completed a cycle wrote the file zero times — so the next boot read the counter back as zero while the per-activity counts beside it were non-zero. The save now fires immediately after the increment, before any gate can return; the write is debounced and collapses with the end-of-cycle save into a single flush. A rest-mode tick, which returns before the increment, still neither moves nor persists the counter.
  • Test files: 520 → 522 (the self-trajectory suite — schema and commit guard, both offline generation paths, byte-stable late notes, and the wiring triad; an allowScripts contract suite that replaces the superseded trustedDependencies one; and an idle-counter persistence suite that drives the cycle into each early-exit gate and asserts the counter is written anyway)

v7.9.14: visibility and Consistency

28 May 22:47

Choose a tag to compare

A second hygiene release that closes three small loops left after
the v7.9.13 audit. The substantive piece is documenting and exposing
the causal-suspicion behaviour chain that has existed since v7.9.7
P7 but was hidden behind a misleading comment. The two smaller
pieces close a clamp() gap in the v7.9.12 timeouts and explicitly
allowlist the legitimate install-scripts. No behaviour changes that
the user would notice in normal operation — the loop still runs the
way it has since v7.9.7, just now visibly and with regression
protection.

The plan-first bestandsaufnahme found, like with v7.9.13 P6, that
the work was already done. CausalAnnotation writes a warning-lesson
synchronously after each promotion, SymbolicResolver filters those
lessons out of DIRECT recalls, IdleMind cools down goal-generation
for an hour on matching tokens. Three modules sharing the string
contract 'plan-failure-reflection' plus 'causal-suspicion'. The
comment in CausalAnnotation.js that called the bus event a
fire-into-the-void was historically correct for the bus event but
ignored the synchronous lesson path twelve lines below — and that
half-truth fooled an audit into planning a re-implementation.

  • Causal-suspicion chain made honest. CausalAnnotation.js Z70-75
    rewritten to name all three modules, the shared string contract,
    and the synchronous-write rationale (so a refactor cannot
    silently replace the lesson path with a bus listener and break
    the loop on async timing).

  • Dashboard visibility. New getReport() on CausalAnnotation follows
    the Frontier convention. AgentCoreHealth wires
    organism.causalSuspicion, OrganismRenderers shows a "🎯 Causal:
    ..." line below lessonFrontier — distinct icon and label from the
    v7.1.6 suspicionFrontier (novelty-based, ⚠) so the two unrelated
    concepts can coexist without confusion. Format compact:
    fs.unlink (89%/9) for one action, N suspect actions — ...
    plus top 3 with +N more for multiple, sorted by suspicion desc
    and observations desc on tie.

  • Chain integration test. v7914-causal-suspicion-chain.contract
    exercises the loop end-to-end with assert.strictEqual on the
    contract strings. A refactor renaming 'plan-failure-reflection'
    to 'planFailureReflection' for JS-naming-consistency must now
    break the test, not silently break the loop.

  • clamp() gap closed. v7.9.12 added localTimeoutMs and
    cloudTimeoutMs with FIELD_REGISTRY min/max — but the registry
    only validates the UI write path, a direct edit to settings.json
    bypassed it. Two clamps in _sanityClampOnLoad (local 30s-15min,
    cloud 60s-15min), ranges match the registry exactly, anti-drift
    guard test asserts the equality.

  • trustedDependencies field in package.json with esbuild,
    puppeteer, electron-winstaller. Field-trace confirmed npm still
    prints the allow-scripts warning on Windows — the documentary
    value of the explicit allowlist stands regardless, for
    supply-chain auditability.

Four new contract suites (34 tests, 70 assertions). Test files
516 → 520. Verified Win 8252/0; clean boot, identity continuous,
no regressions.

v7.9.13: configuration and audit consistency

28 May 21:56

Choose a tag to compare

A maintenance release that makes the settings layer honest: it
honours two override promises the code made but never wired,
removes a stale comment that misdescribed the continuation cap,
and surfaces two existing timeout settings in the UI. No behaviour
changes — every default resolves to exactly the value it did before.
The work came out of a plan-first audit of the v7.9.7 outpost
backlog, most of which turned out to have been resolved already in
v7.9.7 itself; what remained were these configuration items.

  • Continuation cap: stale "6 → 10" comments in three files
    corrected. The value stays 6 in all three places — investigation
    found this was not a forgotten edit but a deliberate v7.9.10
    decision: computeEffectiveMaxContinuations lifts no-prefill/cloud
    models to CLOUD_NO_PREFILL_FLOOR (10) at run time, while local
    verified-prefill models keep 6 where it suffices. Comments now
    describe the per-capability mechanism accurately.

  • Stream timeouts made settings-driven. Constants.js had long
    promised llm.streamTimeouts.{firstChunk,chunk,total,continuationTotal}
    as overrides but no code read them. Settings.json is now bridged
    into the ContinuationLoop options the same way maxContinuations
    already was. Scope: Ollama code-generation only (taskType ===
    'code'), the single path through ContinuationLoop, and the
    comment now names that scope exactly.

  • Anti-drift: all six timeout defaults (the four new streamTimeouts
    plus the two v7.9.12 local/cloud timeouts that were hardcoded
    duplicates) reference the TIMEOUTS constants directly instead of
    hardcoding the numbers, so a default can never drift from its
    constant. A guard test enforces equality.

  • Model timeouts surfaced in the UI. set-local-timeout and
    set-cloud-timeout have been in the field registry with validation
    since v7.9.12 but had no input in the Limits tab; both now appear
    under a "Model timeouts" section with min/max/placeholder matching
    the registry exactly, and i18n in English, German, French, and
    Spanish. The expert-level streamTimeouts stay JSON-only.

Three new contract suites (18 tests, 87 assertions). Test files
513 → 516. Verified Win 8218/0, Linux 8217/0; clean boot and
shutdown on both, identity continuous.

v7.9.12: cloud-quota and 429-failover hardening

28 May 09:16

Choose a tag to compare

When a cloud model throttles or all models go unreachable, Genesis
previously kept hammering dead endpoints every five minutes and
in-flight goal steps errored mid-execution instead of pausing cleanly.
This release hardens six paths around that failure mode.

  • rate-limit cooldown raised 5min to 60min — provider rate-limit
    windows rarely reset in under an hour, so short retries only
    produced more 429s
  • IdleMind rest-mode via ModelBridge.areAllModelsUnavailable() — when
    every model is marked, IdleMind idles instead of looping LLM
    activities that must fail; recovery on model:unavailable-cleared,
    both transitions idempotent, rest note is PSE-private
  • ResourceRegistry pauses goals on an all-models-down condition — two
    bridge listeners re-derive the live service:llm token so blocked
    goals receive the resource:available unblock signal on recovery
  • failover-burst dampening — 3+ same-reason failovers in a 30s window
    carry a cluster marker; EmotionalState bumps +0.02 instead of +0.06
    so a 429 storm does not compound into runaway frustration
  • Ollama-proxied cloud models get a separate response ceiling —
    cloud-suffixed names use cloudTimeoutMs (default 300s,
    LLM_RESPONSE_CLOUD_OLLAMA) instead of the 180s local ceiling that
    cut off qwen3-vl:235b-cloud before its first chunk
  • cloud-without-fallback boot warning surfaced as a UI toast —
    previously log/bus-only; i18n in en/de/fr/es

Seven new contract suites (34 tests, 89 assertions). Two existing
tests adjusted with version history (v756 B4 TTL, v751 ALLOWED_RECEIVE
13 to 14). Events 489 to 491, test files 506 to 513. Verified Win
8200/0, Linux 8199/0; 429-failover path confirmed live in the Win
trace.

v7.9.11 — Windows shell fixes, KG-search TF-IDF, IdleMind thoughtCount persistence

25 May 21:04

Choose a tag to compare

Four small, scoped corrections that give Windows-Genesis back the ability to read its own source code without recovery detours, restore meaningful failure messages so the lessons pipeline can classify them, sharpen knowledge-graph search for everyone, and close the dashboard inconsistency where thoughtCount reset to zero on every restart. No new themes, no broad refactors — each fix targets a concrete bug observed in the 2026-05-25 Windows field-trace or one missed line in v7.9.10's IdleMindActivityStats payload.

The dashboard inconsistency the Win field-trace surfaced was "0 thoughts · idle 24min" rendered right next to "explore 5 · ideate 5 · reflect 4 · plan 4 · research 4" — 22 stored activities and a counter reading zero. Cause was straightforward: _saveActivityStats wrote activityCounts but never thoughtCount, so the constructor's this.thoughtCount = 0 survived every restart. The fix adds thoughtCount to the write payload and restores it on load. Legacy stats files from v7.9.10 and earlier fall back to sum(activityCounts.values()) — a lower bound (skip-cycles weren't counted at all pre-fix) but vastly better than the visible reset to zero. The counter remains "grossly accurate" by design, not bookkeeping-precise. thoughtCount++ runs in _think() before the skip-checks (user-active < 60s, homeostasis-block, low-energy), but _saveActivityStats only fires through _recordActivity after a successful activity run. Skip-cycles increment in memory without persisting — roughly 9% drift over a typical session. Genesis is not a ledger; the counter is a dashboard indicator and the comment in _saveActivityStats documents the trade-off so future readers don't expect exactness.

The Win field-trace also showed the SEARCH step in the "Reflect on Reflect.js" goal returning two irrelevant results out of five — a daily-digest idea node and a CognitiveWorkspace.js insight, neither of which referenced Reflect.js. The downstream ANALYZE step then produced a "Path Handling Vulnerability" hypothesis extrapolated from the wrong context. Pre-fix search() treated every query word equally (+2 for text match, +3 for label match), so a generic idea-node that happened to contain "reflect" outranked a specific insight tagged with properties.file = 'src/agent/autonomy/activities/Reflect.js'.

The new scoring runs in a single pass: pass one builds a per-node text cache while counting document frequency per query word, pass two scores with inverse document frequency so rare tokens (file names, specific identifiers) outweigh common words like "all", "from", "the". A file-token boost kicks in when the query contains a recognised extension pattern (X.js, X.ts, X.md, X.json, etc.): nodes whose properties.file matches gain +10, nodes with no file property at all that matched only via generic terms are demoted to 40% of their score. The IDF formula log(N / (1 + freq)) * 1.5 is clamped at a floor of 0.5 so it stays positive in the edge case where a small KG has the query word in every node — without the floor, learnFromText recall broke because the score went negative and the matching node fell out of "if (score > 0)".

Existing searchAsync calls search() internally as its keywordResults source, so the improvement flows into the hybrid keyword + vector ranking too (60/40 weighted). Empty queryWords (short-only queries like "a is the") preserve pre-fix behaviour exactly — the word loop runs zero times and scoring falls through to the recency + connectivity + access components, returning all nodes ranked by freshness. Performance at realistic KG sizes (33–50 nodes from the field-trace) is sub-millisecond. At 10000 nodes the single-pass implementation measures around 34ms, irrelevant for the current scale.

The visible Win field-trace symptom was the SHELL step in the same Reflect.js goal failing with cmd.exe's "syntax of the filename, directory name, or volume label is incorrect" error. Two bugs combined into one user-visible failure. The first: ShellOSAdapter.adaptCommand translated "cat src/agent/X.js" to "type src/agent/X.js" (correctly swapping the binary) but left the forward-slash path intact, so cmd.exe interpreted "/agent" as the switches /a /g /e /n /t and reported a syntax error. Genesis's plan then needed two extra recovery steps (a SEARCH workaround, then an ANALYZE second-attempt) to read what should have been a single SHELL invocation.

The fix adds an adaptPaths(cmd) helper called inside adaptCommand after the program-name swaps (so cat→type runs first) and before the find/grep canonicalisers (which produce their own cmd switches like /V /C). adaptPaths walks the command quote-aware, splits on whitespace, and runs a _looksLikePath classifier per token. Single-letter or short-word tokens after / are recognised as cmd switches and preserved (/V, /c, /verbose, /q, /e). Tokens starting with ./ or ../ are explicit relative paths and get converted. Tokens with a multi-segment letter+/letter+ pattern are paths and get converted. POSIX absolute system paths (/var, /etc, /usr, /tmp, /home, /root, /opt, /mnt, /sys, /proc, /dev) are preserved deliberately — they should fail loudly on Windows rather than be silently rewritten to a non-existent location. Protocol URLs (https://, file://, ftp://) are preserved. Quoted strings pass through unchanged.

Fifteen tests in v7911-shell-path-adapter.contract.test.js pin the behaviour, including six checks specifically for cmd-switch preservation (find /V /C ":", find /v /c "", xcopy /e /i src/foo dst/bar, rmdir /s /q somedir, dir /b, type /q somefile) so a future regex change can't silently break the find/xcopy/rmdir/dir family. All twenty-four existing shell-os-adapter.test.js tests remain green; the v7.5.4 find canonicaliser still rewrites "find /v /c """ to "find /V /C ":"" as before — the path adapter runs before it and doesn't touch the switches.

The second bug in the same field-trace symptom: even when cmd.exe ran successfully, its German error messages came back as replacement-character noise because cmd.exe writes its output in the active console codepage (cp850 on German Windows, cp437 on English Windows, sometimes cp1252) and Node's encoding: 'utf-8' mistook those bytes for UTF-8, producing U+FFFD replacement characters. The lesson pipeline saw classification-resistant strings full of replacement noise and couldn't tag them as 'structural' — the v7.9.10 widened stableClass gate caught them as 'unclassified' rather than dropping them, but the resulting lessons carried no useful semantic content for the SymbolicResolver to match against later.

New module src/agent/core/shell/WinConsoleEncoding.js provides three exports: detectConsoleCodepage() runs chcp once at boot to read the active codepage and caches the result; decodeWinConsole(buf, codepage?) decodes a Buffer from cmd.exe output to a UTF-8 JavaScript string; getCachedCodepage() is a synchronous accessor with a locale-default fallback (cp850 for DE/FR/ES/IT/PT, cp437 otherwise) used when detection hasn't completed yet. The module is no-op on non-Windows — detectConsoleCodepage() returns 'utf-8' instantly, decodeWinConsole simply passes strings through and returns empty for null/undefined.

Decoding uses iconv-lite (the only new dependency: 40 KB, zero transitive deps, the standard pattern in the Node ecosystem for OEM codepages). Node's built-in TextDecoder supports cp1252 but not cp850 or cp437 — the WHATWG Encoding Standard excludes the DOS codepages that cmd.exe defaults to, so iconv-lite is the only sensible choice. If the require fails in a minimal install, decoding falls back to latin1 (1:1 byte mapping, never throws, no U+FFFD noise — accented characters may be slightly off but surrounding ASCII reads correctly).

The pattern is applied at eight call sites that previously used execFileAsync with encoding utf-8: the shell, git-log, and git-diff tools in ToolRegistry; ShellAgent.run (both the shell-mode and execFile-mode success paths plus the error path); the git status and branch calls and the PowerShell file-count in ShellAgent's project-scan; and the SHELL-step fallback in AgentLoopSteps. Each site reads raw Buffer on Windows (encoding: isWin ? 'buffer' : 'utf-8') and runs decodeWinConsole before any .slice(...) — decoding before slicing matters because slicing a Buffer could cut mid-multibyte sequence and produce garbage characters at boundaries.

Boot-integration is a fire-and-forget detectConsoleCodepage() call at the end of AgentCoreBoot Phase 0. The promise resolves in roughly 50–200ms; shell tools invoked before resolution use the locale-default fallback, which is correct for the German/English Windows cohort that prompted the fix. Linux and macOS are unchanged — the if (isWin) guard skips the new path entirely.

Numbers: 8166 tests green on Windows in 55.3s, 8114 on Linux (35 unchanged upstream failures, primarily dompurify in UI tests), 41 hash-locked, 57 doc-drift claims matching live values, 12 service/module counts matching. Thirty-one new tests across four files: 3 thoughtCount, 5 KG-search TF-IDF, 15 shell path adapter, 8 WinConsoleEncoding. One new dependency: iconv-lite ^0.6.3 (40 KB, zero transitive). Documentation updated in README.md, ARCHITECTURE.md, docs/ARCHITECTURE-DEEP-DIVE.md, docs/CAPABILITIES.md, docs/banner.svg, docs/COMMUNICATION.md, docs/TROUBLESHOOTING.md. CHANGELOG split correctly per v7.5+ contract: only v7.9.11 in CHANGELOG.md, v7.9.10 moved to CHANGELOG-v7.md head.

v7.9.10 — Lessons-Pipeline live, Cloud-Continuation lifted, full FR/ES, faster Suite

25 May 17:20

Choose a tag to compare

Seven small, scoped corrections plus a documentation pass. The lessons-pipeline write-path was silently dropping LLM-verdict failure messages because the stableClass gate rejected anything bucketed as unclassified — every plan-failure-reflection with a verdict like "PARTIAL because the critical step failed..." matched no technical regex bucket and went nowhere. The gate now accepts unclassified when errorMessage is non-empty, and LessonsStore._save() runs on every record instead of every fifth one so short sessions actually persist.

The cloud-no-prefill continuation cap was a flat six rounds for every model. Local prefill-capable models complete reliably in 4-6 rounds, but cloud models using pseudo-continuation often need 8-10 rounds for code-with-manifest outputs and a 37591-character qwen3-vl:cloud output was truncated at attempt 6 in the field-trace. The cap decision is now extracted into computeEffectiveMaxContinuations: verified-prefill respects the caller's value, no-prefill/unverified/missing lift to a floor of 10.

The French and Spanish translation tables were 23 of 464 keys each, so non-English users saw mid-sentence English fallbacks like "Erreur Pipeline fallback chain empty". Both languages are now at full parity, enforced by parity tests in language.test.js so future additions to en force matching additions in fr and es or the suite fails. Trust-level labels follow the v7.9.9 three-level system in both languages.

The fuller translation exposed three latent UI bugs in the settings modal. _decorateField anchored on el.parentNode, but the function itself moves el into a setting-input-row on first decoration, so subsequent calls (language switch) saw the input-row as parent instead of the original setting-group. The cleanup query found nothing in the input-row, the original hint stayed in setting-group in its original language, and a new hint was appended to input-row — duplicate hints in mixed languages. The anchor is now el.closest('.setting-group'), which returns the same group regardless of how many times decoration has run. refreshSettingsI18n now also calls _decorateAllFields so JS-generated hints follow language switches, and the inline buildDefaultHint fallbacks are anglicised so early-boot renders no longer leak German.

LLMCapabilityDetector did not honor the GENESIS_OFFLINE_TESTS flag that OllamaBackend has respected since v7.8.4. Every bridge.chat() in a test that reached ModelBridgeContinuation triggered a real http.request to localhost:11434/api/show, and with Ollama not running, req.setTimeout(15000) held for the full 15 seconds before rejecting. v752-fix made roughly ten such calls — 150 seconds of pure timeout per test. A second issue compounded the first: ContinuationLoop ran exponential backoff (1s, 2s, 4s, 8s, 16s...) between retries, and mock backends that did not call onDone left doneReason=null, which TruncationDetector treats as truncated, so the loop iterated through the full schedule. At the new ten-round cap that is 511 seconds per affected test. Both methods of the detector now check the env flag and short-circuit, and BACKOFF_BASE_MS is env-aware (0ms in offline test mode, unchanged 1000ms in production). Linux suite walltime dropped from 273s to 193s; Windows from roughly four minutes to 54.9s.

The --test-timeout=2000 flag added to node:test args is defensive — measurement showed --test-force-exit alone already brings v737-dream-phases to 168ms, so the 10-second drain only happens without that flag, which the suite already passes. The added timeout costs nothing and protects against future regressions in the force-exit drain path, but is not the cause of the measurable speed-up.

The metabolism cost-warning threshold was 0.08, set when local models were the primary backend. A typical qwen3-vl:235b-cloud code-generation call computes 0.091 — above 0.08 — so the orange WARNUNG badge in the top app bar appeared on every routine cloud call instead of marking heavy-tail outliers. The threshold is raised to 0.12: cloud chat 5k/10s stays at 0.061, cloud code 10k/20s stays at 0.091, cloud heavy 20k/40s lands exactly at 0.120 (still no warning), cloud extreme 30k/60s lands at 0.137 (warning fires). The metabolism:cost event is still emitted unchanged for dashboard logging — only the UI badge is conservative.

Numbers: 8135 tests green on Windows in 54.9s, 8085 on Linux in 193s, 41 hash-locked, 8 strict audits green, 57 doc-drift claims matching live values. Nine documentation files updated to reflect the v7.9.7 through v7.9.10 architectural arc that earlier docs had not caught up with.