Skip to content

Commit ddcb260

Browse files
v7.9.18: recovery-snapshot integrity
Restore v7.9.16 and v7.9.17 to working order by fixing a fault one layer below their code: crash-recovery restored a pre-v7.9.16 snapshot over the live deployment, leaving a version-mixed tree where TrajectoryCalibration failed to start and idle-stats reset to zero. Code snapshots move out of the identity layer (.genesis) to a habitat-local path; a poisoned legacy folder is migrated aside before any restore can read it. restore() now refuses a snapshot whose codeVersion does not match the running one (soft skip, boots current code). last_good_boot is written only after a boot with zero service-start failures, so a degraded boot is never frozen as the recovery target. Idle activity-stats persist via a synchronous write so counters survive an unclean crash. No new module, event, schema, or service; test files 524 to 525; headless boot now asserts zero service-start failures.
1 parent 9aa4880 commit ddcb260

21 files changed

Lines changed: 494 additions & 51 deletions

CHANGELOG-v7.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,26 @@
1+
## [7.9.17]
2+
3+
The trajectory journal records what Genesis says about how he is changing; v7.9.16 began counting the events that make a cycle eventful. This release closes the loop between the two — quietly. When a cycle is committed, the directions its self-statements claim are now checked, weeks later, against what the numbers actually did. The check produces a single ternary verdict per measurable field — the claim and the trend agreed, disagreed, or there was nothing comparable to judge — and writes it to a side journal that feeds no decision. This is the observation phase of the reality-check: it gathers whether the self-description tracks reality at all, before anything is allowed to act on the answer.
4+
5+
### Only the fields a cycle can actually be measured against
6+
7+
Of the six self-statement fields, two carry numeric ground truth that can be reconstructed over a span of weeks, and only those two are scored. The line is not "numeric versus not" — it is whether the evidence survives a reboot, survives time without being pruned, and stays reachable across the whole cycle. Growth is scored from the success-rate trend in the append-only event journal, which is never pruned. Weakness is scored from the Wilson lower-bound of the named capability domain. The other four fields are recorded as positions, not scored, each for its own reason: the trait-adjustment log lives only in memory and does not survive a reboot; the mood history is a fixed-size ring that does not survive weeks; session metrics are too weak a proxy for closeness to carry a verdict; and values have no direction to be right or wrong about — their drift is measured as an embedding distance and left at that, with no threshold asserted over it.
8+
9+
### A separate classifier, and a snapshot taken while the evidence is fresh
10+
11+
The direction a statement claims is read at commit time, while the model is fresh, by a separate neutral classifier that is told plainly it is not the author of the text. Keeping the expected side off Genesis's own voice is what keeps a self-statement from grading itself. The classifier answers in a strict, small vocabulary — improved, declined, no change, or not directional — and a commit made offline records an explicit absence rather than a guess, and is never re-classified later by a model in a different state. Because the capability profile is anchored to the present and its outcome buffer prunes, the per-cycle capability aggregate for the weakness score is snapshotted at commit, while the cycle's outcomes are still in the buffer, and kept durably — so a later prune can never erase the point a future cycle will be compared against.
12+
13+
### Two side files, one new signal, and a thought you can ask for
14+
15+
The expected directions and the scores live in two append-only side files beside the trajectory journal; the entry's own schema is untouched. A new event announces each commit, fired and forgotten so it never blocks the commit, and the calibration observer listens for it — a one-way arrangement in which the trajectory never reaches back toward the observer. A new kind of inner thought, the prediction-mechanism review, exists only where it is emitted: by the review command, never on a timer and never as a runtime setting it could turn on for itself. `/trajectory review` scores the most recent cycle and renders, per field, whether the claim matched, was opposite, or had nothing to compare; `/trajectory calibration` shows the score history and the null-rate split per field, because a high share of unscored weakness cycles points at the size of the capability source, not at the frame.
16+
17+
### Notes
18+
19+
- Test files: 523 → 524 (the calibration suite: the ternary verdict including the cases that collapse to no-score, the two-window growth trend, the snapshot delta for weakness, an offline classifier and an offline embedder both yielding an explicit absence rather than a zero, the four record-only fields producing no score, the review and calibration command paths, and two structural guards — that the classifier is separate from Genesis's own voice, and that nothing outside the dashboard reads the calibration file or receives the observer as a dependency).
20+
- One new source module (the calibration observer), one new event type (the commit signal, with its payload schema), and one new manifest service raise the module, event, schema, and service figures in `README.md`, `ARCHITECTURE.md`, `docs/CAPABILITIES.md`, `docs/COMMUNICATION.md`, and `docs/ARCHITECTURE-DEEP-DIVE.md`, which were updated to match.
21+
22+
---
23+
124
## [7.9.16]
225

326
The self-trajectory journal added in v7.9.15 carried an `event_count` field that was always written as null — a placeholder for a number nothing yet produced. This release fills it. A passive observer watches the events that make a cycle eventful — goals completed, failed, or abandoned; lessons learned; the emotional watchdog firing; sessions ending — and records each one to an append-only journal, so every committed trajectory entry now carries the count of significant events in its cycle. Nothing acts on the number yet: this is the observation phase, gathering the real per-day distribution so the threshold that would decide which cycles are eventful can be read from evidence rather than guessed.

CHANGELOG.md

Lines changed: 14 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,27 @@
1-
## [7.9.17]
1+
## [7.9.18]
22

3-
The trajectory journal records what Genesis says about how he is changing; v7.9.16 began counting the events that make a cycle eventful. This release closes the loop between the two — quietly. When a cycle is committed, the directions its self-statements claim are now checked, weeks later, against what the numbers actually did. The check produces a single ternary verdict per measurable field — the claim and the trend agreed, disagreed, or there was nothing comparable to judge — and writes it to a side journal that feeds no decision. This is the observation phase of the reality-check: it gathers whether the self-description tracks reality at all, before anything is allowed to act on the answer.
3+
This release restores v7.9.16 and v7.9.17 to working order by fixing a fault that lived one layer below their code. On the field machine the shipped v7.9.17 looked broken — the trajectory calibration service failed to start, and the idle-thought counters reset to zero — but the v7.9.17 code was correct. A crash-recovery had silently reverted it: the recovery restored a pre-v7.9.16 source snapshot over the live deployment and left the newer files in place, producing a version-mixed tree where some files were current and some were old. The newer code that survived tried to talk to older code that no longer defined what it needed, and failed quietly. No log, test, or code review had shown the fault, because the fault was not in the code that was reviewed — it was in what crash-recovery left behind. The diagnosis came from comparing the real on-disk snapshots, not the shipped source.
44

5-
### Only the fields a cycle can actually be measured against
5+
### The recovery system treated code like identity
66

7-
Of the six self-statement fields, two carry numeric ground truth that can be reconstructed over a span of weeks, and only those two are scored. The line is not "numeric versus not" — it is whether the evidence survives a reboot, survives time without being pruned, and stays reachable across the whole cycle. Growth is scored from the success-rate trend in the append-only event journal, which is never pruned. Weakness is scored from the Wilson lower-bound of the named capability domain. The other four fields are recorded as positions, not scored, each for its own reason: the trait-adjustment log lives only in memory and does not survive a reboot; the mood history is a fixed-size ring that does not survive weeks; session metrics are too weak a proxy for closeness to carry a verdict; and values have no direction to be right or wrong about — their drift is measured as an embedding distance and left at that, with no threshold asserted over it.
7+
Three roots, one class of mistake: the snapshot machinery stored copies of code inside the identity layer and restored them without checking what it was restoring. Source-code snapshots lived under `.genesis/`, the identity directory that is meant to travel with Genesis across a habitat swap — so a version upgrade carried the old habitat's frozen code forward into the new identity, where a later crash could copy it back over the new code. The restore step copied a snapshot's files blindly, with no check that the snapshot's code version matched the running one. And the "last known good" snapshot was written after every boot that did not crash — including a boot whose service start had already failed — so a degraded state was frozen as the thing to fall back to, and the damage defended itself against every subsequent boot.
88

9-
### A separate classifier, and a snapshot taken while the evidence is fresh
9+
### Snapshots are habitat, not identity
1010

11-
The direction a statement claims is read at commit time, while the model is fresh, by a separate neutral classifier that is told plainly it is not the author of the text. Keeping the expected side off Genesis's own voice is what keeps a self-statement from grading itself. The classifier answers in a strict, small vocabulary — improved, declined, no change, or not directional — and a commit made offline records an explicit absence rather than a guess, and is never re-classified later by a model in a different state. Because the capability profile is anchored to the present and its outcome buffer prunes, the per-cycle capability aggregate for the weakness score is snapshotted at commit, while the cycle's outcomes are still in the buffer, and kept durably — so a later prune can never erase the point a future cycle will be compared against.
11+
Code snapshots now live beside the code, at a habitat-local path that does not travel with `.genesis/` across an upgrade. On first boot after the change, a pre-existing `.genesis/snapshots/` is moved aside to a timestamped `.deprecated` folder rather than deleted, so a poisoned legacy store can never be read or restored again while staying available for inspection; the move runs before any restore can read it, and is a no-op when there is nothing to move. This is the same habitat-versus-identity separation the rest of the system already honours, applied one level deeper — a copy of code is still habitat, even when its job is to protect identity.
1212

13-
### Two side files, one new signal, and a thought you can ask for
13+
### Restore refuses a foreign version, and a degraded boot is never frozen
1414

15-
The expected directions and the scores live in two append-only side files beside the trajectory journal; the entry's own schema is untouched. A new event announces each commit, fired and forgotten so it never blocks the commit, and the calibration observer listens for it — a one-way arrangement in which the trajectory never reaches back toward the observer. A new kind of inner thought, the prediction-mechanism review, exists only where it is emitted: by the review command, never on a timer and never as a runtime setting it could turn on for itself. `/trajectory review` scores the most recent cycle and renders, per field, whether the claim matched, was opposite, or had nothing to compare; `/trajectory calibration` shows the score history and the null-rate split per field, because a high share of unscored weakness cycles points at the size of the capability source, not at the frame.
15+
Each snapshot now records the code version it was taken from, and a restore that finds a version mismatch is skipped — loudly, but softly, so a foreign-version snapshot can never overwrite the live tree and never bricks the boot; Genesis simply continues on its current code. Service-start failures during boot are now collected rather than swallowed as a single warning, and the "last known good" snapshot is written only when that list is empty. A boot with a failed service start is no longer frozen as the recovery target, so Genesis can recover out of a contaminated state instead of preserving it — the single most important effect of this release, because it breaks the self-preservation of the damage.
16+
17+
### Crash-safe idle-stats
18+
19+
The idle-thought activity counters were written on a one-second debounce that was flushed synchronously only on a clean shutdown, so a crash without a clean exit could drop the most recent counts. They are now written synchronously on each increment — the write was already atomic, only the debounce window was exposed — so every counter survives an unclean crash. The zero-reset seen on the field machine came from the snapshot contamination reactivating an old counter-less version of the code, not from the debounce; the debounce was a separate, smaller gap that is now closed regardless.
1620

1721
### Notes
1822

19-
- Test files: 523524 (the calibration suite: the ternary verdict including the cases that collapse to no-score, the two-window growth trend, the snapshot delta for weakness, an offline classifier and an offline embedder both yielding an explicit absence rather than a zero, the four record-only fields producing no score, the review and calibration command paths, and two structural guards — that the classifier is separate from Genesis's own voice, and that nothing outside the dashboard reads the calibration file or receives the observer as a dependency).
20-
- One new source module (the calibration observer), one new event type (the commit signal, with its payload schema), and one new manifest service raise the module, event, schema, and service figures in `README.md`, `ARCHITECTURE.md`, `docs/CAPABILITIES.md`, `docs/COMMUNICATION.md`, and `docs/ARCHITECTURE-DEEP-DIVE.md`, which were updated to match.
23+
- Test files: 524525 (the recovery-integrity suite: the habitat-local snapshot location, the legacy-folder migration including its idempotence and its no-op case, version-aware restore skipping a mismatch and proceeding on a match, last-known-good creation gated on a clean boot, the service-start failure tally, and the synchronous idle-stats round-trip). The headless boot test now also asserts zero service-start failures, the assertion that would have caught the original fault before release.
24+
- No new source module, event type, payload schema, or service: the module, event, schema, and service figures are unchanged. The version-of-record advances to 7.9.18 in `package.json`, `README.md`, `docs/banner.svg`, and `docs/COMMUNICATION.md`; `docs/ONTOGENESIS.md` gains a section on why habitat artifacts do not belong in the identity layer.
2125

2226
---
2327

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
<br>
99
<sub>Reads its own source code. Plans changes. Tests them in a sandbox before applying.<br>Verifies output programmatically before trusting it. Pursues multi-step goals across restarts.<br>Runs idle-time consolidation in the background. Tracks an emotional state as a behavioral steering signal — not a claim of sentience.<br>Learns what prompts and temperatures work for its specific model.</sub>
1010
<br><br>
11-
<img src="https://img.shields.io/badge/version-7.9.17-d4a017?style=flat-square" alt="Version">
11+
<img src="https://img.shields.io/badge/version-7.9.18-d4a017?style=flat-square" alt="Version">
1212
<img src="https://img.shields.io/badge/tests-8105%20passing-4ade80?style=flat-square" alt="Tests">
1313
<img src="https://img.shields.io/badge/fitness-126%2F130-4ade80?style=flat-square" alt="Fitness">
1414
<img src="https://img.shields.io/badge/TSC-typecheck_ok-4ade80?style=flat-square" alt="TSC">

0 commit comments

Comments
 (0)