|
| 1 | +# Convergence Mechanics — How the Loop Compounds Instead of Drifts |
| 2 | + |
| 3 | +The $evolve loop only compounds when each cycle reads prior cycles' outcomes and lets them change behavior. Append-only ledgers that no step reads are write-only artifacts — they accumulate without compounding. |
| 4 | + |
| 5 | +This reference documents the four feedback mechanisms that turn raw cycle output into next-cycle behavior change. |
| 6 | + |
| 7 | +## Mechanism 1: Step 0 reads prior-failure surface |
| 8 | + |
| 9 | +In Step 0 (Setup), after `mkdir -p .agents/evolve`, the loop reads the last 3 entries of `cycle-history.jsonl`. For any entry where `gate` field contains a FAIL marker, it extracts the failure surface (e.g. "registry-check stale", "bats-tests goals-validate") and injects the matching learning before work selection. |
| 10 | + |
| 11 | +```bash |
| 12 | +last3=$(scripts/evolve-read-cycle-history.sh recent 3) # routes through BC3 LoopReaderPort (soc-y5vh.4) |
| 13 | +fail_surfaces=$(echo "$last3" | jq -r 'select(.gate | test("FAIL|FAILED|BLOCKED")) | .gate' 2>/dev/null) |
| 14 | +if [ -n "$fail_surfaces" ]; then |
| 15 | + # Search learnings for surface keywords; print whichever match |
| 16 | + keywords=$(echo "$fail_surfaces" | grep -oE 'registry|bats|markdown|supergate|canary|coverage|toolchain' | sort -u) |
| 17 | + for kw in $keywords; do |
| 18 | + ao lookup --query "$kw failure" --limit 2 2>/dev/null || \ |
| 19 | + find .agents/learnings -name "*$kw*.md" -mtime -30 | head -2 |
| 20 | + done |
| 21 | +fi |
| 22 | +``` |
| 23 | + |
| 24 | +Without this, the 2026-05-07 CI-toil learning sat for 7 days while 5 cycles re-hit the same `registry.json` non-determinism. Reading the learning at Step 0 would have surfaced the `git ls-files` fix on cycle 45. |
| 25 | + |
| 26 | +## Mechanism 2: Healing-first classifier |
| 27 | + |
| 28 | +Before measuring fitness and selecting work, the loop classifies the cycle: |
| 29 | + |
| 30 | +```bash |
| 31 | +# Healing-first classifier — routes through BC2 CIStatusPort |
| 32 | +# (cli/cmd/ao/ci_status_adapter.go, productionCIStatus) per soc-y5vh.2. |
| 33 | +# No inline gh shell-outs. |
| 34 | +last_ci=$(ao ci recent --limit 1 2>/dev/null | jq -r '.Conclusion // empty') |
| 35 | +if [ "$last_ci" = "failure" ]; then |
| 36 | + CYCLE_MODE="restorative" |
| 37 | + # Read failure surface, search for matching learning (see Mechanism 1). |
| 38 | + # Selection ladder downgrade: only allow harvested items typed |
| 39 | + # bug/fix/ci-failure. |
| 40 | +else |
| 41 | + CYCLE_MODE="feature" |
| 42 | +fi |
| 43 | +``` |
| 44 | + |
| 45 | +Restorative cycles ONLY take work that reduces CI red. New PG4 promotions, feature additions, doc growth — all blocked until `last_ci=success`. |
| 46 | + |
| 47 | +This eliminates the pattern of adding new evidence files onto a CI-red base. |
| 48 | + |
| 49 | +## Mechanism 3: Hypothesis tracking for skill changes |
| 50 | + |
| 51 | +When a cycle edits `skills/evolve/SKILL.md` (or `skills-codex/evolve/SKILL.md`), |
| 52 | +it MUST append to the hypothesis ledger through the typed BC3 |
| 53 | +`HypothesisLedgerPort` (soc-y5vh.8): |
| 54 | + |
| 55 | +```bash |
| 56 | +ao loop hypothesis append --id "H<cycle>.<patch>" --cycle-landed N --check-at-cycle $((N+15)) \ |
| 57 | + --patch "<one-line>" --hypothesis "<expected effect>" --measure "<how to verify>" |
| 58 | +``` |
| 59 | + |
| 60 | +This routes through `productionHypothesisLedger` instead of a raw append to |
| 61 | +`.agents/evolve/hypotheses.jsonl`; the port rejects empty and duplicate IDs. |
| 62 | +At `check_at_cycle`, the loop reads the ledger with `ao loop hypothesis list` |
| 63 | +(one JSON record per line), evaluates each PENDING row's `measure`, and |
| 64 | +writes the verdict (VERIFIED / FALSIFIED). Falsified hypotheses are |
| 65 | +revisited: either the patch is wrong, or the measurement was wrong. |
| 66 | + |
| 67 | +The `ao loop hypothesis` subcommands are runtime-agnostic — the same `ao` |
| 68 | +binary serves Claude Code and Codex; only the surrounding loop driver differs. |
| 69 | + |
| 70 | +Without this, skill-edit patches land unmeasured and silently inert — text in SKILL.md with no harness automation behind them. |
| 71 | + |
| 72 | +## Mechanism 4: Convergence criteria with a STOP |
| 73 | + |
| 74 | +`.agents/evolve/session-convergence.json` records the terminal state; the STOP |
| 75 | +decision is evaluated through the typed BC3 `ConvergenceCheckPort` (soc-y5vh.8): |
| 76 | + |
| 77 | +```bash |
| 78 | +ao loop converged --green-streak "$STREAK" --unconsumed-high-medium "$HM" --fitness-baseline |
| 79 | +# emits {converged, ci_green_streak, unconsumed_high_medium, fitness_baseline_captured, reasons} |
| 80 | +``` |
| 81 | + |
| 82 | +The predicate is pure — the loop supplies the evidence it already has |
| 83 | +(`ao ci recent` for the streak, the next-work findings count, the |
| 84 | +fitness-baseline flag). The criteria are met when all hold: |
| 85 | + |
| 86 | +- CI Validate green for the last 3 pushes (green streak ≥ 3) |
| 87 | +- HIGH+MEDIUM unconsumed next-work entries ≤ 1 |
| 88 | +- a fitness baseline has been captured |
| 89 | + |
| 90 | +When `ao loop converged` reports `converged: true`, the loop emits a teardown |
| 91 | +report and breaks the Step 7 loop — it does NOT re-enter Step 1. The |
| 92 | +autonomous loop is bounded by criteria, not by cycle count. `reasons` names |
| 93 | +every unmet criterion when `converged` is false. |
| 94 | + |
| 95 | +> Harness note: in Codex the loop is the Step 7 `while` loop, so convergence |
| 96 | +> means breaking that loop into Teardown. In the Claude Code harness the dual |
| 97 | +> mechanism is an end-of-turn `ScheduleWakeup` that simply is not re-armed. |
| 98 | +> Same intent — a criteria-bounded STOP — different harness primitive. |
| 99 | +
|
| 100 | +Without an explicit STOP, the loop drifts indefinitely. With STOP, it converges. |
| 101 | + |
| 102 | +## Anti-drift rules |
| 103 | + |
| 104 | +1. **Restorative-only after red.** Any cycle whose `gate` field has FAIL → cycle N+1 is restorative. |
| 105 | +2. **3 consecutive restorative without restoration → escalate.** Don't silently grind. |
| 106 | +3. **Scope shift resets the streak.** If the operator broadens the convergence target mid-session, reset the `ci-green-streak` counter to 0. |
| 107 | + |
| 108 | +## Why this is the load-bearing change |
| 109 | + |
| 110 | +A loop can write ~30 KB of bookkeeping per arc (cycle-history, learning, retro, evidence, hypotheses) and still produce ~0 compounded behavior — every cycle re-deriving a lesson an earlier cycle should have surfaced. The compounding lives in the read path, not the write path. These four mechanisms make the read path real. |
0 commit comments