2121** Forge Core with first-class drivers for [ Claude Code] ( https://docs.anthropic.com/en/docs/claude-code ) and Codex/manual workflows.**
2222
2323[ ![ License: MIT] ( https://img.shields.io/badge/License-MIT-blue.svg )] ( LICENSE )
24- [ ![ Version] ( https://img.shields.io/badge/version-0.4.2 -green.svg )] ( CHANGELOG.md )
24+ [ ![ Version] ( https://img.shields.io/badge/version-0.5.0 -green.svg )] ( CHANGELOG.md )
2525
26- Forge is a protocol plus an adapter . The protocol defines KPI tracking , state, strategy rotation, evaluation cadence, and completion rules. The bundled adapter makes that protocol run inside Claude Code with commands, agents, and a stop hook .
26+ Forge is a protocol plus adapters . The protocol defines task success, KPI guardrails , state, strategy rotation, evaluation cadence, and completion rules. The bundled drivers make that protocol run inside Claude Code and Codex/manual workflows .
2727
2828```
29- You: /forge "API controllers " --coverage 90 --speed -30%
29+ You: /forge "password reset flow" --done-when "users can request and complete a reset end-to-end " --coverage 90 --speed -30%
3030
3131Forge: Measuring baseline... 85.2% coverage, 120s
32+ Success contract: password reset works end-to-end
3233 Strategy: coverage-push → 15 tests for edge cases
3334 85.8% (+0.6%), 118s (-2s) ✓
34- ...iterates until all targets met simultaneously ...
35+ ...iterates until task success and KPI targets are both satisfied ...
3536```
3637
3738---
@@ -43,6 +44,7 @@ Forge: Measuring baseline... 85.2% coverage, 120s
4344The portable part of the system:
4445
4546- iteration protocol (Orient → Measure → Evaluate → Decide → Execute → Verify → Record → Complete)
47+ - task-driven success contract with optional explicit ` done_when `
4648- state format and autoregressive memory
4749- KPI targets (coverage, speed, quality)
4850- strategy selection and stagnation handling
@@ -71,7 +73,7 @@ The bundled Codex/manual adapter in this repo:
7173- ` .codex/forge/ ` state layout for per-project sessions
7274- shared shell state helpers reused across drivers
7375
74- Both drivers are first-class in ` v0.4.2 ` . The difference is automation depth:
76+ Both drivers are first-class in ` v0.5.0 ` . The difference is automation depth:
7577Claude gets hook-driven iteration; Codex gets manual driver scripts that print
7678the next prompt and manage session state.
7779
@@ -83,7 +85,7 @@ the next prompt and manage session state.
8385| Codex CLI | First-class manual driver | Install script, ` forge-init ` , ` forge-continue ` , ` forge-cancel ` , project-local state |
8486| Other agents / plain shell | Protocol-only | Reuse the protocol and state model manually |
8587
86- Forge is not claiming native parity across agent runtimes. ` v0.4.2 ` ships two real drivers with different control surfaces.
88+ Forge is not claiming native parity across agent runtimes. ` v0.5.0 ` ships two real drivers with different control surfaces.
8789
8890---
8991
@@ -105,14 +107,24 @@ Each iteration executes one complete eight-phase cycle:
105107
106108| Phase | What happens |
107109| -------| -------------|
108- | ** A. Orient** | Read forge-state file, check position + trends + stagnation count |
110+ | ** A. Orient** | Read forge-state file, check task success contract + KPI trends + stagnation count |
109111| ** B. Measure** | Run tests with coverage, capture KPIs |
110112| ** C. Evaluate** | Every 3rd iteration: spawn fresh-context subagent for unbiased audit |
111113| ** D. Decide** | Pick strategy from KPI gaps + findings + lessons |
112114| ** E. Execute** | Apply ONE focused transformation |
113115| ** F. Verify** | Tests must be green, re-measure KPIs |
114116| ** G. Record** | Update forge-state with deltas + lessons (the autoregressive step) |
115- | ** H. Complete** | All targets met simultaneously? Done. Otherwise, next iteration. |
117+ | ** H. Complete** | Task success contract satisfied and KPI targets met? Done. Otherwise, next iteration. |
118+
119+ ### Success Contract
120+
121+ Forge is built for open-text work, not just KPI chasing.
122+
123+ - The task scope is the primary objective.
124+ - ` --done-when "TEXT" ` is an optional explicit success override.
125+ - If ` --done-when ` is omitted, Forge derives concrete completion checks from the task scope and records them in Forge state.
126+ - Coverage, speed, and quality stay as guardrails alongside the task itself.
127+ - Completion means both the task and the guardrails are satisfied.
116128
117129### Strategies
118130
@@ -183,7 +195,7 @@ Codex support is manual by design, but it is now a real shipped driver.
183195
184196Typical flow:
185197
186- 1 . Run ` forge-init "scope" ... ` in the target project.
198+ 1 . Run ` forge-init "scope" [--done-when "TEXT"] ... ` in the target project.
1871992 . Paste the printed prompt into Codex.
1882003 . After each iteration, run ` forge-continue ` to print the next prompt.
1892014 . Use ` forge-status ` to inspect the active session.
@@ -209,15 +221,22 @@ Driver safety:
209221/forge "LiveView components" --coverage 95 --speed -20%
210222```
211223
224+ #### Open-text task with explicit success
225+
226+ ```
227+ /forge "password reset flow" --done-when "users can request, receive, and complete a reset end-to-end" --coverage 90 --quality strict
228+ ```
229+
212230#### All options
213231
214232```
215- /forge "SCOPE" --coverage N --speed -N% --quality strict|moderate|lax --max-iterations N
233+ /forge "SCOPE" [--done-when "TEXT"] --coverage N --speed -N% --quality strict|moderate|lax --max-iterations N
216234```
217235
218236| Option | Default | Description |
219237| --------| ---------| -------------|
220238| ` SCOPE ` | (required) | What to improve — quoted string |
239+ | ` --done-when "TEXT" ` | task-derived | Explicit success contract. If omitted, derive completion checks from the task itself |
221240| ` --coverage N ` | baseline + 2 | Minimum coverage % target |
222241| ` --speed -N% ` | -20% | Speed reduction from baseline |
223242| ` --quality ` | moderate | strict (0 high, 0 med) / moderate (0 high, ≤3 med) / lax (0 high, ≤5 med) |
@@ -253,6 +272,13 @@ Other runtimes can reuse the same format in a different state root. Each iterati
253272---
254273session_id : " 0320-1430-a3b2"
255274scope : " API controllers"
275+ success :
276+ mode : " task-derived"
277+ task : " API controllers"
278+ done_when : null
279+ completion_checks :
280+ - " controller edge cases covered and passing"
281+ - " no controller path regresses current behavior"
256282baseline :
257283 coverage : 85.2
258284 speed_seconds : 120
@@ -347,7 +373,7 @@ Distilled from studying autoresearch, Ralph Wiggum, pi-autoresearch, SICA, and a
347373| Strategy | Single prompt | 8 named strategies, auto-rotation on stagnation |
348374| Evaluation | Self-evaluation (anchoring bias) | Fresh-context audits every 3 iterations |
349375| Memory | Context window only | Persistent state file survives compaction |
350- | Completion | Manual / hope | Exact completion marker after protocol checks |
376+ | Completion | Manual / hope | Exact completion marker after task success plus protocol checks |
351377| Lessons | Lost between iterations | Accumulated, inform strategy selection |
352378| Stagnation | Repeats same approach | Detects + rotates after low-delta iterations |
353379| Portability | Rebuild per runtime | Portable protocol, Claude and Codex drivers bundled |
@@ -357,7 +383,7 @@ Distilled from studying autoresearch, Ralph Wiggum, pi-autoresearch, SICA, and a
357383## Claims We Are Willing To Make
358384
359385- Forge packages proven loop patterns into a reusable protocol with first-class Claude Code and Codex/manual drivers.
360- - Forge improves repeatability versus ad-hoc prompting when you care about KPI targets , iteration memory, and strategy rotation.
386+ - Forge improves repeatability versus ad-hoc prompting when you care about task success, KPI guardrails , iteration memory, and strategy rotation.
361387- Forge does ** not** yet provide universal runtime adapter parity beyond the shipped drivers.
362388- Forge is more preconfigured than raw hooks. It is not a new primitive.
363389
0 commit comments