DjinnFoundry
diff --git a/‎CHANGELOG.md‎
Lines changed: 11 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 11 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 38 additions & 12 deletions b/‎README.md‎
Lines changed: 38 additions & 12 deletions
diff --git a/‎VERSION‎
Lines changed: 1 addition & 1 deletion b/‎VERSION‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎commands/forge-status.md‎
Lines changed: 3 additions & 0 deletions b/‎commands/forge-status.md‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎commands/forge.md‎
Lines changed: 18 additions & 12 deletions b/‎commands/forge.md‎
Lines changed: 18 additions & 12 deletions
diff --git a/‎drivers/codex/README.md‎
Lines changed: 16 additions & 0 deletions b/‎drivers/codex/README.md‎
Lines changed: 16 additions & 0 deletions
diff --git a/‎drivers/codex/bin/forge-continue‎
Lines changed: 5 additions & 4 deletions b/‎drivers/codex/bin/forge-continue‎
Lines changed: 5 additions & 4 deletions
diff --git a/‎drivers/codex/bin/forge-init‎
Lines changed: 22 additions & 3 deletions b/‎drivers/codex/bin/forge-init‎
Lines changed: 22 additions & 3 deletions
@@ -5,6 +5,16 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [0.5.0] - 2026-03-20
+
+### Added
+- Optional explicit `--done-when "TEXT"` success contract for Forge task scopes
+
+### Changed
+- Forge now treats open-text task completion as the primary objective and KPI targets as guardrails
+- Codex driver status and prompt rendering now surface success mode and `done_when` text
+- README and driver docs now describe task-derived completion checks when no explicit success override is provided
+
 ## [0.4.2] - 2026-03-20
 
 ### Added
@@ -99,6 +109,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Multi-language support in MEASURE phase (Elixir, Python, JavaScript, Ruby, Go)
 - Simultaneous multi-KPI completion gate
 
+[0.5.0]: https://github.com/DjinnFoundry/forge-loop/releases/tag/v0.5.0
 [0.4.2]: https://github.com/DjinnFoundry/forge-loop/releases/tag/v0.4.2
 [0.4.1]: https://github.com/DjinnFoundry/forge-loop/releases/tag/v0.4.1
 [0.4.0]: https://github.com/DjinnFoundry/forge-loop/releases/tag/v0.4.0
 
@@ -21,17 +21,18 @@
 **Forge Core with first-class drivers for [Claude Code](https://docs.anthropic.com/en/docs/claude-code) and Codex/manual workflows.**
 
 [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
-[![Version](https://img.shields.io/badge/version-0.4.2-green.svg)](CHANGELOG.md)
+[![Version](https://img.shields.io/badge/version-0.5.0-green.svg)](CHANGELOG.md)
 
-Forge is a protocol plus an adapter. The protocol defines KPI tracking, state, strategy rotation, evaluation cadence, and completion rules. The bundled adapter makes that protocol run inside Claude Code with commands, agents, and a stop hook.
+Forge is a protocol plus adapters. The protocol defines task success, KPI guardrails, state, strategy rotation, evaluation cadence, and completion rules. The bundled drivers make that protocol run inside Claude Code and Codex/manual workflows.
 
 ```
-You: /forge "API controllers" --coverage 90 --speed -30%
+You: /forge "password reset flow" --done-when "users can request and complete a reset end-to-end" --coverage 90 --speed -30%
 
 Forge: Measuring baseline... 85.2% coverage, 120s
+       Success contract: password reset works end-to-end
        Strategy: coverage-push → 15 tests for edge cases
        85.8% (+0.6%), 118s (-2s) ✓
-       ...iterates until all targets met simultaneously...
+       ...iterates until task success and KPI targets are both satisfied...
 ```
 
 ---
@@ -43,6 +44,7 @@ Forge: Measuring baseline... 85.2% coverage, 120s
 The portable part of the system:
 
 - iteration protocol (Orient → Measure → Evaluate → Decide → Execute → Verify → Record → Complete)
+- task-driven success contract with optional explicit `done_when`
 - state format and autoregressive memory
 - KPI targets (coverage, speed, quality)
 - strategy selection and stagnation handling
@@ -71,7 +73,7 @@ The bundled Codex/manual adapter in this repo:
 - `.codex/forge/` state layout for per-project sessions
 - shared shell state helpers reused across drivers
 
-Both drivers are first-class in `v0.4.2`. The difference is automation depth:
+Both drivers are first-class in `v0.5.0`. The difference is automation depth:
 Claude gets hook-driven iteration; Codex gets manual driver scripts that print
 the next prompt and manage session state.
 
@@ -83,7 +85,7 @@ the next prompt and manage session state.
 | Codex CLI | First-class manual driver | Install script, `forge-init`, `forge-continue`, `forge-cancel`, project-local state |
 | Other agents / plain shell | Protocol-only | Reuse the protocol and state model manually |
 
-Forge is not claiming native parity across agent runtimes. `v0.4.2` ships two real drivers with different control surfaces.
+Forge is not claiming native parity across agent runtimes. `v0.5.0` ships two real drivers with different control surfaces.
 
 ---
 
@@ -105,14 +107,24 @@ Each iteration executes one complete eight-phase cycle:
 
 | Phase | What happens |
 |-------|-------------|
-| **A. Orient** | Read forge-state file, check position + trends + stagnation count |
+| **A. Orient** | Read forge-state file, check task success contract + KPI trends + stagnation count |
 | **B. Measure** | Run tests with coverage, capture KPIs |
 | **C. Evaluate** | Every 3rd iteration: spawn fresh-context subagent for unbiased audit |
 | **D. Decide** | Pick strategy from KPI gaps + findings + lessons |
 | **E. Execute** | Apply ONE focused transformation |
 | **F. Verify** | Tests must be green, re-measure KPIs |
 | **G. Record** | Update forge-state with deltas + lessons (the autoregressive step) |
-| **H. Complete** | All targets met simultaneously? Done. Otherwise, next iteration. |
+| **H. Complete** | Task success contract satisfied and KPI targets met? Done. Otherwise, next iteration. |
+
+### Success Contract
+
+Forge is built for open-text work, not just KPI chasing.
+
+- The task scope is the primary objective.
+- `--done-when "TEXT"` is an optional explicit success override.
+- If `--done-when` is omitted, Forge derives concrete completion checks from the task scope and records them in Forge state.
+- Coverage, speed, and quality stay as guardrails alongside the task itself.
+- Completion means both the task and the guardrails are satisfied.
 
 ### Strategies
 
@@ -183,7 +195,7 @@ Codex support is manual by design, but it is now a real shipped driver.
 
 Typical flow:
 
-1. Run `forge-init "scope" ...` in the target project.
+1. Run `forge-init "scope" [--done-when "TEXT"] ...` in the target project.
 2. Paste the printed prompt into Codex.
 3. After each iteration, run `forge-continue` to print the next prompt.
 4. Use `forge-status` to inspect the active session.
@@ -209,15 +221,22 @@ Driver safety:
 /forge "LiveView components" --coverage 95 --speed -20%
 ```
 
+#### Open-text task with explicit success
+
+```
+/forge "password reset flow" --done-when "users can request, receive, and complete a reset end-to-end" --coverage 90 --quality strict
+```
+
 #### All options
 
 ```
-/forge "SCOPE" --coverage N --speed -N% --quality strict|moderate|lax --max-iterations N
+/forge "SCOPE" [--done-when "TEXT"] --coverage N --speed -N% --quality strict|moderate|lax --max-iterations N
 ```
 
 | Option | Default | Description |
 |--------|---------|-------------|
 | `SCOPE` | (required) | What to improve — quoted string |
+| `--done-when "TEXT"` | task-derived | Explicit success contract. If omitted, derive completion checks from the task itself |
 | `--coverage N` | baseline + 2 | Minimum coverage % target |
 | `--speed -N%` | -20% | Speed reduction from baseline |
 | `--quality` | moderate | strict (0 high, 0 med) / moderate (0 high, ≤3 med) / lax (0 high, ≤5 med) |
@@ -253,6 +272,13 @@ Other runtimes can reuse the same format in a different state root. Each iterati
 ---
 session_id: "0320-1430-a3b2"
 scope: "API controllers"
+success:
+  mode: "task-derived"
+  task: "API controllers"
+  done_when: null
+  completion_checks:
+    - "controller edge cases covered and passing"
+    - "no controller path regresses current behavior"
 baseline:
   coverage: 85.2
   speed_seconds: 120
@@ -347,7 +373,7 @@ Distilled from studying autoresearch, Ralph Wiggum, pi-autoresearch, SICA, and a
 | Strategy | Single prompt | 8 named strategies, auto-rotation on stagnation |
 | Evaluation | Self-evaluation (anchoring bias) | Fresh-context audits every 3 iterations |
 | Memory | Context window only | Persistent state file survives compaction |
-| Completion | Manual / hope | Exact completion marker after protocol checks |
+| Completion | Manual / hope | Exact completion marker after task success plus protocol checks |
 | Lessons | Lost between iterations | Accumulated, inform strategy selection |
 | Stagnation | Repeats same approach | Detects + rotates after low-delta iterations |
 | Portability | Rebuild per runtime | Portable protocol, Claude and Codex drivers bundled |
@@ -357,7 +383,7 @@ Distilled from studying autoresearch, Ralph Wiggum, pi-autoresearch, SICA, and a
 ## Claims We Are Willing To Make
 
 - Forge packages proven loop patterns into a reusable protocol with first-class Claude Code and Codex/manual drivers.
-- Forge improves repeatability versus ad-hoc prompting when you care about KPI targets, iteration memory, and strategy rotation.
+- Forge improves repeatability versus ad-hoc prompting when you care about task success, KPI guardrails, iteration memory, and strategy rotation.
 - Forge does **not** yet provide universal runtime adapter parity beyond the shipped drivers.
 - Forge is more preconfigured than raw hooks. It is not a new primitive.
 
 
@@ -1 +1 @@
-0.4.2
+0.5.0
@@ -22,6 +22,9 @@ Show the current Claude Code driver status for this project.
    - forge-state path if present
 5. If the forge-state file exists, also show:
    - scope
+   - success mode
+   - done_when text, or that it is task-derived if no explicit override exists
    - current strategy
    - stagnation_count
+   - whether completion checks have been recorded yet if the success block is present
 6. Do not mutate any files. This command is read-only.
@@ -1,6 +1,6 @@
 ---
-description: "KPI-driven codebase improvement loop"
-argument-hint: '"SCOPE" --coverage N --speed -N% --quality strict|moderate|lax [--max-iterations N]'
+description: "Task-driven Forge loop with KPI guardrails"
+argument-hint: '"SCOPE" [--done-when "TEXT"] --coverage N --speed -N% --quality strict|moderate|lax [--max-iterations N]'
 ---
 
 # Forge Command
@@ -15,28 +15,33 @@ This command is the Claude Code driver for Forge Core.
 
 ## Argument Parsing
 
-1. **SCOPE**: Quoted string describing what to improve (e.g., "LiveView components", "API controllers")
-2. `--coverage N`: Minimum coverage % target (default: current baseline + 2)
-3. `--speed -N%`: Speed reduction target as percentage (default: -20% from baseline)
-4. `--quality strict|moderate|lax`: Quality gate level (default: moderate)
+1. **SCOPE**: Quoted string describing the primary task or area to improve (e.g., "Build password reset flow", "LiveView components")
+2. `--done-when "TEXT"`: Optional explicit success override. If omitted, derive concrete completion checks from the task scope and persist them in forge-state.
+3. `--coverage N`: Minimum coverage % target (default: current baseline + 2)
+4. `--speed -N%`: Speed reduction target as percentage (default: -20% from baseline)
+5. `--quality strict|moderate|lax`: Quality gate level (default: moderate)
    - strict: 0 high, 0 medium findings
    - moderate: 0 high, <= 3 medium
    - lax: 0 high, <= 5 medium
-5. `--max-iterations N`: Safety limit (default: 20)
+6. `--max-iterations N`: Safety limit (default: 20)
 
 If SCOPE is missing, ask what area to focus on.
 
 ## Launch Sequence
 
 1. **Measure baseline**: Run test suite with coverage, parse coverage/speed/tests/failures
 2. **Generate session ID**: `MMDD-HHMM-XXXX` format
-3. **Compute targets** from arguments:
+3. **Establish success contract**:
+   - primary task: `SCOPE`
+   - explicit success override: `--done-when "TEXT"` if provided
+   - otherwise: derive concrete completion checks from the task itself during iteration 1 and persist them in forge-state
+4. **Compute targets** from arguments:
    - coverage: `--coverage N` if provided, else `baseline_coverage + 2`
    - speed: `--speed -N%` if provided, compute `baseline_speed * (1 - N/100)`, else `baseline_speed * 0.8`
    - quality: `--quality` value or "moderate"
-4. **Create forge state file**: `.claude/forge-state.SESSION.md` with baseline + targets
-5. **Create loop state file**: `.claude/ralph-loop.SESSION.local.md` with forge prompt
-6. **Report** baseline, targets, and begin first iteration
+5. **Create forge state file**: `.claude/forge-state.SESSION.md` with success contract + baseline + targets
+6. **Create loop state file**: `.claude/ralph-loop.SESSION.local.md` with forge prompt
+7. **Report** baseline, targets, and begin first iteration
 
 ## Loop Prompt (written to state file)
 
@@ -45,6 +50,7 @@ Read .claude/forge-state.SESSION.md and follow The Forge Protocol (A through H).
 
 SCOPE: {parsed scope}
 SESSION: {session_id}
+DONE WHEN: {explicit done_when or derive from task and record in forge-state}
 
 You are in a forge loop. Each iteration:
 A. ORIENT - Read forge-state, check position + trends + stagnation
@@ -54,7 +60,7 @@ D. DECIDE - Pick strategy from KPI gaps + findings + lessons
 E. EXECUTE - ONE focused change using appropriate subagent
 F. VERIFY - Tests must be green, re-measure with coverage
 G. RECORD - Update forge-state with deltas + lessons (autoregressive step)
-H. COMPLETE - ALL targets met simultaneously? → output RALPH_COMPLETE on its own line
+H. COMPLETE - Task success contract satisfied AND KPI targets met? → output RALPH_COMPLETE on its own line
 
 Refer to the forge skill for the full protocol.
 
 
@@ -6,6 +6,20 @@ It reuses Forge Core, but does not depend on Claude Code commands or stop hooks.
 Instead it ships small shell entrypoints that manage loop state and print the
 next prompt to run in Codex.
 
+Forge is task-driven, not just KPI-driven. Each session stores:
+
+- the open-text task scope
+- either an explicit `--done-when "TEXT"` success contract or a task-derived one
+- the normal KPI guardrails for coverage, speed, quality, and max iterations
+
+Typical flow:
+
+1. `forge-init "scope" --done-when "what finished means"`
+2. Paste the printed prompt into Codex
+3. Record iteration results in Forge state
+4. Run `forge-continue` for the next prompt
+5. Use `forge-status` to inspect scope, success mode, and next iteration
+
 ## Files
 
 - `bin/forge-init` — create a new Forge session for the current project
@@ -31,3 +45,5 @@ location differs from the Claude Code adapter.
   entries in Forge state instead of blindly incrementing loop metadata.
 - If multiple active Codex sessions exist, `forge-continue` and `forge-cancel`
   require an explicit session id instead of guessing.
+- Open-text `scope` and `--done-when` values are persisted in Forge state and
+  rendered back into prompts and status output.
@@ -29,11 +29,12 @@ if [[ ! -f "$loop_state" ]] || [[ ! -f "$forge_state" ]]; then
   exit 1
 fi
 
-active="$(forge_frontmatter_value "$loop_state" "active")"
-max_iterations="$(forge_frontmatter_value "$loop_state" "max_iterations")"
-scope="$(forge_frontmatter_value "$forge_state" "scope")"
+active="$(forge_strip_quotes "$(forge_frontmatter_value "$loop_state" "active")")"
+max_iterations="$(forge_strip_quotes "$(forge_frontmatter_value "$loop_state" "max_iterations")")"
+scope="$(forge_strip_quotes "$(forge_frontmatter_value "$forge_state" "scope")")"
 recorded_iteration="$(forge_recorded_iteration "$forge_state")"
 iteration=$((recorded_iteration + 1))
+done_when_text="$(forge_done_when_text "$forge_state")"
 
 if [[ "$active" != "true" ]]; then
   echo "Error: Session ${session_id} is not active." >&2
@@ -50,7 +51,7 @@ if (( iteration > max_iterations )); then
   exit 1
 fi
 
-prompt_text="$(forge_render_prompt "$PROMPT_TEMPLATE" "$session_id" "$scope" "$iteration")"
+prompt_text="$(forge_render_prompt "$PROMPT_TEMPLATE" "$session_id" "$scope" "$iteration" "$done_when_text")"
 
 echo "Forge Codex driver session ${session_id}, iteration ${iteration}."
 echo "Forge state: .codex/forge/forge-state.${session_id}.md"
 
@@ -10,7 +10,7 @@ source "$LIB_PATH"
 
 usage() {
   cat <<'EOF'
-Usage: forge-init "SCOPE" [--coverage N] [--speed -N%] [--quality strict|moderate|lax] [--max-iterations N]
+Usage: forge-init "SCOPE" [--done-when "TEXT"] [--coverage N] [--speed -N%] [--quality strict|moderate|lax] [--max-iterations N]
 EOF
 }
 
@@ -34,6 +34,7 @@ coverage_target=""
 speed_target="-20%"
 quality_target="moderate"
 max_iterations="20"
+done_when=""
 
 while [[ $# -gt 0 ]]; do
   case "$1" in
@@ -52,6 +53,11 @@ while [[ $# -gt 0 ]]; do
       quality_target="$2"
       shift 2
       ;;
+    --done-when)
+      require_value "$1" "${2:-}"
+      done_when="$2"
+      shift 2
+      ;;
     --max-iterations)
       require_value "$1" "${2:-}"
       max_iterations="$2"
@@ -96,16 +102,27 @@ timestamp="$(date -u +"%m%d-%H%M")"
 rand="$(od -An -N4 -tx1 /dev/urandom | tr -d ' \n' | cut -c1-4)"
 session_id="${timestamp}-${rand}"
 started_at="$(date -u +"%Y-%m-%dT%H:%M:%SZ")"
+success_mode="$(if [[ -n "$done_when" ]]; then printf 'explicit'; else printf 'task-derived'; fi)"
+done_when_yaml="null"
+if [[ -n "$done_when" ]]; then
+  done_when_yaml="$(forge_yaml_quote "$done_when")"
+fi
+scope_yaml="$(forge_yaml_quote "$scope")"
 
 forge_state="${state_dir}/forge-state.${session_id}.md"
 loop_state="${state_dir}/loop-state.${session_id}.md"
 
 cat > "$forge_state" <<EOF
 ---
 session_id: "${session_id}"
-scope: "${scope}"
+scope: ${scope_yaml}
 driver: "codex"
 state_root: ".codex/forge"
+success:
+  mode: "${success_mode}"
+  task: ${scope_yaml}
+  done_when: ${done_when_yaml}
+  completion_checks: []
 baseline:
   coverage: null
   speed_seconds: null
@@ -127,6 +144,7 @@ ideas: []
 ## Session
 - Driver: Codex
 - Scope: ${scope}
+- Done when: ${done_when:-Derive concrete completion checks from the task scope and record them during iteration 1.}
 - Started at: ${started_at}
 - Notes: Baseline not measured yet. First iteration must establish it from real test output.
 EOF
@@ -144,7 +162,8 @@ started_at: "${started_at}"
 ---
 EOF
 
-prompt_text="$(forge_render_prompt "$PROMPT_TEMPLATE" "$session_id" "$scope" "1")"
+done_when_text="${done_when:-Derive concrete completion checks from the task scope and record them in forge-state before EXECUTE.}"
+prompt_text="$(forge_render_prompt "$PROMPT_TEMPLATE" "$session_id" "$scope" "1" "$done_when_text")"
 
 echo "Initialized Forge Codex driver session ${session_id}."
 echo "Forge state: .codex/forge/forge-state.${session_id}.md"