|
| 1 | +# AGENTS.md |
| 2 | + |
| 3 | +This file is for future maintainers working inside this repository. |
| 4 | +Treat it as the high-signal maintenance note for how the repo is supposed to evolve. |
| 5 | + |
| 6 | +## Purpose |
| 7 | + |
| 8 | +`strict-agent-loop` is a Codex skill plus a small stdlib-only runtime for enforcing strict iterative execution. |
| 9 | +The repository has two jobs at the same time: |
| 10 | + |
| 11 | +- define the skill behavior and user-facing guidance |
| 12 | +- provide reliable helper scripts that make the loop durable and recoverable |
| 13 | + |
| 14 | +If you change one side, review the other side too. |
| 15 | + |
| 16 | +## Core Design |
| 17 | + |
| 18 | +The design is intentionally split into three layers: |
| 19 | + |
| 20 | +1. Controller protocol |
| 21 | + The current Codex session is the controller. |
| 22 | + It owns scope, verification, stop checks, and recovery decisions. |
| 23 | +2. Durable runtime |
| 24 | + The scripts under `scripts/` persist loop state, append-only logs, status broadcasts, and unattended supervision metadata. |
| 25 | +3. User guidance |
| 26 | + `SKILL.md`, `README.md`, `README_zh.md`, and `references/` must stay aligned with the actual runtime behavior. |
| 27 | + |
| 28 | +The repository is not trying to build a generic scheduler. |
| 29 | +It is trying to make Codex noticeably less likely to skip the middle of a long task. |
| 30 | + |
| 31 | +## Stability Rules |
| 32 | + |
| 33 | +These are effectively part of the public contract. |
| 34 | +Do not change them casually. |
| 35 | + |
| 36 | +- The default durable workspace layout lives under `<workspace_root>/.codex-loop/`. |
| 37 | +- `state.json` is the current authoritative state. |
| 38 | +- `events.jsonl`, `iterations.jsonl`, `status-history.jsonl`, `rounds/`, and `run-summary.md` are the durable trail. |
| 39 | +- The runtime must remain stdlib-only. |
| 40 | +- The helper scripts must stay compatible with Python `3.7` through `3.14`. |
| 41 | +- The skill should prefer default conventions over asking the user to specify many storage paths. |
| 42 | + |
| 43 | +If a change requires breaking one of these assumptions, update the docs and call it out explicitly in the commit. |
| 44 | + |
| 45 | +## Default Storage Convention |
| 46 | + |
| 47 | +Unless the user explicitly overrides it, assume: |
| 48 | + |
| 49 | +- workspace root is the target repo root |
| 50 | +- manager registry path is `<workspace_root>/.codex-loop/registry.json` |
| 51 | +- each task gets its own root at `<workspace_root>/.codex-loop/tasks/<task-id>/` |
| 52 | +- task state path is `<workspace_root>/.codex-loop/tasks/<task-id>/state.json` |
| 53 | +- all task-local durable artifacts live under the same task root |
| 54 | + |
| 55 | +Future changes should preserve this default-first behavior. |
| 56 | +Prompt examples should say "use the default managed `.codex-loop/` layout" instead of making users spell out every file path. |
| 57 | + |
| 58 | +The management helpers are intentionally simple: |
| 59 | + |
| 60 | +- `init_state.py` may derive the default task-local state path automatically |
| 61 | +- mutation scripts still require `--state` so operators cannot accidentally update the wrong task |
| 62 | +- `list_tasks.py` and `show_task.py` are the intended low-friction management entrypoints |
| 63 | + |
| 64 | +## Recovery Model |
| 65 | + |
| 66 | +Recovery is disk-first, not memory-first. |
| 67 | + |
| 68 | +When something goes wrong, the intended recovery order is: |
| 69 | + |
| 70 | +1. `registry.json` to find the right task |
| 71 | +2. the task's `state.json` |
| 72 | +3. the task's `run-summary.md` |
| 73 | +4. the task's `iterations.jsonl` |
| 74 | +5. the task's `events.jsonl` |
| 75 | +6. the task's `status-history.jsonl` |
| 76 | +7. the task's `rounds/` |
| 77 | +8. unattended only: the task's `supervisor/` |
| 78 | + |
| 79 | +If you change the schema or artifact set, make sure this recovery order still makes sense and update `references/recovery.md`. |
| 80 | + |
| 81 | +## Script Responsibilities |
| 82 | + |
| 83 | +- `init_state.py` |
| 84 | + Initializes the state and writes the first status artifacts. |
| 85 | + It should stay easy to use, with sensible defaults. |
| 86 | +- `update_state.py` |
| 87 | + Records one verified iteration. |
| 88 | + This is the most important script for correctness. |
| 89 | +- `append_event.py` |
| 90 | + Records controller or supervisor events without mutating the semantic history. |
| 91 | +- `check_stop.py` |
| 92 | + Evaluates stop conditions and writes the latest stop report. |
| 93 | +- `report_status.py` |
| 94 | + Refreshes human-readable and machine-readable progress outputs. |
| 95 | +- `compact_state.py` |
| 96 | + Shrinks the rolling in-memory history window without destroying the append-only trail. |
| 97 | +- `supervise.py` |
| 98 | + Owns unattended outer-loop execution and heartbeat-style broadcasting. |
| 99 | +- `list_tasks.py` |
| 100 | + Lists managed tasks from the workspace registry. |
| 101 | +- `show_task.py` |
| 102 | + Resolves one managed task and prints its canonical paths and latest registry metadata. |
| 103 | +- `state_tools.py` |
| 104 | + Shared schema, paths, rendering, and append-only helpers. |
| 105 | +- `stop_tools.py` |
| 106 | + Machine-checkable stop evaluation. |
| 107 | + |
| 108 | +If you add a new runtime behavior, decide clearly which script owns it. |
| 109 | +Avoid smearing one responsibility across many files. |
| 110 | + |
| 111 | +## Documentation Sync Rules |
| 112 | + |
| 113 | +When changing runtime behavior, check all of these: |
| 114 | + |
| 115 | +- `SKILL.md` |
| 116 | +- `README.md` |
| 117 | +- `README_zh.md` |
| 118 | +- `references/protocol.md` |
| 119 | +- `references/recovery.md` |
| 120 | +- `references/state_schema.md` |
| 121 | +- `references/stop_checks.md` |
| 122 | +- `agents/openai.yaml` |
| 123 | + |
| 124 | +If the change affects default usage, update the copy-paste install prompt in both READMEs. |
| 125 | + |
| 126 | +## Validation Expectations |
| 127 | + |
| 128 | +Before pushing, at minimum do all of these: |
| 129 | + |
| 130 | +- `python -m py_compile scripts/*.py` |
| 131 | +- `~/.pyenv/versions/3.7.6/bin/python -m py_compile scripts/*.py` |
| 132 | +- run the lifecycle smoke flow or the GitHub Actions equivalent |
| 133 | +- forward-test a real strict-loop task when the runtime semantics changed |
| 134 | + |
| 135 | +The canonical real-world test in this repo is a hailstone / Collatz sequence task where: |
| 136 | + |
| 137 | +- each round appends exactly one new number |
| 138 | +- the total round count is not obvious up front |
| 139 | +- the final report must aggregate the full history from disk |
| 140 | + |
| 141 | +That scenario is useful because it catches fake batching and weak finalization behavior. |
| 142 | + |
| 143 | +## When Touching The Schema |
| 144 | + |
| 145 | +If you change persistent fields or artifact semantics: |
| 146 | + |
| 147 | +- update `references/state_schema.md` |
| 148 | +- decide whether `schema_version` should change |
| 149 | +- keep old fields readable when reasonable |
| 150 | +- make sure append-only logs still remain queryable after compaction |
| 151 | +- confirm `run-summary.md` still points to the right durable artifacts |
| 152 | + |
| 153 | +## What Not To Do |
| 154 | + |
| 155 | +- Do not add third-party Python dependencies for convenience. |
| 156 | +- Do not turn user prompts into a requirement to enumerate storage paths. |
| 157 | +- Do not make the unattended mode depend only on natural-language claims of success. |
| 158 | +- Do not silently remove durable artifacts from the default layout. |
| 159 | +- Do not let docs drift away from the actual runtime behavior. |
0 commit comments