Skip to content

Commit 7ddb656

Browse files
committed
Add managed multi-task loop layout
1 parent e7c744c commit 7ddb656

23 files changed

Lines changed: 1045 additions & 271 deletions

.github/workflows/python-compat.yml

Lines changed: 15 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -43,18 +43,22 @@ jobs:
4343
run: |
4444
set -euo pipefail
4545
tmpdir="$(mktemp -d)"
46-
state_path="$tmpdir/.codex-loop/state.json"
46+
task_id="compat-smoke"
47+
state_path="$tmpdir/.codex-loop/tasks/$task_id/state.json"
4748
marker_path="$tmpdir/done.txt"
4849
4950
python scripts/init_state.py \
50-
--state "$state_path" \
5151
--goal "Verify compatibility smoke test" \
5252
--global-stop-condition "Stop only when the lifecycle scripts run cleanly and done.txt contains done." \
5353
--workspace-root "$tmpdir" \
54+
--task-id "$task_id" \
5455
--success-evidence "check_stop.py returns success" \
5556
--require-path done.txt \
5657
--require-text "done.txt::done"
5758
59+
python scripts/list_tasks.py --workspace-root "$tmpdir"
60+
python scripts/show_task.py --workspace-root "$tmpdir" --task-id "$task_id"
61+
5862
python scripts/append_event.py \
5963
--state "$state_path" \
6064
--kind round.started \
@@ -93,11 +97,12 @@ jobs:
9397
python scripts/report_status.py --state "$state_path" --label compat.smoke
9498
python scripts/compact_state.py --state "$state_path" --keep-last 1
9599
96-
test -f "$tmpdir/.codex-loop/events.jsonl"
97-
test -f "$tmpdir/.codex-loop/iterations.jsonl"
98-
test -f "$tmpdir/.codex-loop/status-history.jsonl"
99-
test -f "$tmpdir/.codex-loop/latest-status.txt"
100-
test -f "$tmpdir/.codex-loop/latest-stop-report.json"
101-
test -f "$tmpdir/.codex-loop/run-summary.md"
102-
test -f "$tmpdir/.codex-loop/rounds/iteration-0001.md"
103-
test -f "$tmpdir/.codex-loop/rounds/iteration-0002.md"
100+
test -f "$tmpdir/.codex-loop/registry.json"
101+
test -f "$tmpdir/.codex-loop/tasks/$task_id/events.jsonl"
102+
test -f "$tmpdir/.codex-loop/tasks/$task_id/iterations.jsonl"
103+
test -f "$tmpdir/.codex-loop/tasks/$task_id/status-history.jsonl"
104+
test -f "$tmpdir/.codex-loop/tasks/$task_id/latest-status.txt"
105+
test -f "$tmpdir/.codex-loop/tasks/$task_id/latest-stop-report.json"
106+
test -f "$tmpdir/.codex-loop/tasks/$task_id/run-summary.md"
107+
test -f "$tmpdir/.codex-loop/tasks/$task_id/rounds/iteration-0001.md"
108+
test -f "$tmpdir/.codex-loop/tasks/$task_id/rounds/iteration-0002.md"

AGENTS.md

Lines changed: 159 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
# AGENTS.md
2+
3+
This file is for future maintainers working inside this repository.
4+
Treat it as the high-signal maintenance note for how the repo is supposed to evolve.
5+
6+
## Purpose
7+
8+
`strict-agent-loop` is a Codex skill plus a small stdlib-only runtime for enforcing strict iterative execution.
9+
The repository has two jobs at the same time:
10+
11+
- define the skill behavior and user-facing guidance
12+
- provide reliable helper scripts that make the loop durable and recoverable
13+
14+
If you change one side, review the other side too.
15+
16+
## Core Design
17+
18+
The design is intentionally split into three layers:
19+
20+
1. Controller protocol
21+
The current Codex session is the controller.
22+
It owns scope, verification, stop checks, and recovery decisions.
23+
2. Durable runtime
24+
The scripts under `scripts/` persist loop state, append-only logs, status broadcasts, and unattended supervision metadata.
25+
3. User guidance
26+
`SKILL.md`, `README.md`, `README_zh.md`, and `references/` must stay aligned with the actual runtime behavior.
27+
28+
The repository is not trying to build a generic scheduler.
29+
It is trying to make Codex noticeably less likely to skip the middle of a long task.
30+
31+
## Stability Rules
32+
33+
These are effectively part of the public contract.
34+
Do not change them casually.
35+
36+
- The default durable workspace layout lives under `<workspace_root>/.codex-loop/`.
37+
- `state.json` is the current authoritative state.
38+
- `events.jsonl`, `iterations.jsonl`, `status-history.jsonl`, `rounds/`, and `run-summary.md` are the durable trail.
39+
- The runtime must remain stdlib-only.
40+
- The helper scripts must stay compatible with Python `3.7` through `3.14`.
41+
- The skill should prefer default conventions over asking the user to specify many storage paths.
42+
43+
If a change requires breaking one of these assumptions, update the docs and call it out explicitly in the commit.
44+
45+
## Default Storage Convention
46+
47+
Unless the user explicitly overrides it, assume:
48+
49+
- workspace root is the target repo root
50+
- manager registry path is `<workspace_root>/.codex-loop/registry.json`
51+
- each task gets its own root at `<workspace_root>/.codex-loop/tasks/<task-id>/`
52+
- task state path is `<workspace_root>/.codex-loop/tasks/<task-id>/state.json`
53+
- all task-local durable artifacts live under the same task root
54+
55+
Future changes should preserve this default-first behavior.
56+
Prompt examples should say "use the default managed `.codex-loop/` layout" instead of making users spell out every file path.
57+
58+
The management helpers are intentionally simple:
59+
60+
- `init_state.py` may derive the default task-local state path automatically
61+
- mutation scripts still require `--state` so operators cannot accidentally update the wrong task
62+
- `list_tasks.py` and `show_task.py` are the intended low-friction management entrypoints
63+
64+
## Recovery Model
65+
66+
Recovery is disk-first, not memory-first.
67+
68+
When something goes wrong, the intended recovery order is:
69+
70+
1. `registry.json` to find the right task
71+
2. the task's `state.json`
72+
3. the task's `run-summary.md`
73+
4. the task's `iterations.jsonl`
74+
5. the task's `events.jsonl`
75+
6. the task's `status-history.jsonl`
76+
7. the task's `rounds/`
77+
8. unattended only: the task's `supervisor/`
78+
79+
If you change the schema or artifact set, make sure this recovery order still makes sense and update `references/recovery.md`.
80+
81+
## Script Responsibilities
82+
83+
- `init_state.py`
84+
Initializes the state and writes the first status artifacts.
85+
It should stay easy to use, with sensible defaults.
86+
- `update_state.py`
87+
Records one verified iteration.
88+
This is the most important script for correctness.
89+
- `append_event.py`
90+
Records controller or supervisor events without mutating the semantic history.
91+
- `check_stop.py`
92+
Evaluates stop conditions and writes the latest stop report.
93+
- `report_status.py`
94+
Refreshes human-readable and machine-readable progress outputs.
95+
- `compact_state.py`
96+
Shrinks the rolling in-memory history window without destroying the append-only trail.
97+
- `supervise.py`
98+
Owns unattended outer-loop execution and heartbeat-style broadcasting.
99+
- `list_tasks.py`
100+
Lists managed tasks from the workspace registry.
101+
- `show_task.py`
102+
Resolves one managed task and prints its canonical paths and latest registry metadata.
103+
- `state_tools.py`
104+
Shared schema, paths, rendering, and append-only helpers.
105+
- `stop_tools.py`
106+
Machine-checkable stop evaluation.
107+
108+
If you add a new runtime behavior, decide clearly which script owns it.
109+
Avoid smearing one responsibility across many files.
110+
111+
## Documentation Sync Rules
112+
113+
When changing runtime behavior, check all of these:
114+
115+
- `SKILL.md`
116+
- `README.md`
117+
- `README_zh.md`
118+
- `references/protocol.md`
119+
- `references/recovery.md`
120+
- `references/state_schema.md`
121+
- `references/stop_checks.md`
122+
- `agents/openai.yaml`
123+
124+
If the change affects default usage, update the copy-paste install prompt in both READMEs.
125+
126+
## Validation Expectations
127+
128+
Before pushing, at minimum do all of these:
129+
130+
- `python -m py_compile scripts/*.py`
131+
- `~/.pyenv/versions/3.7.6/bin/python -m py_compile scripts/*.py`
132+
- run the lifecycle smoke flow or the GitHub Actions equivalent
133+
- forward-test a real strict-loop task when the runtime semantics changed
134+
135+
The canonical real-world test in this repo is a hailstone / Collatz sequence task where:
136+
137+
- each round appends exactly one new number
138+
- the total round count is not obvious up front
139+
- the final report must aggregate the full history from disk
140+
141+
That scenario is useful because it catches fake batching and weak finalization behavior.
142+
143+
## When Touching The Schema
144+
145+
If you change persistent fields or artifact semantics:
146+
147+
- update `references/state_schema.md`
148+
- decide whether `schema_version` should change
149+
- keep old fields readable when reasonable
150+
- make sure append-only logs still remain queryable after compaction
151+
- confirm `run-summary.md` still points to the right durable artifacts
152+
153+
## What Not To Do
154+
155+
- Do not add third-party Python dependencies for convenience.
156+
- Do not turn user prompts into a requirement to enumerate storage paths.
157+
- Do not make the unattended mode depend only on natural-language claims of success.
158+
- Do not silently remove durable artifacts from the default layout.
159+
- Do not let docs drift away from the actual runtime behavior.

0 commit comments

Comments
 (0)