Skip to content

Commit caefec0

Browse files
committed
docs: add agent session operations spec
1 parent 7a179ca commit caefec0

1 file changed

Lines changed: 166 additions & 0 deletions

File tree

Lines changed: 166 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,166 @@
1+
# Agent Session Operations Spec
2+
3+
This spec defines the desired operating loop for Codex sessions that need to
4+
move through repo work, host configuration, runtime services, memory context,
5+
and pull-request follow-up without losing track of the current truth.
6+
7+
The intent is not to add a new runtime feature. The intent is to make future
8+
sessions predictable: begin from live state, name the active lane, keep work
9+
visible, verify the result through the surface that matters, and close with a
10+
compact handoff.
11+
12+
## Goals
13+
14+
- Start every non-trivial turn from a small live-state readback instead of
15+
relying on remembered context alone.
16+
- Keep repo source, host configuration, runtime services, memory context, and
17+
publication state separate until there is direct proof that connects them.
18+
- Use the plan tool as the visible work ledger throughout multi-step tasks.
19+
- Delegate independent reads, implementation slices, test diagnosis, and review
20+
work to focused subagents while keeping final decisions in the root session.
21+
- Use explicit command timeouts for commands that are expected to be slow while
22+
keeping simple read commands fast.
23+
- End each substantial turn with the smallest useful proof summary and one clear
24+
next move.
25+
26+
## Non-Goals
27+
28+
- Do not change Codex runtime scheduling, tool semantics, or model provider
29+
behavior through this spec.
30+
- Do not use this spec as a replacement for live repo, process, network, test,
31+
or git verification.
32+
- Do not add broad product documentation or user-facing docs from this work.
33+
- Do not encode host-specific secrets, private paths that contain credentials, or
34+
raw secret-bearing logs in session artifacts.
35+
- Do not treat memory context, injected context, or prior summaries as current
36+
truth without checking the relevant live surface.
37+
38+
## Session Start Card
39+
40+
For a non-trivial task, the session should establish a compact state card before
41+
making changes:
42+
43+
```text
44+
cwd:
45+
branch:
46+
dirty state:
47+
remote / publication target:
48+
lane:
49+
applicable instructions:
50+
proof surface:
51+
done condition:
52+
```
53+
54+
The card does not need to be verbose or user-visible in every turn, but the
55+
agent should collect enough of it to avoid editing the wrong surface. Typical
56+
commands are `pwd`, `git status --short --branch`, `git remote -v`, applicable
57+
`AGENTS.md` reads, and the narrowest command or file read that proves the
58+
reported problem.
59+
60+
## Lane Classification
61+
62+
Every task should identify its primary lane before mutation:
63+
64+
| Lane | Truth surface | Typical proof |
65+
| --- | --- | --- |
66+
| Repo code | Source tree and tests | Focused diff, formatter, crate or package tests |
67+
| Host config | Files under `~/.codex` | Live file readback and feature registry checks |
68+
| Runtime service | Running process, container, socket, or API | Health endpoint, logs, listener, process tree |
69+
| Memory context | Honcho or memory files | Source-labeled retrieval plus live verification |
70+
| Publication | Git remotes, branches, PRs, CI | `git status`, `git ls-remote`, PR checks, workflow state |
71+
72+
When a task crosses lanes, the answer should say which lane was changed and
73+
which lane was only inspected. For example, editing `~/.codex/AGENTS.md` changes
74+
agent guidance, not Codex runtime behavior; changing repo source may still need
75+
a rebuild or deployment before it affects a running service.
76+
77+
## Plan Discipline
78+
79+
The plan is the session's work ledger:
80+
81+
- Start a plan before multi-step work.
82+
- Keep only one item in progress.
83+
- Update the plan when the task changes lanes, when edits begin, and when
84+
verification starts.
85+
- Treat terse follow-ups such as `continue`, `.`, `ok check`, `eval`, and
86+
`health` as instructions to resume from the current live state rather than to
87+
restart broad discovery.
88+
89+
Plans should stay small. A good default shape is:
90+
91+
1. Inspect live state and constraints.
92+
2. Make the scoped change.
93+
3. Verify the relevant proof surface.
94+
4. Commit, push, or hand off if requested.
95+
96+
## Subagent Discipline
97+
98+
Use subagents when work splits cleanly:
99+
100+
- Independent code or document reading.
101+
- Implementation in bounded files or modules.
102+
- Test failure diagnosis.
103+
- Review of a completed diff.
104+
- Memory or prior-context lookup.
105+
106+
Each subagent should receive an explicit scope, no-go areas, evidence
107+
requirements, and output shape. The root session remains responsible for
108+
decisions, staging, commits, user-visible claims, and live verification of risky
109+
agent conclusions.
110+
111+
Root should also close stale agents once their evidence is collected. A useful
112+
root loop is:
113+
114+
```text
115+
spawn focused agents -> continue independent work -> wait/list agents -> verify risky claims -> close agents
116+
```
117+
118+
## Timeout Policy
119+
120+
Keep omitted shell timeouts short for simple commands. Use explicit `timeout_ms`
121+
for commands that are known or expected to be slow:
122+
123+
- Tests, builds, formatters that may compile, and package installs.
124+
- Cargo, Bazel, Docker, network probes, and service health checks.
125+
- Long-running audits, PR polling, and CI inspection.
126+
127+
Do not raise generic command timeouts to hide hangs. Prefer a deliberate timeout
128+
that matches the command's expected duration.
129+
130+
## Health And Eval Answers
131+
132+
For prompts such as `health`, `working`, `eval`, `quality`, and `maxed`, report
133+
separate proof buckets instead of blending them:
134+
135+
```text
136+
setup wiring:
137+
runtime/API health:
138+
quality/eval signal:
139+
queue freshness:
140+
remaining boundary:
141+
```
142+
143+
A reachable endpoint does not prove useful retrieval quality. A green hook
144+
configuration does not prove that a downstream service is healthy. Queue status
145+
is observability, not by itself a blocker.
146+
147+
## Final Handoff
148+
149+
Substantial turns should close with a compact proof footer:
150+
151+
```text
152+
Changed:
153+
Verified:
154+
Not verified:
155+
Next move:
156+
```
157+
158+
The exact labels are optional, but the answer should make the same information
159+
clear. Prefer direct readbacks, command names, commit hashes, PR URLs, health
160+
endpoints, listener checks, and focused tests over long narrative recaps.
161+
162+
## Rollout
163+
164+
This spec can be adopted as operating guidance immediately. Any later runtime
165+
work should be proposed separately and tied to a concrete behavior gap, such as
166+
agent wake semantics, transcript visibility, or command timeout defaults.

0 commit comments

Comments
 (0)