|
| 1 | +# Agent Session Operations Spec |
| 2 | + |
| 3 | +This spec defines the desired operating loop for Codex sessions that need to |
| 4 | +move through repo work, host configuration, runtime services, memory context, |
| 5 | +and pull-request follow-up without losing track of the current truth. |
| 6 | + |
| 7 | +The intent is not to add a new runtime feature. The intent is to make future |
| 8 | +sessions predictable: begin from live state, name the active lane, keep work |
| 9 | +visible, verify the result through the surface that matters, and close with a |
| 10 | +compact handoff. |
| 11 | + |
| 12 | +## Goals |
| 13 | + |
| 14 | +- Start every non-trivial turn from a small live-state readback instead of |
| 15 | + relying on remembered context alone. |
| 16 | +- Keep repo source, host configuration, runtime services, memory context, and |
| 17 | + publication state separate until there is direct proof that connects them. |
| 18 | +- Use the plan tool as the visible work ledger throughout multi-step tasks. |
| 19 | +- Delegate independent reads, implementation slices, test diagnosis, and review |
| 20 | + work to focused subagents while keeping final decisions in the root session. |
| 21 | +- Use explicit command timeouts for commands that are expected to be slow while |
| 22 | + keeping simple read commands fast. |
| 23 | +- End each substantial turn with the smallest useful proof summary and one clear |
| 24 | + next move. |
| 25 | + |
| 26 | +## Non-Goals |
| 27 | + |
| 28 | +- Do not change Codex runtime scheduling, tool semantics, or model provider |
| 29 | + behavior through this spec. |
| 30 | +- Do not use this spec as a replacement for live repo, process, network, test, |
| 31 | + or git verification. |
| 32 | +- Do not add broad product documentation or user-facing docs from this work. |
| 33 | +- Do not encode host-specific secrets, private paths that contain credentials, or |
| 34 | + raw secret-bearing logs in session artifacts. |
| 35 | +- Do not treat memory context, injected context, or prior summaries as current |
| 36 | + truth without checking the relevant live surface. |
| 37 | + |
| 38 | +## Session Start Card |
| 39 | + |
| 40 | +For a non-trivial task, the session should establish a compact state card before |
| 41 | +making changes: |
| 42 | + |
| 43 | +```text |
| 44 | +cwd: |
| 45 | +branch: |
| 46 | +dirty state: |
| 47 | +remote / publication target: |
| 48 | +lane: |
| 49 | +applicable instructions: |
| 50 | +proof surface: |
| 51 | +done condition: |
| 52 | +``` |
| 53 | + |
| 54 | +The card does not need to be verbose or user-visible in every turn, but the |
| 55 | +agent should collect enough of it to avoid editing the wrong surface. Typical |
| 56 | +commands are `pwd`, `git status --short --branch`, `git remote -v`, applicable |
| 57 | +`AGENTS.md` reads, and the narrowest command or file read that proves the |
| 58 | +reported problem. |
| 59 | + |
| 60 | +## Lane Classification |
| 61 | + |
| 62 | +Every task should identify its primary lane before mutation: |
| 63 | + |
| 64 | +| Lane | Truth surface | Typical proof | |
| 65 | +| --- | --- | --- | |
| 66 | +| Repo code | Source tree and tests | Focused diff, formatter, crate or package tests | |
| 67 | +| Host config | Files under `~/.codex` | Live file readback and feature registry checks | |
| 68 | +| Runtime service | Running process, container, socket, or API | Health endpoint, logs, listener, process tree | |
| 69 | +| Memory context | Honcho or memory files | Source-labeled retrieval plus live verification | |
| 70 | +| Publication | Git remotes, branches, PRs, CI | `git status`, `git ls-remote`, PR checks, workflow state | |
| 71 | + |
| 72 | +When a task crosses lanes, the answer should say which lane was changed and |
| 73 | +which lane was only inspected. For example, editing `~/.codex/AGENTS.md` changes |
| 74 | +agent guidance, not Codex runtime behavior; changing repo source may still need |
| 75 | +a rebuild or deployment before it affects a running service. |
| 76 | + |
| 77 | +## Plan Discipline |
| 78 | + |
| 79 | +The plan is the session's work ledger: |
| 80 | + |
| 81 | +- Start a plan before multi-step work. |
| 82 | +- Keep only one item in progress. |
| 83 | +- Update the plan when the task changes lanes, when edits begin, and when |
| 84 | + verification starts. |
| 85 | +- Treat terse follow-ups such as `continue`, `.`, `ok check`, `eval`, and |
| 86 | + `health` as instructions to resume from the current live state rather than to |
| 87 | + restart broad discovery. |
| 88 | + |
| 89 | +Plans should stay small. A good default shape is: |
| 90 | + |
| 91 | +1. Inspect live state and constraints. |
| 92 | +2. Make the scoped change. |
| 93 | +3. Verify the relevant proof surface. |
| 94 | +4. Commit, push, or hand off if requested. |
| 95 | + |
| 96 | +## Subagent Discipline |
| 97 | + |
| 98 | +Use subagents when work splits cleanly: |
| 99 | + |
| 100 | +- Independent code or document reading. |
| 101 | +- Implementation in bounded files or modules. |
| 102 | +- Test failure diagnosis. |
| 103 | +- Review of a completed diff. |
| 104 | +- Memory or prior-context lookup. |
| 105 | + |
| 106 | +Each subagent should receive an explicit scope, no-go areas, evidence |
| 107 | +requirements, and output shape. The root session remains responsible for |
| 108 | +decisions, staging, commits, user-visible claims, and live verification of risky |
| 109 | +agent conclusions. |
| 110 | + |
| 111 | +Root should also close stale agents once their evidence is collected. A useful |
| 112 | +root loop is: |
| 113 | + |
| 114 | +```text |
| 115 | +spawn focused agents -> continue independent work -> wait/list agents -> verify risky claims -> close agents |
| 116 | +``` |
| 117 | + |
| 118 | +## Timeout Policy |
| 119 | + |
| 120 | +Keep omitted shell timeouts short for simple commands. Use explicit `timeout_ms` |
| 121 | +for commands that are known or expected to be slow: |
| 122 | + |
| 123 | +- Tests, builds, formatters that may compile, and package installs. |
| 124 | +- Cargo, Bazel, Docker, network probes, and service health checks. |
| 125 | +- Long-running audits, PR polling, and CI inspection. |
| 126 | + |
| 127 | +Do not raise generic command timeouts to hide hangs. Prefer a deliberate timeout |
| 128 | +that matches the command's expected duration. |
| 129 | + |
| 130 | +## Health And Eval Answers |
| 131 | + |
| 132 | +For prompts such as `health`, `working`, `eval`, `quality`, and `maxed`, report |
| 133 | +separate proof buckets instead of blending them: |
| 134 | + |
| 135 | +```text |
| 136 | +setup wiring: |
| 137 | +runtime/API health: |
| 138 | +quality/eval signal: |
| 139 | +queue freshness: |
| 140 | +remaining boundary: |
| 141 | +``` |
| 142 | + |
| 143 | +A reachable endpoint does not prove useful retrieval quality. A green hook |
| 144 | +configuration does not prove that a downstream service is healthy. Queue status |
| 145 | +is observability, not by itself a blocker. |
| 146 | + |
| 147 | +## Final Handoff |
| 148 | + |
| 149 | +Substantial turns should close with a compact proof footer: |
| 150 | + |
| 151 | +```text |
| 152 | +Changed: |
| 153 | +Verified: |
| 154 | +Not verified: |
| 155 | +Next move: |
| 156 | +``` |
| 157 | + |
| 158 | +The exact labels are optional, but the answer should make the same information |
| 159 | +clear. Prefer direct readbacks, command names, commit hashes, PR URLs, health |
| 160 | +endpoints, listener checks, and focused tests over long narrative recaps. |
| 161 | + |
| 162 | +## Rollout |
| 163 | + |
| 164 | +This spec can be adopted as operating guidance immediately. Any later runtime |
| 165 | +work should be proposed separately and tied to a concrete behavior gap, such as |
| 166 | +agent wake semantics, transcript visibility, or command timeout defaults. |
0 commit comments