Skip to content

Commit c97ed96

Browse files
author
jgstern-agent
committed
docs: operator guide for scripts/agent-supervisor
Adds docs/agent-supervisor.md, a user-facing guide for the tmux-session watchdog introduced in WI-razub. The original design doc's "Concrete user UX" section lived in the tracker item's discussion thread — not anywhere a workstation operator would look — so existing documentation was limited to --help text and script docstrings. Covers: - What the supervisor solves + what signal it relies on - First-time setup (the loop-toggle + agent-supervisor run two-step) - Normal operation (attach / detach / pause / resume / shutdown) - status JSON field semantics for debugging - Edge cases: two supervisors, human attached, rate-limited, crashed, missing tmux, CLI refuses graceful exit - Troubleshooting matrix - State directory layout - What the supervisor does NOT do - Deferred follow-ups (WI-sipov wrapper heartbeats, WI-batob keystroke verification) - Related reading cross-references Linked from README.md Links section. Follow-up to WI-razub-duluf-nobun-rulit-dapam-jipal-dafud-nahob. Signed-off-by: jgstern-agent <josh-agent@iterabloom.com>
1 parent 8848be3 commit c97ed96

5 files changed

Lines changed: 156 additions & 3 deletions

File tree

.ci/affected-tests.txt

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
# Test selection manifest
2-
# Generated by smart-test at 2026-04-18T04:12:04-04:00
2+
# Generated by smart-test at 2026-04-18T04:36:14-04:00
33
# Mode: targeted
4-
# Baseline: 58710465f29011d0eb024e7839e19d5e20df121c
4+
# Baseline: c01512b82b712fcdf8352f9a9f487d9c624927c8
55
# Reason: no Python source files changed
6-
# Changed files: 32
6+
# Changed files: 10
77
# Changed source files: 0
88
# Selected tests: 0
99
#

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@ This changelog tracks the **tool version** (package releases). The **schema vers
1212

1313
### Added
1414

15+
- **`docs/agent-supervisor.md` operator guide** (follow-up to WI-razub): net-new user-facing doc covering the `scripts/agent-supervisor` daemon's operator workflow — prerequisites, first-time setup (`loop-toggle DEEP` + `agent-supervisor run &`), daily operations (attach / detach / pause / resume / shutdown), `status` JSON field semantics, edge cases (two supervisors, human attached, rate-limited, crashed, missing tmux), a troubleshooting matrix, the state-directory layout, an explicit "what the supervisor does NOT do" list, and cross-references to AGENTS.md + the script docstring. Fills the documentation gap the design doc left — the WI-razub "Concrete user UX" section lived in the tracker thread, not anywhere a workstation operator would find it. Linked from the main `README.md` Links section.
16+
1517
- **Vendor Parity for Respawn AGENTS.md section** (WI-batob, sub-item of WI-razub respawn mechanism): new authoritative table in `AGENTS.md` documenting, for each of Claude Code / Codex CLI / Cursor / Gemini CLI, the per-turn hook path (WI-sipov heartbeat), the session-start hook path (WI-sakod respawn branch), the graceful-exit keystroke the supervisor sends via `tmux send-keys`, the non-interactive CLI invocation for `tmux new-session`, and any vendor-specific quirks. Verification status is explicit: Claude Code's `/quit` is verified; the other three are marked "unverified — FIXME WI-batob" with a documented verification procedure (throwaway tmux session, send the keystroke, confirm the CLI process exits within 30s). Adding a new vendor requires four coordinated changes in the same PR (table row, `VENDOR_TABLE` entry in `scripts/agent-supervisor`, per-turn hook sourcing `touch_heartbeat.sh`, session-start hook sourcing `session_start_logic.sh`); the existing structural-guard tests in `tests/test_touch_heartbeat.py` and `tests/test_session_start_respawn.py` will fire if any hook wire-up is missed. Completes the last open sub-item of WI-razub.
1618

1719
- **Respawn-aware session-start hook** (WI-sakod, sub-item of WI-razub respawn mechanism): when the agent-supervisor daemon spawns a fresh CLI via `tmux new-session -e HYPERGUMBO_RESPAWN=1`, the vendor session-start hooks (via the shared `session_start_logic.sh`) now branch on the env var to auto-enable autonomous mode for this session per `autonomous_intent.txt` (narrow-write via `loop-toggle --set-session-mode`; project-level intent is left untouched) and emit the generic seed prompt ("Please familiarize yourself with this repo. Once you have done so, please set autonomous mode to DEEP."). All four vendors pick this up automatically since each hook sources the shared logic and surfaces `SESSION_START_MESSAGE` through its vendor-native path (plain stdout for Claude Code / Codex / Cursor; JSON `decision: allow, reason: ...` for Gemini). Defensive fall-through: `HYPERGUMBO_RESPAWN=1` + intent=OFF (or missing intent file / garbage value) takes the existing human-prompt path rather than force autonomous mode on. `HYPERGUMBO_RESPAWN` values other than exactly `1` are ignored so env var leakage can't accidentally trigger autonomous bootstrap. Never writes to `autonomous_intent.txt` — the hook is strictly a mirror from intent → session mode. 18 new tests cover the respawn branch for DEEP/BROAD/OFF/garbage/missing intent, the case-insensitive normalization, the "wrong env value is ignored" guard, a regression test that intent file mtime is unchanged, and structural guards that every vendor hook sources the shared logic and surfaces the message.

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -226,6 +226,7 @@ See [CONTRIBUTING.md](CONTRIBUTING.md) for PR workflow (including fork-based wor
226226
- [docs/hypergumbo-spec.md](https://codeberg.org/iterabloom/hypergumbo/src/branch/dev/docs/hypergumbo-spec.md) — Detailed specification
227227
- [docs/CITATIONS.md](https://codeberg.org/iterabloom/hypergumbo/src/branch/dev/docs/CITATIONS.md) — Paper citations for embedding models
228228
- [docs/CACHE.md](https://codeberg.org/iterabloom/hypergumbo/src/branch/dev/docs/CACHE.md) — Caching architecture
229+
- [docs/agent-supervisor.md](https://codeberg.org/iterabloom/hypergumbo/src/branch/dev/docs/agent-supervisor.md) — Operator guide for `scripts/agent-supervisor` (the tmux-session watchdog for autonomous agents)
229230
- [SECURITY.md](https://codeberg.org/iterabloom/hypergumbo/src/branch/dev/SECURITY.md) — Vulnerability reporting
230231
- [hypergumbo-tracker README](packages/hypergumbo-tracker/README.md) — Standalone tracker for AI agent governance
231232

docs/agent-supervisor.md

Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
<!-- SPDX-License-Identifier: AGPL-3.0-or-later -->
2+
# Agent Supervisor — Operator Guide
3+
4+
`scripts/agent-supervisor` is a long-running daemon that monitors tmux sessions running hypergumbo-aware agent CLIs (Claude Code, Codex CLI, Cursor, Gemini CLI) and replaces a stuck session with a fresh one when your project-level intent says autonomous work is desired but the current session has stopped making progress.
5+
6+
This guide covers the operator workflow. For the design rationale, see tracker item `WI-razub` and the related vendor-contract documentation in [`AGENTS.md` → Vendor Parity for Respawn](../AGENTS.md).
7+
8+
## What the supervisor solves
9+
10+
The stop-hook circuit breaker (5 consecutive no-progress hashes) correctly detects a stagnating autonomous session — but it does so by permitting the session to terminate, which gives a stuck agent a one-way exit out of long-running work. The supervisor closes that loop: when a session tripss the breaker (or crashes, or exits cleanly), the supervisor spawns a fresh CLI with a clean context, seeded with a generic "familiarize yourself with this repo" prompt, so forward-march resumes automatically.
11+
12+
The supervisor's authoritative signal is tmux pane-byte delta over a rolling 15-minute window — "is the pane actually scrolling?" — NOT any file the agent itself writes. Per-session heartbeat files exist (touched by every vendor's per-turn hook) but are telemetry only; they surface in `status` output but are never consulted for spawn/replace decisions.
13+
14+
## Prerequisites
15+
16+
- `tmux` installed on the workstation.
17+
- One or more vendor CLIs installed and on `$PATH`: `claude`, `codex`, `cursor`, `gemini`.
18+
- `python3` (standard library only — no extra dependencies).
19+
- You have run `./scripts/dev-install` in this repo so the hooks and scripts are wired up.
20+
21+
> **Verification status note.** The exit-keystroke for Claude Code is verified. For Codex / Cursor / Gemini the supervisor's table is best-effort and marked `FIXME WI-batob` in both `scripts/agent-supervisor::VENDOR_TABLE` and the AGENTS.md parity table. Before relying on the supervisor to respawn those vendors in production, do the one-time verification step documented in AGENTS.md.
22+
23+
## First-time setup
24+
25+
Two commands per workstation. Run them once and the supervisor owns the lifecycle from then on.
26+
27+
```bash
28+
./scripts/loop-toggle DEEP # writes autonomous_intent.txt = DEEP
29+
# (also writes AUTONOMOUS_MODE.txt = DEEP
30+
# for today's session — preserves old UX)
31+
32+
./scripts/agent-supervisor run & # starts the daemon in background
33+
```
34+
35+
The supervisor creates `~/hypergumbo_lab_notebook/agent-supervisor/` if it doesn't exist. Override the default with `AGENT_SUPERVISOR_STATE_DIR=<path>` if you need the state elsewhere.
36+
37+
Substitute `BROAD` for `DEEP` if you want breadth / linker-coverage work instead of feature-quality work — see [AGENTS.md § Mode Selection](../AGENTS.md).
38+
39+
## Normal operation
40+
41+
Once the supervisor is running, it polls every 60 seconds (tunable via `--interval N`). On each tick it:
42+
43+
1. Reads `autonomous_intent.txt`. If OFF, does nothing.
44+
2. Enumerates tmux sessions whose name starts with `hypergumbo-session-` (reserved prefix — human-managed tmux sessions are never touched).
45+
3. For each such session, checks: is a tmux client attached? is the recorded CLI PID alive? has the pane scrolled in the last 15 minutes?
46+
4. Acts: if no session exists, spawn one. If a session is attached, do nothing (human is watching). If the CLI is dead OR the pane has been frozen for ≥ 15 minutes, run the replacement sequence.
47+
48+
### Watching a live session
49+
50+
The supervisor launches sessions in detached mode. To observe one:
51+
52+
```bash
53+
./scripts/agent-supervisor status # lists live sessions + pane bytes + heartbeat ages
54+
tmux attach -t hypergumbo-session-<UTC-timestamp>
55+
```
56+
57+
Detach without killing the CLI with `Ctrl-B D`.
58+
59+
**Important:** while you are attached, the supervisor will NOT replace the session even if the pane freezes — an attached client blocks replacement, by design. Detach when you're done watching so the watchdog can do its job.
60+
61+
### Pausing the loop
62+
63+
```bash
64+
./scripts/loop-toggle OFF # flips intent to OFF (and today's session mode, too)
65+
```
66+
67+
The supervisor continues running but its decision matrix short-circuits on OFF: no spawns, no replacements. Any live CLI finishes its current work and idles. Resume with another `loop-toggle DEEP` / `BROAD`.
68+
69+
Prefer the narrow form if you want to temporarily disable autonomous mode on *just* the currently-running CLI without flipping project intent:
70+
71+
```bash
72+
./scripts/loop-toggle --set-session-mode OFF # session only; intent stays on
73+
```
74+
75+
### Shutting down for the day
76+
77+
```bash
78+
./scripts/agent-supervisor stop # writes supervisor.stop-sentinel
79+
```
80+
81+
The running daemon consumes the sentinel on its next poll tick (≤ 60 s) and exits cleanly. Your live CLIs keep running until you close them; the supervisor just stops respawning. Re-arm with another `agent-supervisor run &` whenever you come back.
82+
83+
## `status` output
84+
85+
```bash
86+
./scripts/agent-supervisor status | jq .
87+
```
88+
89+
Returns a JSON object with:
90+
91+
- `intent` — current value of `autonomous_intent.txt`.
92+
- `rate_limit` — rolling 24h spawn count, the cap (default 8), and whether a spawn is currently allowed.
93+
- `sessions[]` — one entry per hypergumbo-prefixed tmux session, with `meta` (the stored session-id / CLI pid / vendor / start UTC), `clients_attached`, `pane_bytes` (raw scrollback size in bytes), and `heartbeat_age_sec` (seconds since the per-turn hooks last touched the heartbeat file).
94+
- `stop_requested` — true if a stop sentinel is in flight.
95+
96+
Use `pane_bytes` + `heartbeat_age_sec` together to debug "is this session actually working?" — if pane bytes haven't grown but the heartbeat is fresh, the CLI is stuck in a tool that's not emitting output. If both are stale, the CLI itself is frozen.
97+
98+
## Edge cases
99+
100+
- **Two supervisors for the same project.** The second `agent-supervisor run` invocation fails `fcntl.flock` acquisition on `supervisor.lock` and exits with "another supervisor is already running". This is the enforced single-instance invariant; don't work around it.
101+
- **You want to run a vendor CLI by hand.** Either launch it in a tmux session whose name does NOT start with `hypergumbo-session-` (the supervisor will ignore it entirely), or `loop-toggle OFF` first and it won't get touched.
102+
- **Rate-limited.** If the supervisor has spawned 8 sessions in the last 24 hours (default soft cap), the next spawn is skipped with a log entry in `respawn_log.log` instead of proceeding. Fix the underlying problem — pounding on the spawn button would indicate a deeper issue.
103+
- **Supervisor crashes.** Nothing gets auto-spawned until you restart it with `agent-supervisor run &`. The daemon is not self-restarting by design.
104+
- **Tmux is not installed.** The `run_subprocess` seam returns rc=127 for every tmux call, so `status` works and reports zero sessions. The `run` loop no-ops each tick. Install tmux to unstick.
105+
- **CLI refuses graceful exit.** The supervisor polls `kill -0 <cli_pid>` for 30 seconds after sending the vendor exit keystroke. If the CLI is still alive, it falls back to `tmux kill-session` + direct invocation of `kill-transcript-sync.sh` / `rotate-on-session-end.sh` (the per-session cleanup scripts are already idempotent). An entry appears in `respawn_log.log` as `forced-kill fallback for session <name>`.
106+
107+
## Troubleshooting
108+
109+
| Symptom | Likely cause | Fix |
110+
| --- | --- | --- |
111+
| `agent-supervisor run` fails with "another supervisor is already running" | flock still held by a supervisor PID | `agent-supervisor status` to confirm, then `ps -fp <pid>` on the PID in `supervisor.lock`; if that PID is dead, remove the lock file and retry |
112+
| Live session not getting replaced despite being stuck | You're attached to it, or the pane has scrolled within 15 min | Detach (`Ctrl-B D`); or wait out the 15-minute frozen window |
113+
| `respawn_log.log` shows repeated "rate-limit reached" | 8 spawns in 24h — usually indicates a loop somewhere upstream | Read the log tail + `agent_notes.json` for a pattern; don't just raise the cap |
114+
| Fresh CLI launches but doesn't enable autonomous mode | `autonomous_intent.txt` is OFF or missing | `loop-toggle --set-intent DEEP` (narrow-write, doesn't touch the current session) |
115+
| Fresh CLI launches but the session-start hook doesn't inject the seed prompt | Vendor's hook file missing or unwired | Verify `.agent/hooks/<vendor>/session-start.sh` exists and sources `_shared/session_start_logic.sh` |
116+
117+
## State directory layout
118+
119+
`~/hypergumbo_lab_notebook/agent-supervisor/` (override with `AGENT_SUPERVISOR_STATE_DIR`):
120+
121+
- `supervisor.lock` — flock + pid-file for single-instance enforcement.
122+
- `supervisor.stop-sentinel` — present when a stop is requested; consumed on the next tick.
123+
- `<session>.meta.json` — written on spawn: session_id, cli_pid, vendor, project_dir, tmux session name, start_utc.
124+
- `<session>.heartbeat` — touched by the per-turn hooks (telemetry only; never a spawn/replace input).
125+
- `respawn_log.log` — append-only audit of every spawn / replace / rate-limit event.
126+
- `rate_limit.json` — rolling 24h spawn timestamps.
127+
128+
## What the supervisor does NOT do
129+
130+
- **Decide mode.** The human still picks BROAD vs DEEP via `loop-toggle`. The supervisor only mirrors project intent into each spawned session.
131+
- **Self-heal tmux.** If tmux is down, the supervisor waits silently for it to come back.
132+
- **Restart after crash.** No systemd / cron wiring by default — you launch `agent-supervisor run &` manually (or add it to your shell rc).
133+
- **Persist pane history across restarts.** Pane-byte observations are in-memory only; after a supervisor restart, the first tick per session seeds a new observation and the 15-minute frozen clock restarts.
134+
- **Consult the heartbeat.** Heartbeats are for your debugging / retrospective metrics, not the spawn/replace decision. See WI-sipov.
135+
136+
## Deferred follow-ups
137+
138+
These are noted on their tracker items and would extend the supervisor's reach without changing today's contract:
139+
140+
- **Stop-hook + long-running-command heartbeats.** Today the heartbeat is only touched by per-turn hooks. Wrappers like `auto-pr` / `bakeoff-*` / `smart-test` don't yet have a supervisor-exported session_id env var to key their heartbeat touches. (Tracked as a follow-up on WI-sipov.)
141+
- **Codex / Cursor / Gemini exit keystrokes.** Marked `FIXME WI-batob` in the supervisor's `VENDOR_TABLE` and in the AGENTS.md parity table. Claude Code is verified; the others need a one-time "start the CLI in a throwaway tmux, send the keystroke, confirm exit within 30s" verification.
142+
143+
## Related reading
144+
145+
- [AGENTS.md § Vendor Parity for Respawn](../AGENTS.md) — the per-vendor contract table (hook paths, exit keystrokes, CLI invocations).
146+
- [AGENTS.md § Premature Stopping Prevention](../AGENTS.md) — the autonomous-mode framework the supervisor plugs into.
147+
- `scripts/agent-supervisor` — inline design notes in the script's docstring.
148+
- `scripts/loop-toggle --help` — the intent/mode split (`--set-intent` / `--set-session-mode`).
149+
- Tracker item `WI-razub-duluf-nobun-rulit-dapam-jipal-dafud-nahob` — the full design discussion and resolution notes.

packages/hypergumbo/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -226,6 +226,7 @@ See [CONTRIBUTING.md](CONTRIBUTING.md) for PR workflow (including fork-based wor
226226
- [docs/hypergumbo-spec.md](https://codeberg.org/iterabloom/hypergumbo/src/branch/dev/docs/hypergumbo-spec.md) — Detailed specification
227227
- [docs/CITATIONS.md](https://codeberg.org/iterabloom/hypergumbo/src/branch/dev/docs/CITATIONS.md) — Paper citations for embedding models
228228
- [docs/CACHE.md](https://codeberg.org/iterabloom/hypergumbo/src/branch/dev/docs/CACHE.md) — Caching architecture
229+
- [docs/agent-supervisor.md](https://codeberg.org/iterabloom/hypergumbo/src/branch/dev/docs/agent-supervisor.md) — Operator guide for `scripts/agent-supervisor` (the tmux-session watchdog for autonomous agents)
229230
- [SECURITY.md](https://codeberg.org/iterabloom/hypergumbo/src/branch/dev/SECURITY.md) — Vulnerability reporting
230231
- [hypergumbo-tracker README](packages/hypergumbo-tracker/README.md) — Standalone tracker for AI agent governance
231232

0 commit comments

Comments
 (0)