Purpose: Define the policy, context, and guardrails for running autonomous coding agents (Claude Code or similar) against this project's GitHub issues while the maintainer is AFK.
Companion to: STRATEGIC-ROADMAP-2026-05-29.md (the what; this is the how, with what guardrails).
OpenClaw operating layer: OPENCLAW-FORGE-PROTOCOL.md defines how Vision, Gilfoyle, and Heimdall apply this policy. This project has no UI, so Saga is not part of the default loop.
Adopted: 2026-05-29
- Agents work in branches, never on
main. - Every PR requires independent verification before merge. Vision may merge verified PRs when checks and review triage are green.
- Agents declare their scope explicitly and stay inside it.
- The canonical validation gate (§5) must pass before any PR is opened. Failing gate → no PR, just a
WORKING-NOTES.mdon the branch + comment on the issue. - Automated review tools such as CodeRabbit provide review signal only. They do not approve or merge; Vision uses them as review signal before merging.
- Forbidden territory (§2) is non-negotiable. Any drift triggers a hard stop.
- Recovery is always stop and post a comment, never silently expand scope.
The goal is to keep the forge moving while preserving explicit stop points for money, secrets, external communication, and unresolved architectural judgment.
Autonomous agents may NOT modify the following without explicit Vision approval in the issue comments first:
| Path / Concern | Reason |
|---|---|
| Any tool name, parameter, or return shape | Public API surface; semver-significant |
schema.sql and migrations |
Index schema; rebuilds existing user caches |
.github/workflows/ (any workflow) |
CI/CD and supply chain |
.github/workflows/release.yml specifically |
Release path; Trusted Publishing config |
pyproject.toml [project] (anything other than version) |
Identity, dependencies, classifiers |
| Major dependency bumps (anything ≥1 major) | Compatibility risk |
.planning/POSITIONING.md |
Load-bearing brand asset |
README.md hero section (above the first install code block) |
Load-bearing brand asset |
LICENSE |
Permanent commitment (MIT, always free) |
CHANGELOG.md (creating entries is fine; rewriting history is not) |
Release history |
SECURITY.md |
Trust posture; requires deliberate review |
| Existing tests (deletion or weakening assertions) | Regression cover |
.planning/ROADMAP.md historical phase records |
Archival history |
AGENT-EXECUTION-PIPELINE.md, OPENCLAW-FORGE-PROTOCOL.md, and STRATEGIC-ROADMAP-2026-05-29.md |
Governing policy and strategy docs |
If an agent's task appears to require touching any of these:
- Stop work.
- Post a comment on the issue explaining the conflict.
- Tag with
supervisor-review. - Wait for guidance.
Every issue intended for an autonomous agent must contain these sections, in this order:
| Section | Purpose | Required content |
|---|---|---|
| Title | Routability | [v0.X.Y] <scope> — <verb> <thing> e.g., [v0.3.0] cache — add zstd codec layer |
| Context | Self-containment | Links to: this pipeline doc, the strategic roadmap, any specific ADR or .planning/phases/0X-* directory, prior related issues |
| Goal | Single sentence | What outcome counts as success |
| Acceptance criteria | Testable definition of done | Checkbox list per §4 |
| Scope boundaries | Prevents creep | "In scope:" and "Out of scope:" subsections |
| Forbidden-territory reminder | Belt and suspenders | Repeat the §2 items relevant to this issue |
| Validation commands | Pre-PR gate | The exact canonical commands per §5 |
| PR template | What the PR description must include | §6 checklist |
| Recovery | What to do if blocked | Pointer to §8 |
| Effort estimate | Sanity check | Rough hours; agent should bail and escalate if work exceeds 2× |
An issue missing any of these is not agent-ready. The pre-flight checklist (§10) gates this.
Each criterion must be: testable, atomic, achievable without touching forbidden territory, and sized so a competent dev could verify it in <5 minutes.
Good examples:
- "
uv run pytest tests/cache/test_codec.py -qpasses with at least 4 new tests covering: codec round-trip for'none', codec round-trip for'zstd', codec round-trip for'zstd-dict-v1', and graceful read of pre-existingcompression='none'rows." - "After cherry-picking this branch,
python -c 'from mcp_server_python_docs.cache.codec import list_supported; print(list_supported())'prints exactly['none', 'zstd', 'zstd-dict-v1']." - "
grep -rn 'yaml.load(' src/ tests/returns zero hits.grep -rn 'yaml.safe_load(' src/returns at least the expected call sites inserver.pyandingestion/sphinx_json.py." - "README.md
## Toolssection lists exactly six rows in the table, includingcompare_versions, and the row order matches the@mcp.tooldeclaration order insrc/mcp_server_python_docs/server.py."
Bad examples (do not allow):
- "Improve cache performance." — not testable
- "Make it production-ready." — not specific
- "Refactor for clarity." — invites scope creep
- "Add tests." — what tests, asserting what?
Must pass, in this order, before any PR is opened:
uv run ruff check src/ tests/
uv run pyright src/
uv run pytest --tb=short -q
uv run python-docs-mcp-server doctorAdditional gates for specific change types:
- Any change touching the MCP wire protocol or tool registration:
uv run pytest tests/test_stdio_smoke.py -q
- Any change to ingestion or storage:
uv run python-docs-mcp-server validate-corpus
- Any change touching dependencies:
uv lock --check uv pip compile --quiet pyproject.toml -o /tmp/requirements-check.txt
Failure handling: If any gate fails, the agent writes the full output into a file WORKING-NOTES.md at the branch root, commits it as agent: validation-gate-failed, posts a comment on the issue with a link to the failing commit, and stops. No PR is opened.
- Branch name:
agent/{issue-number}-{kebab-case-summary}(e.g.,agent/47-zstd-cache-codec) - Commit prefix:
agent:followed by a short, imperative summary. Conventional-commit scopes optional but encouraged:agent: cache(codec): add zstd round-trip path - Atomic commits. One logical change per commit. No squash-and-force-push during agent work.
- PR title matches the issue title verbatim
- PR description must include:
Closes #<issue-number>(orRefs #if intentionally not closing)- Each acceptance criterion as a checked or unchecked box, with a one-line explanation if unchecked
- Output (or link to artifact) for the §5 validation gate
- CodeRabbit triage summary when CodeRabbit comments on the PR: blocking, follow-up, false positive, or pending/unavailable
- A short "Why this approach" paragraph if the design wasn't fully prescribed in the issue
- The §7 "Supervisor review" disclosure (which doubles as a forbidden-territory near-miss log when applicable; CODEOWNERS is the mechanical enforcement)
- PR is opened against the milestone integration branch (e.g.,
release/v0.3.0) when one exists, otherwisemain. Never self-merge.
The agent must open the PR but NOT request merge — and must add the supervisor-review label — if any of these are true:
| Trigger | Why |
|---|---|
| Any forbidden-territory item (§2) was modified | By definition |
| Any existing test was deleted | Possible regression-cover loss |
| Diff exceeds 500 lines of source code (excluding generated and lockfiles) | Bigger than a single agent task should be |
| A new third-party runtime dependency was introduced | Trust-posture and footprint review |
Any pyproject.toml field changed (other than version during a release issue) |
Identity / metadata review |
.github/workflows/ was modified |
CI/release-path review |
| The PR introduces network access at runtime | Violates principle 2.2 (offline-first) |
| The PR introduces async code in a previously-sync code path | Concurrency review |
| The agent's "Why this approach" paragraph cites a design choice not in the issue | Verify scope |
For each trigger, the PR description must include a ## Why this triggered supervisor review section explaining what changed and why the agent believes it was necessary.
If the agent encounters any of these conditions, stop work and post a comment on the issue:
- A previously passing test now fails for an unclear reason
- A change to forbidden territory appears necessary to complete the task
- The acceptance criteria turn out to be ambiguous or contradictory
- An upstream dependency is broken or unavailable
- Work appears to exceed 2× the original effort estimate
The stop comment must contain:
- What was attempted (1–3 sentences)
- What failed or blocked (with error output if applicable)
- The agent's best read on the path forward
- An explicit "I am stopping pending guidance" line
Forbidden recovery moves:
- Silently expanding scope
- Trying alternative implementations not specified in the issue
- Merging to
main - Deleting tests to make others pass
- Suppressing warnings or skipping tests as a "workaround"
These files must exist on main before the v0.3.0 issues are unleashed to autonomous agents.
| File | Purpose | Status |
|---|---|---|
AGENTS.md |
Existing repo-conventions doc; should reference this pipeline | Needs update — add a one-paragraph link to this file |
STRATEGIC-ROADMAP-2026-05-29.md |
The what and why; mandatory reading | Exists |
AGENT-EXECUTION-PIPELINE.md (this file) |
The how, with what guardrails | Exists |
OPENCLAW-FORGE-PROTOCOL.md |
OpenClaw role split and MCP-specific execution loop | Exists |
.github/ISSUE_TEMPLATE/autonomous-agent.yml |
Issue template enforcing §3 structure | Create — see §11 sketch |
.github/PULL_REQUEST_TEMPLATE/agent.md |
PR template enforcing §6 | Create — see §11 sketch |
.github/CODEOWNERS |
Requests owner attention on forbidden-territory paths | Create — see §11 sketch |
docs/architecture/TOKEN-STUDY-METHODOLOGY.md |
Methodology spec for the v0.3.0 first issue | Create as part of that issue spec |
GitHub label: supervisor-review |
Marks PRs paused at §7 triggers | Create |
GitHub label: agent-ready |
Confirms issue passed §10 pre-flight | Create |
Branch protection on main |
Allows Vision to merge after verification and green checks | Confirm enabled |
Branch protection on release/v0.3.0 (when created) |
Same | Configure at branch creation |
Run this checklist before pushing the first agent-ready issue to the queue.
- All §9 context files exist on
main. - The §5 canonical validation gate passes on
main(clean baseline). - Each issue has been pre-flighted by Vision and labeled
agent-ready. - Each issue includes its §3 sections in full.
- The
supervisor-reviewandagent-readylabels exist in the repo. - CODEOWNERS forces review on at least:
pyproject.toml,.github/workflows/,LICENSE,README.md,.planning/POSITIONING.md,schema.sql. - Branch protection on
mainkeeps deletion and force-push protection active without review deadlock. - At least one issue is small enough (≤4 hours) to serve as a confidence-building first run.
The §9 templates live as checked-in files; this section just points at them so this doc never drifts from the implementations:
.github/ISSUE_TEMPLATE/autonomous-agent.yml.github/PULL_REQUEST_TEMPLATE/agent.md.github/CODEOWNERS
For the first wave of agent-targetable issues, Claude Code should include — as linked references in each issue — a dedicated .planning/agent-context/<issue-slug>.md file that captures the agent's working notes, design decisions to honor, and any test fixtures or code excerpts the agent will need to look at.
Each per-issue context file should contain:
- The relevant excerpt from
STRATEGIC-ROADMAP-2026-05-29.md(don't make the agent re-derive the goal). - Pointer to the existing code touch-points (file paths + symbols).
- Existing test patterns to follow (one or two example tests from the same area).
- Known pitfalls specific to this task.
- Decision log placeholder for the agent to fill in.
The point is to give the agent everything it needs in one read, so it doesn't go fishing across the repo and pick up incorrect patterns from .planning/ archive material.
Mapping the v0.3.0 deliverables to agent-friendliness, to help prioritize issue generation:
| Deliverable | Agent-friendly? | Why | Recommended owner |
|---|---|---|---|
| Workstream J — zstd cache codec layer | Yes (high) | Well-bounded, testable, no API surface change | Agent |
| README + PyPI + glama.json refresh to 6-tool surface | Yes (high) | Mechanical, easily verified, low risk | Agent |
| PyYAML safe-loader audit | Yes (medium) | Simple grep + fix; need agent to surface findings before changing | Agent |
| ADR-001 (Source Adapters) draft | Yes (medium) | Writing task; needs clear template + style guide | Agent with strict template |
| ADR-006 (Serialization) draft | Yes (medium) | Same as ADR-001 | Agent with strict template |
| Build-time supply-chain hardening (CPython SHA pin + SECURITY.md update) | Partial | Pinning is mechanical; SECURITY.md text needs judgment | Agent for pinning; Vision for SECURITY.md |
| 30-minute TOON Python port audit | No | Requires subjective quality judgment | Human |
| Empirical token study | No | Methodology choices and corpus selection require judgment | Human (with agent scaffolding the harness) |
The v0.3.0 issue wave should therefore lead with the high-confidence agent issues so the overnight run produces obvious wins, then escalate to the partial / Vision-judgment items the following day under Vision supervision.
Append amendments below as ## Amendment YYYY-MM-DD sections. Do not edit historical content above this line; the locked sections are the authoritative current policy.
OpenClaw execution for this repo is governed by OPENCLAW-FORGE-PROTOCOL.md.
The default loop is Vision → Gilfoyle → Heimdall → Vision/Aymen:
- Vision owns issue pre-flight,
agent-ready, review synthesis, branch protection, and pause/resume decisions. - Gilfoyle owns scoped implementation on exactly one issue at a time.
- Heimdall owns independent verification, packaging/install smoke, security-sensitive checks, and release-readiness checks.
- CodeRabbit findings are mandatory review signal when present. Vision/Heimdall must triage them as blocking, follow-up, or false positive before
verified. - Saga is not in the default loop because this MCP has no UI.
- Pipeline Monitor remains disabled unless Aymen explicitly asks for assisted merge checks; no Vision-owned merge is allowed.