Autonomous Agent Execution Pipeline

Purpose: Define the policy, context, and guardrails for running autonomous coding agents (Claude Code or similar) against this project's GitHub issues while the maintainer is AFK.

Companion to: STRATEGIC-ROADMAP-2026-05-29.md (the what; this is the how, with what guardrails).

OpenClaw operating layer: OPENCLAW-FORGE-PROTOCOL.md defines how Vision, Gilfoyle, and Heimdall apply this policy. This project has no UI, so Saga is not part of the default loop.

Adopted: 2026-05-29

1. Operating Principles

Agents work in branches, never on main.
Every PR requires independent verification before merge. Vision may merge verified PRs when checks and review triage are green.
Agents declare their scope explicitly and stay inside it.
The canonical validation gate (§5) must pass before any PR is opened. Failing gate → no PR, just a WORKING-NOTES.md on the branch + comment on the issue.
Automated review tools such as CodeRabbit provide review signal only. They do not approve or merge; Vision uses them as review signal before merging.
Forbidden territory (§2) is non-negotiable. Any drift triggers a hard stop.
Recovery is always stop and post a comment, never silently expand scope.

The goal is to keep the forge moving while preserving explicit stop points for money, secrets, external communication, and unresolved architectural judgment.

2. Forbidden Territory (hard stop)

Autonomous agents may NOT modify the following without explicit Vision approval in the issue comments first:

Path / Concern	Reason
Any tool name, parameter, or return shape	Public API surface; semver-significant
`schema.sql` and migrations	Index schema; rebuilds existing user caches
`.github/workflows/` (any workflow)	CI/CD and supply chain
`.github/workflows/release.yml` specifically	Release path; Trusted Publishing config
`pyproject.toml` `[project]` (anything other than `version`)	Identity, dependencies, classifiers
Major dependency bumps (anything ≥1 major)	Compatibility risk
`.planning/POSITIONING.md`	Load-bearing brand asset
`README.md` hero section (above the first install code block)	Load-bearing brand asset
`LICENSE`	Permanent commitment (MIT, always free)
`CHANGELOG.md` (creating entries is fine; rewriting history is not)	Release history
`SECURITY.md`	Trust posture; requires deliberate review
Existing tests (deletion or weakening assertions)	Regression cover
`.planning/ROADMAP.md` historical phase records	Archival history
`AGENT-EXECUTION-PIPELINE.md`, `OPENCLAW-FORGE-PROTOCOL.md`, and `STRATEGIC-ROADMAP-2026-05-29.md`	Governing policy and strategy docs

If an agent's task appears to require touching any of these:

Stop work.
Post a comment on the issue explaining the conflict.
Tag with supervisor-review.
Wait for guidance.

3. Issue Structure (required for every agent-targetable issue)

Every issue intended for an autonomous agent must contain these sections, in this order:

Section	Purpose	Required content
Title	Routability	`[v0.X.Y] <scope> — <verb> <thing>` e.g., `[v0.3.0] cache — add zstd codec layer`
Context	Self-containment	Links to: this pipeline doc, the strategic roadmap, any specific ADR or `.planning/phases/0X-*` directory, prior related issues
Goal	Single sentence	What outcome counts as success
Acceptance criteria	Testable definition of done	Checkbox list per §4
Scope boundaries	Prevents creep	"In scope:" and "Out of scope:" subsections
Forbidden-territory reminder	Belt and suspenders	Repeat the §2 items relevant to this issue
Validation commands	Pre-PR gate	The exact canonical commands per §5
PR template	What the PR description must include	§6 checklist
Recovery	What to do if blocked	Pointer to §8
Effort estimate	Sanity check	Rough hours; agent should bail and escalate if work exceeds 2×

An issue missing any of these is not agent-ready. The pre-flight checklist (§10) gates this.

4. Acceptance-Criteria Patterns

Each criterion must be: testable, atomic, achievable without touching forbidden territory, and sized so a competent dev could verify it in <5 minutes.

Good examples:

"uv run pytest tests/cache/test_codec.py -q passes with at least 4 new tests covering: codec round-trip for 'none', codec round-trip for 'zstd', codec round-trip for 'zstd-dict-v1', and graceful read of pre-existing compression='none' rows."
"After cherry-picking this branch, python -c 'from mcp_server_python_docs.cache.codec import list_supported; print(list_supported())' prints exactly ['none', 'zstd', 'zstd-dict-v1']."
"grep -rn 'yaml.load(' src/ tests/ returns zero hits. grep -rn 'yaml.safe_load(' src/ returns at least the expected call sites in server.py and ingestion/sphinx_json.py."
"README.md ## Tools section lists exactly six rows in the table, including compare_versions, and the row order matches the @mcp.tool declaration order in src/mcp_server_python_docs/server.py."

Bad examples (do not allow):

"Improve cache performance." — not testable
"Make it production-ready." — not specific
"Refactor for clarity." — invites scope creep
"Add tests." — what tests, asserting what?

5. Canonical Validation Gate

Must pass, in this order, before any PR is opened:

uv run ruff check src/ tests/
uv run pyright src/
uv run pytest --tb=short -q
uv run python-docs-mcp-server doctor

Additional gates for specific change types:

Any change touching the MCP wire protocol or tool registration:
```
uv run pytest tests/test_stdio_smoke.py -q
```

Any change to ingestion or storage:

uv run python-docs-mcp-server validate-corpus

Any change touching dependencies:

uv lock --check
uv pip compile --quiet pyproject.toml -o /tmp/requirements-check.txt

Failure handling: If any gate fails, the agent writes the full output into a file WORKING-NOTES.md at the branch root, commits it as agent: validation-gate-failed, posts a comment on the issue with a link to the failing commit, and stops. No PR is opened.

6. Branch, Commit, PR Conventions

Branch name: agent/{issue-number}-{kebab-case-summary} (e.g., agent/47-zstd-cache-codec)
Commit prefix: agent: followed by a short, imperative summary. Conventional-commit scopes optional but encouraged: agent: cache(codec): add zstd round-trip path
Atomic commits. One logical change per commit. No squash-and-force-push during agent work.
PR title matches the issue title verbatim
PR description must include:
- Closes #<issue-number> (or Refs # if intentionally not closing)
- Each acceptance criterion as a checked or unchecked box, with a one-line explanation if unchecked
- Output (or link to artifact) for the §5 validation gate
- CodeRabbit triage summary when CodeRabbit comments on the PR: blocking, follow-up, false positive, or pending/unavailable
- A short "Why this approach" paragraph if the design wasn't fully prescribed in the issue
- The §7 "Supervisor review" disclosure (which doubles as a forbidden-territory near-miss log when applicable; CODEOWNERS is the mechanical enforcement)
PR is opened against the milestone integration branch (e.g., release/v0.3.0) when one exists, otherwise main. Never self-merge.

7. Supervisor-Review Triggers (always pause)

The agent must open the PR but NOT request merge — and must add the supervisor-review label — if any of these are true:

Trigger	Why
Any forbidden-territory item (§2) was modified	By definition
Any existing test was deleted	Possible regression-cover loss
Diff exceeds 500 lines of source code (excluding generated and lockfiles)	Bigger than a single agent task should be
A new third-party runtime dependency was introduced	Trust-posture and footprint review
Any `pyproject.toml` field changed (other than `version` during a release issue)	Identity / metadata review
`.github/workflows/` was modified	CI/release-path review
The PR introduces network access at runtime	Violates principle 2.2 (offline-first)
The PR introduces async code in a previously-sync code path	Concurrency review
The agent's "Why this approach" paragraph cites a design choice not in the issue	Verify scope

For each trigger, the PR description must include a ## Why this triggered supervisor review section explaining what changed and why the agent believes it was necessary.

8. Recovery Procedures

If the agent encounters any of these conditions, stop work and post a comment on the issue:

A previously passing test now fails for an unclear reason
A change to forbidden territory appears necessary to complete the task
The acceptance criteria turn out to be ambiguous or contradictory
An upstream dependency is broken or unavailable
Work appears to exceed 2× the original effort estimate

The stop comment must contain:

What was attempted (1–3 sentences)
What failed or blocked (with error output if applicable)
The agent's best read on the path forward
An explicit "I am stopping pending guidance" line

Forbidden recovery moves:

Silently expanding scope
Trying alternative implementations not specified in the issue
Merging to main
Deleting tests to make others pass
Suppressing warnings or skipping tests as a "workaround"

9. Context Files Required Before Agents Run

These files must exist on main before the v0.3.0 issues are unleashed to autonomous agents.

File	Purpose	Status
`AGENTS.md`	Existing repo-conventions doc; should reference this pipeline	Needs update — add a one-paragraph link to this file
`STRATEGIC-ROADMAP-2026-05-29.md`	The what and why; mandatory reading	Exists
`AGENT-EXECUTION-PIPELINE.md` (this file)	The how, with what guardrails	Exists
`OPENCLAW-FORGE-PROTOCOL.md`	OpenClaw role split and MCP-specific execution loop	Exists
`.github/ISSUE_TEMPLATE/autonomous-agent.yml`	Issue template enforcing §3 structure	Create — see §11 sketch
`.github/PULL_REQUEST_TEMPLATE/agent.md`	PR template enforcing §6	Create — see §11 sketch
`.github/CODEOWNERS`	Requests owner attention on forbidden-territory paths	Create — see §11 sketch
`docs/architecture/TOKEN-STUDY-METHODOLOGY.md`	Methodology spec for the v0.3.0 first issue	Create as part of that issue spec
GitHub label: `supervisor-review`	Marks PRs paused at §7 triggers	Create
GitHub label: `agent-ready`	Confirms issue passed §10 pre-flight	Create
Branch protection on `main`	Allows Vision to merge after verification and green checks	Confirm enabled
Branch protection on `release/v0.3.0` (when created)	Same	Configure at branch creation

10. Pre-flight Checklist (run before unleashing agents on a milestone)

Run this checklist before pushing the first agent-ready issue to the queue.

All §9 context files exist on main.
The §5 canonical validation gate passes on main (clean baseline).
Each issue has been pre-flighted by Vision and labeled agent-ready.
Each issue includes its §3 sections in full.
The supervisor-review and agent-ready labels exist in the repo.
CODEOWNERS forces review on at least: pyproject.toml, .github/workflows/, LICENSE, README.md, .planning/POSITIONING.md, schema.sql.
Branch protection on main keeps deletion and force-push protection active without review deadlock.
At least one issue is small enough (≤4 hours) to serve as a confidence-building first run.

11. Templates (authoritative implementations)

The §9 templates live as checked-in files; this section just points at them so this doc never drifts from the implementations:

.github/ISSUE_TEMPLATE/autonomous-agent.yml
.github/PULL_REQUEST_TEMPLATE/agent.md
.github/CODEOWNERS

12. Per-Issue Context Files (v0.3.0 wave)

For the first wave of agent-targetable issues, Claude Code should include — as linked references in each issue — a dedicated .planning/agent-context/<issue-slug>.md file that captures the agent's working notes, design decisions to honor, and any test fixtures or code excerpts the agent will need to look at.

Each per-issue context file should contain:

The relevant excerpt from STRATEGIC-ROADMAP-2026-05-29.md (don't make the agent re-derive the goal).
Pointer to the existing code touch-points (file paths + symbols).
Existing test patterns to follow (one or two example tests from the same area).
Known pitfalls specific to this task.
Decision log placeholder for the agent to fill in.

The point is to give the agent everything it needs in one read, so it doesn't go fishing across the repo and pick up incorrect patterns from .planning/ archive material.

13. Suggested Agent-Targetable Issues for v0.3.0

Mapping the v0.3.0 deliverables to agent-friendliness, to help prioritize issue generation:

Deliverable	Agent-friendly?	Why	Recommended owner
Workstream J — zstd cache codec layer	Yes (high)	Well-bounded, testable, no API surface change	Agent
README + PyPI + glama.json refresh to 6-tool surface	Yes (high)	Mechanical, easily verified, low risk	Agent
PyYAML safe-loader audit	Yes (medium)	Simple grep + fix; need agent to surface findings before changing	Agent
ADR-001 (Source Adapters) draft	Yes (medium)	Writing task; needs clear template + style guide	Agent with strict template
ADR-006 (Serialization) draft	Yes (medium)	Same as ADR-001	Agent with strict template
Build-time supply-chain hardening (CPython SHA pin + SECURITY.md update)	Partial	Pinning is mechanical; SECURITY.md text needs judgment	Agent for pinning; Vision for SECURITY.md
30-minute TOON Python port audit	No	Requires subjective quality judgment	Human
Empirical token study	No	Methodology choices and corpus selection require judgment	Human (with agent scaffolding the harness)

The v0.3.0 issue wave should therefore lead with the high-confidence agent issues so the overnight run produces obvious wins, then escalate to the partial / Vision-judgment items the following day under Vision supervision.

Amendments

Append amendments below as ## Amendment YYYY-MM-DD sections. Do not edit historical content above this line; the locked sections are the authoritative current policy.

Amendment 2026-05-29 — OpenClaw Role Split

OpenClaw execution for this repo is governed by OPENCLAW-FORGE-PROTOCOL.md. The default loop is Vision → Gilfoyle → Heimdall → Vision/Aymen:

Vision owns issue pre-flight, agent-ready, review synthesis, branch protection, and pause/resume decisions.
Gilfoyle owns scoped implementation on exactly one issue at a time.
Heimdall owns independent verification, packaging/install smoke, security-sensitive checks, and release-readiness checks.
CodeRabbit findings are mandatory review signal when present. Vision/Heimdall must triage them as blocking, follow-up, or false positive before verified.
Saga is not in the default loop because this MCP has no UI.
Pipeline Monitor remains disabled unless Aymen explicitly asks for assisted merge checks; no Vision-owned merge is allowed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autonomous Agent Execution Pipeline

1. Operating Principles

2. Forbidden Territory (hard stop)

3. Issue Structure (required for every agent-targetable issue)

4. Acceptance-Criteria Patterns

5. Canonical Validation Gate

6. Branch, Commit, PR Conventions

7. Supervisor-Review Triggers (always pause)

8. Recovery Procedures

9. Context Files Required Before Agents Run

10. Pre-flight Checklist (run before unleashing agents on a milestone)

11. Templates (authoritative implementations)

12. Per-Issue Context Files (v0.3.0 wave)

13. Suggested Agent-Targetable Issues for v0.3.0

Amendments

Amendment 2026-05-29 — OpenClaw Role Split

FilesExpand file tree

AGENT-EXECUTION-PIPELINE.md

Latest commit

History

AGENT-EXECUTION-PIPELINE.md

File metadata and controls

Autonomous Agent Execution Pipeline

1. Operating Principles

2. Forbidden Territory (hard stop)

3. Issue Structure (required for every agent-targetable issue)

4. Acceptance-Criteria Patterns

5. Canonical Validation Gate

6. Branch, Commit, PR Conventions

7. Supervisor-Review Triggers (always pause)

8. Recovery Procedures

9. Context Files Required Before Agents Run

10. Pre-flight Checklist (run before unleashing agents on a milestone)

11. Templates (authoritative implementations)

12. Per-Issue Context Files (v0.3.0 wave)

13. Suggested Agent-Targetable Issues for v0.3.0

Amendments

Amendment 2026-05-29 — OpenClaw Role Split