|
| 1 | +# CLAUDE.md — ControlFlow for Claude Code |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +`AGENTS.md` manages Copilot-agent contracts in VS Code (P.A.R.T. format, model routing, governance). |
| 6 | +This file `CLAUDE.md` controls Claude Code behavior for this repository. |
| 7 | +When both are active: follow the source matching your tool — AGENTS.md for Copilot agents, CLAUDE.md for Claude Code sessions. |
| 8 | + |
| 9 | +--- |
| 10 | + |
| 11 | +## When to Generate a Plan |
| 12 | + |
| 13 | +Generate a structured plan **before** implementation when the task is MEDIUM/LARGE: |
| 14 | + |
| 15 | +| Tier | Criteria | Action | |
| 16 | +|------|----------|--------| |
| 17 | +| TRIVIAL | 1–2 files, single concern, low blast radius | No plan artifact (describe steps inline) | |
| 18 | +| SMALL | 3–5 files, one subsystem | Plan + plan-audit | |
| 19 | +| MEDIUM | 6–14 files or multiple concerns | Full plan + audit + assumption-verifier | |
| 20 | +| LARGE | 15+ files OR any high-impact risk | Full plan + all three verifiers | |
| 21 | + |
| 22 | +**Override rule:** if ANY unresolved semantic risk is both *applicable* and *HIGH impact*, treat as LARGE regardless of file count. |
| 23 | + |
| 24 | +--- |
| 25 | + |
| 26 | +## Plan Generation Contract (v2) |
| 27 | + |
| 28 | +Save every non-trivial plan at: `plans/<task-slug>-plan.md` (kebab-case, under 4 words). |
| 29 | + |
| 30 | +### YAML Header (exact fields — no fence) |
| 31 | + |
| 32 | +```yaml |
| 33 | +Status: READY_FOR_EXECUTION | ABSTAIN | REPLAN_REQUIRED |
| 34 | +Agent: controlflow-planning |
| 35 | +Schema Version: 2.0.0 |
| 36 | +Complexity Tier: TRIVIAL | SMALL | MEDIUM | LARGE |
| 37 | +Confidence: 0.0–1.0 (computed; below 0.87 auto-NEEDS_REVISION) |
| 38 | +Abstain: is_abstaining: false or [ true, reasons: [...] ] |
| 39 | +Summary: One paragraph describing task and approach |
| 40 | +``` |
| 41 | +
|
| 42 | +### Sections (in order — never skip or reorder) |
| 43 | +
|
| 44 | +| # | Section | Mandatory for | Key requirements | |
| 45 | +|---|---------|---------------|-----------------| |
| 46 | +| 1 | Context & Analysis | ALL | Verified facts only; separate assumptions clearly with bounded scope statement | |
| 47 | +| 2 | Design Decisions | MEDIUM+ | Arch choices + rejected alternatives, boundary/integration points, constraints/trade-offs, temporal flow diagram (sequenceDiagram for non-trivial orchestration) | |
| 48 | +| 3 | Implementation Phases | ALL | Phase count: 3–10. Each phase: Objective, Executor Agent, Wave#, Dependencies, Files create/modify, Tests add/update, Acceptance Criteria (measurable), Quality Gates, Failure Expectations + mitigation | |
| 49 | +| 4 | Inter-Phase Contracts | MEDIUM+ | Deliverable format from upstream + exactly how downstream validates it | |
| 50 | +| 5 | Open Questions | ALL | Explicitly listed; if any could change scope → stop and ask user | |
| 51 | +| 6 | Risks | ALL | Table: Risk \| Impact \| Likelihood \| Mitigation | |
| 52 | +| 7 | Semantic Risk Review | ALL (all 7 rows) | data_volume, performance, concurrency, access_control, migration_rollback, dependency, operability — every row present, even if `not_applicable` with justification | |
| 53 | +| 8 | Architecture Visualization | MEDIUM+ | Flowchart TD DAG base; sequenceDiagram for non-trivial orchestration; each ≤30 lines source | |
| 54 | +| 9 | Success Criteria | ALL | Measurable system-level indicators tied back to phase acceptance criteria | |
| 55 | +| 10 | Handoff & Execution Notes | ALL | Target Agent, Prompt, execution order, parallelization opportunities, max parallel agents (default: 3) | |
| 56 | + |
| 57 | +**Lifecycle Sections** (append for SMALL+ only): Progress → Discoveries → Decision Log → Outcomes → Idempotence & Recovery. One sentence per entry, evidence-backed. Omit for draft/abandoned plans. |
| 58 | + |
| 59 | +### Non-negotiable rules |
| 60 | + |
| 61 | +- Steps in numbered prose — **no code blocks** inside plan document |
| 62 | +- Acceptance criteria MUST include at least one measurable condition referencing observable outcome (test pass, file produced, CI check clears) |
| 63 | +- Every phase declares exactly one `executor_agent` from: `CodeMapper-subagent`, `Researcher-subagent`, `CoreImplementer-subagent`, `UIImplementer-subagent`, `PlatformEngineer-subagent`, `TechnicalWriter-subagent` |
| 64 | +- All verification must be automatable — no manual testing steps |
| 65 | +- Quality Gates use standard values only: `tests_pass`, `lint_clean`, `schema_valid`, `safety_clear`, `human_approved_if_required` |
| 66 | + |
| 67 | +--- |
| 68 | + |
| 69 | +## Embedded Audit Pipeline (v2) |
| 70 | + |
| 71 | +### Pre-flight Check (run once at top of Step 1) |
| 72 | + |
| 73 | +| Action | Command / Method | Why | |
| 74 | +|--------|-----------------|-----| |
| 75 | +| Unfinished plans | `ls plans/` — flag any non-APPROVED/non-REJECTED files | May interfere with current work | |
| 76 | +| Branch divergence | `git log --oneline -5 && git status --short` | Flag early as open question if dirty | |
| 77 | +| Dependency conflicts | Read manifest vs what plan proposes to modify | Version mismatches = risk items | |
| 78 | + |
| 79 | +If clean: log "No interference detected" and continue. |
| 80 | + |
| 81 | +### Step 1: Spec Capture & Fact Verification |
| 82 | + |
| 83 | +Confirm scope by anchoring to explicit requirements: |
| 84 | +- What user-visible behavior changes? |
| 85 | +- Which files, functions, tables, APIs are affected? (read relevant manifest or import chain) |
| 86 | +- What constraints exist — dependencies, coding standards, API contracts? |
| 87 | + |
| 88 | +If vague on any → **ask the user**. Do not infer scope from chat alone. |
| 89 | + |
| 90 | +### Step 2: Structural Validation (before adversarial review) |
| 91 | + |
| 92 | +1. Header keys present; Status is one of exactly `READY_FOR_EXECUTION`, `ABSTAIN`, or `REPLAN_REQUIRED` |
| 93 | +2. Sections 1–10 exist in order; lifecycle sections in exact specified order (if present) |
| 94 | +3. Section 7 has exactly seven categories — no more, no fewer |
| 95 | +4. All gate values are from the standard set of five |
| 96 | +5. Executor agents match ControlFlow sub-agent enum strings |
| 97 | +6. LARGE tier requires both flowchart + sequenceDiagram; each ≤30 lines |
| 98 | + |
| 99 | +If any check fails → `NEEDS_REVISION` immediately (structural layer). |
| 100 | + |
| 101 | +### Step 3: Adversarial Review — 10-Point Safety Checklist |
| 102 | + |
| 103 | +| # | Check | Verdict criterion | |
| 104 | +|---|-------|------------------| |
| 105 | +| 1 | All referenced files/paths real? | Verified in repo | |
| 106 | +| 2 | Clear objectives, no scope overlap between same-wave phases? | Each phase independent | |
| 107 | +| 3 | Acceptance criteria objectively testable? | Not vague ("handle errors", "make it secure") | |
| 108 | +| 4 | Verification commands concrete enough for fresh executor? | No guessing required | |
| 109 | +| 5 | Destructive/migration-heavy phase has rollback/recovery guidance? | HIGH blast radius → `human_approved_if_required`; MEDIUM → `safety_clear` | |
| 110 | +| 6 | Missing dependency assumptions, version constraints, unpinned APIs? | All pinned or flagged | |
| 111 | +| 7 | Data volume concerns documented (bulk ops, pagination)? | If applicable | |
| 112 | +| 8 | Concurrency surface: shared mutable state, race windows? | Ownership/ordering explicit; no "should be safe" hand-waving | |
| 113 | +| 9 | Fresh executor blocked by ambiguity in Phase 1–3? | Should execute Phase 1 without asking | |
| 114 | +| 10 | Security/operability (permissions, auth, deploy configs, monitoring)? | Stronger gates if needed | |
| 115 | + |
| 116 | +**Deep dimensions:** data_volume triggers → add discovery step; performance hot paths → name expected bottleneck + verification method; concurrency → explicit locking order map. |
| 117 | + |
| 118 | +### Step 4: Confidence Scoring & Verdict |
| 119 | + |
| 120 | +Score each checklist item as `confirmed` / `uncertain` / `refuted`. |
| 121 | + |
| 122 | +``` |
| 123 | +confidence = confirmed_count / total_items_with_any_actionable_question |
| 124 | +Round to two decimal digits. |
| 125 | +``` |
| 126 | +
|
| 127 | +**Capping rules:** |
| 128 | +- If uncertain ≥ 2 → auto-cap at 0.85; insert research phases for those items |
| 129 | +- Any HIGH-impact row marked `open_question` → cap at 0.7 + add research spike phase |
| 130 | +- confirmed < 6 out of ≥10 total → NEEDS_REVISION (insufficient evidence) |
| 131 | +
|
| 132 | +**Verdict:** |
| 133 | +
|
| 134 | +| Condition | Verdict | Action | |
| 135 | +|-----------|---------|--------| |
| 136 | +| All checks pass, first-phase fully actionable, criteria measurable | `APPROVED` | Proceed to execution with handoff section | |
| 137 | +| Critical gap: ambiguous Phase 1, no rollback on destructive change, unverified paths, vague criteria | `NEEDS_REVISION` | Update plan sections listed by finding; re-audit until pass or escalation threshold breached | |
| 138 | +| Structural flaw in architecture; scope not deliverable as authored | `REJECTED` | Explain blockers; ask user for direction. Do NOT start coding. | |
| 139 | +
|
| 140 | +### Step 5: Resolution & Handoff |
| 141 | +
|
| 142 | +- **APPROVED** → embed handoff prompt into Section 10 of plan artifact |
| 143 | +- **NEEDS_REVISION** → list each finding with exact section reference; edit in-place; re-audit |
| 144 | +- **REJECTED/ESCALATED** → explain blockers, provide next concrete step to unblock. Pause. |
| 145 | +
|
| 146 | +--- |
| 147 | +
|
| 148 | +## Semantic Risk Review — 7 Mandatory Categories |
| 149 | +
|
| 150 | +Each plan MUST include all 7 categories exactly once: |
| 151 | +
|
| 152 | +| Category | Applicability | Impact | Evidence Source | Disposition | |
| 153 | +|----------|---------------|--------|-----------------|-------------| |
| 154 | +| data_volume | applicable / not_applicable / uncertain | HIGH / MEDIUM / LOW / UNKNOWN | file, command, or repo evidence | resolved / open_question / research_phase_added / not_applicable | |
| 155 | +| performance | ... | ... | ... | ... | |
| 156 | +| concurrency | ... | ... | ... | ... | |
| 157 | +| access_control | ... | ... | ... | ... | |
| 158 | +| migration_rollback | ... | ... | ... | ... | |
| 159 | +| dependency | ... | ... | ... | ... | |
| 160 | +| operability | ... | ... | ... | ... | |
| 161 | +
|
| 162 | +Never skip a category — if not applicable, set `not_applicable` + justification. |
| 163 | +
|
| 164 | +--- |
| 165 | +
|
| 166 | +## Tier-gated Review Pipeline |
| 167 | +
|
| 168 | +Transition from planning → execution requires review per tier: |
| 169 | +
|
| 170 | +| Tier | Skill (slash command) | |
| 171 | +|------|----------------------| |
| 172 | +| TRIVIAL | Skip | |
| 173 | +| SMALL | `/controlflow-claude-code:controlflow-plan-audit` | |
| 174 | +| MEDIUM | plan-audit + `/controlflow-claude-code:controlflow-assumption-verifier` | |
| 175 | +| LARGE | plan-audit + assumption-verifier + `/controlflow-claude-code:controlflow-executability-verifier` | |
| 176 | +
|
| 177 | +--- |
| 178 | +
|
| 179 | +## Workflow Entry Points — Available Skills |
| 180 | +
|
| 181 | +Located in `plugins/controlflow-claude-code/skills/`: |
| 182 | +
|
| 183 | +| Skill | Purpose | |
| 184 | +|-------|---------| |
| 185 | +| `controlflow-spec` | Requirements capture before planning (when task is ambiguous) | |
| 186 | +| `controlflow-planning` | Plan generation in ControlFlow format | |
| 187 | +| `controlflow-plan-audit` | Pre-execution plan audit | |
| 188 | +| `controlflow-assumption-verifier` | Mirage detection — assumption verification | |
| 189 | +| `controlflow-executability-verifier` | Cold-start simulation for LARGE plans | |
| 190 | +| `controlflow-strict-workflow` | Full pipeline (plan → audit → execute → review) | |
| 191 | +| `controlflow-orchestration` | Approved plan execution by phase | |
| 192 | +| `controlflow-review` | Post-implementation code review | |
| 193 | +| `controlflow-memory-hygiene` | Memory cleanup for long sessions | |
| 194 | +| `controlflow-router` | Entry point dispatcher (usually inline in CLAUDE.md) | |
| 195 | +
|
| 196 | +--- |
| 197 | +
|
| 198 | +## Artifact Paths |
| 199 | +
|
| 200 | +``` |
| 201 | +plans/<task-slug>-plan.md # main plan |
| 202 | +plans/artifacts/<task-slug>/plan-audit-report.md # audit report |
| 203 | +plans/artifacts/<task-slug>/assumption-verifier.md # assumption verification |
| 204 | +plans/artifacts/<task-slug>/executability-verifier.md # executability verification |
| 205 | +plans/artifacts/<task-slug>/research-packet.md # research packet (if applicable) |
| 206 | +``` |
| 207 | +
|
| 208 | +--- |
| 209 | +
|
| 210 | +## Reference Files |
| 211 | +
|
| 212 | +- `plugins/controlflow-shared-source/skills/` — **Source of truth** for all skill distributions (generates claude-code, codex, cursor plugins) |
| 213 | +- `schemas/planner.plan.schema.json` — JSON Schema for plan validation (draft 2020-12) |
| 214 | +- `schemas/runtime-policy.schema.json` — Runtime execution policy schema |
| 215 | +- `plans/templates/plan-document-template.md` — Full plan document template |
| 216 | +- `.github/workflows/ci.yml` — CI pipeline runs `evals` suite on push |
| 217 | +
|
| 218 | +--- |
| 219 | +
|
| 220 | +## Changes from v1 → v2 |
| 221 | +
|
| 222 | +| Change | Reason | |
| 223 | +|--------|--------| |
| 224 | +| Merged old Steps 4→5 (adversarial review + risk scoring) into unified pipeline with clear sub-steps | Reduced cognitive load; structural validation now a distinct layer before content audit | |
| 225 | +| Removed redundant "Plan Quality Standards" section (11 standards) and "Anti-Rationalization Checklist" | Overlap with checklist items; the 10-point safety check covers all 11 + anti-rationalization concerns | |
| 226 | +| Moved quality gate enum to inline rules table instead of separate section | One source of truth for gate/agent values | |
| 227 | +| Confidence scoring formula simplified (was embedded in prose, now explicit) | Transparency: users see exactly how the number is computed | |
| 228 | +| Pre-flight check extracted from Step 1 into its own table | Runs once before any planning; avoids token waste on empty reports | |
0 commit comments