Skip to content

Commit d4edb27

Browse files
SmithSmith
authored andcommitted
feat: Introduce Minimum Viable Change Ladder and enhance review processes
- Added Minimum Viable Change Ladder to core portability matrix and relevant documentation to promote simplicity in code changes. - Updated generation manifest to include cursor as a target host for plugin generation. - Enhanced audit and review checklists to include checks for over-engineering and adherence to the Minimum Viable Change Ladder. - Introduced new planning rules and settings for Claude Code to streamline project-level decision-making and complexity assessment. - Created CLAUDE.md to define behavior and guidelines for Claude Code sessions, including structured plan generation and mandatory review processes. - Implemented tests to ensure compliance with new guidelines and verify the integration of the Minimum Viable Change Ladder across various skills and plugins. - Added new decision challenge and source grounding references to support high-risk decision-making and external claim verification in orchestration skills.
1 parent b6ba8a6 commit d4edb27

41 files changed

Lines changed: 694 additions & 39 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.claude/rules/planning.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
---
2+
paths:
3+
- "**/*"
4+
---
5+
6+
# Planning Rules (ControlFlow x Claude Code) — Project-Level
7+
8+
Приоритет у `CLAUDE.md` в корне проекта. Это дополнение для сессий где CLAUDE.md может не загрузиться.
9+
10+
## Complexity Tiers
11+
12+
| Tier | Criteria | Review Required |
13+
|------|----------|-----------------|
14+
| TRIVIAL | 1-2 files, single concern, low blast radius | Skip (no plan artifact) |
15+
| SMALL | 3-5 files, one subsystem | controlflow-plan-audit |
16+
| MEDIUM | 6-14 files or multiple concerns | plan-audit + assumption-verifier |
17+
| LARGE | 15+ files OR high-risk any size | plan-audit + assumption-verifier + executability-verifier |
18+
19+
**Override:** Any unresolved HIGH semantic risk → automatically LARGE.
20+
21+
## Semantic Risk Review — 7 mandatory categories
22+
23+
Every plan MUST include all 7: data_volume, performance, concurrency, access_control, migration_rollback, dependency, operability. Never skip — use `not_applicable` if irrelevant.
24+
25+
## Skill Invocations
26+
27+
| Tier | Command |
28+
|------|---------|
29+
| TRIVIAL | Skip |
30+
| SMALL | `/controlflow-claude-code:controlflow-plan-audit` |
31+
| MEDIUM | plan-audit + `/controlflow-claude-code:controlflow-assumption-verifier` |
32+
| LARGE | plan-audit + assumption-verifier + `/controlflow-claude-code:controlflow-executability-verifier` |

.claude/settings.json

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
{
2+
"rules": [
3+
".claude/rules/planning.md"
4+
],
5+
"env": {
6+
"CONTROLFLOW_PLAN_DIR": "plans",
7+
"CONTROLFLOW_SKILL_PREFIX": "/controlflow-claude-code"
8+
}
9+
}

.cursor/skills/controlflow-plan-audit/references/audit-checklist.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ Review these areas before approving a plan:
99
5. Are validation commands or verification mechanisms concrete enough?
1010
6. Does any destructive or migration-heavy phase have rollback or recovery guidance?
1111
7. Are there missing dependency assumptions, version assumptions, or unpinned external contracts?
12-
8. Would a fresh executor be blocked by ambiguity in the first 1-3 phases?
13-
9. Does the plan promise scope that the listed phases never actually implement?
14-
10. Do security, access control, or operability concerns deserve stronger gates than the plan currently shows?
12+
8. Did the plan apply the Minimum Viable Change Ladder before proposing a new abstraction, new dependency, or new generated surface?
13+
9. Would a fresh executor be blocked by ambiguity in the first 1-3 phases?
14+
10. Does the plan promise scope that the listed phases never actually implement?
15+
11. Do security, access control, or operability concerns deserve stronger gates than the plan currently shows?

.cursor/skills/controlflow-planning/references/llm-behavior-guidelines.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,14 @@ Portable ControlFlow-Codex guardrails for avoiding common LLM coding anti-patter
2323

2424
- Build only the requested behavior.
2525
- Avoid one-use abstractions, speculative configurability, and defensive branches that cannot happen under current constraints.
26+
- Apply the Minimum Viable Change Ladder before adding code, phases, dependencies, or abstractions:
27+
1. Does this need to exist for the accepted goal?
28+
2. Can existing project behavior cover it?
29+
3. Can the standard library or native platform cover it?
30+
4. Can an already-installed dependency cover it?
31+
5. Can one localized line or existing helper cover it?
32+
6. Only then write the minimum new code.
33+
- Do not simplify away validation, data-loss prevention, security, accessibility, rollback, or explicitly requested behavior; record the ceiling or upgrade trigger when choosing a smaller implementation.
2634
- If the solution is much larger than the task warrants, stop and simplify before continuing.
2735

2836
### 3. Surgical Changes
@@ -43,6 +51,7 @@ Portable ControlFlow-Codex guardrails for avoiding common LLM coding anti-patter
4351
| ------- | --------------- |
4452
| Assume a missing requirement because the likely answer seems obvious | Ask when the answer changes scope, behavior, or file set; otherwise record the bounded assumption. |
4553
| Add abstraction because a future task might need it | Build the requested behavior only; record future options separately. |
54+
| Add a dependency or custom helper before checking existing options | Check existing project behavior, standard library, native platform, and already-installed dependency options first. |
4655
| Clean up adjacent code while editing nearby lines | Keep edits tied to task scope and report unrelated observations without changing them. |
4756
| Skip verification because a change is prompt-only or documentation-only | Run the smallest relevant check plus any workflow-required gate. |
4857
| Treat a narrow pass as proof of a broad claim | Match the command to the claim and state what remains unverified. |

.cursor/skills/controlflow-review/references/review-checklist.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,12 @@ Review in this order:
1616
- Ask for a split when a diff mixes unrelated behaviors, generated output, policy changes, and implementation edits.
1717
- If a large review cannot be split, review by file area and risk axis and state confidence limits.
1818

19+
## Over-Engineering Pass
20+
21+
- After correctness, security, data integrity, and scope checks, ask what can delete, inline, or replace with a standard library or native platform feature.
22+
- Flag one-use abstractions, speculative configuration, wrappers with no policy value, and new dependencies that an already-installed dependency or platform primitive covers.
23+
- Treat over-engineering as a maintainability signal. Block only when it creates real review, behavior, test, dependency, or operability risk; otherwise report it as a non-blocking simplification opportunity.
24+
1925
## Stop-the-Line Decision Points
2026

2127
Halt for security, authorization, secret-handling, destructive-action, or data-integrity defects; failed core behavior or acceptance criteria; migration/schema/contract changes without rollback or compatibility evidence; or scope drift that prevents comparison to the approved plan.

CLAUDE.md

Lines changed: 228 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,228 @@
1+
# CLAUDE.md — ControlFlow for Claude Code
2+
3+
## Overview
4+
5+
`AGENTS.md` manages Copilot-agent contracts in VS Code (P.A.R.T. format, model routing, governance).
6+
This file `CLAUDE.md` controls Claude Code behavior for this repository.
7+
When both are active: follow the source matching your tool — AGENTS.md for Copilot agents, CLAUDE.md for Claude Code sessions.
8+
9+
---
10+
11+
## When to Generate a Plan
12+
13+
Generate a structured plan **before** implementation when the task is MEDIUM/LARGE:
14+
15+
| Tier | Criteria | Action |
16+
|------|----------|--------|
17+
| TRIVIAL | 1–2 files, single concern, low blast radius | No plan artifact (describe steps inline) |
18+
| SMALL | 3–5 files, one subsystem | Plan + plan-audit |
19+
| MEDIUM | 6–14 files or multiple concerns | Full plan + audit + assumption-verifier |
20+
| LARGE | 15+ files OR any high-impact risk | Full plan + all three verifiers |
21+
22+
**Override rule:** if ANY unresolved semantic risk is both *applicable* and *HIGH impact*, treat as LARGE regardless of file count.
23+
24+
---
25+
26+
## Plan Generation Contract (v2)
27+
28+
Save every non-trivial plan at: `plans/<task-slug>-plan.md` (kebab-case, under 4 words).
29+
30+
### YAML Header (exact fields — no fence)
31+
32+
```yaml
33+
Status: READY_FOR_EXECUTION | ABSTAIN | REPLAN_REQUIRED
34+
Agent: controlflow-planning
35+
Schema Version: 2.0.0
36+
Complexity Tier: TRIVIAL | SMALL | MEDIUM | LARGE
37+
Confidence: 0.0–1.0 (computed; below 0.87 auto-NEEDS_REVISION)
38+
Abstain: is_abstaining: false or [ true, reasons: [...] ]
39+
Summary: One paragraph describing task and approach
40+
```
41+
42+
### Sections (in order — never skip or reorder)
43+
44+
| # | Section | Mandatory for | Key requirements |
45+
|---|---------|---------------|-----------------|
46+
| 1 | Context & Analysis | ALL | Verified facts only; separate assumptions clearly with bounded scope statement |
47+
| 2 | Design Decisions | MEDIUM+ | Arch choices + rejected alternatives, boundary/integration points, constraints/trade-offs, temporal flow diagram (sequenceDiagram for non-trivial orchestration) |
48+
| 3 | Implementation Phases | ALL | Phase count: 3–10. Each phase: Objective, Executor Agent, Wave#, Dependencies, Files create/modify, Tests add/update, Acceptance Criteria (measurable), Quality Gates, Failure Expectations + mitigation |
49+
| 4 | Inter-Phase Contracts | MEDIUM+ | Deliverable format from upstream + exactly how downstream validates it |
50+
| 5 | Open Questions | ALL | Explicitly listed; if any could change scope → stop and ask user |
51+
| 6 | Risks | ALL | Table: Risk \| Impact \| Likelihood \| Mitigation |
52+
| 7 | Semantic Risk Review | ALL (all 7 rows) | data_volume, performance, concurrency, access_control, migration_rollback, dependency, operability — every row present, even if `not_applicable` with justification |
53+
| 8 | Architecture Visualization | MEDIUM+ | Flowchart TD DAG base; sequenceDiagram for non-trivial orchestration; each ≤30 lines source |
54+
| 9 | Success Criteria | ALL | Measurable system-level indicators tied back to phase acceptance criteria |
55+
| 10 | Handoff & Execution Notes | ALL | Target Agent, Prompt, execution order, parallelization opportunities, max parallel agents (default: 3) |
56+
57+
**Lifecycle Sections** (append for SMALL+ only): Progress → Discoveries → Decision Log → Outcomes → Idempotence & Recovery. One sentence per entry, evidence-backed. Omit for draft/abandoned plans.
58+
59+
### Non-negotiable rules
60+
61+
- Steps in numbered prose — **no code blocks** inside plan document
62+
- Acceptance criteria MUST include at least one measurable condition referencing observable outcome (test pass, file produced, CI check clears)
63+
- Every phase declares exactly one `executor_agent` from: `CodeMapper-subagent`, `Researcher-subagent`, `CoreImplementer-subagent`, `UIImplementer-subagent`, `PlatformEngineer-subagent`, `TechnicalWriter-subagent`
64+
- All verification must be automatable — no manual testing steps
65+
- Quality Gates use standard values only: `tests_pass`, `lint_clean`, `schema_valid`, `safety_clear`, `human_approved_if_required`
66+
67+
---
68+
69+
## Embedded Audit Pipeline (v2)
70+
71+
### Pre-flight Check (run once at top of Step 1)
72+
73+
| Action | Command / Method | Why |
74+
|--------|-----------------|-----|
75+
| Unfinished plans | `ls plans/` — flag any non-APPROVED/non-REJECTED files | May interfere with current work |
76+
| Branch divergence | `git log --oneline -5 && git status --short` | Flag early as open question if dirty |
77+
| Dependency conflicts | Read manifest vs what plan proposes to modify | Version mismatches = risk items |
78+
79+
If clean: log "No interference detected" and continue.
80+
81+
### Step 1: Spec Capture & Fact Verification
82+
83+
Confirm scope by anchoring to explicit requirements:
84+
- What user-visible behavior changes?
85+
- Which files, functions, tables, APIs are affected? (read relevant manifest or import chain)
86+
- What constraints exist — dependencies, coding standards, API contracts?
87+
88+
If vague on any → **ask the user**. Do not infer scope from chat alone.
89+
90+
### Step 2: Structural Validation (before adversarial review)
91+
92+
1. Header keys present; Status is one of exactly `READY_FOR_EXECUTION`, `ABSTAIN`, or `REPLAN_REQUIRED`
93+
2. Sections 1–10 exist in order; lifecycle sections in exact specified order (if present)
94+
3. Section 7 has exactly seven categories — no more, no fewer
95+
4. All gate values are from the standard set of five
96+
5. Executor agents match ControlFlow sub-agent enum strings
97+
6. LARGE tier requires both flowchart + sequenceDiagram; each ≤30 lines
98+
99+
If any check fails → `NEEDS_REVISION` immediately (structural layer).
100+
101+
### Step 3: Adversarial Review — 10-Point Safety Checklist
102+
103+
| # | Check | Verdict criterion |
104+
|---|-------|------------------|
105+
| 1 | All referenced files/paths real? | Verified in repo |
106+
| 2 | Clear objectives, no scope overlap between same-wave phases? | Each phase independent |
107+
| 3 | Acceptance criteria objectively testable? | Not vague ("handle errors", "make it secure") |
108+
| 4 | Verification commands concrete enough for fresh executor? | No guessing required |
109+
| 5 | Destructive/migration-heavy phase has rollback/recovery guidance? | HIGH blast radius → `human_approved_if_required`; MEDIUM → `safety_clear` |
110+
| 6 | Missing dependency assumptions, version constraints, unpinned APIs? | All pinned or flagged |
111+
| 7 | Data volume concerns documented (bulk ops, pagination)? | If applicable |
112+
| 8 | Concurrency surface: shared mutable state, race windows? | Ownership/ordering explicit; no "should be safe" hand-waving |
113+
| 9 | Fresh executor blocked by ambiguity in Phase 1–3? | Should execute Phase 1 without asking |
114+
| 10 | Security/operability (permissions, auth, deploy configs, monitoring)? | Stronger gates if needed |
115+
116+
**Deep dimensions:** data_volume triggers → add discovery step; performance hot paths → name expected bottleneck + verification method; concurrency → explicit locking order map.
117+
118+
### Step 4: Confidence Scoring & Verdict
119+
120+
Score each checklist item as `confirmed` / `uncertain` / `refuted`.
121+
122+
```
123+
confidence = confirmed_count / total_items_with_any_actionable_question
124+
Round to two decimal digits.
125+
```
126+
127+
**Capping rules:**
128+
- If uncertain ≥ 2 → auto-cap at 0.85; insert research phases for those items
129+
- Any HIGH-impact row marked `open_question` → cap at 0.7 + add research spike phase
130+
- confirmed < 6 out of ≥10 total → NEEDS_REVISION (insufficient evidence)
131+
132+
**Verdict:**
133+
134+
| Condition | Verdict | Action |
135+
|-----------|---------|--------|
136+
| All checks pass, first-phase fully actionable, criteria measurable | `APPROVED` | Proceed to execution with handoff section |
137+
| Critical gap: ambiguous Phase 1, no rollback on destructive change, unverified paths, vague criteria | `NEEDS_REVISION` | Update plan sections listed by finding; re-audit until pass or escalation threshold breached |
138+
| Structural flaw in architecture; scope not deliverable as authored | `REJECTED` | Explain blockers; ask user for direction. Do NOT start coding. |
139+
140+
### Step 5: Resolution & Handoff
141+
142+
- **APPROVED** → embed handoff prompt into Section 10 of plan artifact
143+
- **NEEDS_REVISION** → list each finding with exact section reference; edit in-place; re-audit
144+
- **REJECTED/ESCALATED** → explain blockers, provide next concrete step to unblock. Pause.
145+
146+
---
147+
148+
## Semantic Risk Review — 7 Mandatory Categories
149+
150+
Each plan MUST include all 7 categories exactly once:
151+
152+
| Category | Applicability | Impact | Evidence Source | Disposition |
153+
|----------|---------------|--------|-----------------|-------------|
154+
| data_volume | applicable / not_applicable / uncertain | HIGH / MEDIUM / LOW / UNKNOWN | file, command, or repo evidence | resolved / open_question / research_phase_added / not_applicable |
155+
| performance | ... | ... | ... | ... |
156+
| concurrency | ... | ... | ... | ... |
157+
| access_control | ... | ... | ... | ... |
158+
| migration_rollback | ... | ... | ... | ... |
159+
| dependency | ... | ... | ... | ... |
160+
| operability | ... | ... | ... | ... |
161+
162+
Never skip a category — if not applicable, set `not_applicable` + justification.
163+
164+
---
165+
166+
## Tier-gated Review Pipeline
167+
168+
Transition from planning → execution requires review per tier:
169+
170+
| Tier | Skill (slash command) |
171+
|------|----------------------|
172+
| TRIVIAL | Skip |
173+
| SMALL | `/controlflow-claude-code:controlflow-plan-audit` |
174+
| MEDIUM | plan-audit + `/controlflow-claude-code:controlflow-assumption-verifier` |
175+
| LARGE | plan-audit + assumption-verifier + `/controlflow-claude-code:controlflow-executability-verifier` |
176+
177+
---
178+
179+
## Workflow Entry Points — Available Skills
180+
181+
Located in `plugins/controlflow-claude-code/skills/`:
182+
183+
| Skill | Purpose |
184+
|-------|---------|
185+
| `controlflow-spec` | Requirements capture before planning (when task is ambiguous) |
186+
| `controlflow-planning` | Plan generation in ControlFlow format |
187+
| `controlflow-plan-audit` | Pre-execution plan audit |
188+
| `controlflow-assumption-verifier` | Mirage detection — assumption verification |
189+
| `controlflow-executability-verifier` | Cold-start simulation for LARGE plans |
190+
| `controlflow-strict-workflow` | Full pipeline (plan → audit → execute → review) |
191+
| `controlflow-orchestration` | Approved plan execution by phase |
192+
| `controlflow-review` | Post-implementation code review |
193+
| `controlflow-memory-hygiene` | Memory cleanup for long sessions |
194+
| `controlflow-router` | Entry point dispatcher (usually inline in CLAUDE.md) |
195+
196+
---
197+
198+
## Artifact Paths
199+
200+
```
201+
plans/<task-slug>-plan.md # main plan
202+
plans/artifacts/<task-slug>/plan-audit-report.md # audit report
203+
plans/artifacts/<task-slug>/assumption-verifier.md # assumption verification
204+
plans/artifacts/<task-slug>/executability-verifier.md # executability verification
205+
plans/artifacts/<task-slug>/research-packet.md # research packet (if applicable)
206+
```
207+
208+
---
209+
210+
## Reference Files
211+
212+
- `plugins/controlflow-shared-source/skills/` — **Source of truth** for all skill distributions (generates claude-code, codex, cursor plugins)
213+
- `schemas/planner.plan.schema.json` — JSON Schema for plan validation (draft 2020-12)
214+
- `schemas/runtime-policy.schema.json` — Runtime execution policy schema
215+
- `plans/templates/plan-document-template.md` — Full plan document template
216+
- `.github/workflows/ci.yml` — CI pipeline runs `evals` suite on push
217+
218+
---
219+
220+
## Changes from v1 → v2
221+
222+
| Change | Reason |
223+
|--------|--------|
224+
| Merged old Steps 4→5 (adversarial review + risk scoring) into unified pipeline with clear sub-steps | Reduced cognitive load; structural validation now a distinct layer before content audit |
225+
| Removed redundant "Plan Quality Standards" section (11 standards) and "Anti-Rationalization Checklist" | Overlap with checklist items; the 10-point safety check covers all 11 + anti-rationalization concerns |
226+
| Moved quality gate enum to inline rules table instead of separate section | One source of truth for gate/agent values |
227+
| Confidence scoring formula simplified (was embedded in prose, now explicit) | Transparency: users see exactly how the number is computed |
228+
| Pre-flight check extracted from Step 1 into its own table | Runs once before any planning; avoids token waste on empty reports |

evals/package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"description": "Structural validation harness for Copilot ControlFlow agent system",
55
"type": "module",
66
"scripts": {
7-
"test": "node validate.mjs && node tests/cursor-rules.test.mjs && node tests/prompt-behavior-contract.test.mjs && node tests/orchestration-handoff-contract.test.mjs && node tests/drift-detection.test.mjs && node tests/notes-md-drift.test.mjs && node tests/archive-script.test.mjs && node tests/fingerprint.test.mjs && node tests/report-health.test.mjs && node tests/skill-discoverability.test.mjs && node tests/capability-matrix.test.mjs",
7+
"test": "node validate.mjs && node tests/cursor-rules.test.mjs && node tests/prompt-behavior-contract.test.mjs && node tests/orchestration-handoff-contract.test.mjs && node tests/drift-detection.test.mjs && node tests/notes-md-drift.test.mjs && node tests/archive-script.test.mjs && node tests/fingerprint.test.mjs && node tests/report-health.test.mjs && node tests/skill-discoverability.test.mjs && node tests/ponytail-adaptation.test.mjs && node tests/capability-matrix.test.mjs",
88
"capability-matrix": "node capability-matrix.mjs",
99
"archive:dry": "node archive-completed-plans.mjs",
1010
"archive:apply": "node archive-completed-plans.mjs --apply",

0 commit comments

Comments
 (0)