Skip to content

Commit 1b776f5

Browse files
alexeyvclaude
andauthored
feat: add bmad-checkpoint-preview skill (#2145)
* feat: add bmad-checkpoint skill for guided human change review Copies the av-human-review experiment skill into BMAD-METHOD as bmad-checkpoint, following established multi-step skill conventions (SKILL.md → workflow.md → step chain). Registered in module-help.csv under 4-implementation phase. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: rename bmad-checkpoint to bmad-checkpoint-preview Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(checkpoint): inline workflow into SKILL.md and add global step rules Remove separate workflow.md — its content now lives directly in SKILL.md with merged frontmatter. Replace scattered standing rules with a structured Global Step Rules section (path:line format, front-load output, comm style). * refactor(checkpoint): reference global step rules from SKILL.md in step-01 * refactor(checkpoint): deduplicate step rules against global step rules Steps 2–4 now reference Global Step Rules in SKILL.md instead of restating path:line format, front-load, and silence rules locally. Step-specific rules (concern-based org, design judgment, risk awareness, experiential testing) are preserved. * fix(checkpoint): move main_config out of SKILL.md frontmatter SKILL.md frontmatter should only contain name and description. Hardcode the config path inline in the INITIALIZATION section. * docs(checkpoint): update skill description and trigger phrases Rewrite description to reflect the skills purpose as an LLM-assisted human-in-the-loop review. Add checkpoint trigger, drop stale triggers. * fix(checkpoint): align trail format with global step rules and add token budget Use CWD-relative path:line in fallback trail (not markdown links), cap full-file reads at ~50k tokens, remove over-prompted empty-tree SHA. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com> * refactor(checkpoint): rewrite FIND THE CHANGE as numbered priority cascade Replace the ad-hoc change-finding logic with a clean 1-5 cascade modeled after quick-dev Intent Check: explicit argument, recent conversation, sprint tracking, current git state, ask. Extract spec/commit pairing into a separate ENRICH step that runs after any cascade level resolves. Add planning_artifacts to SKILL.md initialization. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(checkpoint): clarify review_mode and terse-commit instructions in step-01 Replace opaque Review Mode table with explicit set-variable instructions. Scope terse commit message handling to bare-commit mode only. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(checkpoint): make review_mode a numbered cascade, not independent bullets Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(checkpoint): simplify change_type from table to one-liner Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(checkpoint): make link-to-source conditional on source existing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(checkpoint): make surface area stats best-effort with baseline cascade Replace rigid with-spec/bare-commit split with a 4-level fallback: baseline_commit, merge-base, HEAD~1, skip. Omit metrics that cannot be computed rather than failing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(checkpoint): extract fallback trail generation into generate-trail.md Reduce step-01 bloat by moving the conditional trail generation sub-routine into its own file, loaded only when review mode is not full-trail. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(checkpoint): add early-exit routing and wrap-up step Replace undefined "I've seen enough" exits with proper early-exit handling across steps 02-04. Extract wrap-up logic into dedicated step-05-wrapup.md. Fix step-02 menu text that incorrectly promised "code review" when step-03 does risk surfacing. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 1aa0903 commit 1b776f5

8 files changed

Lines changed: 461 additions & 0 deletions

File tree

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
---
2+
name: bmad-checkpoint-preview
3+
description: 'LLM-assisted human-in-the-loop review. Make sense of a change, focus attention where it matters, test. Use when the user says "checkpoint", "human review", or "walk me through this change".'
4+
---
5+
6+
# Checkpoint Review Workflow
7+
8+
**Goal:** Guide a human through reviewing a change — from purpose and context into details.
9+
10+
You are assisting the user in reviewing a change.
11+
12+
## Global Step Rules (apply to every step)
13+
14+
- **Path:line format** — Every code reference must use CWD-relative `path:line` format (no leading `/`) so it is clickable in IDE-embedded terminals (e.g., `src/auth/middleware.ts:42`).
15+
- **Front-load then shut up** — Present the entire output for the current step in a single coherent message. Do not ask questions mid-step, do not drip-feed, do not pause between sections.
16+
- **Communication style** — Always output using the exact Agent communication style defined in SKILL.md and the loaded config.
17+
18+
## INITIALIZATION
19+
20+
Load and read full config from `{project-root}/_bmad/bmm/config.yaml` and resolve:
21+
22+
- `implementation_artifacts`
23+
- `planning_artifacts`
24+
- `communication_language`
25+
26+
## FIRST STEP
27+
28+
Read fully and follow `./step-01-orientation.md` to begin.
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# Generate Review Trail
2+
3+
Generate a review trail from the diff and codebase context. A generated trail is lower quality than an author-produced one, but far better than none.
4+
5+
## Follow Global Step Rules in SKILL.md
6+
7+
## INSTRUCTIONS
8+
9+
1. Get the full diff against the appropriate baseline (same rules as Surface Area Stats in step-01).
10+
2. Read changed files in full — not just diff hunks. Surrounding code reveals intent that hunks alone miss. If total file content exceeds ~50k tokens, read only the files with the largest diff hunks in full and use hunks for the rest.
11+
3. If a spec exists, use its Intent section to anchor concern identification.
12+
4. Identify 2–5 concerns: cohesive design intents that each explain *why* behind a cluster of changes. Prefer functional groupings and architectural boundaries over file-level splits. A single-concern change is fine — don't invent groupings.
13+
5. For each concern, select 1–4 `path:line` stops — locations where the concern is most visible. Prefer entry points, decision points, and boundary crossings over mechanical changes.
14+
6. Lead with the entry point — the highest-leverage stop a reviewer should see first. Inside each concern, order stops so each builds on the previous. End with peripherals (tests, config, types).
15+
7. Format each stop using `path:line` per the global step rules:
16+
17+
```
18+
**{Concern name}**
19+
20+
- {one-line framing, ≤15 words}
21+
`src/path/to/file.ts:42`
22+
```
23+
24+
When there is only one concern, omit the bold label — just list the stops directly.
25+
26+
## PRESENT
27+
28+
Output after the orientation:
29+
30+
```
31+
I built a review trail for this {change_type} (no author-produced trail was found):
32+
33+
{generated trail}
34+
```
35+
36+
Set review mode to `full-trail`. The generated trail is the Suggested Review Order for subsequent steps.
37+
38+
If git is unavailable or the diff cannot be retrieved, return to step-01 with: "Could not generate trail — git unavailable."
Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
# Step 1: Orientation
2+
3+
Display: `[Orientation] → Walkthrough → Detail Pass → Testing`
4+
5+
## Follow Global Step Rules in SKILL.md
6+
7+
## FIND THE CHANGE
8+
9+
The conversation context before this skill was triggered IS your starting point — not a blank slate. Check in this order — stop as soon as the change is identified:
10+
11+
1. **Explicit argument**
12+
Did the user pass a PR, commit SHA, branch, or spec file this message?
13+
- PR reference → resolve to branch/commit via `gh pr view`. If resolution fails, ask for a SHA or branch.
14+
- Spec file, commit, or branch → use directly.
15+
16+
2. **Recent conversation**
17+
Do the last few messages reveal what change the user wants reviewed? Look for spec paths, commit refs, branches, PRs, or descriptions of a change. Use the same routing as above.
18+
19+
3. **Sprint tracking**
20+
Check for a sprint status file (`*sprint-status*`) in `{implementation_artifacts}` or `{planning_artifacts}`. If found, scan for stories with status `review`:
21+
- Exactly one → suggest it and confirm with the user.
22+
- Multiple → present as numbered options.
23+
- None → fall through.
24+
25+
4. **Current git state**
26+
Check current branch and HEAD. Confirm: "I see HEAD is `<short-sha>` on `<branch>` — is this the change you want to review?"
27+
28+
5. **Ask**
29+
If none of the above identified a change, ask:
30+
- What changed and why?
31+
- Which commit, branch, or PR should I look at?
32+
- Do you have a spec, bug report, or anything else that explains what this change is supposed to do?
33+
34+
If after 3 exchanges you still can't identify a change, HALT.
35+
36+
Never ask extra questions beyond what the cascade prescribes. If a step above already identified the change, skip the remaining steps.
37+
38+
## ENRICH
39+
40+
Once a change is identified from any source above, fill in the complementary artifact:
41+
42+
- If you have a spec, look for `baseline_commit` in its frontmatter to determine the diff baseline.
43+
- If you have a commit or branch, check `{implementation_artifacts}` for a spec whose `baseline_commit` is an ancestor of that commit/branch (i.e., the spec describes work done on top of that baseline).
44+
- If you found both a spec and a commit/branch, use both.
45+
46+
## DETERMINE WHAT YOU HAVE
47+
48+
Set `change_type` to match how the user referred to the change — `PR`, `commit`, `branch`, or their own words (e.g. `auth refactor`). Default to `change` if ambiguous.
49+
50+
Set `review_mode` — pick the first match:
51+
52+
1. **`full-trail`** — ENRICH found a spec with a `## Suggested Review Order` section. Intent source: spec's Intent section.
53+
2. **`spec-only`** — ENRICH found a spec but it has no Suggested Review Order. Intent source: spec's Intent section.
54+
3. **`bare-commit`** — no spec found. Intent source: commit message. If the commit message is terse (under 10 words), scan the diff for the primary change pattern and draft a one-sentence intent. Confirm with the user before proceeding.
55+
56+
## PRODUCE ORIENTATION
57+
58+
### Intent Summary
59+
60+
- If intent comes from a spec's Intent section, display it verbatim regardless of length — it's already written to be concise.
61+
- For other sources (commit messages, bug reports, user description): if ≤200 tokens, display verbatim. If longer, distill to ≤200 tokens. Link to the full source when one exists (e.g. a file path or URL).
62+
- Format: `> **Intent:** {summary}`
63+
64+
### Surface Area Stats
65+
66+
Best-effort stats from `git diff --stat`. Try these baselines in order:
67+
68+
1. `baseline_commit` from the spec's frontmatter.
69+
2. Branch merge-base against `main` (or the default branch).
70+
3. `HEAD~1..HEAD` (latest commit only — tell the user).
71+
4. If git is unavailable or all of the above fail, skip stats and note: "Could not compute stats."
72+
73+
Display as:
74+
75+
```
76+
N files changed · M modules touched · ~L lines of logic · B boundary crossings · P new public interfaces
77+
```
78+
79+
- **Files changed**: from `git diff --stat`.
80+
- **Modules touched**: distinct top-level directories with changes.
81+
- **Lines of logic**: added/modified lines excluding blanks, imports, formatting. `~` because approximate.
82+
- **Boundary crossings**: changes spanning more than one top-level module. `0` if single module.
83+
- **New public interfaces**: new exports, endpoints, public methods. `0` if none.
84+
85+
Omit any metric you cannot compute rather than guessing.
86+
87+
### Present
88+
89+
```
90+
[Orientation] → Walkthrough → Detail Pass → Testing
91+
92+
> **Intent:** {intent_summary}
93+
94+
{stats line}
95+
```
96+
97+
## FALLBACK TRAIL GENERATION
98+
99+
If review mode is not `full-trail`, read fully and follow `./generate-trail.md` to build one from the diff. Then return here and continue to NEXT.
100+
101+
## NEXT
102+
103+
Read fully and follow `./step-02-walkthrough.md`
Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# Step 2: Walkthrough
2+
3+
Display: `Orientation → [Walkthrough] → Detail Pass → Testing`
4+
5+
## Follow Global Step Rules in SKILL.md
6+
7+
- Organize by **concern**, not by file. A concern is a cohesive design intent — e.g., "input validation," "state management," "API contract." One file may appear under multiple concerns; one concern may span multiple files.
8+
- The walkthrough activates **design judgment**, not correctness checking. Frame each concern as "here's what this change does and why" — the human evaluates whether it's the right approach for the system.
9+
10+
## BUILD THE WALKTHROUGH
11+
12+
### Identify Concerns
13+
14+
**With Suggested Review Order** (`full-trail` mode):
15+
16+
1. Read the Suggested Review Order stops from the spec (or from conversation context if generated by step-01 fallback).
17+
2. Resolve each stop to a file in the current repo. Output in `path:line` format per the standing rule.
18+
3. Read the diff to understand what each stop actually does.
19+
4. Group stops by concern. Stops that share a design intent belong together even if they're in different files. A stop may appear under multiple concerns if it serves multiple purposes.
20+
21+
**Without Suggested Review Order** (`spec-only` or `bare-commit` mode):
22+
23+
1. Get the diff against the appropriate baseline (same rules as step 1).
24+
2. Identify concerns by reading the diff for cohesive design intents:
25+
- Functional groupings — what user-facing behavior does each cluster of changes support?
26+
- Architectural layers — does the change cross boundaries (API → service → data)?
27+
- Design decisions — where did the author choose between alternatives?
28+
3. For each concern, identify the key code locations as `path:line` stops.
29+
30+
### Order for Comprehension
31+
32+
Sequence concerns top-down: start with the highest-level intent (the "what and why"), then drill into supporting implementation. Within each concern, order stops so each one builds on the previous. The reader should never encounter a reference to something they haven't seen yet.
33+
34+
If the change has a natural entry point (e.g., a new public API, a config change, a UI entry point), lead with it.
35+
36+
### Write Each Concern
37+
38+
For each concern, produce:
39+
40+
1. **Heading** — a short phrase naming the design intent (not a file name, not a module name).
41+
2. **Why** — 1–2 sentences: what problem this concern addresses, why this approach was chosen over alternatives. If the spec documents rejected alternatives, reference them here.
42+
3. **Stops** — each stop on its own line: `path:line` followed by a brief phrase (not a sentence) describing what this location does for the concern. Keep framing under 15 words per stop.
43+
44+
Target 2–5 concerns for a typical change. A single-concern change is fine — don't invent groupings. A change with more than 7 concerns is a signal the scope may be too large, but present it anyway.
45+
46+
## PRESENT
47+
48+
Output the full walkthrough as a single message with this structure:
49+
50+
```
51+
Orientation → [Walkthrough] → Detail Pass → Testing
52+
```
53+
54+
Then each concern group using this format:
55+
56+
```
57+
### {Concern Heading}
58+
59+
{Why — 1–2 sentences}
60+
61+
- `path:line` — {brief framing}
62+
- `path:line` — {brief framing}
63+
- ...
64+
```
65+
66+
End the message with:
67+
68+
```
69+
---
70+
71+
Take your time — click through the stops, read the diff, trace the logic. While you are reviewing, you can:
72+
- "run advanced elicitation on the error handling"
73+
- "party mode on whether this schema migration is safe"
74+
- or just ask anything
75+
76+
When you're ready, say **next** and I'll surface the highest-risk spots.
77+
```
78+
79+
## EARLY EXIT
80+
81+
If at any point the human signals they want to make a decision about this {change_type} (e.g., "let's ship it", "this needs a rethink", "I'm done reviewing", or anything suggesting they're ready to decide), confirm their intent:
82+
83+
- If they want to **approve and ship** → read fully and follow `./step-05-wrapup.md`
84+
- If they want to **reject and rework** → read fully and follow `./step-05-wrapup.md`
85+
- If you misread them → acknowledge and continue the current step.
86+
87+
## NEXT
88+
89+
Default: read fully and follow `./step-03-detail-pass.md`
Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
# Step 3: Detail Pass
2+
3+
Display: `Orientation → Walkthrough → [Detail Pass] → Testing`
4+
5+
## Follow Global Step Rules in SKILL.md
6+
7+
- The detail pass surfaces what the human should **think about**, not what the code got wrong. Machine hardening already handled correctness. This activates risk awareness.
8+
- The LLM detects risk category by pattern. The human judges significance. Do not assign severity scores or numeric rankings — ordering by blast radius (below) is sequencing for readability, not a severity judgment.
9+
- If no high-risk spots exist, say so explicitly. Do not invent findings.
10+
11+
## IDENTIFY RISK SPOTS
12+
13+
Scan the diff for changes touching risk-sensitive patterns. Look for 2–5 spots where a mistake would have the highest blast radius — not the most complex code, but the code where being wrong costs the most.
14+
15+
Risk categories to detect:
16+
17+
- `[auth]` — authentication, authorization, session, token, permission, access control
18+
- `[public API]` — new/changed endpoints, exports, public methods, interface contracts
19+
- `[schema]` — database migrations, schema changes, data model modifications, serialization
20+
- `[billing]` — payment, pricing, subscription, metering, usage tracking
21+
- `[infra]` — deployment, CI/CD, environment variables, config files, infrastructure
22+
- `[security]` — input validation, sanitization, crypto, secrets, CORS, CSP
23+
- `[config]` — feature flags, environment-dependent behavior, defaults
24+
- `[other]` — anything risk-sensitive that doesn't fit the above (e.g., concurrency, data privacy, backwards compatibility). Use a descriptive tag.
25+
26+
Sequence spots so the highest blast radius comes first (how much breaks if this is wrong), not by diff order or file order. If more than 5 spots qualify, show the top 5 and note: "N additional spots omitted — ask if you want the full list."
27+
28+
If the change has no spots matching these patterns, state: "No high-risk spots found in this change — the diff speaks for itself." Do not force findings.
29+
30+
## SURFACE MACHINE HARDENING FINDINGS
31+
32+
Check whether the spec has a `## Spec Change Log` section with entries (populated by adversarial review loops).
33+
34+
- **If entries exist:** Read them. Surface findings that are instructive for the human reviewer — not bugs that were already fixed, but decisions the review loop flagged that the human should be aware of. Format: brief summary of what was flagged and what was decided.
35+
- **If no entries or no spec:** Skip this section entirely. Do not mention it.
36+
37+
## PRESENT
38+
39+
Output as a single message:
40+
41+
```
42+
Orientation → Walkthrough → [Detail Pass] → Testing
43+
```
44+
45+
### Risk Spots
46+
47+
For each spot, one line:
48+
49+
```
50+
- `path:line` — [tag] reason-phrase
51+
```
52+
53+
Example:
54+
55+
```
56+
- `src/auth/middleware.ts:42` — [auth] New token validation bypasses rate limiter
57+
- `migrations/003_add_index.sql:7` — [schema] Index on high-write table, check lock behavior
58+
- `api/routes/billing.ts:118` — [billing] Metering calculation changed, verify idempotency
59+
```
60+
61+
### Machine Hardening (only if findings exist)
62+
63+
```
64+
### Machine Hardening
65+
66+
- Finding summary — what was flagged, what was decided
67+
- ...
68+
```
69+
70+
### Closing menu
71+
72+
End the message with:
73+
74+
```
75+
---
76+
77+
You've seen the design and the risk landscape. From here:
78+
- **"dig into [area]"** — I'll deep-dive that specific area with correctness focus
79+
- **"next"** — I'll suggest how to observe the behavior
80+
```
81+
82+
## EARLY EXIT
83+
84+
If at any point the human signals they want to make a decision about this {change_type} (e.g., "let's ship it", "this needs a rethink", "I'm done reviewing", or anything suggesting they're ready to decide), confirm their intent:
85+
86+
- If they want to **approve and ship** → read fully and follow `./step-05-wrapup.md`
87+
- If they want to **reject and rework** → read fully and follow `./step-05-wrapup.md`
88+
- If you misread them → acknowledge and continue the current step.
89+
90+
## TARGETED RE-REVIEW
91+
92+
When the human says "dig into [area]" (e.g., "dig into the auth changes", "dig into the schema migration"):
93+
94+
1. If the specified area does not map to any code in the diff, say so: "I don't see [area] in this change — did you mean something else?" Return to the closing menu.
95+
2. Identify all code locations in the diff relevant to the specified area.
96+
3. Read each location in full context (not just the diff hunk — read surrounding code).
97+
4. Shift to **correctness mode**: trace edge cases, check boundary conditions, verify error handling, look for off-by-one errors, race conditions, resource leaks.
98+
5. Present findings as a compact list — each finding is `path:line` + what you found + why it matters.
99+
6. If nothing concerning is found, say so: "Looked closely at [area] — nothing concerning. The implementation is solid."
100+
7. After presenting, show only the closing menu (not the full risk spots list again).
101+
102+
The human can trigger multiple targeted re-reviews. Each time, present new findings and the closing menu only.
103+
104+
## NEXT
105+
106+
Read fully and follow `./step-04-testing.md`

0 commit comments

Comments
 (0)