Skip to content

Commit ec91a87

Browse files
committed
docs(pb-build): update recovery to use task-local rollback and pre-task snapshot
1 parent 8ff37ba commit ec91a87

22 files changed

Lines changed: 1910 additions & 97 deletions

File tree

.github/prompts/pb-build.prompt.md

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@ For each unfinished task, in order:
5151

5252
1. **Extract** the full task block (Context, Steps, Verification).
5353
2. **Gather context** — read `design.md` and `AGENTS.md`.
54+
- Record a pre-task workspace snapshot (`git status --porcelain` + tracked/untracked file lists) for safe rollback.
5455
3. **Spawn a fresh subagent** with the Implementer Prompt (below), filled in with the task content and project context.
5556
**Context Hygiene:** Do NOT pass the entire chat history. Pass ONLY:
5657
- The specific Task Description from `tasks.md`.
@@ -68,9 +69,12 @@ For each unfinished task, in order:
6869
If a subagent fails:
6970

7071
1. **Analyze the diff:** Run `git diff` to see what the failed agent changed.
71-
2. **Revert the workspace:** Run `git checkout .` to reset to the last known-good state (Harness Reset).
72-
3. **Report** the failure — which task, what went wrong, specific error output.
73-
4. Prompt the user:
72+
2. **Compute task-local change set:** Compare against the pre-task snapshot to identify only files changed by this failed attempt.
73+
3. **Safe recovery (file-scoped):**
74+
- If pre-task workspace was clean: restore only changed tracked files with `git restore --worktree --staged -- <files>` and remove only newly created files from this task.
75+
- If pre-task workspace was dirty: do NOT run workspace-wide restore commands. Report file-level cleanup options and wait for user choice.
76+
4. **Report** the failure — which task, what went wrong, specific error output.
77+
5. Prompt the user:
7478
- **Retry** — new subagent, fresh context, pass previous error as a hint constraint. Maximum 2 retries per task.
7579
- **Skip** — mark as `⏭️ SKIPPED`, move to next task.
7680
- **Abort** — stop the build, report progress so far.
@@ -165,6 +169,7 @@ Update `tasks.md` in-place after each task using **precise edits** (target the s
165169
### ALWAYS
166170

167171
- Mark completed tasks in `tasks.md` immediately.
172+
- Capture a pre-task workspace snapshot before spawning subagents.
168173
- Self-review before submitting each task.
169174
- Run full test suite after each task.
170175
- Report failures with retry/skip/abort options.
@@ -182,13 +187,11 @@ Update `tasks.md` in-place after each task using **precise edits** (target the s
182187
4. **Grounding before action.** Verify workspace state before writing code.
183188
5. **Self-review catches over-engineering.** Audit before submit.
184189
6. **State lives on disk.** Checkboxes and code are the only persistent state.
185-
7. **Fail fast, recover cleanly.** Revert workspace before retry. Each attempt starts from a known-good state.
190+
7. **Fail fast, recover cleanly.** Use task-local rollback from the pre-task snapshot. Avoid workspace-wide resets in dirty trees.
186191
8. **Context hygiene.** Pass minimal, relevant context. Summarize — don't dump.
187192

188193
---
189194

190-
---
191-
192195
## IMPLEMENTER PROMPT TEMPLATE
193196

194197
> This is the instruction template passed to each subagent. Fill in the `{{placeholders}}` with actual values per task.

.github/prompts/pb-plan.prompt.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -205,8 +205,6 @@ Please review the design and tasks. When ready, run /pb-build <feature-name> to
205205

206206
---
207207

208-
---
209-
210208
## DESIGN TEMPLATE
211209

212210
> Fill this template and write to `specs/<spec-dir>/design.md`.
@@ -359,8 +357,6 @@ Please review the design and tasks. When ready, run /pb-build <feature-name> to
359357

360358
---
361359

362-
---
363-
364360
## TASKS TEMPLATE
365361

366362
> Fill this template and write to `specs/<spec-dir>/tasks.md`.

.github/prompts/pb-refine.prompt.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
````prompt
21
# pb-refine — Design & Plan Refinement
32

43
You are the **pb-refine** agent. Your job is to read user feedback on an existing spec (`design.md` and/or `tasks.md`) and update them accordingly. This closes the gap between one-shot planning and iterative refinement.
@@ -10,12 +9,14 @@ Run this when the user invokes `/pb-refine <feature-name>` with feedback or chan
109
## Step 1: Resolve Spec Directory & Load Existing Spec
1110

1211
**Resolve `<feature-name>``<spec-dir>`:**
12+
1313
1. List all directories under `specs/`.
1414
2. Find the directory whose name ends with `-<feature-name>` (e.g., `2026-02-15-01-add-websocket-auth` for feature-name `add-websocket-auth`).
1515
3. If exactly one match is found, use it as `<spec-dir>`. All `specs/<spec-dir>/` paths below refer to this resolved directory.
1616
4. If multiple matches exist, use the most recent one (latest date prefix).
1717
5. If no match is found, stop and report:
18-
```
18+
19+
```text
1920
❌ No spec directory found for feature "<feature-name>" in specs/.
2021
Run /pb-plan <requirement> first to generate the spec.
2122
```
@@ -37,6 +38,7 @@ The user's feedback may include:
3738
- **General feedback** — "this approach won't work because..." or "we should use X instead of Y".
3839

3940
Categorize the feedback into:
41+
4042
1. **Design changes** — modifications to `design.md`.
4143
2. **Task changes** — modifications to `tasks.md`.
4244
3. **Both** — changes that affect design and cascade to tasks.
@@ -82,7 +84,7 @@ After making changes, verify:
8284

8385
## Step 6: Output Summary
8486

85-
```
87+
```text
8688
🔄 Spec refined: specs/<spec-dir>/
8789
8890
Changes to design.md:
@@ -131,5 +133,3 @@ Next steps:
131133
- **Feedback invalidates completed tasks:** Flag this in the summary as a warning. Do not automatically undo completed tasks.
132134
- **Feedback requires entirely new design:** Recommend the user run `/pb-plan <feature-name>` instead with the new requirements. Only use `/pb-refine` for incremental changes.
133135
- **Multiple conflicting feedback items:** Apply them in the order given. Note conflicts in the Revision History.
134-
135-
````

.opencode/skills/pb-build/SKILL.md

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
name: pb-build
3-
description: "Subagent-Driven Implementation"
3+
description: "Use when tasks.md is ready and you need sequential TDD implementation with recovery loops."
44
---
55

66
# pb-build — Subagent-Driven Implementation
@@ -67,6 +67,7 @@ Extract the full task block from `tasks.md` — including Context, Steps, and Ve
6767
- Read `specs/<spec-dir>/design.md` for design context.
6868
- Read `AGENTS.md` (if it exists) for project conventions.
6969
- Identify files most relevant to this task.
70+
- Record a pre-task workspace snapshot (`git status --porcelain` + tracked/untracked file lists). This baseline is used for safe recovery if the task fails.
7071

7172
#### 3c. Spawn Subagent
7273

@@ -115,14 +116,17 @@ After the subagent succeeds, update `tasks.md`:
115116
If a subagent fails (tests don't pass, implementation blocked, etc.):
116117

117118
1. **Analyze the diff:** Run `git diff` to see exactly what the failed agent changed. Understanding the attempted approach is essential before retrying.
118-
2. **Revert the workspace:** Run `git checkout .` to clean the workspace back to the last known-good state. This is the "Harness Reset" — it prevents broken code from one attempt polluting the next.
119-
3. **Report** the failure with details — which task, what went wrong, the specific error output.
120-
4. **Prompt the user** to choose:
119+
2. **Compute task-local change set:** Compare with the pre-task snapshot to identify only files changed by this failed attempt (tracked diffs + newly created untracked files).
120+
3. **Safe recovery (file-scoped):**
121+
- If the pre-task workspace was clean: restore only the task-local changed tracked files with `git restore --worktree --staged -- <files>` and remove only the new files created by this task.
122+
- If the pre-task workspace was dirty: **do not run any workspace-wide restore command**. Report file-level cleanup steps and ask the user before reverting anything.
123+
4. **Report** the failure with details — which task, what went wrong, the specific error output.
124+
5. **Prompt the user** to choose:
121125
- **Retry** — Spawn a new subagent with fresh context. Pass the previous failure's error message as a "Constraint" hint (e.g., "Previous attempt failed with 'circular import in auth.py'. Avoid importing types directly — use string annotations or TYPE_CHECKING block."). Maximum 2 retries per task.
122126
- **Skip** — Mark the task as skipped (`⏭️ SKIPPED`) and continue to the next task.
123127
- **Abort** — Stop the entire build. Report progress so far.
124128

125-
> **Why revert before retry:** If the failed agent left partially-written code, a new agent may try to build on top of broken foundations. A clean revert ensures each retry starts from a known-good state — this is the core principle of an observable, resettable harness.
129+
> **Why file-scoped recovery before retry:** Failed attempts can leave broken partial edits, but global resets can wipe unrelated in-progress work. Task-local rollback preserves harness reliability without destroying user state.
126130
127131
#### Design Change Requests (DCR)
128132

@@ -223,6 +227,7 @@ While executing, display progress after each task:
223227
### ALWAYS
224228

225229
- **ALWAYS** mark completed tasks in `tasks.md` immediately after success.
230+
- **ALWAYS** capture a pre-task workspace snapshot before spawning a subagent.
226231
- **ALWAYS** self-review before submitting a task's work.
227232
- **ALWAYS** run the full test suite after each task to catch regressions.
228233
- **ALWAYS** report failures clearly with actionable options (retry/skip/abort).
@@ -240,7 +245,7 @@ While executing, display progress after each task:
240245
4. **Grounding before action.** Every subagent verifies workspace state before writing code — preventing path hallucination and stale assumptions.
241246
5. **Self-review catches over-engineering.** Every subagent audits its own work before submitting.
242247
6. **State lives on disk.** `tasks.md` checkboxes and committed code are the only persistent state.
243-
7. **Fail fast, recover cleanly.** Failures trigger workspace revert (`git checkout .`) before retry — ensuring each attempt starts from a known-good state.
248+
7. **Fail fast, recover cleanly.** Failures trigger task-local rollback using the pre-task snapshot. Never run workspace-wide reset commands in a dirty tree.
244249
8. **Context hygiene.** Only pass relevant, minimal context to subagents. Error logs from failed attempts are summarized as hints, not passed verbatim.
245250

246251
---

.opencode/skills/pb-build/references/implementer_prompt.md

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ Execute the following steps in strict order. **You must output your reasoning fo
3333
3. **Check Dependencies:** Verify that any modules you plan to import actually exist. Check `pyproject.toml`, `package.json`, `Cargo.toml`, or equivalent before importing third-party libraries.
3434
4. **Confirm Test Infrastructure:** Verify the test directory exists and check how existing tests are structured (test runner, naming conventions, fixture patterns).
3535

36-
> **Why this step is mandatory:** Long-running agents are prone to "path hallucination" — assuming files exist at locations they don't oratethat code has a structure it doesn't. This grounding step synchronizes your mental model with the actual workspace state.
36+
> **Why this step is mandatory:** Long-running agents are prone to "path hallucination" — assuming files exist at locations they don't or that code has a structure it doesn't. This grounding step synchronizes your mental model with the actual workspace state.
3737
3838
### 2. TDD Cycle
3939

@@ -149,6 +149,3 @@ These rules act as your safety harness — they prevent common failure modes in
149149
4. **Quote Errors:** When a test or command fails, always quote the specific error message in your reasoning before attempting a fix.
150150
5. **One Fix at a Time:** When debugging a failure, make exactly one change, then re-run. Do not stack multiple speculative fixes.
151151
6. **Path Verification:** Never hardcode or assume file paths. Use `ls`, `find`, or file search to confirm paths before using them.
152-
153-
````text
154-
```

.opencode/skills/pb-init/SKILL.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
name: pb-init
3-
description: "Project State Initialization"
3+
description: "Use when onboarding a repo or after major structural changes to regenerate AGENTS.md project context."
44
---
55

66
# pb-init — Project Initialization

.opencode/skills/pb-plan/SKILL.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
name: pb-plan
3-
description: "Design & Task Planning"
3+
description: "Use when converting a requirement into a design proposal and executable tasks before coding."
44
---
55

66
# pb-plan — Design & Task Planning

.opencode/skills/pb-refine/SKILL.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,8 @@
11
---
22
name: pb-refine
3-
description: "Design & Plan Refinement"
3+
description: "Use when feedback or a Design Change Request requires incremental updates to design.md and tasks.md."
44
---
55

6-
````skill
76
# pb-refine — Design & Plan Refinement
87

98
You are the **pb-refine** agent. Your job is to read user feedback on an existing spec (`design.md` and/or `tasks.md`) and update them accordingly. This closes the gap between one-shot planning and iterative refinement.
@@ -19,12 +18,14 @@ Execute the following steps in order.
1918
### Step 1: Resolve Spec Directory & Load Existing Spec
2019

2120
**Resolve `<feature-name>``<spec-dir>`:**
21+
2222
1. List all directories under `specs/`.
2323
2. Find the directory whose name ends with `-<feature-name>` (e.g., `2026-02-15-01-add-websocket-auth` for feature-name `add-websocket-auth`).
2424
3. If exactly one match is found, use it as `<spec-dir>`. All `specs/<spec-dir>/` paths below refer to this resolved directory.
2525
4. If multiple matches exist, use the most recent one (latest date prefix).
2626
5. If no match is found, stop and report:
27-
```
27+
28+
```text
2829
❌ No spec directory found for feature "<feature-name>" in specs/.
2930
Run /pb-plan <requirement> first to generate the spec.
3031
```
@@ -46,6 +47,7 @@ The user's feedback may include:
4647
- **General feedback** — "this approach won't work because..." or "we should use X instead of Y".
4748

4849
Categorize the feedback into:
50+
4951
1. **Design changes** — modifications to `design.md`.
5052
2. **Task changes** — modifications to `tasks.md`.
5153
3. **Both** — changes that affect design and cascade to tasks.
@@ -91,7 +93,7 @@ After making changes, verify:
9193

9294
### Step 6: Output Summary
9395

94-
```
96+
```text
9597
🔄 Spec refined: specs/<spec-dir>/
9698
9799
Changes to design.md:
@@ -140,5 +142,3 @@ Next steps:
140142
- **Feedback invalidates completed tasks:** Flag this in the summary as a warning. Do not automatically undo completed tasks.
141143
- **Feedback requires entirely new design:** Recommend the user run `/pb-plan <feature-name>` instead.
142144
- **Multiple conflicting feedback items:** Apply them in the order given. Note conflicts in the Revision History.
143-
144-
````

Cargo.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ resolver = "3"
1414
agent-skills = "0.2"
1515
async-trait = "0.1"
1616
futures-core = "0.3"
17+
futures-util = "0.3"
1718
genai = "=0.6.0-beta.1"
1819
rmcp = { version = "0.16", features = [
1920
"client",
@@ -31,6 +32,7 @@ tokio = { version = "1", features = [
3132
"time",
3233
"process",
3334
] }
35+
tokio-stream = "0.1"
3436
tokio-util = "0.7"
3537
tracing = "0.1.44"
3638
tracing-subscriber = "0.3.22"

agent.toml

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,9 @@ max_steps = 12
1515
# Turn timeout in milliseconds (default: 90000 = 90s).
1616
turn_timeout_ms = 90000
1717

18+
# Optional context window used for ratio-based skills budgets.
19+
# model_context_tokens = 128000
20+
1821
# ── LLM Settings (optional, reserved for v2) ────────────────────────
1922

2023
# [llm]
@@ -25,6 +28,7 @@ turn_timeout_ms = 90000
2528

2629
# [policy]
2730
# deny_tools = []
31+
# allow_tools = ["local/read_file"]
2832

2933
# ── MCP Servers (optional) ───────────────────────────────────────────
3034
# Uncomment and configure to enable tool use via MCP.
@@ -35,3 +39,17 @@ turn_timeout_ms = 90000
3539
# command = "npx"
3640
# args = ["-y", "@modelcontextprotocol/server-filesystem", "."]
3741
# tool_timeout_ms = 15000
42+
43+
# ── Skills Sources (optional) ───────────────────────────────────────
44+
# When configured, Bob loads skill directories and injects selected
45+
# skill instructions into each turn based on user input.
46+
47+
# [skills]
48+
# max_selected = 3
49+
# token_budget_tokens = 1800
50+
# token_budget_ratio = 0.10
51+
#
52+
# [[skills.sources]]
53+
# type = "directory"
54+
# path = "./skills"
55+
# recursive = false

0 commit comments

Comments
 (0)