You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+10-4Lines changed: 10 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,6 +22,8 @@ pb-spec follows a **harness-first** philosophy: reliability comes from process d
22
22
|[Plan-and-Solve Prompting](https://arxiv.org/abs/2305.04091)| Plan first to reduce missing-step errors |`design.md` + `tasks.md` are mandatory artifacts |
23
23
|[ReAct](https://arxiv.org/abs/2210.03629)| Interleave reasoning and actions with environment feedback |`/pb-build` executes task-by-task with test/tool feedback loops |
24
24
|[Reflexion](https://arxiv.org/abs/2303.11366)| Learn from failure signals via iterative retries | Retry/skip/abort and DCR flow in `pb-build`|
25
+
|[Harness Engineering (OpenAI, 2026-02-11)](https://openai.com/index/harness-engineering/)| Treat runtime signals and checklists as first-class harness inputs |`pb-plan` requires runtime verification hooks; `pb-build` validates logs/health evidence before task closure |
26
+
|[openai/symphony](https://github.com/openai/symphony)| Long-running agents need explicit observability and deterministic escalation |`pb-build` enforces bounded retries and emits standardized DCR packets for `pb-refine`|
25
27
|[Effective Harnesses for Long-Running Agents](https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents)| Grounding, context hygiene, recovery, observability | State checks, minimal context handoff, task-local rollback guidance |
26
28
|[Building Effective Agents](https://www.anthropic.com/engineering/building-effective-agents)| Prefer simple composable workflows over framework complexity | Small adapter-based CLI + explicit workflow prompts |
27
29
|[Stop Using /init for AGENTS.md](https://addyosmani.com/blog/agents-md/)| Keep AGENTS.md focused and maintainable |`/pb-init` updates a managed snapshot block in `AGENTS.md` while preserving all user-authored constraints outside that block |
@@ -30,7 +32,9 @@ pb-spec follows a **harness-first** philosophy: reliability comes from process d
30
32
31
33
-**Context Before Code:**`/pb-init` and `/pb-plan` establish project and requirement context before implementation starts.
32
34
-**Verification by Design:** Planning requires explicit verification commands so completion is measurable.
35
+
-**Observability as Context:** Service-facing tasks must capture runtime evidence (log tails and/or health probes), not only test output.
33
36
-**Strict TDD Execution:**`/pb-build` enforces Red → Green → Refactor with per-task status tracking.
37
+
-**Escalation Over Thrashing:** Three consecutive failures suspend the current task and route a standardized DCR packet to `/pb-refine`.
34
38
-**Safe Failure Recovery:** Failed attempts use scoped recovery guidance to avoid polluting unrelated workspace state.
35
39
-**Composable Architecture:** Platform differences stay in adapters; workflow semantics stay in shared templates.
36
40
@@ -140,11 +144,11 @@ The spec directory follows the naming format `YYYY-MM-DD-NO-feature-name` (e.g.,
Reads user feedback or Design Change Requests (from failed builds) and intelligently updates `design.md` and `tasks.md`. It maintains a revision history and cascades design changes to the task list without overwriting completed work. `AGENTS.md` remains read-only in this phase.
147
+
Reads user feedback or Design Change Requests (from failed builds, including standardized 3-failure build-block packets) and intelligently updates `design.md` and `tasks.md`. It maintains a revision history and cascades design changes to the task list without overwriting completed work. `AGENTS.md` remains read-only in this phase.
Reads `specs/<YYYY-MM-DD-NO-feature-name>/tasks.md` and implements each task sequentially. Every task is executed by a fresh subagent following strict TDD (Red → Green → Refactor). Supports **Design Change Requests** if the planned design proves infeasible during implementation. Only the `<feature-name>` part is needed when invoking — the agent resolves the full directory automatically. `AGENTS.md` is read-only unless the user explicitly requests an `AGENTS.md` change.
151
+
Reads `specs/<YYYY-MM-DD-NO-feature-name>/tasks.md` and implements each task sequentially. Every task is executed by a fresh subagent following strict TDD (Red → Green → Refactor), then runtime verification (log/health evidence when applicable). Supports **Design Change Requests** if the planned design proves infeasible during implementation, and auto-escalates to DCR after three consecutive task failures. Only the `<feature-name>` part is needed when invoking — the agent resolves the full directory automatically. `AGENTS.md` is read-only unless the user explicitly requests an `AGENTS.md` change.
148
152
149
153
## Skills Overview
150
154
@@ -168,13 +172,15 @@ pb-spec's prompt design is inspired by Anthropic's research on [Effective Harnes
168
172
|**Context Hygiene**| Orchestrator passes only minimal, relevant context to each subagent — preventing context window pollution |
169
173
|**Recovery Loop**| Failed tasks use pre-task snapshots + file-scoped recovery (`git restore` + task-local cleanup), and avoid workspace-wide restore in dirty trees |
170
174
|**Verification Harness**| Design docs define explicit verification commands at planning time — subagents execute, not invent, verification |
175
+
|**Observability as Context**| Task verification includes runtime signals (logs/health) for service-facing work, and build closure requires command-backed evidence |
176
+
|**Escalation Loop**| Three consecutive failures trigger task suspension + standardized DCR handoff to `pb-refine`|
171
177
|**Agent Rules**|`AGENTS.md` is treated as free-form policy context: `pb-init` manages only its marker block; `pb-plan`/`pb-refine`/`pb-build` read it without rewriting |
172
178
173
179
### Where Each Principle Lives
174
180
175
181
-**Worker (Implementer):**`implementer_prompt.md` enforces grounding-first workflow and error quoting
Copy file name to clipboardExpand all lines: src/pb_spec/templates/prompts/pb-build.prompt.md
+45-8Lines changed: 45 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,8 +8,8 @@ Run this when the user invokes `/pb-build <feature-name>`.
8
8
9
9
- Complete unfinished tasks in `tasks.md` sequentially until done or explicitly blocked.
10
10
- Use one fresh subagent per task with minimal, task-relevant context only.
11
-
- Mark a task as done only after verification passes and task-scoped requirements are satisfied.
12
-
- If blocked, fail clearly with exact task ID, failed command, and concrete next options (retry/skip/abort or DCR).
11
+
- Mark a task as done only after tests pass, task verification passes, and runtime evidence is captured when applicable.
12
+
- If blocked, fail clearly with exact task ID, failed command, and concrete next options (retry/skip/abort within budget, then DCR escalation).
13
13
14
14
---
15
15
@@ -69,10 +69,10 @@ For each unfinished task, in order:
69
69
- The `AGENTS.md` (project constraints and hard rules; do not assume any fixed template layout).
70
70
- The `design.md` (Feature Spec).
71
71
-**Summary of previous tasks** — a one-line-per-task summary (e.g., "Task 1.1 created `models.py` with `User` class."). Do NOT pass raw logs or full outputs.
72
-
4.**Subagent executes** the TDD cycle (see Implementer Prompt section).
72
+
4.**Subagent executes** the TDD + runtime verification cycle (see Implementer Prompt section).
73
73
5.**Mark completed** — update `- [ ]` to `- [x]` and Status to `🟢 DONE` in `tasks.md`.
74
74
-**Use precise editing:** Use `sed`, string-replacement, or line-targeted edits to update the specific Task ID heading and its checkboxes. Do NOT rewrite the entire `tasks.md` file — this risks truncation and content loss in large files.
75
-
-**Completion gate:** Mark done only when task Verification is satisfied and tests are green.
75
+
-**Completion gate:** Mark done only when task Verification is satisfied, tests are green, and runtime checks (when applicable) are evidence-backed.
76
76
77
77
> **⚠️ Context Reset:** After completing all tasks (or when context grows large), output: "Recommend starting a fresh session. Run `/pb-build <feature-name>` again to continue from where you left off."
78
78
@@ -87,10 +87,36 @@ If a subagent fails:
87
87
- If pre-task workspace was dirty: do NOT run workspace-wide restore commands. Report file-level cleanup options and wait for user choice.
88
88
4.**Report** the failure — which task, what went wrong, specific error output.
89
89
- Include the exact failing command and a short quoted error excerpt.
90
-
5. Prompt the user:
91
-
-**Retry** — new subagent, fresh context, pass previous error as a hint constraint. Maximum 2 retries per task.
90
+
5.**Track consecutive failures per task** (same task, same build run).
91
+
- Allowed budget is **3 consecutive failures total**: initial attempt + up to 2 retries.
92
+
6.**If failure count is 1 or 2**, prompt the user:
93
+
-**Retry** — new subagent, fresh context, pass previous error as a hint constraint.
92
94
-**Skip** — mark as `⏭️ SKIPPED`, move to next task.
93
95
-**Abort** — stop the build, report progress so far.
96
+
7.**If failure count reaches 3**, suspend the task and stop the build loop. Do not continue to later tasks. Output a standardized DCR packet:
Copy file name to clipboardExpand all lines: src/pb_spec/templates/prompts/pb-plan.prompt.md
+16-1Lines changed: 16 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -158,6 +158,12 @@ Write a **flat task list** to `specs/<spec-dir>/tasks.md`:
158
158
-[ ] Verification: ...
159
159
```
160
160
161
+
For lightweight tasks that introduce or change runtime behavior (service startup, UI runtime flow, API availability, performance-critical paths), include runtime observability checks in `Verification`:
162
+
163
+
- Capture recent runtime logs (for example `tail -n 50 app.log` or project-equivalent command).
164
+
- Capture a live probe result (for example `curl http://localhost:8080/health` or project-equivalent endpoint).
165
+
- If runtime checks are not applicable, explicitly write `N/A` with the reason.
166
+
161
167
**Skip** phases, Summary & Timeline table, and Definition of Done boilerplate for lightweight specs.
@@ -173,6 +179,10 @@ Remove all instructional placeholder text (such as bracket examples) in the fina
173
179
-**Task ID format:** Each task MUST have a unique ID: `Task X.Y` (e.g., `Task 1.1`, `Task 2.3`).
174
180
- Ordered by dependency — no task references work from a later task.
175
181
- Every task has a concrete **Verification** criterion.
182
+
- For tasks that introduce or change runtime behavior (service startup, UI runtime flow, API/network availability, performance-sensitive code paths), **Verification must include runtime observability checks**:
183
+
- Recent runtime logs (for example `tail -n 50 app.log` or equivalent).
184
+
- A live health/probe command (for example `curl http://localhost:8080/health` or equivalent).
185
+
- If not applicable, explicitly mark `N/A` with a reason.
176
186
-**Reference reusable components** in task Context when the task should extend or use existing code.
177
187
- Ensure every requirement from the Step 1 checklist is covered by at least one task or explicitly marked out-of-scope.
178
188
@@ -202,7 +212,7 @@ Please review the design and tasks. When ready, run /pb-build <feature-name> to
202
212
3.**Right-sized output (YAGNI).** Match output detail to requirement complexity. Simple changes get compact specs; complex features get full specs.
203
213
4.**Live codebase analysis.** Always search the actual codebase. Use `AGENTS.md` as complementary policy context, not a replacement for code inspection.
204
214
5.**Task granularity: Logical Unit of Work.** Each task is a self-contained, meaningful change. Do not split based on arbitrary time estimates.
205
-
6.**Verification per task.** Every task defines how to prove it is done.
215
+
6.**Verification per task.** Every task defines how to prove it is done; runtime-facing tasks include runtime observability evidence.
206
216
7.**Dependency order.** Phases and tasks flow foundational → dependent.
207
217
8.**Project-aware.** Use existing conventions, patterns, and tech stack. Reuse existing components — do not reinvent.
208
218
9.**Requirements coverage.** Track every requirement from input to design sections and tasks.
@@ -432,6 +442,7 @@ Please review the design and tasks. When ready, run /pb-build <feature-name> to
432
442
-[ ]**Step 1:** ...
433
443
-[ ]**Step 2:** ...
434
444
-[ ]**Verification:**[Concrete check]
445
+
-[ ]**Runtime Verification (if applicable):**[Capture runtime signals — e.g., `tail -n 50 app.log` and `curl http://localhost:8080/health`; if not applicable, write `N/A` with reason]
435
446
436
447
---
437
448
@@ -449,6 +460,7 @@ Please review the design and tasks. When ready, run /pb-build <feature-name> to
449
460
-[ ]**Step 1:** ...
450
461
-[ ]**Step 2:** ...
451
462
-[ ]**Verification:** ...
463
+
-[ ]**Runtime Verification (if applicable):**[Logs + probe result, or `N/A` with reason]
452
464
453
465
---
454
466
@@ -466,6 +478,7 @@ Please review the design and tasks. When ready, run /pb-build <feature-name> to
466
478
-[ ]**Step 1:** ...
467
479
-[ ]**Step 2:** ...
468
480
-[ ]**Verification:** ...
481
+
-[ ]**Runtime Verification (if applicable):**[Logs + probe result, or `N/A` with reason]
469
482
470
483
---
471
484
@@ -483,6 +496,7 @@ Please review the design and tasks. When ready, run /pb-build <feature-name> to
483
496
-[ ]**Step 1:** ...
484
497
-[ ]**Step 2:** ...
485
498
-[ ]**Verification:** ...
499
+
-[ ]**Runtime Verification (if applicable):**[Logs + probe result, or `N/A` with reason]
486
500
487
501
---
488
502
@@ -502,4 +516,5 @@ Please review the design and tasks. When ready, run /pb-build <feature-name> to
502
516
2.[ ]**Tested:** Unit tests covering added logic.
503
517
3.[ ]**Formatted:** Code formatter applied.
504
518
4.[ ]**Verified:** Task's specific Verification criterion met.
519
+
5.[ ]**Runtime-Evidenced (when applicable):** Runtime logs and health/probe results are captured, or `N/A` is explicitly justified.
0 commit comments