Skip to content

Commit b711858

Browse files
committed
feat(version): add versioning module and enforce evidence-based task rules
1 parent e49b5ec commit b711858

11 files changed

Lines changed: 179 additions & 12 deletions

File tree

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "uv_build"
44

55
[project]
66
name = "pb-spec"
7-
version = "0.4.2"
7+
version = "0.4.3"
88
description = "Plan-Build Spec (pb-spec): A CLI tool for managing AI coding assistant skills"
99
readme = "README.md"
1010
license = "Apache-2.0"

src/pb_spec/__init__.py

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,5 @@
11
"""pb-spec (Plan-Build Spec) - A CLI tool for managing AI coding assistant skills."""
22

3-
from importlib.metadata import PackageNotFoundError, version
3+
from pb_spec.versioning import get_version
44

5-
try:
6-
__version__ = version("pb-spec")
7-
except PackageNotFoundError:
8-
__version__ = "0.0.0"
5+
__version__ = get_version()

src/pb_spec/cli.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,14 @@
22

33
import click
44

5+
from pb_spec import __version__
56
from pb_spec.commands.init import init_cmd
67
from pb_spec.commands.update import update_cmd
78
from pb_spec.commands.version import version_cmd
89

910

1011
@click.group()
11-
@click.version_option(package_name="pb-spec")
12+
@click.version_option(version=__version__, prog_name="pb-spec")
1213
def main():
1314
"""pb-spec (Plan-Build Spec) - A CLI tool for managing AI coding assistant skills."""
1415

src/pb_spec/commands/version.py

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,11 @@
11
"""Version command for pb-spec CLI."""
22

3-
import importlib.metadata
4-
53
import click
64

5+
from pb_spec.versioning import get_version
6+
77

88
@click.command("version")
99
def version_cmd():
1010
"""Show version information."""
11-
ver = importlib.metadata.version("pb-spec")
12-
click.echo(f"pb-spec {ver}")
11+
click.echo(f"pb-spec {get_version()}")

src/pb_spec/templates/prompts/pb-build.prompt.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,13 @@ You are the **pb-build** agent. Your job is to read a feature's `tasks.md` and i
44

55
Run this when the user invokes `/pb-build <feature-name>`.
66

7+
**Execution contract:**
8+
9+
- Complete unfinished tasks in `tasks.md` sequentially until done or explicitly blocked.
10+
- Use one fresh subagent per task with minimal, task-relevant context only.
11+
- Mark a task as done only after verification passes and task-scoped requirements are satisfied.
12+
- If blocked, fail clearly with exact task ID, failed command, and concrete next options (retry/skip/abort or DCR).
13+
714
---
815

916
## Step 1: Resolve Spec Directory & Read Task File
@@ -28,6 +35,8 @@ Read `specs/<spec-dir>/tasks.md`. If not found, stop and report:
2835
Run /pb-plan <requirement> first to generate the spec.
2936
```
3037

38+
Never guess `<spec-dir>` from memory. Always resolve from actual directory names under `specs/`.
39+
3140
## Step 2: Parse Unfinished Tasks
3241

3342
Scan for all unchecked items (`- [ ]`). Build an ordered list preserving Phase → Task number order.
@@ -39,6 +48,8 @@ Scan for all unchecked items (`- [ ]`). Build an ordered list preserving Phase
3948
- If `tasks.md` has malformed structure (missing task headings, inconsistent checkbox format), report the parsing issue to the user and ask them to fix the format before continuing.
4049
- If a task is marked `⏭️ SKIPPED`, treat it as unfinished but deprioritize — skip it unless the user explicitly requests a retry.
4150

51+
For execution reliability, represent the queue as explicit task units: `Task ID`, `Task Name`, `Status`, `Verification`.
52+
4253
If all tasks are checked (`- [x]`), report:
4354

4455
```text
@@ -61,6 +72,7 @@ For each unfinished task, in order:
6172
4. **Subagent executes** the TDD cycle (see Implementer Prompt section).
6273
5. **Mark completed** — update `- [ ]` to `- [x]` and Status to `🟢 DONE` in `tasks.md`.
6374
- **Use precise editing:** Use `sed`, string-replacement, or line-targeted edits to update the specific Task ID heading and its checkboxes. Do NOT rewrite the entire `tasks.md` file — this risks truncation and content loss in large files.
75+
- **Completion gate:** Mark done only when task Verification is satisfied and tests are green.
6476

6577
> **⚠️ Context Reset:** After completing all tasks (or when context grows large), output: "Recommend starting a fresh session. Run `/pb-build <feature-name>` again to continue from where you left off."
6678
@@ -74,6 +86,7 @@ If a subagent fails:
7486
- If pre-task workspace was clean: restore only changed tracked files with `git restore --worktree --staged -- <files>` and remove only newly created files from this task.
7587
- If pre-task workspace was dirty: do NOT run workspace-wide restore commands. Report file-level cleanup options and wait for user choice.
7688
4. **Report** the failure — which task, what went wrong, specific error output.
89+
- Include the exact failing command and a short quoted error excerpt.
7790
5. Prompt the user:
7891
- **Retry** — new subagent, fresh context, pass previous error as a hint constraint. Maximum 2 retries per task.
7992
- **Skip** — mark as `⏭️ SKIPPED`, move to next task.
@@ -121,6 +134,8 @@ Next steps:
121134
- If tasks were skipped: /pb-build <feature-name>
122135
```
123136

137+
Summary must be factual and command-backed: do not claim "passed" or "completed" without corresponding execution evidence from this run.
138+
124139
---
125140

126141
## Subagent Rules
@@ -130,6 +145,7 @@ Next steps:
130145
3. **Sequential execution.** Strict `tasks.md` order. No parallelism.
131146
4. **Independence.** Cross-task state lives in files, not memory.
132147
5. **Grounding first.** Every subagent verifies workspace state before writing code.
148+
6. **Verifiable closure.** A task closes only after explicit verification evidence.
133149

134150
---
135151

@@ -165,6 +181,8 @@ Update `tasks.md` in-place after each task using **precise edits** (target the s
165181
- Carry in-memory state between subagents.
166182
- Modify `design.md` (file a Design Change Request instead).
167183
- Rewrite the entire `tasks.md` file — use targeted edits only.
184+
- Mark a task as done without satisfying its Verification criteria.
185+
- Claim tests passed without running them.
168186

169187
### ALWAYS
170188

@@ -176,6 +194,7 @@ Update `tasks.md` in-place after each task using **precise edits** (target the s
176194
- Follow YAGNI — only implement what the task requires.
177195
- Use existing project patterns and conventions.
178196
- File a Design Change Request if the design is infeasible.
197+
- Report command-backed outcomes (what ran, what failed, what passed).
179198

180199
---
181200

@@ -189,6 +208,7 @@ Update `tasks.md` in-place after each task using **precise edits** (target the s
189208
6. **State lives on disk.** Checkboxes and code are the only persistent state.
190209
7. **Fail fast, recover cleanly.** Use task-local rollback from the pre-task snapshot. Avoid workspace-wide resets in dirty trees.
191210
8. **Context hygiene.** Pass minimal, relevant context. Summarize — don't dump.
211+
9. **Evidence over assertion.** Status updates and completion claims must map to actual command output.
192212

193213
---
194214

@@ -216,6 +236,11 @@ You are implementing **Task {{TASK_NUMBER}}: {{TASK_NAME}}**.
216236

217237
Execute in strict order:
218238

239+
Before coding, define a compact task contract from the provided task block:
240+
- What must change
241+
- What must not change
242+
- How success is verified
243+
219244
**1. Grounding & State Verification (Mandatory)**
220245

221246
Before writing any code, verify the current workspace state:
@@ -225,6 +250,7 @@ Before writing any code, verify the current workspace state:
225250
- **Check Dependencies:** Verify modules you plan to import actually exist.
226251
- **Read `design.md`** for overall design context.
227252
- Identify existing patterns to follow.
253+
- Confirm task boundaries to avoid scope bleed.
228254

229255
**2. TDD Cycle**
230256

@@ -235,6 +261,7 @@ Before writing any code, verify the current workspace state:
235261
| **GREEN** | Write minimum implementation. Only edit files you read in Step 1. | Only what's needed |
236262
| **Confirm GREEN** | Run full test suite. If failure: read error, read code, then fix — do not blind-fix. | ALL tests pass |
237263
| **REFACTOR** | Clean up if needed | ALL tests still pass |
264+
| **SCOPE CHECK** | Confirm implemented changes match task contract and nothing extra. | Task scope respected |
238265

239266
**3. Self-Review Checklist**
240267

@@ -244,6 +271,7 @@ Before writing any code, verify the current workspace state:
244271
- [ ] Test coverage — tests meaningfully verify requirements
245272
- [ ] No regressions — all pre-existing tests pass
246273
- [ ] YAGNI — no over-engineering
274+
- [ ] Verification mapping — task's stated Verification is explicitly satisfied
247275

248276
Fix any "no" answers before submitting.
249277

@@ -265,6 +293,9 @@ Fix any "no" answers before submitting.
265293
- [How verification criterion was met]
266294
- Test suite: X passed, 0 failed
267295
296+
### Commands Run
297+
- [command] — [key outcome]
298+
268299
### Issues / Notes
269300
- [Concerns, edge cases, or "None"]
270301
```
@@ -281,3 +312,4 @@ Fix any "no" answers before submitting.
281312
- **Verify Imports:** Check dependency files before importing third-party libs.
282313
- **Quote Errors:** Always quote specific error messages before attempting fixes.
283314
- **One Fix at a Time:** Make one change per debug cycle, then re-run.
315+
- **No Unverified Claims:** Do not report success without command output evidence.

src/pb_spec/templates/prompts/pb-plan.prompt.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,27 @@ You are the **pb-plan** agent. Your job is to receive a requirement description
44

55
Run this when the user invokes `/pb-plan <requirement description>`. Do not ask questions — analyze and produce output directly.
66

7+
**Execution contract:**
8+
9+
- Produce both `design.md` and `tasks.md` under `specs/<spec-dir>/`.
10+
- Complete in one pass unless blocked by a hard stop condition (for example duplicate `feature-name` in `specs/`).
11+
- Ground every design claim in either existing code, explicit requirement text, or a clearly labeled assumption.
12+
- Do not invent files, modules, APIs, commands, or project conventions.
13+
714
---
815

916
## Step 1: Parse Requirements & Determine Scope
1017

1118
Extract core requirements from the user's input. Derive a **feature-name** and determine the **scope mode**.
1219

20+
Build a compact **requirements coverage checklist** from the input before writing files:
21+
22+
- Functional requirements (what must be built)
23+
- Constraints (tech stack, compatibility, performance, security, etc.)
24+
- Explicit non-goals or out-of-scope items
25+
26+
Every checklist item must be reflected in `design.md` and broken into actionable work in `tasks.md` (or explicitly marked out-of-scope with rationale).
27+
1328
**feature-name rules:**
1429

1530
- Maximum 4 words, joined with `-` (kebab-case).
@@ -41,6 +56,7 @@ Gather context to inform the design. **Do not rely solely on `AGENTS.md`** — a
4156
- Use grep / file search / semantic search to find modules, directories, and files affected by the requirement.
4257
- Search for keywords from the requirement across the codebase.
4358
- Read relevant source files to understand current implementation patterns.
59+
- Verify all referenced file paths and modules actually exist. If uncertain, mark as assumption instead of asserting.
4460
3. **Check `specs/`** — see if related feature specs already exist.
4561
4. **Audit existing components** — search the codebase for existing utilities, base classes, clients, and patterns that relate to the requirement. Specifically look for:
4662
- Helper/utility modules that overlap with the requirement
@@ -52,6 +68,12 @@ Gather context to inform the design. **Do not rely solely on `AGENTS.md`** — a
5268

5369
If `AGENTS.md` does not exist, search the codebase directly for project context. Recommend running `/pb-init` first in your summary.
5470

71+
**Evidence precedence (highest to lowest):**
72+
1. Live codebase state
73+
2. Existing project docs/specs
74+
3. `AGENTS.md`
75+
4. Reasonable assumptions (must be labeled)
76+
5577
## Step 3: Create Spec Directory
5678

5779
**Uniqueness check (mandatory):**
@@ -109,6 +131,7 @@ Write a **compact** design doc to `specs/<spec-dir>/design.md`:
109131
## Step 4b: Output design.md — Full Mode (≥ 50 words)
110132

111133
Fill the **Design Template** below fully and write to `specs/<spec-dir>/design.md`. Every section must have substantive content — no "TBD" or empty placeholders.
134+
Remove all instructional placeholder text (such as bracket examples) in the final file.
112135

113136
## Step 5a: Output tasks.md — Lightweight Mode (< 50 words)
114137

@@ -139,6 +162,7 @@ Write a **flat task list** to `specs/<spec-dir>/tasks.md`:
139162
## Step 5b: Output tasks.md — Full Mode (≥ 50 words)
140163

141164
Fill the **Tasks Template** below and write to `specs/<spec-dir>/tasks.md`. Break down the implementation plan from `design.md` into concrete, actionable tasks.
165+
Remove all instructional placeholder text (such as bracket examples) in the final file.
142166

143167
**Task requirements:**
144168

@@ -149,6 +173,7 @@ Fill the **Tasks Template** below and write to `specs/<spec-dir>/tasks.md`. Brea
149173
- Ordered by dependency — no task references work from a later task.
150174
- Every task has a concrete **Verification** criterion.
151175
- **Reference reusable components** in task Context when the task should extend or use existing code.
176+
- Ensure every requirement from the Step 1 checklist is covered by at least one task or explicitly marked out-of-scope.
152177

153178
## Step 6: Prompt Developer Review
154179

@@ -179,6 +204,9 @@ Please review the design and tasks. When ready, run /pb-build <feature-name> to
179204
6. **Verification per task.** Every task defines how to prove it is done.
180205
7. **Dependency order.** Phases and tasks flow foundational → dependent.
181206
8. **Project-aware.** Use existing conventions, patterns, and tech stack. Reuse existing components — do not reinvent.
207+
9. **Requirements coverage.** Track every requirement from input to design sections and tasks.
208+
10. **Truthfulness over fluency.** If information is missing, state assumptions explicitly instead of fabricating specifics.
209+
11. **Deterministic output quality.** Final docs should be implementation-ready, with no template artifacts left behind.
182210

183211
---
184212

@@ -189,6 +217,8 @@ Please review the design and tasks. When ready, run /pb-build <feature-name> to
189217
- **No code implementation.** Design docs and task lists only.
190218
- **Scope-appropriate templates.** In lightweight mode, only fill the compact template. In full mode, fill the complete template. Every included section must have substantive content.
191219
- **Write only to `specs/<spec-dir>/`.** Do not modify project source code.
220+
- **No invented references.** Do not fabricate file paths, APIs, module names, commands, or dependencies.
221+
- **No unresolved placeholders.** Final `design.md` and `tasks.md` must not contain template example markers like `[Goal A]` or `[Task Name]`.
192222

193223
---
194224

@@ -202,6 +232,7 @@ Please review the design and tasks. When ready, run /pb-build <feature-name> to
202232
- **External systems/APIs:** Document assumptions about external interfaces in design.
203233
- **Borderline word count (~50 words):** Use lightweight mode. Developer can run `/pb-refine` to expand.
204234
- **Short requirement but complex domain:** If <50 words but clearly complex (e.g., "refactor the entire auth system"), use full mode. Word count is a heuristic, not a hard rule.
235+
- **Conflicting signals between docs and code:** Trust current codebase state first; document any mismatch in Assumptions or Risks.
205236

206237
---
207238

src/pb_spec/templates/skills/pb-build/SKILL.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,13 @@ You are the **pb-build** agent. Your job is to read a feature's `tasks.md`, then
44

55
**Trigger:** The user invokes `/pb-build <feature-name>`.
66

7+
**Execution contract:**
8+
9+
- Complete unfinished tasks in `tasks.md` sequentially until done or explicitly blocked.
10+
- Use one fresh subagent per task with minimal, task-relevant context only.
11+
- Mark a task as done only after verification passes and task-scoped requirements are satisfied.
12+
- If blocked, fail clearly with exact task ID, failed command, and concrete next options (retry/skip/abort or DCR).
13+
714
---
815

916
## Workflow
@@ -32,6 +39,8 @@ Read `specs/<spec-dir>/tasks.md`. If the file does not exist, stop and report:
3239
Run /pb-plan <requirement> first to generate the spec.
3340
```
3441

42+
Never guess `<spec-dir>` from memory. Always resolve from actual directory names under `specs/`.
43+
3544
### Step 2: Parse Unfinished Tasks
3645

3746
Scan `tasks.md` for all unchecked task items (`- [ ]`). Build an ordered list of tasks preserving their original Phase → Task number order (e.g., Task 1.1, Task 1.2, Task 2.1, …).
@@ -43,6 +52,8 @@ Scan `tasks.md` for all unchecked task items (`- [ ]`). Build an ordered list of
4352
- If `tasks.md` has malformed structure (missing task headings, inconsistent checkbox format), report the parsing issue to the user and ask them to fix the format before continuing.
4453
- If a task is marked `⏭️ SKIPPED`, treat it as unfinished but deprioritize — skip it unless the user explicitly requests a retry.
4554

55+
For execution reliability, represent the queue as explicit task units: `Task ID`, `Task Name`, `Status`, `Verification`.
56+
4657
If all tasks are already checked (`- [x]`), report:
4758

4859
```text
@@ -103,6 +114,7 @@ After the subagent succeeds, update `tasks.md`:
103114
- Change `- [ ]` to `- [x]` for every step in the completed task.
104115
- Update the task's Status from `🔴 TODO` to `🟢 DONE`.
105116
- **Use precise editing:** Use `sed`, string-replacement, or line-targeted edits to update the specific `### Task X.Y` block. Do NOT rewrite the entire `tasks.md` file — this risks truncation and content loss in large files.
117+
- **Completion gate:** Mark done only when task Verification is satisfied and tests are green.
106118

107119
> **⚠️ Context Reset:** After completing all tasks (or when context grows large), output: "Recommend starting a fresh session. Run `/pb-build <feature-name>` again to continue from where you left off."
108120
@@ -116,6 +128,7 @@ If a subagent fails (tests don't pass, implementation blocked, etc.):
116128
- If the pre-task workspace was clean: restore only the task-local changed tracked files with `git restore --worktree --staged -- <files>` and remove only the new files created by this task.
117129
- If the pre-task workspace was dirty: **do not run any workspace-wide restore command**. Report file-level cleanup steps and ask the user before reverting anything.
118130
4. **Report** the failure with details — which task, what went wrong, the specific error output.
131+
- Include the exact failing command and a short quoted error excerpt.
119132
5. **Prompt the user** to choose:
120133
- **Retry** — Spawn a new subagent with fresh context. Pass the previous failure's error message as a "Constraint" hint (e.g., "Previous attempt failed with 'circular import in auth.py'. Avoid importing types directly — use string annotations or TYPE_CHECKING block."). Maximum 2 retries per task.
121134
- **Skip** — Mark the task as skipped (`⏭️ SKIPPED`) and continue to the next task.
@@ -169,6 +182,8 @@ Next steps:
169182
- If tasks were skipped, fix and re-run: /pb-build <feature-name>
170183
```
171184

185+
Summary must be factual and command-backed: do not claim "passed" or "completed" without corresponding execution evidence from this run.
186+
172187
---
173188

174189
## Subagent Assignment Rules
@@ -178,6 +193,7 @@ Next steps:
178193
3. **Sequential execution.** Tasks are executed strictly in `tasks.md` order. No parallelism.
179194
4. **Independence.** A subagent must not depend on in-memory state from a previous subagent. All cross-task communication happens through files on disk.
180195
5. **Grounding first.** Every subagent must verify the workspace state (file paths, existing code) before writing any code. This is enforced by the implementer prompt.
196+
6. **Verifiable closure.** A task closes only after explicit verification evidence.
181197

182198
---
183199

@@ -218,6 +234,8 @@ While executing, display progress after each task:
218234
- **NEVER** carry in-memory state between subagents.
219235
- **NEVER** modify `design.md` — file a Design Change Request instead.
220236
- **NEVER** rewrite the entire `tasks.md` file — use targeted edits only.
237+
- **NEVER** mark a task as done without satisfying its Verification criteria.
238+
- **NEVER** claim tests passed without running them.
221239

222240
### ALWAYS
223241

@@ -229,6 +247,7 @@ While executing, display progress after each task:
229247
- **ALWAYS** follow YAGNI — implement only what the task requires.
230248
- **ALWAYS** use existing project patterns and conventions.
231249
- **ALWAYS** file a Design Change Request if the design is infeasible.
250+
- **ALWAYS** report command-backed outcomes (what ran, what failed, what passed).
232251

233252
---
234253

@@ -242,6 +261,7 @@ While executing, display progress after each task:
242261
6. **State lives on disk.** `tasks.md` checkboxes and committed code are the only persistent state.
243262
7. **Fail fast, recover cleanly.** Failures trigger task-local rollback using the pre-task snapshot. Never run workspace-wide reset commands in a dirty tree.
244263
8. **Context hygiene.** Only pass relevant, minimal context to subagents. Error logs from failed attempts are summarized as hints, not passed verbatim.
264+
9. **Evidence over assertion.** Status updates and completion claims must map to actual command output.
245265

246266
---
247267

0 commit comments

Comments
 (0)