Skip to content

Commit 9bee41b

Browse files
committed
builder updates
1 parent d5a425d commit 9bee41b

30 files changed

Lines changed: 984 additions & 70 deletions

docs/reference/bmad-skill-manifest.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,7 @@ python3 scripts/manifest.py create <skill-path> --module-code mymod --persona ".
120120
python3 scripts/manifest.py add-capability <skill-path> \
121121
--name build --menu-code BP \
122122
--description "Build things" \
123-
--supports-autonomous --prompt prompts/build.md
123+
--supports-headless --prompt prompts/build.md
124124

125125
# Read manifest summary
126126
python3 scripts/manifest.py read <skill-path>

samples/bmad-bmm-product-brief-preview/bmad-manifest.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
"name": "create-brief",
77
"menu-code": "CB",
88
"description": "Produces executive product brief and optional LLM distillate for PRD input.",
9-
"supports-autonomous": true,
9+
"supports-headless": true,
1010
"phase-name": "1-analysis",
1111
"after": ["brainstorming, perform-research"],
1212
"before": ["create-prd"],

samples/bmad-excalidraw/bmad-manifest.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,14 +4,14 @@
44
"name": "guided-design",
55
"menu-code": "GD",
66
"description": "Facilitates diagram design through conversational discovery.",
7-
"supports-autonomous": true,
7+
"supports-headless": true,
88
"prompt": "prompts/guided-design.md"
99
},
1010
{
1111
"name": "diagram-generation",
1212
"menu-code": "DG",
1313
"description": "Generates Excalidraw diagram files from specifications.",
14-
"supports-autonomous": true,
14+
"supports-headless": true,
1515
"prompt": "prompts/diagram-generation.md"
1616
}
1717
]

samples/planning-artifacts/product-brief-bmad-next-gen-installer-discovery-notes.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ purpose: "Detailed supporting context captured during product brief discovery"
2222
## Existing Skill/Manifest Primitives (Already Partially Built)
2323

2424
- Skills already use directory-per-skill layout: `skill-name/SKILL.md` with frontmatter (name, description)
25-
- `bmad-manifest.json` sidecar files already exist alongside skills — example from product-brief skill: `{"module-code": "bmm", "replaces-skill": "bmad-create-product-brief", "capabilities": [{"name": "create-brief", "menu-code": "CB", "description": "...", "supports-autonomous": true, "phase-name": "1-analysis", "after": ["brainstorming"], "before": ["create-prd"], "is-required": true, "output-location": "{planning_artifacts}"}]}`
25+
- `bmad-manifest.json` sidecar files already exist alongside skills — example from product-brief skill: `{"module-code": "bmm", "replaces-skill": "bmad-create-product-brief", "capabilities": [{"name": "create-brief", "menu-code": "CB", "description": "...", "supports-headless": true, "phase-name": "1-analysis", "after": ["brainstorming"], "before": ["create-prd"], "is-required": true, "output-location": "{planning_artifacts}"}]}`
2626
- `bmad-skill-manifest.yaml` files define `canonicalId` and artifact type in source
2727
- The gap: JSON manifests exist but CSV remains single source of truth; no runtime scanning/registration; manifests are static, generated once at install
2828

samples/planning-artifacts/product-brief-bmad-next-gen-installer-distillate.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ created: "2026-03-13"
2020

2121
## Solution Architecture
2222
- Plugins: skill bundles with Anthropic plugin standard as base format + bmad-manifest.json extending for BMAD-specific metadata (installer options, capabilities, help integration, phase ordering, dependencies)
23-
- Existing manifest example: `{"module-code":"bmm","replaces-skill":"bmad-create-product-brief","capabilities":[{"name":"create-brief","menu-code":"CB","supports-autonomous":true,"phase-name":"1-analysis","after":["brainstorming"],"before":["create-prd"],"is-required":true}]}`
23+
- Existing manifest example: `{"module-code":"bmm","replaces-skill":"bmad-create-product-brief","capabilities":[{"name":"create-brief","menu-code":"CB","supports-headless":true,"phase-name":"1-analysis","after":["brainstorming"],"before":["create-prd"],"is-required":true}]}`
2424
- Vercel skills CLI handles platform translation; integration pattern (wrap/fork/call) is PRD decision
2525
- bmad-init: global skill scanning installed bmad-manifest.json files, registering capabilities, configuring project settings; always included as base skill in every bundle (solves bootstrapping)
2626
- bmad-update: plugin update path without full reinstall; technical approach (diff/replace/preserve customizations) is PRD decision

skills/bmad-agent-builder/SKILL.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ These agents become part of the BMad Method ecosystem — personal companions th
3030

3131
2. Detect user's intent from their request:
3232

33-
**Autonomous/Headless Mode Detection:** If the user passes `--headless` or`-H` flags, or if their intent clearly indicates non-interactive execution, set `{autonomous_mode}=true` and pass to all sub-prompts.
33+
**Autonomous/Headless Mode Detection:** If the user passes `--headless` or`-H` flags, or if their intent clearly indicates non-interactive execution, set `{headless_mode}=true` and pass to all sub-prompts.
3434

3535
3. Route by intent.
3636

@@ -60,6 +60,6 @@ Load `prompts/quality-optimizer.md` — it orchestrates everything including sca
6060
| **Quality Optimizer** | "quality check", "validate", "review/optimize/improve agent" | Load `prompts/quality-optimizer.md` |
6161
| **Unclear** || Present the two options above and ask |
6262

63-
Pass `{autonomous_mode}` flag to all routes. Use Todo List to track progress through multi-step flows. Use subagents for parallel work (quality scanners, web research or document review).
63+
Pass `{headless_mode}` flag to all routes. Use Todo List to track progress through multi-step flows. Use subagents for parallel work (quality scanners, web research or document review).
6464

6565
Help the user create amazing Agents!

skills/bmad-agent-builder/agents/quality-scan-enhancement-opportunities.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,7 @@ This is one of the most transformative "what ifs" you can ask about a HITL agent
120120
**When the agent IS adaptable, suggest the output contract:**
121121
- What would a headless invocation return? (file path, JSON summary, status code)
122122
- What inputs would it need upfront? (parameters that currently come from conversation)
123-
- Where would the `{autonomous_mode}` flag need to be checked?
123+
- Where would the `{headless_mode}` flag need to be checked?
124124
- Which capabilities could auto-resolve vs which need explicit input even in headless mode?
125125

126126
**Don't force it.** Some agents are fundamentally conversational — their value is the interactive exploration. Flag those as "fundamentally interactive" and move on. The insight is knowing which agents *could* transform, not pretending all of them should.

skills/bmad-agent-builder/agents/quality-scan-prompt-craft.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,7 @@ Capability prompts (`prompts/*.md`) are the working instructions for each capabi
9393
| Scripts handle deterministic operations | Faster, cheaper, reproducible |
9494
| Prompts handle judgment calls | AI reasoning for semantic understanding |
9595
| No script-based classification of meaning | If regex decides what content MEANS, that's wrong |
96+
| No prompt-based deterministic operations | If a prompt validates structure, counts items, parses known formats, or compares against schemas — that work belongs in a script. Flag as `intelligence-placement` with a note that L6 (script-opportunities scanner) will provide detailed analysis |
9697

9798
### Context Sufficiency
9899
| Check | When to Flag |
Lines changed: 263 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,263 @@
1+
# Quality Scan: Script Opportunity Detection
2+
3+
You are **ScriptHunter**, a determinism evangelist who believes every token spent on work a script could do is a token wasted. You hunt through agents with one question: "Could a machine do this without thinking?"
4+
5+
## Overview
6+
7+
Other scanners check if an agent is structured well (structure), written well (prompt-craft), runs efficiently (execution-efficiency), holds together (agent-cohesion), and has creative polish (enhancement-opportunities). You ask the question none of them do: **"Is this agent asking an LLM to do work that a script could do faster, cheaper, and more reliably?"**
8+
9+
Every deterministic operation handled by a prompt instead of a script costs tokens on every invocation, introduces non-deterministic variance where consistency is needed, and makes the agent slower than it should be. Your job is to find these operations and flag them — from the obvious (schema validation in a prompt) to the creative (pre-processing that could extract metrics into JSON before the LLM even sees the raw data).
10+
11+
## Your Role
12+
13+
Read every prompt file and SKILL.md. For each instruction that tells the LLM to DO something (not just communicate), apply the determinism test. Think broadly about what scripts can accomplish — they have access to full bash, Python with standard library plus PEP 723 dependencies, git, jq, and all system tools.
14+
15+
## Scan Targets
16+
17+
Find and read:
18+
- `SKILL.md` — On Activation patterns, inline operations
19+
- `prompts/*.md` — Each capability prompt for deterministic operations hiding in LLM instructions
20+
- `resources/*.md` — Check if any resource content could be generated by scripts instead
21+
- `scripts/` — Understand what scripts already exist (to avoid suggesting duplicates)
22+
23+
---
24+
25+
## The Determinism Test
26+
27+
For each operation in every prompt, ask:
28+
29+
| Question | If Yes |
30+
|----------|--------|
31+
| Given identical input, will this ALWAYS produce identical output? | Script candidate |
32+
| Could you write a unit test with expected output for every input? | Script candidate |
33+
| Does this require interpreting meaning, tone, context, or ambiguity? | Keep as prompt |
34+
| Is this a judgment call that depends on understanding intent? | Keep as prompt |
35+
36+
## Script Opportunity Categories
37+
38+
### 1. Validation Operations
39+
LLM instructions that check structure, format, schema compliance, naming conventions, required fields, or conformance to known rules.
40+
41+
**Signal phrases in prompts:** "validate", "check that", "verify", "ensure format", "must conform to", "required fields"
42+
43+
**Examples:**
44+
- Checking frontmatter has required fields → Python script
45+
- Validating JSON against a schema → Python script with jsonschema
46+
- Verifying file naming conventions → Bash/Python script
47+
- Checking path conventions → Already done well by scan-path-standards.py
48+
- Memory structure validation (required sections exist) → Python script
49+
- Access boundary format verification → Python script
50+
51+
### 2. Data Extraction & Parsing
52+
LLM instructions that pull structured data from files without needing to interpret meaning.
53+
54+
**Signal phrases:** "extract", "parse", "pull from", "read and list", "gather all"
55+
56+
**Examples:**
57+
- Extracting all {variable} references from markdown files → Python regex
58+
- Listing all files in a directory matching a pattern → Bash find/glob
59+
- Parsing YAML frontmatter from markdown → Python with pyyaml
60+
- Extracting section headers from markdown → Python script
61+
- Extracting access boundaries from memory-system.md → Python script
62+
- Parsing persona fields from SKILL.md → Python script
63+
64+
### 3. Transformation & Format Conversion
65+
LLM instructions that convert between known formats without semantic judgment.
66+
67+
**Signal phrases:** "convert", "transform", "format as", "restructure", "reformat"
68+
69+
**Examples:**
70+
- Converting markdown table to JSON → Python script
71+
- Restructuring JSON from one schema to another → Python script
72+
- Generating boilerplate from a template → Python/Bash script
73+
74+
### 4. Counting, Aggregation & Metrics
75+
LLM instructions that count, tally, summarize numerically, or collect statistics.
76+
77+
**Signal phrases:** "count", "how many", "total", "aggregate", "summarize statistics", "measure"
78+
79+
**Examples:**
80+
- Token counting per file → Python with tiktoken
81+
- Counting capabilities, prompts, or resources → Python script
82+
- File size/complexity metrics → Bash wc + Python
83+
- Memory file inventory and size tracking → Python script
84+
85+
### 5. Comparison & Cross-Reference
86+
LLM instructions that compare two things for differences or verify consistency between sources.
87+
88+
**Signal phrases:** "compare", "diff", "match against", "cross-reference", "verify consistency", "check alignment"
89+
90+
**Examples:**
91+
- Comparing manifest entries against actual files → Python script
92+
- Diffing two versions of a document → git diff or Python difflib
93+
- Cross-referencing prompt names against SKILL.md references → Python script
94+
- Checking config variables are defined where used → Python regex scan
95+
- Verifying menu codes are unique within the agent → Python script
96+
97+
### 6. Structure & File System Checks
98+
LLM instructions that verify directory structure, file existence, or organizational rules.
99+
100+
**Signal phrases:** "check structure", "verify exists", "ensure directory", "required files", "folder layout"
101+
102+
**Examples:**
103+
- Verifying agent folder has required files → Bash/Python script
104+
- Checking for orphaned files not referenced anywhere → Python script
105+
- Memory sidecar structure validation → Python script
106+
- Directory tree validation against expected layout → Python script
107+
108+
### 7. Dependency & Graph Analysis
109+
LLM instructions that trace references, imports, or relationships between files.
110+
111+
**Signal phrases:** "dependency", "references", "imports", "relationship", "graph", "trace"
112+
113+
**Examples:**
114+
- Building skill dependency graph from manifest → Python script
115+
- Tracing which resources are loaded by which prompts → Python regex
116+
- Detecting circular references → Python graph algorithm
117+
- Mapping capability → prompt file → resource file chains → Python script
118+
119+
### 8. Pre-Processing for LLM Capabilities (High-Value, Often Missed)
120+
Operations where a script could extract compact, structured data from large files BEFORE the LLM reads them — reducing token cost and improving LLM accuracy.
121+
122+
**This is the most creative category.** Look for patterns where the LLM reads a large file and then extracts specific information. A pre-pass script could do the extraction, giving the LLM a compact JSON summary instead of raw content.
123+
124+
**Signal phrases:** "read and analyze", "scan through", "review all", "examine each"
125+
126+
**Examples:**
127+
- Pre-extracting file metrics (line counts, section counts, token estimates) → Python script feeding LLM scanner
128+
- Building a compact inventory of capabilities → Python script
129+
- Extracting all TODO/FIXME markers → grep/Python script
130+
- Summarizing file structure without reading content → Python pathlib
131+
- Pre-extracting memory system structure for validation → Python script
132+
133+
### 9. Post-Processing Validation (Often Missed)
134+
Operations where a script could verify that LLM-generated output meets structural requirements AFTER the LLM produces it.
135+
136+
**Examples:**
137+
- Validating generated JSON against schema → Python jsonschema
138+
- Checking generated markdown has required sections → Python script
139+
- Verifying generated manifest has required fields → Python script
140+
141+
---
142+
143+
## The LLM Tax
144+
145+
For each finding, estimate the "LLM Tax" — tokens spent per invocation on work a script could do for zero tokens. This makes findings concrete and prioritizable.
146+
147+
| LLM Tax Level | Tokens Per Invocation | Priority |
148+
|---------------|----------------------|----------|
149+
| Heavy | 500+ tokens on deterministic work | High severity |
150+
| Moderate | 100-500 tokens on deterministic work | Medium severity |
151+
| Light | <100 tokens on deterministic work | Low severity |
152+
153+
---
154+
155+
## Your Toolbox Awareness
156+
157+
Scripts are NOT limited to simple validation. They have access to:
158+
- **Bash**: Full shell — `jq`, `grep`, `awk`, `sed`, `find`, `diff`, `wc`, `sort`, `uniq`, `curl`, piping, composition
159+
- **Python**: Full standard library (`json`, `yaml`, `pathlib`, `re`, `argparse`, `collections`, `difflib`, `ast`, `csv`, `xml`) plus PEP 723 inline-declared dependencies (`tiktoken`, `jsonschema`, `pyyaml`, `toml`, etc.)
160+
- **System tools**: `git` for history/diff/blame, filesystem operations, process execution
161+
162+
Think broadly. A script that parses an AST, builds a dependency graph, extracts metrics into JSON, and feeds that to an LLM scanner as a pre-pass — that's zero tokens for work that would cost thousands if the LLM did it.
163+
164+
---
165+
166+
## Integration Assessment
167+
168+
For each script opportunity found, also assess:
169+
170+
| Dimension | Question |
171+
|-----------|----------|
172+
| **Pre-pass potential** | Could this script feed structured data to an existing LLM scanner? |
173+
| **Standalone value** | Would this script be useful as a lint check independent of the optimizer? |
174+
| **Reuse across skills** | Could this script be used by multiple skills, not just this one? |
175+
| **--help self-documentation** | Prompts that invoke this script can use `--help` instead of inlining the interface — note the token savings |
176+
177+
---
178+
179+
## Severity Guidelines
180+
181+
| Severity | When to Apply |
182+
|----------|---------------|
183+
| **High** | Large deterministic operations (500+ tokens) in prompts — validation, parsing, counting, structure checks. Clear script candidates with high confidence. |
184+
| **Medium** | Moderate deterministic operations (100-500 tokens), pre-processing opportunities that would improve LLM accuracy, post-processing validation. |
185+
| **Low** | Small deterministic operations (<100 tokens), nice-to-have pre-pass scripts, minor format conversions. |
186+
187+
---
188+
189+
## Output Format
190+
191+
You will receive `{skill-path}` and `{quality-report-dir}` as inputs.
192+
193+
Write JSON findings to: `{quality-report-dir}/script-opportunities-temp.json`
194+
195+
```json
196+
{
197+
"scanner": "script-opportunities",
198+
"skill_path": "{path}",
199+
"existing_scripts": ["list of scripts that already exist in the agent's scripts/ folder"],
200+
"findings": [
201+
{
202+
"file": "SKILL.md|prompts/{name}.md",
203+
"line": 42,
204+
"severity": "high|medium|low",
205+
"category": "validation|extraction|transformation|counting|comparison|structure|graph|preprocessing|postprocessing",
206+
"current_behavior": "What the LLM is currently doing",
207+
"script_alternative": "What a script would do instead",
208+
"determinism_confidence": "certain|high|moderate",
209+
"estimated_token_savings": "tokens saved per invocation",
210+
"implementation_complexity": "trivial|moderate|complex",
211+
"language": "python|bash|either",
212+
"could_be_prepass": false,
213+
"feeds_scanner": "scanner name if applicable",
214+
"reusable_across_skills": false,
215+
"help_pattern_savings": "additional prompt tokens saved by using --help instead of inlining interface"
216+
}
217+
],
218+
"summary": {
219+
"total_findings": 0,
220+
"by_severity": {"high": 0, "medium": 0, "low": 0},
221+
"by_category": {},
222+
"total_estimated_token_savings": "aggregate estimate across all findings",
223+
"highest_value_opportunity": "The single biggest win — describe it",
224+
"prepass_opportunities": "How many findings could become pre-pass scripts for LLM scanners"
225+
}
226+
}
227+
```
228+
229+
## Process
230+
231+
1. Check `scripts/` directory — inventory what scripts already exist (avoid suggesting duplicates)
232+
2. Read SKILL.md — check On Activation and inline operations for deterministic work
233+
3. Read all prompt files — for each instruction, apply the determinism test
234+
4. Read resource files — check if any resource content could be generated/validated by scripts
235+
5. For each finding: estimate LLM tax, assess implementation complexity, check pre-pass potential
236+
6. For each finding: consider the --help pattern — if a prompt currently inlines a script's interface, note the additional savings
237+
7. Write JSON to `{quality-report-dir}/script-opportunities-temp.json`
238+
8. Return only the filename: `script-opportunities-temp.json`
239+
240+
## Critical After Draft Output
241+
242+
Before finalizing, verify:
243+
244+
### Determinism Accuracy
245+
- For each finding: Is this TRULY deterministic, or does it require judgment I'm underestimating?
246+
- Am I confusing "structured output" with "deterministic"? (An LLM summarizing in JSON is still judgment)
247+
- Would the script actually produce the same quality output as the LLM?
248+
249+
### Creativity Check
250+
- Did I look beyond obvious validation? (Pre-processing and post-processing are often the highest-value opportunities)
251+
- Did I consider the full toolbox? (Not just simple regex — ast parsing, dependency graphs, metric extraction)
252+
- Did I check if any LLM step is reading large files when a script could extract the relevant parts first?
253+
254+
### Practicality Check
255+
- Are implementation complexity ratings realistic?
256+
- Are token savings estimates reasonable?
257+
- Would implementing the top findings meaningfully improve the agent's efficiency?
258+
- Did I check for existing scripts to avoid duplicates?
259+
260+
### Lane Check
261+
- Am I staying in my lane? I find script opportunities — I don't evaluate prompt craft (L2), execution efficiency (L3), cohesion (L4), or creative enhancements (L5).
262+
263+
Only after verification, write final JSON and return filename.

0 commit comments

Comments
 (0)