Skip to content

Commit 72ce8b6

Browse files
authored
feat: add script creation standards and cross-platform guidance (#46)
Add script-standards reference docs to both builders enforcing Python-first, PEP 723 metadata, uv run invocation, and cross-platform portability. Update build processes to load standards when scripts are involved and require explicit user approval for external dependencies. Add explanation doc covering why deterministic scripts improve skill quality.
1 parent 9a4ae85 commit 72ce8b6

8 files changed

Lines changed: 325 additions & 10 deletions

File tree

docs/explanation/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ Create world-class AI agents and workflows with the BMad Builder.
2121
| **[Progressive Disclosure](/explanation/progressive-disclosure.md)** | Four layers of context loading — from frontmatter through step files |
2222
| **[Subagent Patterns](/explanation/subagent-patterns.md)** | Six orchestration patterns for parallel and hierarchical work |
2323
| **[Skill Authoring Best Practices](/explanation/skill-authoring-best-practices.md)** | Core principles, common patterns, quality dimensions, and anti-patterns |
24+
| **[Scripts in Skills](/explanation/scripts-in-skills.md)** | Why deterministic scripts make skills faster, cheaper, and more reliable |
2425

2526
## Reference
2627

Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
---
2+
title: "Scripts in Skills"
3+
description: Why deterministic scripts make skills faster, cheaper, and more reliable — and the technical choices behind portable script design
4+
---
5+
6+
Scripts are the reliability backbone of a well-built skill. They handle work that has clear right-and-wrong answers — validation, transformation, extraction, counting — so the LLM can focus on what it does best: judgment, synthesis, and creative reasoning.
7+
8+
## The Problem: LLMs Do Too Much
9+
10+
Without scripts, every operation in a skill runs through the LLM. That means:
11+
12+
- **Non-deterministic results.** Ask an LLM to count tokens in a file three times and you may get three different numbers. Ask a script and you get the same answer every time.
13+
- **Wasted tokens and time.** Parsing a JSON file, checking if a directory exists, or comparing two strings are mechanical operations. Running them through the LLM burns context window and adds latency for no gain.
14+
- **Harder to test.** You can write unit tests for a script. You cannot write unit tests for an LLM prompt.
15+
16+
The pattern shows up everywhere: skills that try to LLM their way through structural validation are slower, less reliable, and more expensive than skills that offload those checks to scripts.
17+
18+
## The Determinism Boundary
19+
20+
The core design principle is **intelligence placement** — put each operation where it belongs.
21+
22+
| Scripts Handle | LLM Handles |
23+
| -------------- | ----------- |
24+
| Validate structure, format, schema | Interpret meaning, evaluate quality |
25+
| Count, parse, extract, transform | Classify ambiguous input, make judgment calls |
26+
| Compare, diff, check consistency | Synthesize insights, generate creative output |
27+
| Pre-process data into compact form | Analyze pre-processed data with domain reasoning |
28+
29+
**The test:** Given identical input, will this operation always produce identical output? If yes, it belongs in a script. Could you write a unit test with expected output? Definitely a script. Requires interpreting meaning, tone, or context? Keep it as an LLM prompt.
30+
31+
:::tip[The Pre-Processing Pattern]
32+
One of the highest-value script uses is pre-processing. A script extracts compact metrics from large files into a small JSON summary. The LLM then reasons over the summary instead of reading raw files — dramatically reducing token usage while improving analysis quality because the data is clean and structured.
33+
:::
34+
35+
## Why Python, Not Bash
36+
37+
Skills must work across macOS, Linux, and Windows. Bash is not portable.
38+
39+
| Factor | Bash | Python |
40+
| ------ | ---- | ------ |
41+
| **macOS / Linux** | Works | Works |
42+
| **Windows (native)** | Fails or behaves inconsistently | Works identically |
43+
| **Windows (WSL)** | Works, but can conflict with Git Bash on PATH | Works identically |
44+
| **Error handling** | Limited, fragile | Rich exception handling |
45+
| **Testing** | Difficult | Standard unittest/pytest |
46+
| **Complex logic** | Quickly becomes unreadable | Clean, maintainable |
47+
48+
Even basic commands like `sed -i` behave differently on macOS vs Linux. Piping, `jq`, `grep`, `awk` — all of these have cross-platform pitfalls that Python's standard library avoids entirely.
49+
50+
**Safe bash commands** that work everywhere and remain fine to use directly:
51+
52+
| Command | Purpose |
53+
| ------- | ------- |
54+
| `git`, `gh` | Version control and GitHub CLI |
55+
| `uv run` | Python script execution |
56+
| `npm`, `npx`, `pnpm` | Node.js ecosystem |
57+
| `mkdir -p` | Directory creation |
58+
59+
Everything beyond that list should be a Python script.
60+
61+
## Standard Library First
62+
63+
Python's standard library covers most script needs without any external dependencies. Stdlib-only scripts run with plain `python3`, need no special tooling, and have zero supply-chain risk.
64+
65+
| Need | Standard Library |
66+
| ---- | ---------------- |
67+
| JSON parsing | `json` |
68+
| Path handling | `pathlib` |
69+
| Pattern matching | `re` |
70+
| CLI interface | `argparse` |
71+
| Text comparison | `difflib` |
72+
| Counting, grouping | `collections` |
73+
| Source analysis | `ast` |
74+
| Data formats | `csv`, `xml.etree` |
75+
76+
Only reach for external dependencies when the stdlib genuinely cannot do the job — `tiktoken` for accurate token counting, `pyyaml` for YAML parsing, `jsonschema` for schema validation. Each external dependency adds install-time cost, requires `uv` to be available, and expands the supply-chain surface. The BMad builders require explicit user approval for any external dependency during the build process.
77+
78+
## Zero-Friction Dependencies with PEP 723
79+
80+
Python scripts in skills use [PEP 723](https://peps.python.org/pep-0723/) inline metadata to declare their dependencies directly in the file. Combined with `uv run`, this gives you `npx`-like behavior — dependencies are silently cached in an isolated environment, no global installs, no user prompts.
81+
82+
```python
83+
#!/usr/bin/env -S uv run --script
84+
# /// script
85+
# requires-python = ">=3.10"
86+
# dependencies = ["pyyaml>=6.0"]
87+
# ///
88+
89+
import yaml
90+
# script logic here
91+
```
92+
93+
When a skill invokes this script with `uv run scripts/analyze.py`, the dependency (`pyyaml` in this example) is automatically resolved. The user never sees an install prompt, never needs to manage a virtual environment, and never pollutes their global Python installation.
94+
95+
**Why this matters for skill authoring:** Without PEP 723, skills that needed libraries like `pyyaml` or `tiktoken` would force users to run `pip install` — a jarring, trust-breaking experience that makes users hesitate to adopt the skill.
96+
97+
## Graceful Degradation
98+
99+
Skills run in multiple environments: CLI terminals, desktop apps, IDE extensions, and web interfaces like claude.ai. Not all environments can execute Python scripts.
100+
101+
The principle: **scripts are the fast, reliable path — but the skill must still deliver its outcome when execution is unavailable.**
102+
103+
When a script cannot run, the LLM performs the equivalent work directly. This is slower and less deterministic, but the user still gets a result. The script's `--help` output documents what it checks, making the fallback natural — the LLM reads the help to understand the script's purpose and replicates the logic.
104+
105+
Frame script steps as outcomes in the SKILL.md, not just commands:
106+
107+
| Approach | Example |
108+
| -------- | ------- |
109+
| **Good** | "Validate path conventions (run `scripts/scan-paths.py --help` for details)" |
110+
| **Fragile** | "Execute `python3 scripts/scan-paths.py`" with no context |
111+
112+
The good version tells the LLM both what to accomplish and where to find the details — enabling graceful degradation without additional instructions.
113+
114+
## When to Reach for a Script
115+
116+
Look for these signal verbs in a skill's requirements — they indicate script opportunities:
117+
118+
| Signal | Script Type |
119+
| ------ | ----------- |
120+
| "validate", "check", "verify" | Validation |
121+
| "count", "tally", "aggregate" | Metrics |
122+
| "extract", "parse", "pull from" | Data extraction |
123+
| "convert", "transform", "format" | Transformation |
124+
| "compare", "diff", "match against" | Comparison |
125+
| "scan for", "find all", "list all" | Pattern scanning |
126+
127+
The builders guide you through script opportunity discovery during the build process. The key insight: if you find yourself writing detailed validation logic in a prompt, it almost certainly belongs in a script instead.

skills/bmad-agent-builder/build-process.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ Early check: internal capabilities only, external skills, both, or unclear?
4646

4747
**Script Opportunity Discovery** (active probing — do not skip):
4848

49-
Identify deterministic operations that should be scripts. Load `./references/script-opportunities-reference.md` for guidance. Confirm the script-vs-prompt plan with the user before proceeding.
49+
Identify deterministic operations that should be scripts. Load `./references/script-opportunities-reference.md` for guidance. Confirm the script-vs-prompt plan with the user before proceeding. If any scripts require external dependencies (anything beyond Python's standard library), explicitly list each dependency and get user approval — dependencies add install-time cost and require `uv` to be available.
5050

5151
## Phase 3: Gather Requirements
5252

@@ -125,6 +125,8 @@ Activation is a single flow regardless of mode. It should:
125125
- If headless, route to `./references/autonomous-wake.md`
126126
- If interactive, greet the user and continue from memory context or offer capabilities
127127

128+
**If the built agent includes scripts**, also load `./references/script-standards.md` — ensures PEP 723 metadata, correct shebangs, and `uv run` invocation from the start.
129+
128130
**Lint gate** — after building, validate and auto-fix:
129131

130132
If subagents available, delegate lint-fix to a subagent. Otherwise run inline.

skills/bmad-agent-builder/references/script-opportunities-reference.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -48,10 +48,12 @@ Beyond obvious validation, consider:
4848
- Could metric collection feed into LLM decision-making without the LLM doing the counting?
4949

5050
### Your Toolbox
51-
Scripts have access to full capabilities — think broadly:
52-
- **Bash**: Full shell — `jq`, `grep`, `awk`, `sed`, `find`, `diff`, `wc`, `sort`, `uniq`, `curl`, plus piping and composition
53-
- **Python**: Standard library (`json`, `yaml`, `pathlib`, `re`, `argparse`, `collections`, `difflib`, `ast`, `csv`, `xml`, etc.) plus PEP 723 inline-declared dependencies (`tiktoken`, `jsonschema`, `pyyaml`, etc.)
54-
- **System tools**: `git` commands for history/diff/blame, filesystem operations, process execution
51+
52+
**Python is the default** for all script logic (cross-platform: macOS, Linux, Windows/WSL). See `references/script-standards.md` for full rationale and safe bash commands.
53+
54+
- **Python:** Standard library (`json`, `pathlib`, `re`, `argparse`, `collections`, `difflib`, `ast`, `csv`, `xml`, etc.) plus PEP 723 inline-declared dependencies (`tiktoken`, `jsonschema`, `pyyaml`, etc.)
55+
- **Safe shell commands:** `git`, `gh`, `uv run`, `npm`/`npx`/`pnpm`, `mkdir -p`
56+
- **Avoid bash for logic** — no piping, `jq`, `grep`, `sed`, `awk`, `find`, `diff`, `wc` in scripts. Use Python equivalents instead.
5557

5658
If you can express the logic as deterministic code, it's a script candidate.
5759

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# Script Creation Standards
2+
3+
When building scripts for a skill, follow these standards to ensure portability and zero-friction execution. Skills must work across macOS, Linux, and Windows (native, Git Bash, and WSL).
4+
5+
## Python Over Bash
6+
7+
**Always favor Python for script logic.** Bash is not portable — it fails or behaves inconsistently on Windows (Git Bash is MSYS2-based, not a full Linux shell; WSL bash can conflict with Git Bash on PATH; PowerShell is a different language entirely). Python with `uv run` works identically on all platforms.
8+
9+
**Safe bash commands** — these work reliably across all environments and are fine to use directly:
10+
- `git`, `gh` — version control and GitHub CLI
11+
- `uv run` — Python script execution with automatic dependency handling
12+
- `npm`, `npx`, `pnpm` — Node.js ecosystem
13+
- `mkdir -p` — directory creation
14+
15+
**Everything else should be Python** — piping, `jq`, `grep`, `sed`, `awk`, `find`, `diff`, `wc`, and any non-trivial logic. Even `sed -i` behaves differently on macOS vs Linux. If it's more than a single safe command, write a Python script.
16+
17+
## Favor the Standard Library
18+
19+
Always prefer Python's standard library over external dependencies. The stdlib is pre-installed everywhere, requires no `uv run`, and has zero supply-chain risk. Common stdlib modules that cover most script needs:
20+
21+
- `json` — JSON parsing and output
22+
- `pathlib` — cross-platform path handling
23+
- `re` — pattern matching
24+
- `argparse` — CLI interface
25+
- `collections` — counters, defaultdicts
26+
- `difflib` — text comparison
27+
- `ast` — Python source analysis
28+
- `csv`, `xml.etree` — data formats
29+
30+
Only pull in external dependencies when the stdlib genuinely cannot do the job (e.g., `tiktoken` for accurate token counting, `pyyaml` for YAML parsing, `jsonschema` for schema validation). **External dependencies must be confirmed with the user during the build process** — they add install-time cost, supply-chain surface, and require `uv` to be available.
31+
32+
## PEP 723 Inline Metadata (Required)
33+
34+
Every Python script MUST include a PEP 723 metadata block. For scripts with external dependencies, use the `uv run` shebang:
35+
36+
```python
37+
#!/usr/bin/env -S uv run --script
38+
# /// script
39+
# requires-python = ">=3.10"
40+
# dependencies = ["pyyaml>=6.0", "jsonschema>=4.0"]
41+
# ///
42+
```
43+
44+
For scripts using only the standard library, use a plain Python shebang but still include the metadata block:
45+
46+
```python
47+
#!/usr/bin/env python3
48+
# /// script
49+
# requires-python = ">=3.10"
50+
# ///
51+
```
52+
53+
**Key rules:**
54+
- The shebang MUST be line 1 — before the metadata block
55+
- Always include `requires-python`
56+
- List all external dependencies with version constraints
57+
- Never use `requirements.txt`, `pip install`, or expect global package installs
58+
- The shebang is a Unix convenience — cross-platform invocation relies on `uv run scripts/foo.py`, not `./scripts/foo.py`
59+
60+
## Invocation in SKILL.md
61+
62+
How a built skill's SKILL.md should reference its scripts:
63+
64+
- **Scripts with external dependencies:** `uv run scripts/analyze.py {args}`
65+
- **Stdlib-only scripts:** `python3 scripts/scan.py {args}` (also fine to use `uv run` for consistency)
66+
67+
`uv run` reads the PEP 723 metadata, silently caches dependencies in an isolated environment, and runs the script — no user prompt, no global install. Like `npx` for Python.
68+
69+
## Graceful Degradation
70+
71+
Skills may run in environments where Python or `uv` is unavailable (e.g., claude.ai web). Scripts should be the fast, reliable path — but the skill must still deliver its outcome when execution is not possible.
72+
73+
**Pattern:** When a script cannot execute, the LLM performs the equivalent work directly. The script's `--help` documents what it checks, making this fallback natural. Design scripts so their logic is understandable from their help output and the skill's context.
74+
75+
In SKILL.md, frame script steps as outcomes, not just commands:
76+
- Good: "Validate path conventions (run `scripts/scan-paths.py --help` for details)"
77+
- Avoid: "Execute `python3 scripts/scan-paths.py`" with no context about what it does
78+
79+
## Script Interface Standards
80+
81+
- Implement `--help` via `argparse` (single source of truth for the script's API)
82+
- Accept target path as a positional argument
83+
- `-o` flag for output file (default to stdout)
84+
- Diagnostics and progress to stderr
85+
- Exit codes: 0=pass, 1=fail, 2=error
86+
- `--verbose` flag for debugging
87+
- Output valid JSON to stdout
88+
- No interactive prompts, no network dependencies
89+
- Tests in `scripts/tests/`

skills/bmad-workflow-builder/build-process.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ Work through conversationally, adapted per skill type. Glean from what the user
6767
- **Role guidance:** Brief "Act as a [role/expert]" primer
6868
- **Design rationale:** Non-obvious choices the executing agent should understand
6969
- **External skills used:** Which skills does this invoke?
70-
- **Script Opportunity Discovery** — Walk through planned steps with the user. Identify deterministic operations that should be scripts not prompts. Load `./references/script-opportunities-reference.md` for guidance. Confirm the script-vs-prompt plan.
70+
- **Script Opportunity Discovery** — Walk through planned steps with the user. Identify deterministic operations that should be scripts not prompts. Load `./references/script-opportunities-reference.md` for guidance. Confirm the script-vs-prompt plan. If any scripts require external dependencies (anything beyond Python's standard library), explicitly list each dependency and get user approval before proceeding — dependencies add install-time cost and require `uv` to be available.
7171
- **Creates output documents?** If yes, will use `{document_output_language}`
7272

7373
**Simple Utility additional:**
@@ -130,6 +130,8 @@ Load the template from `./assets/SKILL-template.md` and `./references/template-s
130130
| **`./assets/`** | Templates, starter files | Copied/transformed into output |
131131
| **`./scripts/`** | Python, shell scripts with tests | Invoked for deterministic operations |
132132

133+
**If the built skill includes scripts**, also load `./references/script-standards.md` — ensures PEP 723 metadata, correct shebangs, and `uv run` invocation from the start.
134+
133135
**Lint gate** — after building, validate and auto-fix:
134136

135137
If subagents available, delegate lint-fix to a subagent. Otherwise run inline.

skills/bmad-workflow-builder/references/script-opportunities-reference.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
# Script Opportunities Reference — Workflow Builder
22

3+
**Reference: `references/script-standards.md` for script creation guidelines.**
4+
35
## Core Principle
46

57
Scripts handle deterministic operations (validate, transform, count). Prompts handle judgment (interpret, classify, decide). If a check has clear pass/fail criteria, it belongs in a script.
@@ -42,10 +44,11 @@ When you see these in a workflow's requirements, think scripts first: "validate"
4244

4345
### Your Toolbox
4446

45-
Scripts have access to the full execution environment:
46-
- **Bash:** `jq`, `grep`, `awk`, `sed`, `find`, `diff`, `wc`, piping and composition
47-
- **Python:** Full standard library plus PEP 723 inline-declared dependencies (`tiktoken`, `jsonschema`, `pyyaml`, etc.)
48-
- **System tools:** `git` for history/diff/blame, filesystem operations
47+
**Python is the default** for all script logic (cross-platform: macOS, Linux, Windows/WSL). See `references/script-standards.md` for full rationale and safe bash commands.
48+
49+
- **Python:** Full standard library (`json`, `pathlib`, `re`, `argparse`, `collections`, `difflib`, `ast`, `csv`, `xml`, etc.) plus PEP 723 inline-declared dependencies (`tiktoken`, `jsonschema`, `pyyaml`, etc.)
50+
- **Safe shell commands:** `git`, `gh`, `uv run`, `npm`/`npx`/`pnpm`, `mkdir -p`
51+
- **Avoid bash for logic** — no piping, `jq`, `grep`, `sed`, `awk`, `find`, `diff`, `wc` in scripts. Use Python equivalents instead.
4952

5053
### The --help Pattern
5154

0 commit comments

Comments
 (0)