|
| 1 | +--- |
| 2 | +title: "Scripts in Skills" |
| 3 | +description: Why deterministic scripts make skills faster, cheaper, and more reliable — and the technical choices behind portable script design |
| 4 | +--- |
| 5 | + |
| 6 | +Scripts are the reliability backbone of a well-built skill. They handle work that has clear right-and-wrong answers — validation, transformation, extraction, counting — so the LLM can focus on what it does best: judgment, synthesis, and creative reasoning. |
| 7 | + |
| 8 | +## The Problem: LLMs Do Too Much |
| 9 | + |
| 10 | +Without scripts, every operation in a skill runs through the LLM. That means: |
| 11 | + |
| 12 | +- **Non-deterministic results.** Ask an LLM to count tokens in a file three times and you may get three different numbers. Ask a script and you get the same answer every time. |
| 13 | +- **Wasted tokens and time.** Parsing a JSON file, checking if a directory exists, or comparing two strings are mechanical operations. Running them through the LLM burns context window and adds latency for no gain. |
| 14 | +- **Harder to test.** You can write unit tests for a script. You cannot write unit tests for an LLM prompt. |
| 15 | + |
| 16 | +The pattern shows up everywhere: skills that try to LLM their way through structural validation are slower, less reliable, and more expensive than skills that offload those checks to scripts. |
| 17 | + |
| 18 | +## The Determinism Boundary |
| 19 | + |
| 20 | +The core design principle is **intelligence placement** — put each operation where it belongs. |
| 21 | + |
| 22 | +| Scripts Handle | LLM Handles | |
| 23 | +| -------------- | ----------- | |
| 24 | +| Validate structure, format, schema | Interpret meaning, evaluate quality | |
| 25 | +| Count, parse, extract, transform | Classify ambiguous input, make judgment calls | |
| 26 | +| Compare, diff, check consistency | Synthesize insights, generate creative output | |
| 27 | +| Pre-process data into compact form | Analyze pre-processed data with domain reasoning | |
| 28 | + |
| 29 | +**The test:** Given identical input, will this operation always produce identical output? If yes, it belongs in a script. Could you write a unit test with expected output? Definitely a script. Requires interpreting meaning, tone, or context? Keep it as an LLM prompt. |
| 30 | + |
| 31 | +:::tip[The Pre-Processing Pattern] |
| 32 | +One of the highest-value script uses is pre-processing. A script extracts compact metrics from large files into a small JSON summary. The LLM then reasons over the summary instead of reading raw files — dramatically reducing token usage while improving analysis quality because the data is clean and structured. |
| 33 | +::: |
| 34 | + |
| 35 | +## Why Python, Not Bash |
| 36 | + |
| 37 | +Skills must work across macOS, Linux, and Windows. Bash is not portable. |
| 38 | + |
| 39 | +| Factor | Bash | Python | |
| 40 | +| ------ | ---- | ------ | |
| 41 | +| **macOS / Linux** | Works | Works | |
| 42 | +| **Windows (native)** | Fails or behaves inconsistently | Works identically | |
| 43 | +| **Windows (WSL)** | Works, but can conflict with Git Bash on PATH | Works identically | |
| 44 | +| **Error handling** | Limited, fragile | Rich exception handling | |
| 45 | +| **Testing** | Difficult | Standard unittest/pytest | |
| 46 | +| **Complex logic** | Quickly becomes unreadable | Clean, maintainable | |
| 47 | + |
| 48 | +Even basic commands like `sed -i` behave differently on macOS vs Linux. Piping, `jq`, `grep`, `awk` — all of these have cross-platform pitfalls that Python's standard library avoids entirely. |
| 49 | + |
| 50 | +**Safe bash commands** that work everywhere and remain fine to use directly: |
| 51 | + |
| 52 | +| Command | Purpose | |
| 53 | +| ------- | ------- | |
| 54 | +| `git`, `gh` | Version control and GitHub CLI | |
| 55 | +| `uv run` | Python script execution | |
| 56 | +| `npm`, `npx`, `pnpm` | Node.js ecosystem | |
| 57 | +| `mkdir -p` | Directory creation | |
| 58 | + |
| 59 | +Everything beyond that list should be a Python script. |
| 60 | + |
| 61 | +## Standard Library First |
| 62 | + |
| 63 | +Python's standard library covers most script needs without any external dependencies. Stdlib-only scripts run with plain `python3`, need no special tooling, and have zero supply-chain risk. |
| 64 | + |
| 65 | +| Need | Standard Library | |
| 66 | +| ---- | ---------------- | |
| 67 | +| JSON parsing | `json` | |
| 68 | +| Path handling | `pathlib` | |
| 69 | +| Pattern matching | `re` | |
| 70 | +| CLI interface | `argparse` | |
| 71 | +| Text comparison | `difflib` | |
| 72 | +| Counting, grouping | `collections` | |
| 73 | +| Source analysis | `ast` | |
| 74 | +| Data formats | `csv`, `xml.etree` | |
| 75 | + |
| 76 | +Only reach for external dependencies when the stdlib genuinely cannot do the job — `tiktoken` for accurate token counting, `pyyaml` for YAML parsing, `jsonschema` for schema validation. Each external dependency adds install-time cost, requires `uv` to be available, and expands the supply-chain surface. The BMad builders require explicit user approval for any external dependency during the build process. |
| 77 | + |
| 78 | +## Zero-Friction Dependencies with PEP 723 |
| 79 | + |
| 80 | +Python scripts in skills use [PEP 723](https://peps.python.org/pep-0723/) inline metadata to declare their dependencies directly in the file. Combined with `uv run`, this gives you `npx`-like behavior — dependencies are silently cached in an isolated environment, no global installs, no user prompts. |
| 81 | + |
| 82 | +```python |
| 83 | +#!/usr/bin/env -S uv run --script |
| 84 | +# /// script |
| 85 | +# requires-python = ">=3.10" |
| 86 | +# dependencies = ["pyyaml>=6.0"] |
| 87 | +# /// |
| 88 | + |
| 89 | +import yaml |
| 90 | +# script logic here |
| 91 | +``` |
| 92 | + |
| 93 | +When a skill invokes this script with `uv run scripts/analyze.py`, the dependency (`pyyaml` in this example) is automatically resolved. The user never sees an install prompt, never needs to manage a virtual environment, and never pollutes their global Python installation. |
| 94 | + |
| 95 | +**Why this matters for skill authoring:** Without PEP 723, skills that needed libraries like `pyyaml` or `tiktoken` would force users to run `pip install` — a jarring, trust-breaking experience that makes users hesitate to adopt the skill. |
| 96 | + |
| 97 | +## Graceful Degradation |
| 98 | + |
| 99 | +Skills run in multiple environments: CLI terminals, desktop apps, IDE extensions, and web interfaces like claude.ai. Not all environments can execute Python scripts. |
| 100 | + |
| 101 | +The principle: **scripts are the fast, reliable path — but the skill must still deliver its outcome when execution is unavailable.** |
| 102 | + |
| 103 | +When a script cannot run, the LLM performs the equivalent work directly. This is slower and less deterministic, but the user still gets a result. The script's `--help` output documents what it checks, making the fallback natural — the LLM reads the help to understand the script's purpose and replicates the logic. |
| 104 | + |
| 105 | +Frame script steps as outcomes in the SKILL.md, not just commands: |
| 106 | + |
| 107 | +| Approach | Example | |
| 108 | +| -------- | ------- | |
| 109 | +| **Good** | "Validate path conventions (run `scripts/scan-paths.py --help` for details)" | |
| 110 | +| **Fragile** | "Execute `python3 scripts/scan-paths.py`" with no context | |
| 111 | + |
| 112 | +The good version tells the LLM both what to accomplish and where to find the details — enabling graceful degradation without additional instructions. |
| 113 | + |
| 114 | +## When to Reach for a Script |
| 115 | + |
| 116 | +Look for these signal verbs in a skill's requirements — they indicate script opportunities: |
| 117 | + |
| 118 | +| Signal | Script Type | |
| 119 | +| ------ | ----------- | |
| 120 | +| "validate", "check", "verify" | Validation | |
| 121 | +| "count", "tally", "aggregate" | Metrics | |
| 122 | +| "extract", "parse", "pull from" | Data extraction | |
| 123 | +| "convert", "transform", "format" | Transformation | |
| 124 | +| "compare", "diff", "match against" | Comparison | |
| 125 | +| "scan for", "find all", "list all" | Pattern scanning | |
| 126 | + |
| 127 | +The builders guide you through script opportunity discovery during the build process. The key insight: if you find yourself writing detailed validation logic in a prompt, it almost certainly belongs in a script instead. |
0 commit comments