Skip to content

Commit 1037469

Browse files
committed
docs(agents): adopt diagnose + write-a-skill skills (mattpocock)
Two more Tier 3 maintainer-only skills sourced from mattpocock/skills: diagnose — disciplined 6-phase loop for hard bugs and perf regressions (reproduce → minimise → hypothesise → instrument → fix → cleanup). Core thesis: "build the right feedback loop, and the bug is 90% fixed." Translation: explore-the-codebase step now points at codemap (per the codemap rule's STOP-before-grep) and docs/glossary.md (per Rule 9 canonical terms); ADR mention dropped (no ADR infra in this repo); Phase-5 seam discipline cross-references improve-codebase-architecture (adopted in 906ecba); Phase-6 cleanup includes a one-line lessons.md append per the lessons-rule discipline. scripts/hitl-loop.template.sh ships verbatim — no codemap-specific assumptions. write-a-skill — meta-skill for creating new skills. Translation: front section explicitly cites our agents-first-convention (file layout) and agents-tier-system (tier choice + durability), plus the maintainer-only-vs-shipped distinction (precedent: PR #25). Examples cite codemap precedents (improve-codebase-architecture for the companion-files split; pr-comment-fact-check for single-file). Review checklist adapted: tier choice + rule-pairing decision + tier-list update added. Both adopted as maintainer-only (.agents/skills/ + .cursor/skills/ symlinks per agents-first-convention). Not added to templates/agents/ — consumer surface stays codemap-skill-only. agents-tier-system Tier 3 list updated: diagnose, write-a-skill added (alongside grill-me + improve-codebase-architecture from the prior commit). Skipped grill-with-docs (requires standing up CONTEXT.md / docs/adr/ infra; conflicts with codemap's lift-to-architecture-and- delete-the-plan lifecycle).
1 parent 906ecba commit 1037469

6 files changed

Lines changed: 336 additions & 1 deletion

File tree

.agents/rules/agents-tier-system.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ Today's Tier-2 rules:
5050

5151
Pure intent-triggered. The skill description is detailed enough that Cursor surfaces it on relevant phrases. No always-on cost.
5252

53-
Skills stay rule-less when the work is **explicitly invoked** by the user, not pattern-triggered. Today: `audit-pr-architecture`, `docs-governance`, `docs-lifecycle-sweep`, `grill-me`, `improve-codebase-architecture`. (Skills like `gritql-codemods` and `ubiquitous-language` would also fit this tier if adopted.)
53+
Skills stay rule-less when the work is **explicitly invoked** by the user, not pattern-triggered. Today: `audit-pr-architecture`, `diagnose`, `docs-governance`, `docs-lifecycle-sweep`, `grill-me`, `improve-codebase-architecture`, `write-a-skill`. (Skills like `gritql-codemods` and `ubiquitous-language` would also fit this tier if adopted.)
5454

5555
## Authoring guidelines
5656

.agents/skills/diagnose/SKILL.md

Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
---
2+
name: diagnose
3+
description: Disciplined diagnosis loop for hard bugs and performance regressions. Reproduce → minimise → hypothesise → instrument → fix → regression-test. Use when user says "diagnose this" / "debug this", reports a bug, says something is broken/throwing/failing, or describes a performance regression.
4+
---
5+
6+
# Diagnose
7+
8+
A discipline for hard bugs. Skip phases only when explicitly justified.
9+
10+
When exploring the codebase, query [`codemap`](../codemap/SKILL.md) (the structural SQLite index) before reaching for `Grep` or `Read` per the [`codemap` rule](../../rules/codemap.md) — symbol-shaped questions ("where is X defined?", "what calls X?") have direct answers in the `symbols` / `calls` tables. Read the relevant section of [`docs/architecture.md`](../../../docs/architecture.md) to ground the mental model of layering, and check [`docs/glossary.md`](../../../docs/glossary.md) for canonical domain terms (file types, recipe ids, schema columns).
11+
12+
## Phase 1 — Build a feedback loop
13+
14+
**This is the skill.** Everything else is mechanical. If you have a fast, deterministic, agent-runnable pass/fail signal for the bug, you will find the cause — bisection, hypothesis-testing, and instrumentation all just consume that signal. If you don't have one, no amount of staring at code will save you.
15+
16+
Spend disproportionate effort here. **Be aggressive. Be creative. Refuse to give up.**
17+
18+
### Ways to construct one — try them in roughly this order
19+
20+
1. **Failing test** at whatever seam reaches the bug — unit, integration, e2e. Codemap convention: `src/**/<name>.test.ts` for unit + integration; `fixtures/golden/` for query-shape regressions; `bun test <file>` runs them.
21+
2. **CLI invocation** with a fixture input, diffing stdout against a known-good snapshot. Examples: `bun src/index.ts query --json …` against `fixtures/minimal/`, golden runner under `scripts/query-golden.ts`.
22+
3. **Replay a captured trace.** Save a real `.codemap.db` / config / fixture file to disk; replay it through the code path in isolation.
23+
4. **Throwaway harness.** Spin up a minimal subset (one parser, one DB connection) that exercises the bug code path with a single function call.
24+
5. **Property / fuzz loop.** If the bug is "sometimes wrong output", run 1000 random inputs and look for the failure mode.
25+
6. **Bisection harness.** If the bug appeared between two known states (commit, dataset, version), automate "boot at state X, check, repeat" so you can `git bisect run` it.
26+
7. **Differential loop.** Run the same input through old-version vs new-version (or two configs) and diff outputs. The B.6 baseline machinery (`codemap query --save-baseline` / `--baseline`) is built for exactly this — use it.
27+
8. **HITL bash script.** Last resort. If a human must click or copy a value out of the IDE, drive _them_ with [`scripts/hitl-loop.template.sh`](scripts/hitl-loop.template.sh) so the loop is still structured. Captured output feeds back to you.
28+
29+
Build the right feedback loop, and the bug is 90% fixed.
30+
31+
### Iterate on the loop itself
32+
33+
Treat the loop as a product. Once you have _a_ loop, ask:
34+
35+
- Can I make it faster? (Cache setup, skip unrelated init, narrow the test scope.)
36+
- Can I make the signal sharper? (Assert on the specific symptom, not "didn't crash".)
37+
- Can I make it more deterministic? (Pin time, seed RNG, isolate filesystem, freeze network.)
38+
39+
A 30-second flaky loop is barely better than no loop. A 2-second deterministic loop is a debugging superpower.
40+
41+
### Non-deterministic bugs
42+
43+
The goal is not a clean repro but a **higher reproduction rate**. Loop the trigger 100×, parallelise, add stress, narrow timing windows, inject sleeps. A 50%-flake bug is debuggable; 1% is not — keep raising the rate until it's debuggable.
44+
45+
### When you genuinely cannot build a loop
46+
47+
Stop and say so explicitly. List what you tried. Ask the user for: (a) access to whatever environment reproduces it, (b) a captured artifact (HAR file, log dump, core dump, screen recording with timestamps, broken `.codemap.db`), or (c) permission to add temporary instrumentation. Do **not** proceed to hypothesise without a loop.
48+
49+
Do not proceed to Phase 2 until you have a loop you believe in.
50+
51+
## Phase 2 — Reproduce
52+
53+
Run the loop. Watch the bug appear.
54+
55+
Confirm:
56+
57+
- [ ] The loop produces the failure mode the **user** described — not a different failure that happens to be nearby. Wrong bug = wrong fix.
58+
- [ ] The failure is reproducible across multiple runs (or, for non-deterministic bugs, reproducible at a high enough rate to debug against).
59+
- [ ] You have captured the exact symptom (error message, wrong output, slow timing) so later phases can verify the fix actually addresses it.
60+
61+
Do not proceed until you reproduce the bug.
62+
63+
## Phase 3 — Hypothesise
64+
65+
Generate **3–5 ranked hypotheses** before testing any of them. Single-hypothesis generation anchors on the first plausible idea.
66+
67+
Each hypothesis must be **falsifiable**: state the prediction it makes.
68+
69+
> Format: "If `<X>` is the cause, then `<Y>` will make the bug disappear / `<Z>` will make it worse."
70+
71+
If you cannot state the prediction, the hypothesis is a vibe — discard or sharpen it.
72+
73+
**Show the ranked list to the user before testing.** They often have domain knowledge that re-ranks instantly ("we just changed #3"), or know hypotheses they've already ruled out. Cheap checkpoint, big time saver. Don't block on it — proceed with your ranking if the user is AFK.
74+
75+
## Phase 4 — Instrument
76+
77+
Each probe must map to a specific prediction from Phase 3. **Change one variable at a time.**
78+
79+
Tool preference:
80+
81+
1. **Debugger / REPL inspection** if the env supports it. One breakpoint beats ten logs.
82+
2. **Targeted logs** at the boundaries that distinguish hypotheses.
83+
3. Never "log everything and grep".
84+
85+
**Tag every debug log** with a unique prefix, e.g. `[DEBUG-a4f2]`. Cleanup at the end becomes a single grep. Untagged logs survive; tagged logs die.
86+
87+
**Perf branch.** For performance regressions, logs are usually wrong. Instead: establish a baseline measurement (timing harness, `performance.now()`, profiler, query plan, `--performance` flag for index runs), then bisect. Measure first, fix second.
88+
89+
## Phase 5 — Fix + regression test
90+
91+
Write the regression test **before the fix** — but only if there is a **correct seam** for it (per the [`improve-codebase-architecture`](../improve-codebase-architecture/SKILL.md) vocabulary).
92+
93+
A correct seam is one where the test exercises the **real bug pattern** as it occurs at the call site. If the only available seam is too shallow (single-caller test when the bug needs multiple callers, unit test that can't replicate the chain that triggered the bug), a regression test there gives false confidence.
94+
95+
**If no correct seam exists, that itself is the finding.** Note it. The codebase architecture is preventing the bug from being locked down. Flag this for the next phase.
96+
97+
If a correct seam exists:
98+
99+
1. Turn the minimised repro into a failing test at that seam.
100+
2. Watch it fail.
101+
3. Apply the fix.
102+
4. Watch it pass.
103+
5. Re-run the Phase 1 feedback loop against the original (un-minimised) scenario.
104+
105+
## Phase 6 — Cleanup + post-mortem
106+
107+
Required before declaring done:
108+
109+
- [ ] Original repro no longer reproduces (re-run the Phase 1 loop)
110+
- [ ] Regression test passes (or absence of seam is documented)
111+
- [ ] All `[DEBUG-…]` instrumentation removed (`grep` the prefix)
112+
- [ ] Throwaway prototypes deleted (or moved to a clearly-marked debug location)
113+
- [ ] The hypothesis that turned out correct is stated in the commit / PR message — so the next debugger learns
114+
- [ ] If the post-mortem yields a permanent insight, append a one-line entry to [`.agents/lessons.md`](../../lessons.md) per the lessons-rule discipline
115+
116+
**Then ask: what would have prevented this bug?** If the answer involves architectural change (no good test seam, tangled callers, hidden coupling) hand off to [`improve-codebase-architecture`](../improve-codebase-architecture/SKILL.md) with the specifics. Make the recommendation **after** the fix is in, not before — you have more information now than when you started.
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
#!/usr/bin/env bash
2+
# Human-in-the-loop reproduction loop.
3+
# Copy this file, edit the steps below, and run it.
4+
# The agent runs the script; the user follows prompts in their terminal.
5+
#
6+
# Usage:
7+
# bash hitl-loop.template.sh
8+
#
9+
# Two helpers:
10+
# step "<instruction>" → show instruction, wait for Enter
11+
# capture VAR "<question>" → show question, read response into VAR
12+
#
13+
# At the end, captured values are printed as KEY=VALUE for the agent to parse.
14+
15+
set -euo pipefail
16+
17+
step() {
18+
printf '\n>>> %s\n' "$1"
19+
read -r -p " [Enter when done] " _
20+
}
21+
22+
capture() {
23+
local var="$1" question="$2" answer
24+
printf '\n>>> %s\n' "$question"
25+
read -r -p " > " answer
26+
printf -v "$var" '%s' "$answer"
27+
}
28+
29+
# --- edit below ---------------------------------------------------------
30+
31+
step "Open the app at http://localhost:3000 and sign in."
32+
33+
capture ERRORED "Click the 'Export' button. Did it throw an error? (y/n)"
34+
35+
capture ERROR_MSG "Paste the error message (or 'none'):"
36+
37+
# --- edit above ---------------------------------------------------------
38+
39+
printf '\n--- Captured ---\n'
40+
printf 'ERRORED=%s\n' "$ERRORED"
41+
printf 'ERROR_MSG=%s\n' "$ERROR_MSG"
Lines changed: 176 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,176 @@
1+
---
2+
name: write-a-skill
3+
description: Create new agent skills with proper structure, progressive disclosure, and bundled resources. Use when user wants to create, write, or build a new skill (or asks "how do I write a skill?", "draft a SKILL.md for X").
4+
---
5+
6+
# Writing Skills
7+
8+
Discipline for authoring `.agents/skills/<name>/SKILL.md` files in this repo.
9+
10+
## Repo conventions you must respect
11+
12+
Before drafting any skill in codemap, internalise these (they trump anything in this skill):
13+
14+
- **File layout**[`agents-first-convention`](../../rules/agents-first-convention.md): the source-of-truth file is `.agents/skills/<name>/SKILL.md`; the `.cursor/skills/<name>` entry is a **symlink** back. Never put original content under `.cursor/`.
15+
- **Tier choice**[`agents-tier-system`](../../rules/agents-tier-system.md): every new skill is Tier 1 (always-on, paired with a rule), Tier 2 (auto-attached to a glob, paired with a rule), or Tier 3 (discoverable, no rule). **Skills with `NEVER` / `ALWAYS` clauses deserve a rule pairing.** Pure intent-trigger skills (no hard "must" clauses) stay Tier 3.
16+
- **Maintainer-only vs shipped**`.agents/skills/` is the dev-side mirror; `templates/agents/skills/` is what `codemap agents init` ships to npm consumers. The bundled template surface today is **only** the `codemap` skill — every other skill in `.agents/skills/` is maintainer-only (precedent: PR #25). Don't add a skill to `templates/agents/` unless it's something every consumer of the published package would want.
17+
18+
## Process
19+
20+
### 1. Gather requirements
21+
22+
Ask the user:
23+
24+
- What task / domain does the skill cover?
25+
- What specific use cases should it handle?
26+
- Does it need executable scripts (under `scripts/`) or just instructions?
27+
- Any reference materials to include?
28+
- **Tier choice**: does the skill have always-on principles (any `NEVER` / `ALWAYS` clauses)? If yes, it deserves a Tier-1 or Tier-2 rule pairing per [`agents-tier-system`](../../rules/agents-tier-system.md).
29+
30+
### 2. Draft the skill
31+
32+
Create:
33+
34+
- `SKILL.md` with concise instructions (under 100 lines if possible — see "When to split" below)
35+
- Companion files (`LANGUAGE.md`, `REFERENCE.md`, `EXAMPLES.md`, etc.) when content exceeds 100 lines or has distinct domains
36+
- `scripts/<name>.{sh,ts}` when a deterministic operation is invoked repeatedly (saves tokens vs generated code)
37+
38+
Use [`grill-me`](../grill-me/SKILL.md) on yourself to surface decisions before you write — what's the trigger phrase shape? What's the boundary with adjacent skills? What's the durability test (does this skill still read correctly six months from now)?
39+
40+
### 3. Wire the file layout
41+
42+
```bash
43+
# Source of truth
44+
.agents/skills/<name>/SKILL.md
45+
46+
# Cursor symlink (per agents-first-convention)
47+
ln -s ../../.agents/skills/<name> .cursor/skills/<name>
48+
```
49+
50+
### 4. Update the tier list
51+
52+
Add the skill to the relevant list in [`agents-tier-system.md`](../../rules/agents-tier-system.md) so the inventory stays accurate.
53+
54+
### 5. Review
55+
56+
Ask the user:
57+
58+
- Does this cover your use cases?
59+
- Anything missing or unclear?
60+
- Should any section be more / less detailed?
61+
62+
Run the [Review checklist](#review-checklist) before declaring done.
63+
64+
## Skill structure
65+
66+
```text
67+
.agents/skills/<name>/
68+
├── SKILL.md # Main instructions (required)
69+
├── LANGUAGE.md # Vocabulary the skill enforces (if any)
70+
├── REFERENCE.md # Detailed docs (if SKILL.md exceeds ~100 lines)
71+
├── EXAMPLES.md # Usage examples (if needed)
72+
└── scripts/ # Utility scripts (if needed)
73+
└── helper.sh
74+
```
75+
76+
## SKILL.md template
77+
78+
```md
79+
---
80+
name: skill-name
81+
description: Brief description of capability. Use when [specific triggers — verbs and nouns the user is likely to say, plus contexts where the skill applies].
82+
---
83+
84+
# Skill Name
85+
86+
## Quick start
87+
88+
[Minimal working example — what the user does on first invocation]
89+
90+
## Workflows
91+
92+
[Step-by-step processes with checklists for complex tasks]
93+
94+
## Advanced features
95+
96+
[Link to companion files: See [REFERENCE.md](REFERENCE.md) / [LANGUAGE.md](LANGUAGE.md)]
97+
```
98+
99+
## Description requirements
100+
101+
The description is **the only thing the agent sees** when deciding which skill to load. It's surfaced in the discoverable-skills list alongside every other installed skill. Get this right or your skill never fires.
102+
103+
**Goal**: Give the agent just enough info to know:
104+
105+
1. What capability this skill provides
106+
2. When / why to trigger it (specific keywords, contexts, file types)
107+
108+
**Format**:
109+
110+
- Max ~1024 chars
111+
- Write in third person
112+
- First sentence: what it does
113+
- Second sentence: "Use when [specific triggers]"
114+
- Include the verbs and nouns the user is likely to say (per [`agents-tier-system` § Tier 3 description](../../rules/agents-tier-system.md))
115+
116+
**Good example**:
117+
118+
```text
119+
Triage and fact-check PR review comments against the actual codebase, project rules, and skills. Use when the user asks to address PR comments, respond to reviewer feedback, check if a comment is correct, fact-check a reviewer's claim, decide which comments to push back on, or sort hallucinated suggestions from real ones. Triggers on phrases like "check PR comments", "are these comments right".
120+
```
121+
122+
**Bad example**:
123+
124+
```text
125+
Helps with PRs.
126+
```
127+
128+
The bad example gives the agent no way to distinguish this from any other PR-adjacent skill.
129+
130+
## When to add scripts
131+
132+
Add utility scripts under `scripts/` when:
133+
134+
- Operation is deterministic (validation, formatting, bisection harness)
135+
- Same code would be generated repeatedly across invocations
136+
- Errors need explicit handling that's tedious to re-derive
137+
138+
Scripts save tokens and improve reliability vs generated code.
139+
140+
## When to split files
141+
142+
Split into companion files when:
143+
144+
- `SKILL.md` exceeds ~100 lines
145+
- Content has distinct domains (vocabulary vs process vs templates)
146+
- Advanced features are rarely needed and would balloon the main file
147+
148+
Cite codemap precedents:
149+
150+
- [`improve-codebase-architecture`](../improve-codebase-architecture/SKILL.md) splits into `LANGUAGE.md` (vocab), `DEEPENING.md` (sub-rules), `INTERFACE-DESIGN.md` (parallel-sub-agent pattern).
151+
- [`pr-comment-fact-check`](../pr-comment-fact-check/SKILL.md) stays single-file because every section is in-flow process.
152+
153+
## Durability discipline
154+
155+
Per [`agents-tier-system` § Authoring discipline: durability](../../rules/agents-tier-system.md):
156+
157+
- **Don't cite specific audit / plan / research filenames as canonical examples.** Plans are mortal under [`docs-lifecycle-sweep`](../docs-lifecycle-sweep/SKILL.md). Use shape placeholders (`<topic>.md`) instead.
158+
- **Don't cite specific commit hashes or PR numbers as the only path to context.** Summarise inline.
159+
- **Don't cite source-code line numbers.** Reference symbols by name.
160+
161+
If the skill still reads correctly six months from now after every doc you didn't write got rewritten, it's durable.
162+
163+
## Review checklist
164+
165+
After drafting, verify:
166+
167+
- [ ] Description includes triggers ("Use when…")
168+
- [ ] `SKILL.md` under 100 lines OR has split companion files
169+
- [ ] No time-sensitive info (no "as of 2026-04…")
170+
- [ ] Consistent terminology — drift kills clarity
171+
- [ ] Concrete examples included
172+
- [ ] Cross-references one level deep (don't chain `SKILL.md → REFERENCE.md → DEEP-DIVE.md → REFERENCE2.md`)
173+
- [ ] File layout follows [`agents-first-convention`](../../rules/agents-first-convention.md) (`.agents/` source + `.cursor/` symlink)
174+
- [ ] Tier choice documented per [`agents-tier-system`](../../rules/agents-tier-system.md); rule pairing added if the skill has `NEVER` / `ALWAYS` clauses
175+
- [ ] Skill listed in the appropriate tier section of `agents-tier-system.md`
176+
- [ ] Decision recorded in the PR description: maintainer-only (`.agents/` only) vs shipped (`templates/agents/` too)

.cursor/skills/diagnose

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../../.agents/skills/diagnose

.cursor/skills/write-a-skill

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../../.agents/skills/write-a-skill

0 commit comments

Comments
 (0)