Skip to content

Commit a4f9a2e

Browse files
committed
feat(skills): add /mandoc-fix orchestrator
Wraps the recurring scope → dispatch → validate → promote cycle exercised across c1dbbf9, 84ed928, 4aa2187, c3c42c7. Calls /eval-render and /eval-llm as substeps; uses the audit subcommand (e8cd231) for the load-bearing absolute check. The subagent brief template captures the recurring shape: HEAD + local-history context, repro CLI + real-page reference, acceptance tests, invariants the fix must preserve, and a do-not list (no push, no promote, no reintroducing reverted approaches).
1 parent 2fc626d commit a4f9a2e

2 files changed

Lines changed: 228 additions & 0 deletions

File tree

.claude/skills/mandoc-fix/SKILL.md

Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
---
2+
name: mandoc-fix
3+
description: Drive a mandoc rendering-bug fix end-to-end — scope the bug class, dispatch a subagent in the mandoc source tree, validate via /eval-render audit + compare, and decide whether to promote. Use when the user has identified (or suspects) a class of `-T markdown` rendering bug and wants the full diagnose → fix → validate → promote cycle.
4+
user_invocable: true
5+
---
6+
7+
# mandoc-fix
8+
9+
You orchestrate the recurring mandoc-fix cycle: scope a bug class, dispatch a fresh subagent in `~/dev/vibe/mandoc-1.14.6/` (or the configured tree), validate the result with the existing render eval (`audit` for absolute check, `compare` for regression net), and decide promote / iterate / defer.
10+
11+
This skill calls `/eval-render` and `/eval-llm` as substeps — do not re-implement what they do.
12+
13+
## Usage
14+
15+
```
16+
/mandoc-fix <bug-description> [--rule <audit-rule>] [--pages <file>] [--mandoc-worktree <path>]
17+
```
18+
19+
## Arguments
20+
21+
- **bug-description** (required): Free-text description of the bug class, written so a subagent in the mandoc tree can understand it without prior context. Include a roff repro if possible.
22+
- **rule** (optional): One of the audit rule ids (`quad_star_run`, `empty_emphasis_tag`, `roff_named_escape`, `roff_two_letter_escape`, `roff_font_escape`, `visible_zwnj_entity`, `visible_nbsp_entity`, `visible_double_amp`, `giant_markdown_line`, `synopsis_no_spaces_run`). When provided, the rule's count over the corpus is the load-bearing acceptance metric. When omitted, ask the user which rule (or rules) define success — guess only if the bug-description maps unambiguously to one rule.
23+
- **pages** (optional): Path to a file listing repo-relative manpage paths affected by the bug (one per line). When provided, the absolute baseline is rendered against this list specifically; otherwise the standard `tests/evals/render/corpus.txt` is used.
24+
- **mandoc-worktree** (optional): Path to the mandoc source tree. Defaults to `/home/idank/dev/vibe/mandoc-1.14.6/`.
25+
26+
## Step 1: Scope
27+
28+
Confirm what's being measured before you spend any agent time. Show the user:
29+
30+
- The audit rule that will gate promotion.
31+
- The page set the rule will be applied to (corpus vs `--pages` file, with row count).
32+
- The current mandoc HEAD commit and the binary md5 of `tools/mandoc-md` so they know what "baseline" means.
33+
34+
If the user gave a free-text bug-description without a rule, name one and ask them to confirm. If neither a rule nor `--pages` makes the success metric concrete, stop and ask. Don't dispatch the subagent without a measurable target.
35+
36+
## Step 2: Capture absolute baseline
37+
38+
Render the chosen page set with the current `tools/mandoc-md`, then audit:
39+
40+
```bash
41+
source .venv/bin/activate
42+
# Standard corpus
43+
python tests/evals/render/render_eval.py render --label baseline-<rule> --mandoc tools/mandoc-md
44+
# OR custom page list
45+
python tests/evals/render/render_eval.py render --label baseline-<rule> --mandoc tools/mandoc-md $(cat <pages-file>)
46+
python tests/evals/render/render_eval.py audit <run-dir> --rules <rule>
47+
```
48+
49+
Record the rule's `pages × occurrences` baseline number. This is what the candidate must beat.
50+
51+
## Step 3: Dispatch subagent
52+
53+
Use the `templates/subagent-brief.md` template. Fill in every placeholder. Spawn a fresh general-purpose agent (the mandoc tree is a separate working directory; the subagent will operate there).
54+
55+
Brief the agent **not to push or promote** — those are your job after validation.
56+
57+
Wait for the subagent to return before continuing. Run it foreground (default), not background — the validation steps depend on its output.
58+
59+
## Step 4: Validate (three layers)
60+
61+
After the subagent reports a commit + rebuilt binary:
62+
63+
a. **Absolute check (load-bearing).** Render the same page set with the candidate; audit; compare counts to the baseline. Target: rule's `pages × occurrences` strictly down. Acceptable residue is content-driven (e.g. literal `*` in source roff); the subagent's report should distinguish.
64+
65+
b. **Regression net.** Invoke `/eval-render <candidate-binary>` to run the standard compare. Read the verdict. Suspicious deltas unrelated to the targeted rule are regressions.
66+
67+
c. **Spot-check.** Render and visually diff 2–3 of the most-affected pages from the baseline, confirming the fix matches the subagent's repro and doesn't introduce new visual artifacts.
68+
69+
## Step 5: Apply the rubric
70+
71+
- **merge** ⇢ absolute count strictly down, `/eval-render` verdict is merge, spot-checks clean. Recommend: `cp <candidate> tools/mandoc-md`, commit referencing the upstream commit hash, suggest `/eval-llm` and re-extraction of the affected pages.
72+
- **regression** ⇢ any layer fails. Re-dispatch the subagent (Step 3) with a delta brief that names the specific regression and concrete acceptance test. Re-iterate until merge or defer.
73+
- **defer** ⇢ ambiguous cases (e.g. absolute count drops but `/eval-render` flags structural changes). Surface evidence to the user and ask.
74+
75+
## Step 6: Promote and propose downstream
76+
77+
On merge:
78+
79+
```bash
80+
cp <candidate-binary> tools/mandoc-md
81+
git add tools/mandoc-md
82+
git commit -m "feat(tools): promote mandoc-md with <one-line summary>
83+
84+
Picks up mandoc <upstream-commit> (\"<upstream-subject>\"). <impact line>.
85+
<rule> drops from <baseline> to <candidate> across the <page-set>."
86+
```
87+
88+
Then propose (don't run without explicit go-ahead):
89+
90+
1. **`/eval-llm`** as a sanity check that cleaner markdown doesn't perturb extraction quality.
91+
2. **Re-extract the affected pages** with `--reason` populated:
92+
```
93+
python -m explainshell.manager extract --mode llm:<model> --overwrite \
94+
-j 10 --reason "<one-line: what fixed, eval verdict>" \
95+
$(tr '\n' ' ' < <pages-file>)
96+
```
97+
3. **Upload the live DB** with `make upload-live-db` once the user confirms the re-extract looked clean.
98+
99+
## Reporting back
100+
101+
Final user-facing report (in chat, not a file):
102+
103+
- One-line verdict.
104+
- Baseline → candidate counts for the gating rule, with `pages × occurrences`.
105+
- `/eval-render` aggregate verdict (one line).
106+
- Spot-checked pages (one bullet each, before → after).
107+
- If **merge**: the promotion commands above, ready to run.
108+
- If **regression**: the redispatched subagent prompt fenced and ready.
109+
- If **defer**: the specific evidence and a concrete question.
110+
111+
## What NOT to do
112+
113+
- Don't dispatch a subagent without a measurable success metric (rule + page set).
114+
- Don't promote without all three validation layers passing.
115+
- Don't run `/eval-llm`, re-extract, or `make upload-live-db` without explicit user confirmation — those are downstream actions with cost or production impact.
116+
- Don't push or merge changes in the mandoc tree from this session — the subagent commits in its own tree; the user pushes those upstream when they choose.
Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
<!--
2+
Subagent brief template for /mandoc-fix.
3+
4+
The orchestrator fills the placeholders in <ANGLE_BRACKETS> and dispatches
5+
the rendered text as a fresh general-purpose agent. Every section pulls its
6+
weight; if a section is empty, drop it rather than emit a placeholder.
7+
8+
Required substitutions:
9+
- {{WORKTREE}} absolute path, e.g. /home/idank/dev/vibe/mandoc-1.14.6
10+
- {{HEAD_COMMIT}} output of `git log --oneline -1` in the worktree
11+
- {{LOCAL_HISTORY}} bullet list of recent local commits the subagent must
12+
not break (subject lines + one-sentence intent each)
13+
- {{BUG_NAME}} short label for the bug class
14+
- {{BUG_DESCRIPTION}} why the bug matters in plain English
15+
- {{REPRO_CLI}} printf | mandoc invocation that demonstrates the bug
16+
and the desired output
17+
- {{REPRO_PAGE}} at least one real manpage path (under
18+
../explainshell/manpages/) that contains the pattern
19+
- {{ACCEPTANCE_TESTS}} numbered list of what the candidate must produce
20+
- {{INVARIANTS}} list of behaviors that must NOT change (e.g. "italic
21+
still emits *...*", "&zwnj; insertion intact",
22+
"intraword italic still works")
23+
- {{AUDIT_RULE}} audit rule id whose count must drop (e.g. quad_star_run)
24+
- {{AUDIT_PAGE_SET}} "the standard corpus" or "the {{N}}-page list at
25+
<path>"
26+
- {{BASELINE_COUNT}} "<P> pages, <N> occurrences" measured before dispatch
27+
- {{REPORTING_FIELDS}} pass-through fields the orchestrator needs back
28+
(commit hash, smoke test outputs, audit count, regress tally, judgment
29+
calls)
30+
-->
31+
32+
You're working in {{WORKTREE}}, a vendored mandoc 1.14.6 source tree with a
33+
stack of local fixes. Build with `make`; output is `./mandoc`.
34+
35+
## Context
36+
37+
`HEAD` is at `{{HEAD_COMMIT}}`. Recent local commits you must preserve:
38+
39+
{{LOCAL_HISTORY}}
40+
41+
## The bug — {{BUG_NAME}}
42+
43+
{{BUG_DESCRIPTION}}
44+
45+
### Repro
46+
47+
CLI:
48+
49+
```
50+
{{REPRO_CLI}}
51+
```
52+
53+
Real-world example: `{{REPRO_PAGE}}` (under `../explainshell/manpages/`).
54+
55+
## What I want
56+
57+
{{ACCEPTANCE_TESTS}}
58+
59+
## Invariants — these MUST keep working
60+
61+
{{INVARIANTS}}
62+
63+
## Process
64+
65+
1. Read `git log -p <range>` for the recent local commits in this tree
66+
before patching anything in `mdoc_markdown.c` — they share data
67+
structures with what you're about to change.
68+
2. Implement the fix. Prefer extending existing machinery
69+
(`pending_close_marker`, `marker_stack`, `outer_marker`, font-mode
70+
helpers) over inventing new globals.
71+
3. Build (`make`).
72+
4. Run the CLI repro above and confirm the desired output.
73+
5. `make regress` 100% pass. Update fixtures only when the change
74+
legitimately changes their expected output. Add a new fixture under
75+
`regress/man/B/` (or `regress/mdoc/`) covering the new case, in the
76+
style of recent additions like `regress/man/B/emphasis_transitions`.
77+
6. Cross-check on the audit page set. From `/home/idank/dev/vibe/explainshell`:
78+
79+
```
80+
source .venv/bin/activate
81+
python tests/evals/render/render_eval.py render \
82+
--label candidate-{{BUG_NAME}} --mandoc {{WORKTREE}}/mandoc <CORPUS_OR_LIST>
83+
python tests/evals/render/render_eval.py audit <run-dir> --rules {{AUDIT_RULE}}
84+
```
85+
86+
Target: `{{AUDIT_RULE}}` count strictly below the baseline of
87+
`{{BASELINE_COUNT}}`. Acceptable residue is content-driven (literal
88+
`*` in source roff, etc.) — call those out so the orchestrator can
89+
verify rather than guessing.
90+
7. Commit with a Conventional-Commits-shaped subject:
91+
`Fix -T markdown: <one-line>`. Body explains the mechanism, references
92+
the canonical motivating page (`{{REPRO_PAGE}}`), and notes any
93+
accepted residue.
94+
95+
## What NOT to do
96+
97+
- Don't reintroduce `*` for italic if a prior local commit switched to
98+
`_` (or vice versa). Check `{{LOCAL_HISTORY}}`.
99+
- Don't remove `&zwnj;` insertion machinery; it's load-bearing for
100+
bold↔italic abutment.
101+
- Don't touch any file outside `{{WORKTREE}}`.
102+
- Don't promote the binary into `../explainshell/tools/`.
103+
- Don't push the commit.
104+
105+
## Reporting back
106+
107+
When done, give a short summary:
108+
109+
{{REPORTING_FIELDS}}
110+
111+
If anything blocks the fix (e.g. it would regress an invariant), stop
112+
and report — don't ship a partial fix.

0 commit comments

Comments
 (0)