|
| 1 | +--- |
| 2 | +name: mandoc-fix |
| 3 | +description: Drive a mandoc rendering-bug fix end-to-end — scope the bug class, dispatch a subagent in the mandoc source tree, validate via /eval-render audit + compare, and decide whether to promote. Use when the user has identified (or suspects) a class of `-T markdown` rendering bug and wants the full diagnose → fix → validate → promote cycle. |
| 4 | +user_invocable: true |
| 5 | +--- |
| 6 | + |
| 7 | +# mandoc-fix |
| 8 | + |
| 9 | +You orchestrate the recurring mandoc-fix cycle: scope a bug class, dispatch a fresh subagent in `~/dev/vibe/mandoc-1.14.6/` (or the configured tree), validate the result with the existing render eval (`audit` for absolute check, `compare` for regression net), and decide promote / iterate / defer. |
| 10 | + |
| 11 | +This skill calls `/eval-render` and `/eval-llm` as substeps — do not re-implement what they do. |
| 12 | + |
| 13 | +## Usage |
| 14 | + |
| 15 | +``` |
| 16 | +/mandoc-fix <bug-description> [--rule <audit-rule>] [--pages <file>] [--mandoc-worktree <path>] |
| 17 | +``` |
| 18 | + |
| 19 | +## Arguments |
| 20 | + |
| 21 | +- **bug-description** (required): Free-text description of the bug class, written so a subagent in the mandoc tree can understand it without prior context. Include a roff repro if possible. |
| 22 | +- **rule** (optional): One of the audit rule ids (`quad_star_run`, `empty_emphasis_tag`, `roff_named_escape`, `roff_two_letter_escape`, `roff_font_escape`, `visible_zwnj_entity`, `visible_nbsp_entity`, `visible_double_amp`, `giant_markdown_line`, `synopsis_no_spaces_run`). When provided, the rule's count over the corpus is the load-bearing acceptance metric. When omitted, ask the user which rule (or rules) define success — guess only if the bug-description maps unambiguously to one rule. |
| 23 | +- **pages** (optional): Path to a file listing repo-relative manpage paths affected by the bug (one per line). When provided, the absolute baseline is rendered against this list specifically; otherwise the standard `tests/evals/render/corpus.txt` is used. |
| 24 | +- **mandoc-worktree** (optional): Path to the mandoc source tree. Defaults to `/home/idank/dev/vibe/mandoc-1.14.6/`. |
| 25 | + |
| 26 | +## Step 1: Scope |
| 27 | + |
| 28 | +Confirm what's being measured before you spend any agent time. Show the user: |
| 29 | + |
| 30 | +- The audit rule that will gate promotion. |
| 31 | +- The page set the rule will be applied to (corpus vs `--pages` file, with row count). |
| 32 | +- The current mandoc HEAD commit and the binary md5 of `tools/mandoc-md` so they know what "baseline" means. |
| 33 | + |
| 34 | +If the user gave a free-text bug-description without a rule, name one and ask them to confirm. If neither a rule nor `--pages` makes the success metric concrete, stop and ask. Don't dispatch the subagent without a measurable target. |
| 35 | + |
| 36 | +## Step 2: Capture absolute baseline |
| 37 | + |
| 38 | +Render the chosen page set with the current `tools/mandoc-md`, then audit: |
| 39 | + |
| 40 | +```bash |
| 41 | +source .venv/bin/activate |
| 42 | +# Standard corpus |
| 43 | +python tests/evals/render/render_eval.py render --label baseline-<rule> --mandoc tools/mandoc-md |
| 44 | +# OR custom page list |
| 45 | +python tests/evals/render/render_eval.py render --label baseline-<rule> --mandoc tools/mandoc-md $(cat <pages-file>) |
| 46 | +python tests/evals/render/render_eval.py audit <run-dir> --rules <rule> |
| 47 | +``` |
| 48 | + |
| 49 | +Record the rule's `pages × occurrences` baseline number. This is what the candidate must beat. |
| 50 | + |
| 51 | +## Step 3: Dispatch subagent |
| 52 | + |
| 53 | +Use the `templates/subagent-brief.md` template. Fill in every placeholder. Spawn a fresh general-purpose agent (the mandoc tree is a separate working directory; the subagent will operate there). |
| 54 | + |
| 55 | +Brief the agent **not to push or promote** — those are your job after validation. |
| 56 | + |
| 57 | +Wait for the subagent to return before continuing. Run it foreground (default), not background — the validation steps depend on its output. |
| 58 | + |
| 59 | +## Step 4: Validate (three layers) |
| 60 | + |
| 61 | +After the subagent reports a commit + rebuilt binary: |
| 62 | + |
| 63 | +a. **Absolute check (load-bearing).** Render the same page set with the candidate; audit; compare counts to the baseline. Target: rule's `pages × occurrences` strictly down. Acceptable residue is content-driven (e.g. literal `*` in source roff); the subagent's report should distinguish. |
| 64 | + |
| 65 | +b. **Regression net.** Invoke `/eval-render <candidate-binary>` to run the standard compare. Read the verdict. Suspicious deltas unrelated to the targeted rule are regressions. |
| 66 | + |
| 67 | +c. **Spot-check.** Render and visually diff 2–3 of the most-affected pages from the baseline, confirming the fix matches the subagent's repro and doesn't introduce new visual artifacts. |
| 68 | + |
| 69 | +## Step 5: Apply the rubric |
| 70 | + |
| 71 | +- **merge** ⇢ absolute count strictly down, `/eval-render` verdict is merge, spot-checks clean. Recommend: `cp <candidate> tools/mandoc-md`, commit referencing the upstream commit hash, suggest `/eval-llm` and re-extraction of the affected pages. |
| 72 | +- **regression** ⇢ any layer fails. Re-dispatch the subagent (Step 3) with a delta brief that names the specific regression and concrete acceptance test. Re-iterate until merge or defer. |
| 73 | +- **defer** ⇢ ambiguous cases (e.g. absolute count drops but `/eval-render` flags structural changes). Surface evidence to the user and ask. |
| 74 | + |
| 75 | +## Step 6: Promote and propose downstream |
| 76 | + |
| 77 | +On merge: |
| 78 | + |
| 79 | +```bash |
| 80 | +cp <candidate-binary> tools/mandoc-md |
| 81 | +git add tools/mandoc-md |
| 82 | +git commit -m "feat(tools): promote mandoc-md with <one-line summary> |
| 83 | +
|
| 84 | +Picks up mandoc <upstream-commit> (\"<upstream-subject>\"). <impact line>. |
| 85 | +<rule> drops from <baseline> to <candidate> across the <page-set>." |
| 86 | +``` |
| 87 | + |
| 88 | +Then propose (don't run without explicit go-ahead): |
| 89 | + |
| 90 | +1. **`/eval-llm`** as a sanity check that cleaner markdown doesn't perturb extraction quality. |
| 91 | +2. **Re-extract the affected pages** with `--reason` populated: |
| 92 | + ``` |
| 93 | + python -m explainshell.manager extract --mode llm:<model> --overwrite \ |
| 94 | + -j 10 --reason "<one-line: what fixed, eval verdict>" \ |
| 95 | + $(tr '\n' ' ' < <pages-file>) |
| 96 | + ``` |
| 97 | +3. **Upload the live DB** with `make upload-live-db` once the user confirms the re-extract looked clean. |
| 98 | + |
| 99 | +## Reporting back |
| 100 | + |
| 101 | +Final user-facing report (in chat, not a file): |
| 102 | + |
| 103 | +- One-line verdict. |
| 104 | +- Baseline → candidate counts for the gating rule, with `pages × occurrences`. |
| 105 | +- `/eval-render` aggregate verdict (one line). |
| 106 | +- Spot-checked pages (one bullet each, before → after). |
| 107 | +- If **merge**: the promotion commands above, ready to run. |
| 108 | +- If **regression**: the redispatched subagent prompt fenced and ready. |
| 109 | +- If **defer**: the specific evidence and a concrete question. |
| 110 | + |
| 111 | +## What NOT to do |
| 112 | + |
| 113 | +- Don't dispatch a subagent without a measurable success metric (rule + page set). |
| 114 | +- Don't promote without all three validation layers passing. |
| 115 | +- Don't run `/eval-llm`, re-extract, or `make upload-live-db` without explicit user confirmation — those are downstream actions with cost or production impact. |
| 116 | +- Don't push or merge changes in the mandoc tree from this session — the subagent commits in its own tree; the user pushes those upstream when they choose. |
0 commit comments