feat(skills): add /mandoc-fix orchestrator

idank · idank · commit a4f9a2e73e95 · 2026-05-05T12:03:03.000+03:00
Wraps the recurring scope → dispatch → validate → promote cycle exercised across c1dbbf9, 84ed928, 4aa2187, c3c42c7. Calls /eval-render and /eval-llm as substeps; uses the audit subcommand (e8cd231) for the load-bearing absolute check. The subagent brief template captures the recurring shape: HEAD + local-history context, repro CLI + real-page reference, acceptance tests, invariants the fix must preserve, and a do-not list (no push, no promote, no reintroducing reverted approaches).
diff --git a/.claude/skills/mandoc-fix/SKILL.md b/.claude/skills/mandoc-fix/SKILL.md
@@ -0,0 +1,116 @@
+---
+name: mandoc-fix
+description: Drive a mandoc rendering-bug fix end-to-end — scope the bug class, dispatch a subagent in the mandoc source tree, validate via /eval-render audit + compare, and decide whether to promote. Use when the user has identified (or suspects) a class of `-T markdown` rendering bug and wants the full diagnose → fix → validate → promote cycle.
+user_invocable: true
+---
+
+# mandoc-fix
+
+You orchestrate the recurring mandoc-fix cycle: scope a bug class, dispatch a fresh subagent in `~/dev/vibe/mandoc-1.14.6/` (or the configured tree), validate the result with the existing render eval (`audit` for absolute check, `compare` for regression net), and decide promote / iterate / defer.
+
+This skill calls `/eval-render` and `/eval-llm` as substeps — do not re-implement what they do.
+
+## Usage
+
+```
+/mandoc-fix <bug-description> [--rule <audit-rule>] [--pages <file>] [--mandoc-worktree <path>]
+```
+
+## Arguments
+
+- **bug-description** (required): Free-text description of the bug class, written so a subagent in the mandoc tree can understand it without prior context. Include a roff repro if possible.
+- **rule** (optional): One of the audit rule ids (`quad_star_run`, `empty_emphasis_tag`, `roff_named_escape`, `roff_two_letter_escape`, `roff_font_escape`, `visible_zwnj_entity`, `visible_nbsp_entity`, `visible_double_amp`, `giant_markdown_line`, `synopsis_no_spaces_run`). When provided, the rule's count over the corpus is the load-bearing acceptance metric. When omitted, ask the user which rule (or rules) define success — guess only if the bug-description maps unambiguously to one rule.
+- **pages** (optional): Path to a file listing repo-relative manpage paths affected by the bug (one per line). When provided, the absolute baseline is rendered against this list specifically; otherwise the standard `tests/evals/render/corpus.txt` is used.
+- **mandoc-worktree** (optional): Path to the mandoc source tree. Defaults to `/home/idank/dev/vibe/mandoc-1.14.6/`.
+
+## Step 1: Scope
+
+Confirm what's being measured before you spend any agent time. Show the user:
+
+- The audit rule that will gate promotion.
+- The page set the rule will be applied to (corpus vs `--pages` file, with row count).
+- The current mandoc HEAD commit and the binary md5 of `tools/mandoc-md` so they know what "baseline" means.
+
+If the user gave a free-text bug-description without a rule, name one and ask them to confirm. If neither a rule nor `--pages` makes the success metric concrete, stop and ask. Don't dispatch the subagent without a measurable target.
+
+## Step 2: Capture absolute baseline
+
+Render the chosen page set with the current `tools/mandoc-md`, then audit:
+
+```bash
+source .venv/bin/activate
+# Standard corpus
+python tests/evals/render/render_eval.py render --label baseline-<rule> --mandoc tools/mandoc-md
+# OR custom page list
+python tests/evals/render/render_eval.py render --label baseline-<rule> --mandoc tools/mandoc-md $(cat <pages-file>)
+python tests/evals/render/render_eval.py audit <run-dir> --rules <rule>
+```
+
+Record the rule's `pages × occurrences` baseline number. This is what the candidate must beat.
+
+## Step 3: Dispatch subagent
+
+Use the `templates/subagent-brief.md` template. Fill in every placeholder. Spawn a fresh general-purpose agent (the mandoc tree is a separate working directory; the subagent will operate there).
+
+Brief the agent **not to push or promote** — those are your job after validation.
+
+Wait for the subagent to return before continuing. Run it foreground (default), not background — the validation steps depend on its output.
+
+## Step 4: Validate (three layers)
+
+After the subagent reports a commit + rebuilt binary:
+
+a. **Absolute check (load-bearing).** Render the same page set with the candidate; audit; compare counts to the baseline. Target: rule's `pages × occurrences` strictly down. Acceptable residue is content-driven (e.g. literal `*` in source roff); the subagent's report should distinguish.
+
+b. **Regression net.** Invoke `/eval-render <candidate-binary>` to run the standard compare. Read the verdict. Suspicious deltas unrelated to the targeted rule are regressions.
+
+c. **Spot-check.** Render and visually diff 2–3 of the most-affected pages from the baseline, confirming the fix matches the subagent's repro and doesn't introduce new visual artifacts.
+
+## Step 5: Apply the rubric
+
+- **merge** ⇢ absolute count strictly down, `/eval-render` verdict is merge, spot-checks clean. Recommend: `cp <candidate> tools/mandoc-md`, commit referencing the upstream commit hash, suggest `/eval-llm` and re-extraction of the affected pages.
+- **regression** ⇢ any layer fails. Re-dispatch the subagent (Step 3) with a delta brief that names the specific regression and concrete acceptance test. Re-iterate until merge or defer.
+- **defer** ⇢ ambiguous cases (e.g. absolute count drops but `/eval-render` flags structural changes). Surface evidence to the user and ask.
+
+## Step 6: Promote and propose downstream
+
+On merge:
+
+```bash
+cp <candidate-binary> tools/mandoc-md
+git add tools/mandoc-md
+git commit -m "feat(tools): promote mandoc-md with <one-line summary>
+
+Picks up mandoc <upstream-commit> (\"<upstream-subject>\"). <impact line>.
+<rule> drops from <baseline> to <candidate> across the <page-set>."
+```
+
+Then propose (don't run without explicit go-ahead):
+
+1. **`/eval-llm`** as a sanity check that cleaner markdown doesn't perturb extraction quality.
+2. **Re-extract the affected pages** with `--reason` populated:
+   ```
+   python -m explainshell.manager extract --mode llm:<model> --overwrite \
+     -j 10 --reason "<one-line: what fixed, eval verdict>" \
+     $(tr '\n' ' ' < <pages-file>)
+   ```
+3. **Upload the live DB** with `make upload-live-db` once the user confirms the re-extract looked clean.
+
+## Reporting back
+
+Final user-facing report (in chat, not a file):
+
+- One-line verdict.
+- Baseline → candidate counts for the gating rule, with `pages × occurrences`.
+- `/eval-render` aggregate verdict (one line).
+- Spot-checked pages (one bullet each, before → after).
+- If **merge**: the promotion commands above, ready to run.
+- If **regression**: the redispatched subagent prompt fenced and ready.
+- If **defer**: the specific evidence and a concrete question.
+
+## What NOT to do
+
+- Don't dispatch a subagent without a measurable success metric (rule + page set).
+- Don't promote without all three validation layers passing.
+- Don't run `/eval-llm`, re-extract, or `make upload-live-db` without explicit user confirmation — those are downstream actions with cost or production impact.
+- Don't push or merge changes in the mandoc tree from this session — the subagent commits in its own tree; the user pushes those upstream when they choose.
diff --git a/.claude/skills/mandoc-fix/templates/subagent-brief.md b/.claude/skills/mandoc-fix/templates/subagent-brief.md
@@ -0,0 +1,112 @@
+<!--
+Subagent brief template for /mandoc-fix.
+
+The orchestrator fills the placeholders in <ANGLE_BRACKETS> and dispatches
+the rendered text as a fresh general-purpose agent. Every section pulls its
+weight; if a section is empty, drop it rather than emit a placeholder.
+
+Required substitutions:
+- {{WORKTREE}}         absolute path, e.g. /home/idank/dev/vibe/mandoc-1.14.6
+- {{HEAD_COMMIT}}      output of `git log --oneline -1` in the worktree
+- {{LOCAL_HISTORY}}    bullet list of recent local commits the subagent must
+                       not break (subject lines + one-sentence intent each)
+- {{BUG_NAME}}         short label for the bug class
+- {{BUG_DESCRIPTION}}  why the bug matters in plain English
+- {{REPRO_CLI}}        printf | mandoc invocation that demonstrates the bug
+                       and the desired output
+- {{REPRO_PAGE}}       at least one real manpage path (under
+                       ../explainshell/manpages/) that contains the pattern
+- {{ACCEPTANCE_TESTS}} numbered list of what the candidate must produce
+- {{INVARIANTS}}       list of behaviors that must NOT change (e.g. "italic
+                       still emits *...*", "&zwnj; insertion intact",
+                       "intraword italic still works")
+- {{AUDIT_RULE}}       audit rule id whose count must drop (e.g. quad_star_run)
+- {{AUDIT_PAGE_SET}}   "the standard corpus" or "the {{N}}-page list at
+                       <path>"
+- {{BASELINE_COUNT}}   "<P> pages, <N> occurrences" measured before dispatch
+- {{REPORTING_FIELDS}} pass-through fields the orchestrator needs back
+  (commit hash, smoke test outputs, audit count, regress tally, judgment
+  calls)
+-->
+
+You're working in {{WORKTREE}}, a vendored mandoc 1.14.6 source tree with a
+stack of local fixes. Build with `make`; output is `./mandoc`.
+
+## Context
+
+`HEAD` is at `{{HEAD_COMMIT}}`. Recent local commits you must preserve:
+
+{{LOCAL_HISTORY}}
+
+## The bug — {{BUG_NAME}}
+
+{{BUG_DESCRIPTION}}
+
+### Repro
+
+CLI:
+
+```
+{{REPRO_CLI}}
+```
+
+Real-world example: `{{REPRO_PAGE}}` (under `../explainshell/manpages/`).
+
+## What I want
+
+{{ACCEPTANCE_TESTS}}
+
+## Invariants — these MUST keep working
+
+{{INVARIANTS}}
+
+## Process
+
+1. Read `git log -p <range>` for the recent local commits in this tree
+   before patching anything in `mdoc_markdown.c` — they share data
+   structures with what you're about to change.
+2. Implement the fix. Prefer extending existing machinery
+   (`pending_close_marker`, `marker_stack`, `outer_marker`, font-mode
+   helpers) over inventing new globals.
+3. Build (`make`).
+4. Run the CLI repro above and confirm the desired output.
+5. `make regress` 100% pass. Update fixtures only when the change
+   legitimately changes their expected output. Add a new fixture under
+   `regress/man/B/` (or `regress/mdoc/`) covering the new case, in the
+   style of recent additions like `regress/man/B/emphasis_transitions`.
+6. Cross-check on the audit page set. From `/home/idank/dev/vibe/explainshell`:
+
+   ```
+   source .venv/bin/activate
+   python tests/evals/render/render_eval.py render \
+     --label candidate-{{BUG_NAME}} --mandoc {{WORKTREE}}/mandoc <CORPUS_OR_LIST>
+   python tests/evals/render/render_eval.py audit <run-dir> --rules {{AUDIT_RULE}}
+   ```
+
+   Target: `{{AUDIT_RULE}}` count strictly below the baseline of
+   `{{BASELINE_COUNT}}`. Acceptable residue is content-driven (literal
+   `*` in source roff, etc.) — call those out so the orchestrator can
+   verify rather than guessing.
+7. Commit with a Conventional-Commits-shaped subject:
+   `Fix -T markdown: <one-line>`. Body explains the mechanism, references
+   the canonical motivating page (`{{REPRO_PAGE}}`), and notes any
+   accepted residue.
+
+## What NOT to do
+
+- Don't reintroduce `*` for italic if a prior local commit switched to
+  `_` (or vice versa). Check `{{LOCAL_HISTORY}}`.
+- Don't remove `&zwnj;` insertion machinery; it's load-bearing for
+  bold↔italic abutment.
+- Don't touch any file outside `{{WORKTREE}}`.
+- Don't promote the binary into `../explainshell/tools/`.
+- Don't push the commit.
+
+## Reporting back
+
+When done, give a short summary:
+
+{{REPORTING_FIELDS}}
+
+If anything blocks the fix (e.g. it would regress an invariant), stop
+and report — don't ship a partial fix.