docs(skills): tighten review handoff checks

GiggleLiu · GiggleLiu · commit d34477e810e4 · 2026-03-15T15:23:59.000+08:00
diff --git a/.claude/skills/final-review/SKILL.md b/.claude/skills/final-review/SKILL.md
@@ -58,6 +58,32 @@ Collect all information needed for the review:
 
 1f. **Check for conflicts with main**: Run `gh pr view <number> --json mergeable`. If there are merge conflicts, launch a subagent to merge `origin/main` into the PR branch (in a worktree) and push the merge commit.
 
+1g. **PR / issue comment audit (REQUIRED)**: Final review must check the comment history before recommending merge.
+  - Set `REPO=$(gh repo view --json nameWithOwner --jq .nameWithOwner)`
+  - Fetch and read:
+    - PR conversation comments: `gh api repos/$REPO/issues/<number>/comments`
+    - PR inline review comments: `gh api repos/$REPO/pulls/<number>/comments`
+    - PR review bodies: `gh api repos/$REPO/pulls/<number>/reviews`
+    - linked issue comments, if an issue exists
+  - Build a list of every actionable comment and classify each as:
+    - `addressed`
+    - `superseded / no longer applicable`
+    - `still open`
+  - Pay special attention to the `## Review Pipeline Report` comment. If it contains a `Remaining issues for final review` section, those items must be reviewed explicitly here.
+  - Do **not** recommend merge until every actionable comment has been dispositioned.
+
+1h. **Comment status summary**: Prepare a short summary for later steps:
+
+> **Comment Audit**
+>
+> [N addressed, M superseded, K still open]
+>
+> Open items:
+> - [comment / issue summary]
+> - ...
+
+If no actionable comments remain, report `No open actionable comments`.
+
 ### Step 2: Usefulness assessment
 
 Think critically about whether this model/rule is genuinely useful. Consider:
@@ -212,6 +238,7 @@ Present a summary table:
 
 | Aspect | Result |
 |--------|--------|
+| Comments | [All addressed / Open: X, Y] |
 | Usefulness | [Useful/Marginal/Not useful] |
 | Safety | [Safe/Concerns found] |
 | Completeness | [Complete/Missing: X, Y] |
@@ -229,6 +256,8 @@ Then present all numbered issues from Step 5 as a multi-select `AskUserQuestion`
 
 This lets the reviewer cherry-pick exactly which issues to fix. If the reviewer selects fixes, proceed to Step 7 Quick fix. If "Merge as-is", proceed to Step 7 Merge.
 
+If any actionable PR / issue comment from Step 1g is still open, `Merge as-is` must **not** be your recommendation. Recommend either **Quick fix** or **OnHold** instead.
+
 ### Step 7: Execute decision
 
 **If Merge:**
diff --git a/.claude/skills/review-pipeline/SKILL.md b/.claude/skills/review-pipeline/SKILL.md
@@ -245,9 +245,21 @@ Run agentic feature tests on the modified feature:
 
 2. **Invoke `/agentic-tests:test-feature`** with the identified feature. This simulates a downstream user exercising the feature from docs and examples. You MUST use the Skill tool to invoke `agentic-tests:test-feature`.
 
-3. **If test-feature reports issues:** fix them, commit, and push.
-
-4. **If test-feature passes:** continue to next step.
+3. **If test-feature reports issues:** treat every reported issue as real until you have checked it in the **current PR worktree/branch**.
+   - Reproduce each issue from the current PR branch/worktree before acting. If it does not reproduce there, classify it as `not reproducible in current worktree`.
+   - Auto-fix every objective issue you reasonably can: code bugs, tests, docs, help text, examples, discoverability gaps, and validation/error-message problems. Do **not** leave "minor docs issues" unfixed by default.
+   - If you changed user-facing behavior, docs, or CLI help, re-run `/agentic-tests:test-feature`.
+   - Classify every reported issue as exactly one of:
+     - `fixed`
+     - `not reproducible in current worktree`
+     - `needs human decision`
+
+4. **Only `needs human decision` issues may remain unresolved.** If any remain:
+   - continue to the next step only after you have written them down for the final PR report
+   - include why they were not auto-fixed
+   - include your recommended maintainer decision
+
+5. **If test-feature passes with no remaining issues:** continue to next step.
 
 ### 4. Fix Loop (max 3 retries)
 
@@ -325,15 +337,30 @@ gh pr comment $PR --body "$(cat <<'EOF'
 | Structural review | 17/17 passed |
 | CI | green |
 | Agentic test | passed |
+| Needs human decision | none |
 | Board | Review pool → Under review → Final review |
 
+### Remaining issues for final review
+
+- None.
+
 🤖 Generated by review-pipeline
 EOF
 )"
 ```
 
 Adapt the table values to match the actual results for the PR. If CI failed after 3 retries, report `failed (3 retries)` instead of `green`.
 
+This section is **mandatory**:
+- `Needs human decision` must be `none` or `N item(s)`
+- `### Remaining issues for final review` must always be present
+- If there are unresolved issues, list each one as a bullet with:
+  - the concrete problem
+  - why it was not auto-fixed
+  - the recommended maintainer decision
+
+If unresolved issues remain, do **not** write `Agentic test | passed`. Use `passed with notes` or another accurate status.
+
 ### 8. Batch Mode (`--all`)
 
 If `--all` was specified, repeat Steps 1-7 for each PR (including posting a PR comment per Step 7). After all PRs, print a batch summary to console:
@@ -364,3 +391,4 @@ Completed: 2/2 | All moved to Final review
 | Skipping merge with main | Always merge origin/main in Step 1a to catch conflicts before fixing comments |
 | Ignoring issue comments | Always check the linked issue (`Fix #N`) for human feedback in Step 2a |
 | Only checking Copilot comments | Step 2a checks human PR reviews and linked issue comments too — bot-only review is insufficient |
+| Saying "passed" while deferring issues | If anything remains for maintainer judgment, list it explicitly under `Remaining issues for final review` and mark the agentic result accordingly |