cblecker · cblecker · Jun 19, 2026 · Jun 19, 2026 · Jun 19, 2026 · Jun 19, 2026
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -44,7 +44,15 @@ structure.
 
 - Use kebab-case for all names
 - Use `${CLAUDE_PLUGIN_ROOT}` for portable paths in hooks/MCP configs
-- When editing plugin files (other than README.md or CLAUDE.md), bump the version in that plugin's `.claude-plugin/plugin.json`
+- When editing plugin files (other than README.md or CLAUDE.md), bump the version in
+  that plugin's `.claude-plugin/plugin.json` following semver:
+  - **patch**: bug fixes, typo corrections, minor wording changes
+  - **minor**: new skills, commands, hooks, agents, or backward-compatible behavior changes
+  - **major**: breaking changes (renamed/removed skills, changed hook behavior, restructured plugin)
+- Only bump once per PR branch. Before bumping, check `git diff main -- <plugin>/.claude-plugin/plugin.json`
+  to see if the version was already bumped. Skip if it was, unless the accumulated
+  changes now warrant a higher semver level (e.g., patch already bumped but a new
+  skill was added — upgrade to minor)
 - Use plugin-dev skills: `/plugin-dev:create-plugin`, `/plugin-dev:skill-reviewer`, `/plugin-dev:plugin-validator`
 
 ## Documentation

diff --git a/pr-review-toolkit/.claude-plugin/plugin.json b/pr-review-toolkit/.claude-plugin/plugin.json
@@ -1,6 +1,6 @@
 {
   "name": "pr-review-toolkit",
-  "version": "1.3.0",
+  "version": "1.4.0",
   "description": "Comprehensive PR review using parallel workflow agents",
   "author": {
     "name": "cblecker",

diff --git a/pr-review-toolkit/README.md b/pr-review-toolkit/README.md
@@ -16,10 +16,10 @@ upstream agents + command architecture.
 | Agent | When it runs | What it does |
 |-------|-------------|--------------|
 | code-reviewer | Always | Reviews code for bugs, style, and guideline adherence (runs on Opus) |
-| silent-failure-hunter | Code files changed | Identifies silent failures and inadequate error handling |
-| pr-test-analyzer | Code files changed | Analyzes test coverage completeness |
-| comment-analyzer | Docs changed or >= 3 files | Checks comment accuracy and maintainability |
-| type-design-analyzer | Typed-language files changed | Evaluates type design and invariant quality |
+| silent-failure-hunter | Changes touch error handling, try/catch, or fallback logic | Identifies silent failures and inadequate error handling |
+| pr-test-analyzer | Functional code that should have corresponding tests | Analyzes test coverage completeness |
+| comment-analyzer | Changes add or modify comments, docstrings, or docs | Checks comment accuracy and maintainability |
+| type-design-analyzer | Changes introduce or modify type definitions in typed languages | Evaluates type design and invariant quality |
 
 Agent selection is liberal: when in doubt, the agent runs. All agents execute in
 parallel within a single workflow, and their lifecycle is managed automatically.

diff --git a/pr-review-toolkit/skills/review-pr/SKILL.md b/pr-review-toolkit/skills/review-pr/SKILL.md
@@ -89,6 +89,8 @@ Wait for the workflow to complete. It returns a JSON object:
       "confidence": 85,
       "title": "Short title",
       "description": "Detailed explanation",
+      "verificationStatus": "verified | unverified",
+      "verificationRationale": "What was checked and confirmed",
       "status": "new | duplicate | partial_overlap",
       "matchedThreadId": "thread-id",
       "existingCoverage": "What the existing thread covers",
@@ -121,22 +123,17 @@ Wait for the workflow to complete. It returns a JSON object:
 ```
 
 `line` may be absent for findings that apply to an entire file or PR.
-Each finding has a `status` from the contextualization phase:
-
-- `new` — no existing thread covers this issue
-- `duplicate` — an existing thread fully covers the same concern
-- `partial_overlap` — an existing thread touches the same area but our
-  finding adds something; `delta` describes the addition and
-  `adjustedSeverity`/`adjustedConfidence` rescore the incremental value
-
-`threadVerifications` is non-empty when `hasOwnResolvedThreads` is true
-(we left comments in a previous review that have since been resolved).
-Each entry assesses whether the author addressed the concern.
+False positives are filtered before this output — remaining findings have
+`verificationStatus` of `verified` or `unverified` (verifier unavailable).
+`threadVerifications` is non-empty only when `hasOwnResolvedThreads` is
+true, meaning we left comments in a previous review that have since been
+resolved.
 
 ## Phase 3: Present findings
 
 The workflow returns classified findings and thread verifications.
-Present them to the user via AskUserQuestion using the template below.
+Present them to the user in two steps: text output first, then a
+selection prompt.
 
 ### Score resolution
 
@@ -145,99 +142,147 @@ For each finding, use the effective severity and confidence:
 - If `adjustedSeverity` is present, use it; otherwise use `severity`
 - If `adjustedConfidence` is present, use it; otherwise use `confidence`
 
-### Presentation template
+### Step 1: Output findings as text
 
-Build the AskUserQuestion body using this structure. Omit any section
-that has no items. `[:{line}]` means include `:{line}` only when line
-is present; omit the colon and line number for file-level findings.
+Output findings as plain text before any selection prompt. This step
+is mandatory — do not skip or compress it into AskUserQuestion. Omit
+any section that has no items. `[:{line}]` means include `:{line}`
+only when line is present.
 
 ```
-## Review Summary
+## PR Review: owner/repo#123
 
-{reviewMeta.existingThreadCount} existing thread(s) on this PR.
-{reviewMeta.newCount} new finding(s), {reviewMeta.partialOverlapCount}
-partial overlap(s), {reviewMeta.duplicateCount} duplicate(s).
+{for each severity in [critical, important, suggestion]}
+### {Severity} Issues
 
----
+{for each finding where status = "new" and effective severity = {severity}}
 
-## New Findings
+N. `{file}[:{line}]` -- **{title}**
+   {description}
+   {if verificationStatus = "verified"}_Verified: {verificationRationale}_{end if}
 
-{for each finding where status = "new", grouped by effective severity}
+{end for}
 
-### Critical
+### Partial Overlaps
 
-1. **[critical/{effectiveConfidence}]** `{file}[:{line}]` — {title}
-   {description}
+{for each finding where status = "partial_overlap"}
 
-### Important
+4. `{file}[:{line}]` -- **{title}**
+   Extends existing review comment: {existingCoverage}.
+   New insight: {delta}.
+   {if verificationStatus = "verified"}_Verified: {verificationRationale}_{end if}
 
-2. **[important/{effectiveConfidence}]** `{file}[:{line}]` — {title}
-   {description}
+{if any findings have status = "duplicate"}
+_N findings omitted as duplicates of existing review threads._
+{end if}
 
-### Suggestions
+### Strengths
 
-3. **[suggestion/{effectiveConfidence}]** `{file}[:{line}]` — {title}
-   {description}
+{for each positiveObservation}
 
----
+- {observation}
 
-## Partial Overlaps
+### Previous Review Status
 
-{for each finding where status = "partial_overlap"}
+{for each threadVerification, only if threadVerifications is non-empty}
 
-4. **[{effectiveSeverity}/{effectiveConfidence}]** `{file}[:{line}]`
-   — {title}
-   Existing comment covers: {existingCoverage}
-   Our addition: {delta}
+{if fixed + adequate}Resolved{else if fixed + inadequate}Fix incomplete{else if fixed + newIssue}Fix introduced new issue: {newIssueDescription}{else if pushed_back + adequate}Author disagrees -- reasoning valid{else if pushed_back + inadequate}Author disagrees -- {assessment}{else if unaddressed}Still unresolved{end if} `{file}:{line}` -- {originalConcern}
+   {assessment}
+```
 
----
+### Step 2: Recommendation
 
-## Duplicates (will not post unless selected)
+After presenting the findings, analyze each one and recommend which to
+include in the posted review. For each numbered finding, output a
+one-line recommendation:
 
-{for each finding where status = "duplicate"}
+```
+## Recommendations
 
-5. `{file}[:{line}]` — {existingCoverage}
-   Independently flagged the same issue.
+1. **Include** -- nil pointer panic is a real crash risk in the error path
+2. **Skip** -- sync.Pool is a performance optimization, not a correctness
+   issue; low value as a review comment on this PR
+```
 
----
+Consider these factors when making recommendations:
 
-## Previous Review Status
+- **Severity and verification status** — verified critical/important
+  findings are strong includes; overstated suggestions are candidates
+  to skip
+- **Signal-to-noise ratio** — a review with 3 strong findings is more
+  useful than one with 10 of mixed quality; fewer, higher-impact
+  comments make a better review
+- **PR context** — a suggestion that's valid but tangential to the PR's
+  purpose is noise; a finding central to what the PR is doing is signal
+- **Actionability** — include findings the author can act on; skip
+  findings that are observations without a clear next step
 
-{for each threadVerification, only if threadVerifications is non-empty}
+### Step 3: Selection prompt
 
-- {icon} `{file}:{line}` — {originalConcern}
-  {assessment}
+Number findings sequentially across all actionable sections (new and
+partial overlaps) so each has a unique number. After the recommendations,
+ask via AskUserQuestion:
 
-Icons:
-  fixed + adequate:        ✅ Resolved
-  fixed + inadequate:      ⚠️ Fix incomplete
-  fixed + newIssue:        🔴 Fix introduced new issue: {newIssueDescription}
-  pushed_back + adequate:  ✅ Author disagrees — reasoning valid
-  pushed_back + inadequate:⚠️ Author disagrees — {assessment}
-  unaddressed:             ❌ Still unresolved
+> "Which findings should I include in the review? Enter numbers
+> (e.g. 1,3,5), 'all', 'none', or 'recommended' to accept my
+> recommendations above."
 
----
+Free-text response, not option buttons.
 
-## Positive Observations
+## Phase 4: Draft and preview comments
 
-{for each positiveObservation}
+After the user selects findings, draft and preview the exact GitHub
+comments before posting.
 
-- {observation}
+### Step 1: Draft each comment
+
+For each approved finding, generate the exact text that will be posted
+as a GitHub review comment. Comments should be:
+
+- Written in first-person, natural voice (as if the user wrote them)
+- No boilerplate headers, severity tags, or "AI-generated" markers
+
+### Step 2: Present draft comments for approval
+
+Output all drafted comments grouped by file:
+
+```
+## Draft Review Comments
+
+### path/to/file.go
+
+**Line 42:**
+> Comment text exactly as it will be posted.
+
+**Line 128:**
+> Comment text exactly as it will be posted.
+
+---
+
+Review event: **REQUEST_CHANGES** / **COMMENT**
+(REQUEST_CHANGES if any critical findings selected, COMMENT otherwise)
 ```
 
-### Final prompt
+Then ask via AskUserQuestion:
+
+> "Ready to post these comments? Reply 'post', 'edit N' to modify a
+> specific comment, or 'cancel'."
+
+Free-text response, not option buttons.
+
+### Step 3: Handle edits
 
-Number findings sequentially across all sections (new, partial overlaps,
-duplicates) so each has a unique number. After the template, ask:
-"Which findings should I include in the review? Select by number
-(e.g. 1,3,5), or reply 'all new' / 'all new + overlaps' / 'none'."
+If the user replies "edit N", show the current text of comment N and
+let them provide a replacement. Re-present the updated comment set and
+repeat the approval prompt. Loop until the user replies 'post' or
+'cancel'.
 
-## Phase 4: Post review (after user approval)
+## Phase 5: Post review
 
 1. Create a pending review: `pull_request_review_write` with method `create`
 2. For each approved finding with a file and line number:
-   - Call `add_comment_to_pending_review` with the file, line, and a
-     natural-language comment written as if by the user
+   - Call `add_comment_to_pending_review` with the file, line, and the
+     drafted comment text from Phase 4
    - Use `subjectType: "LINE"` and `side: "RIGHT"`
 3. Write the review body as a brief summary of the review. Include any approved
    findings that lack a file or line number as inline items in the body.