Skip to content

feat(review-pr): add verification phase, comment preview, and model tuning#52

Merged
cblecker merged 4 commits into
mainfrom
feat/review-pr-verify-and-preview
Jun 19, 2026
Merged

feat(review-pr): add verification phase, comment preview, and model tuning#52
cblecker merged 4 commits into
mainfrom
feat/review-pr-verify-and-preview

Conversation

@cblecker

@cblecker cblecker commented Jun 19, 2026

Copy link
Copy Markdown
Owner

Summary

  • Adds an adversarial Verify phase between Analyze and Contextualize — each finding is independently checked against the diff, filtering false positives and adjusting severity
  • Restructures Phase 3 presentation to output findings as plain text first (with verification rationale per finding), then prompt for selection separately via AskUserQuestion
  • Adds a new Phase 4 comment drafting step with preview/edit loop — users see exact comment text before anything is posted to GitHub
  • Appends a standardization suffix to all agent prompts for uniform finding descriptions
  • Right-sizes model and effort per workflow agent: lowers effort from max→high for analyze agents, uses sonnet+high for verify agents (binary output), downgrades fetch-threads to haiku+low (pure MCP tool calling)
  • Clarifies version bump conventions in CLAUDE.md with semver guidelines (patch/minor/major) and a "bump once per branch" rule
  • Bumps plugin version to 1.4.0

Test plan

  • Run claude plugin validate ./pr-review-toolkit — passes
  • Run uvx skillsaw --strict — A+ grade, 0 errors
  • Invoke /pr-review-toolkit:review-pr against a real PR and confirm all 5 phases execute in order
  • Confirm findings display as plain text before selection prompt
  • Confirm verification lines appear on each finding
  • Confirm selected findings produce draft comments for preview
  • Confirm "edit N" flow works for modifying draft comments
  • Confirm "post" creates the review with correct comments
  • Test PR with no findings (should report clean review, skip phases 4-5)
  • Test non-local checkout (isLocal = false) to verify remote diff collection

Summary by CodeRabbit

  • New Features

    • Added an independent Verify phase that checks each finding and filters false positives.
    • Extended findings with verification status and rationale, and re-presents results in a clearer two-step flow.
    • Introduced an end-user Recommendation step (Include/Skip) and free-text selection by numbering (supports all/none/recommended).
    • Added a draft review comments step with an approval/edit loop before comments are posted.
  • Documentation

    • Updated review agent trigger rules for more targeted analysis.
  • Chores

    • Bumped plugin version to 1.4.0.

Copilot AI review requested due to automatic review settings June 19, 2026 17:51
@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 90ca5687-19e9-41eb-afe7-06f6ee3af1b2

📥 Commits

Reviewing files that changed from the base of the PR and between 4414a19 and 8e2057f.

📒 Files selected for processing (4)
  • CLAUDE.md
  • pr-review-toolkit/README.md
  • pr-review-toolkit/skills/review-pr/SKILL.md
  • pr-review-toolkit/skills/review-pr/review-pr.js

📝 Walkthrough

Walkthrough

Adds a new adversarial Verify phase to the review-pr skill between Analyze and Contextualize. A verifier agent runs per-finding in parallel, filters false positives, attaches verificationStatus and verificationRationale to surviving findings. SKILL.md updates the output schema and rewrites the Phase 3/4/5 presentation, draft, edit, and posting flow into a mandatory two-step selection (plain text then numbered prompt) followed by draft-comment grouping and edit-loop approval. Plugin version bumped to 1.4.0 and contribution conventions expanded to document semver bumping rules.

Changes

Adversarial Verify Phase and Workflow Updates

Layer / File(s) Summary
Verify phase schema and implementation
pr-review-toolkit/skills/review-pr/review-pr.js
Registers a new Verify phase in meta.phases between Analyze and Contextualize, defines VERIFY_SCHEMA for verifier output, introduces STANDARDIZATION_SUFFIX appended to Analyze-phase prompts, updates Analyze to use per-agent AGENT_OPTS and accumulate sorted findings, implements parallel verifier-agent execution per finding to filter false_positive findings and record verificationStatus/verificationRationale, and adjusts Contextualize's fetch-threads and classify-findings agent invocations with explicit model and effort parameters.
SKILL.md verification schema and revised workflow
pr-review-toolkit/skills/review-pr/SKILL.md
Adds verificationStatus and verificationRationale to the findings result schema, replaces Phase 3 with a mandatory Step 1 (plain-text sections for Critical/Important/Suggestions/Partial Overlaps/Strengths/Previous Review Status) and Step 2 (one-line recommendations and free-text numbered selection), rewrites Phase 4 as a draft/preview/edit/approve loop with first-person comment text grouped by file, specifies edit N loop mechanics, and updates Phase 5 posting instructions to create a pending review and add approved comments via add_comment_to_pending_review.
Version bump and contribution conventions
pr-review-toolkit/.claude-plugin/plugin.json, CLAUDE.md
Increments plugin version from 1.3.0 to 1.4.0 and expands contribution conventions to document semver level mapping (patch/minor/major), enforce one version bump per PR branch, and require a pre-bump check using git diff main to avoid duplicate bumps.
Review Agents table documentation updates
pr-review-toolkit/README.md
Refines trigger conditions for silent-failure-hunter, pr-test-analyzer, comment-analyzer, and type-design-analyzer with more descriptive content-focused criteria in place of generic file-count/type-based conditions.

Sequence Diagram(s)

sequenceDiagram
  rect rgba(70, 130, 180, 0.5)
    Note over AnalyzePhase: Analyze
    AnalyzePhase->>AnalyzePhase: Append STANDARDIZATION_SUFFIX to prompts
    AnalyzePhase->>AnalyzePhase: Aggregate findings from parallel agents
  end
  rect rgba(180, 100, 70, 0.5)
    Note over VerifyPhase: Verify (new)
    VerifyPhase->>VerifierAgent: Run per-finding in parallel with VERIFY_SCHEMA
    VerifierAgent-->>VerifyPhase: verificationStatus, adjustedSeverity, rationale
    VerifyPhase->>VerifyPhase: Filter false_positive findings
    VerifyPhase->>VerifyPhase: Record verificationStatus + verificationRationale
    VerifyPhase->>VerifyPhase: Re-sort by severity then confidence
  end
  rect rgba(70, 180, 100, 0.5)
    Note over ContextualizePhase: Contextualize
    ContextualizePhase->>ContextualizePhase: Process verified findings with model/effort overrides
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • cblecker/claude-plugins#45: Introduced the initial review-pr workflow that this PR extends with the new Verify phase and adversarial-verification output fields.
  • cblecker/claude-plugins#49: Established the diff preamble and diff-setup logic that the new Verify phase's verifier-agent prompts consume to validate findings.
  • cblecker/claude-plugins#51: Also modifies review-pr.js Phase 3/4/5 flow and extends the findings output contract with adjacent verification/classification metadata.

Poem

🐇 Hop, hop — before a bug slips through,
A verifier digs to check what's true.
False positives? Filtered with a bound!
Each finding verified, each flaw is sound.
The diff can't hide from these keen eyes,
Version 1.4.0 — the rabbit certifies! 🔍

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main changes: it mentions the new verification phase, comment preview functionality, and model tuning optimizations that are central to this PR.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/review-pr-verify-and-preview

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Biome (2.5.0)
pr-review-toolkit/skills/review-pr/review-pr.js

File contains syntax errors that prevent linting: Line 672: Illegal return statement outside of a function; Line 788: Illegal return statement outside of a function


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the pr-review-toolkit plugin's review-pr skill by inserting an adversarial Verify phase into the workflow and restructuring how findings are presented and posted. The workflow script now independently checks each finding from the Analyze phase against the diff (filtering false positives and re-scoring overstated/understated findings) before contextualization. The accompanying SKILL.md is reworked so findings are first rendered as plain text (with per-finding verification rationale), then selected, then drafted/previewed as exact comment text before posting, shifting the old "Post review" step from Phase 4 to Phase 5.

Changes:

  • Adds a Verify workflow phase plus VERIFY_SCHEMA, spawning a per-finding verifier that filters false positives and adjusts severity, and appends a STANDARDIZATION_SUFFIX to analyze-agent prompts.
  • Restructures SKILL.md Phase 3 into text-output + selection steps, adds a new Phase 4 draft/preview/edit loop, renumbers posting to Phase 5, and documents the new verificationStatus/verificationRationale fields.
  • Bumps the plugin version from 1.3.0 to 1.4.0.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
pr-review-toolkit/skills/review-pr/review-pr.js Adds the Verify phase, verification schema, per-finding verifier agents, and standardization suffix for analyze prompts.
pr-review-toolkit/skills/review-pr/SKILL.md Documents verification fields and restructures presentation/drafting/posting into Phases 3–5.
pr-review-toolkit/.claude-plugin/plugin.json Version bump to 1.4.0 reflecting the new feature.

Notes from the review: I flagged a redundant identical if/else if pair in the severity-adjustment logic, an optional operational note about the per-finding verifier re-collecting the full diff (cost scales with finding count), and a documentation inaccuracy attributing confidence filtering to the verify phase rather than the analyze phase. All are nit-level.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pr-review-toolkit/skills/review-pr/review-pr.js Outdated
Comment thread pr-review-toolkit/skills/review-pr/review-pr.js
Comment thread pr-review-toolkit/skills/review-pr/SKILL.md Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pr-review-toolkit/skills/review-pr/review-pr.js`:
- Around line 617-620: The condition in the falsePositiveCount block incorrectly
treats verification failures (!v) the same as explicit false positives
(v.verification === 'false_positive'), which silently drops valid findings when
verification calls fail or timeout. Separate the logic to only filter when
v.verification explicitly equals 'false_positive', and handle the !v case
(verification failure) differently by either keeping the original finding with a
fallback verification note or explicitly failing the phase. This ensures
verification failures do not silently drop important findings.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 66a6d6da-edd9-4b4a-be51-a9bd8d3283dd

📥 Commits

Reviewing files that changed from the base of the PR and between 08fd171 and 7a16d77.

📒 Files selected for processing (3)
  • pr-review-toolkit/.claude-plugin/plugin.json
  • pr-review-toolkit/skills/review-pr/SKILL.md
  • pr-review-toolkit/skills/review-pr/review-pr.js

Comment thread pr-review-toolkit/skills/review-pr/review-pr.js Outdated
cblecker added 2 commits June 19, 2026 11:22
Add adversarial verification of findings between Analyze and
Contextualize phases, filtering false positives and adjusting
severity. Restructure presentation to show findings as plain text
before selection, and add a comment drafting phase with preview/edit
loop before posting.

Assisted-by: Claude:claude-opus-4-6
Lower effort from max to high for specialized analyze agents (silent-failure-hunter,
pr-test-analyzer, comment-analyzer, type-design-analyzer) whose structured prompts
compensate for reduced reasoning. Use sonnet+high for verify agents (binary output,
narrow scope). Downgrade fetch-threads to haiku+low (pure MCP tool calling). Add
explicit effort:high to classify-findings.

Assisted-by: Claude:claude-opus-4-6
@cblecker cblecker force-pushed the feat/review-pr-verify-and-preview branch from 7a16d77 to 4414a19 Compare June 19, 2026 18:26
@cblecker cblecker changed the title feat(review-pr): add verification phase and comment preview feat(review-pr): add verification phase, comment preview, and model tuning Jun 19, 2026
@cblecker

Copy link
Copy Markdown
Owner Author

@CodeRabbit full review

@cblecker cblecker requested a review from Copilot June 19, 2026 18:29
@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown
✅ Action performed

Full review finished.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment thread pr-review-toolkit/skills/review-pr/review-pr.js
Comment thread pr-review-toolkit/skills/review-pr/SKILL.md Outdated
@cblecker

Copy link
Copy Markdown
Owner Author

@CodeRabbit full review

@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown
✅ Action performed

Full review finished.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pr-review-toolkit/skills/review-pr/review-pr.js`:
- Line 556: The code-reviewer configuration on line 556 is forcing effort:
'max', which bypasses the high-effort optimization and unnecessarily increases
cost and latency. Change the effort value for the code-reviewer object from
'max' to a right-sized effort level that aligns with the optimization strategy,
removing the unnecessary forced maximum effort setting.
- Around line 591-619: The parallel() function call starting at verifyResults is
launching all verifier agents without a concurrency limit, which can overload
quotas and cause timeouts when processing large finding sets. Modify the
parallel() call that executes allFindings.map() to include a concurrency limit
parameter that caps the number of simultaneous agent verifications running at
once. Check the parallel() function's API or existing patterns in the codebase
to determine the correct parameter syntax for setting the maximum concurrent
operation limit.

In `@pr-review-toolkit/skills/review-pr/SKILL.md`:
- Around line 171-199: The output template currently hardcodes `_Verified:
{verificationRationale}_` for all findings regardless of their verification
status. When a finding has verificationStatus set to unverified, this creates
factually incorrect output. Make the verification line conditional by only
including `_Verified: {verificationRationale}_` when the finding has a verified
status. Apply this conditional logic consistently across all four sections of
the template: the unnumbered critical findings section at the top, the Important
Issues section, the Suggestions section, and the Partial Overlaps section (which
also appears at line 227-228). Ensure that unverified findings do not display
any verification statement in the output.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 489397c7-7d34-41bc-8c5c-dfa3e26b3500

📥 Commits

Reviewing files that changed from the base of the PR and between 08fd171 and 4414a19.

📒 Files selected for processing (4)
  • CLAUDE.md
  • pr-review-toolkit/.claude-plugin/plugin.json
  • pr-review-toolkit/skills/review-pr/SKILL.md
  • pr-review-toolkit/skills/review-pr/review-pr.js

Comment thread pr-review-toolkit/skills/review-pr/review-pr.js
Comment thread pr-review-toolkit/skills/review-pr/review-pr.js
Comment thread pr-review-toolkit/skills/review-pr/SKILL.md Outdated
Add semver level guidance (patch/minor/major) and once-per-branch
rule to avoid redundant bumps while ensuring the bump level matches
the scope of accumulated changes.

Assisted-by: Claude:claude-opus-4-6
@cblecker cblecker force-pushed the feat/review-pr-verify-and-preview branch from 4414a19 to df1bcb1 Compare June 19, 2026 20:14
@cblecker cblecker enabled auto-merge June 19, 2026 20:14
Condense SKILL.md by removing prose that duplicates the JSON schema,
collapsing three identical severity section templates into a loop,
folding status labels and duplicate handling into the template, and
trimming redundant examples and guidance.

In review-pr.js, extract duplicated sort logic into a sortFindings
helper, remove dead string-args fallback, clean up no-op alias, and
tighten the classify and fetch-threads prompts.

Update README agent trigger descriptions to match SKILL.md criteria.

Assisted-by: Claude:claude-opus-4-6
Copilot AI review requested due to automatic review settings June 19, 2026 20:52
@cblecker cblecker merged commit b02a2eb into main Jun 19, 2026
11 of 12 checks passed
@cblecker cblecker deleted the feat/review-pr-verify-and-preview branch June 19, 2026 20:53

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Comment on lines +641 to +644
verifiedFindings.push(Object.assign({}, finding, {
verificationStatus: 'verified',
verificationRationale: v.rationale
}))
log('Fetching existing review threads')

const fetchPrompt = `Fetch the authenticated user login via \`get_me\`, and all review comment threads via \`pull_request_read\` with method \`get_review_comments\` for ${config.owner}/${config.repo} PR #${config.pullNumber}. Paginate if needed to get all threads. For each thread, populate \`id\` (the thread node ID), \`author\` (the GitHub login of the first comment's author, matching the format returned by \`get_me\`), \`isResolved\`, and include the thread's replies. Return \`threads\` and \`myUsername\` as structured output.`
const fetchPrompt = `Fetch the authenticated user login via \`get_me\`, and all review comment threads via \`pull_request_read\` with method \`get_review_comments\` for ${config.owner}/${config.repo} PR #${config.pullNumber}. Paginate if needed to get all threads. For each thread, populate \`id\` (the thread node ID), \`author\` (the GitHub login of the first comment's author, matching the format returned by \`get_me\`), \`isResolved\`, and include the thread's replies.`
{for each finding where status = "partial_overlap"}

### Important
4. `{file}[:{line}]` -- **{title}**
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants