feat(review-pr): add verification phase, comment preview, and model tuning#52
Conversation
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughAdds a new adversarial Verify phase to the ChangesAdversarial Verify Phase and Workflow Updates
Sequence Diagram(s)sequenceDiagram
rect rgba(70, 130, 180, 0.5)
Note over AnalyzePhase: Analyze
AnalyzePhase->>AnalyzePhase: Append STANDARDIZATION_SUFFIX to prompts
AnalyzePhase->>AnalyzePhase: Aggregate findings from parallel agents
end
rect rgba(180, 100, 70, 0.5)
Note over VerifyPhase: Verify (new)
VerifyPhase->>VerifierAgent: Run per-finding in parallel with VERIFY_SCHEMA
VerifierAgent-->>VerifyPhase: verificationStatus, adjustedSeverity, rationale
VerifyPhase->>VerifyPhase: Filter false_positive findings
VerifyPhase->>VerifyPhase: Record verificationStatus + verificationRationale
VerifyPhase->>VerifyPhase: Re-sort by severity then confidence
end
rect rgba(70, 180, 100, 0.5)
Note over ContextualizePhase: Contextualize
ContextualizePhase->>ContextualizePhase: Process verified findings with model/effort overrides
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Biome (2.5.0)pr-review-toolkit/skills/review-pr/review-pr.jsFile contains syntax errors that prevent linting: Line 672: Illegal return statement outside of a function; Line 788: Illegal return statement outside of a function Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR enhances the pr-review-toolkit plugin's review-pr skill by inserting an adversarial Verify phase into the workflow and restructuring how findings are presented and posted. The workflow script now independently checks each finding from the Analyze phase against the diff (filtering false positives and re-scoring overstated/understated findings) before contextualization. The accompanying SKILL.md is reworked so findings are first rendered as plain text (with per-finding verification rationale), then selected, then drafted/previewed as exact comment text before posting, shifting the old "Post review" step from Phase 4 to Phase 5.
Changes:
- Adds a
Verifyworkflow phase plusVERIFY_SCHEMA, spawning a per-finding verifier that filters false positives and adjusts severity, and appends aSTANDARDIZATION_SUFFIXto analyze-agent prompts. - Restructures
SKILL.mdPhase 3 into text-output + selection steps, adds a new Phase 4 draft/preview/edit loop, renumbers posting to Phase 5, and documents the newverificationStatus/verificationRationalefields. - Bumps the plugin version from
1.3.0to1.4.0.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| pr-review-toolkit/skills/review-pr/review-pr.js | Adds the Verify phase, verification schema, per-finding verifier agents, and standardization suffix for analyze prompts. |
| pr-review-toolkit/skills/review-pr/SKILL.md | Documents verification fields and restructures presentation/drafting/posting into Phases 3–5. |
| pr-review-toolkit/.claude-plugin/plugin.json | Version bump to 1.4.0 reflecting the new feature. |
Notes from the review: I flagged a redundant identical if/else if pair in the severity-adjustment logic, an optional operational note about the per-finding verifier re-collecting the full diff (cost scales with finding count), and a documentation inaccuracy attributing confidence filtering to the verify phase rather than the analyze phase. All are nit-level.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pr-review-toolkit/skills/review-pr/review-pr.js`:
- Around line 617-620: The condition in the falsePositiveCount block incorrectly
treats verification failures (!v) the same as explicit false positives
(v.verification === 'false_positive'), which silently drops valid findings when
verification calls fail or timeout. Separate the logic to only filter when
v.verification explicitly equals 'false_positive', and handle the !v case
(verification failure) differently by either keeping the original finding with a
fallback verification note or explicitly failing the phase. This ensures
verification failures do not silently drop important findings.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 66a6d6da-edd9-4b4a-be51-a9bd8d3283dd
📒 Files selected for processing (3)
pr-review-toolkit/.claude-plugin/plugin.jsonpr-review-toolkit/skills/review-pr/SKILL.mdpr-review-toolkit/skills/review-pr/review-pr.js
Add adversarial verification of findings between Analyze and Contextualize phases, filtering false positives and adjusting severity. Restructure presentation to show findings as plain text before selection, and add a comment drafting phase with preview/edit loop before posting. Assisted-by: Claude:claude-opus-4-6
Lower effort from max to high for specialized analyze agents (silent-failure-hunter, pr-test-analyzer, comment-analyzer, type-design-analyzer) whose structured prompts compensate for reduced reasoning. Use sonnet+high for verify agents (binary output, narrow scope). Downgrade fetch-threads to haiku+low (pure MCP tool calling). Add explicit effort:high to classify-findings. Assisted-by: Claude:claude-opus-4-6
7a16d77 to
4414a19
Compare
|
@CodeRabbit full review |
✅ Action performedFull review finished. |
|
@CodeRabbit full review |
✅ Action performedFull review finished. |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pr-review-toolkit/skills/review-pr/review-pr.js`:
- Line 556: The code-reviewer configuration on line 556 is forcing effort:
'max', which bypasses the high-effort optimization and unnecessarily increases
cost and latency. Change the effort value for the code-reviewer object from
'max' to a right-sized effort level that aligns with the optimization strategy,
removing the unnecessary forced maximum effort setting.
- Around line 591-619: The parallel() function call starting at verifyResults is
launching all verifier agents without a concurrency limit, which can overload
quotas and cause timeouts when processing large finding sets. Modify the
parallel() call that executes allFindings.map() to include a concurrency limit
parameter that caps the number of simultaneous agent verifications running at
once. Check the parallel() function's API or existing patterns in the codebase
to determine the correct parameter syntax for setting the maximum concurrent
operation limit.
In `@pr-review-toolkit/skills/review-pr/SKILL.md`:
- Around line 171-199: The output template currently hardcodes `_Verified:
{verificationRationale}_` for all findings regardless of their verification
status. When a finding has verificationStatus set to unverified, this creates
factually incorrect output. Make the verification line conditional by only
including `_Verified: {verificationRationale}_` when the finding has a verified
status. Apply this conditional logic consistently across all four sections of
the template: the unnumbered critical findings section at the top, the Important
Issues section, the Suggestions section, and the Partial Overlaps section (which
also appears at line 227-228). Ensure that unverified findings do not display
any verification statement in the output.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 489397c7-7d34-41bc-8c5c-dfa3e26b3500
📒 Files selected for processing (4)
CLAUDE.mdpr-review-toolkit/.claude-plugin/plugin.jsonpr-review-toolkit/skills/review-pr/SKILL.mdpr-review-toolkit/skills/review-pr/review-pr.js
Add semver level guidance (patch/minor/major) and once-per-branch rule to avoid redundant bumps while ensuring the bump level matches the scope of accumulated changes. Assisted-by: Claude:claude-opus-4-6
4414a19 to
df1bcb1
Compare
Condense SKILL.md by removing prose that duplicates the JSON schema, collapsing three identical severity section templates into a loop, folding status labels and duplicate handling into the template, and trimming redundant examples and guidance. In review-pr.js, extract duplicated sort logic into a sortFindings helper, remove dead string-args fallback, clean up no-op alias, and tighten the classify and fetch-threads prompts. Update README agent trigger descriptions to match SKILL.md criteria. Assisted-by: Claude:claude-opus-4-6
| verifiedFindings.push(Object.assign({}, finding, { | ||
| verificationStatus: 'verified', | ||
| verificationRationale: v.rationale | ||
| })) |
| log('Fetching existing review threads') | ||
|
|
||
| const fetchPrompt = `Fetch the authenticated user login via \`get_me\`, and all review comment threads via \`pull_request_read\` with method \`get_review_comments\` for ${config.owner}/${config.repo} PR #${config.pullNumber}. Paginate if needed to get all threads. For each thread, populate \`id\` (the thread node ID), \`author\` (the GitHub login of the first comment's author, matching the format returned by \`get_me\`), \`isResolved\`, and include the thread's replies. Return \`threads\` and \`myUsername\` as structured output.` | ||
| const fetchPrompt = `Fetch the authenticated user login via \`get_me\`, and all review comment threads via \`pull_request_read\` with method \`get_review_comments\` for ${config.owner}/${config.repo} PR #${config.pullNumber}. Paginate if needed to get all threads. For each thread, populate \`id\` (the thread node ID), \`author\` (the GitHub login of the first comment's author, matching the format returned by \`get_me\`), \`isResolved\`, and include the thread's replies.` |
| {for each finding where status = "partial_overlap"} | ||
|
|
||
| ### Important | ||
| 4. `{file}[:{line}]` -- **{title}** |
Summary
Test plan
claude plugin validate ./pr-review-toolkit— passesuvx skillsaw --strict— A+ grade, 0 errors/pr-review-toolkit:review-pragainst a real PR and confirm all 5 phases execute in orderSummary by CodeRabbit
New Features
all/none/recommended).Documentation
Chores