Skip to content

Commit f901632

Browse files
MrFlounderclaude
andauthored
feat(court): enhance reviewer and judge prompts to catch silent data corruption (#31)
## Summary - Expands both Reviewer A (Claude) and Reviewer B (Codex) prompts with 4 high-risk pattern checks derived from the PR #3232 incident - Adds a "Judge's Own Investigation" step to Phase 3 so the judge proactively checks for these patterns even if neither reviewer flags them - The 4 patterns: called-but-not-changed code contracts, mutations on read paths, blast radius from null/default initial state, data contract completeness at merge boundaries ## Context PR #3232 (`refreshMetricsIfStale()`) caused mass data corruption because: 1. The bug lived in `MetricsCalculationService.calculate()` which was **called but not changed** in the diff 2. A GET endpoint silently mutated data via fire-and-forget 3. All existing evals had `null` metricsRefreshedAt, so every record got hit on first view 4. Object spread merged incomplete data over existing metrics, zeroing out fields None of these were visible in the diff alone. The current reviewer prompts only say "review for bugs, security issues, and code quality" — too generic to catch this class of issue. ## Test plan - [ ] Run `crab court` on a test PR and verify both reviewers mention the new checks in their findings - [ ] Verify the judge's Phase 3 output includes the "Judge's Own Investigation" section - [ ] Confirm no regression in general review quality (reviewers still catch standard bugs) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent b303372 commit f901632

1 file changed

Lines changed: 28 additions & 2 deletions

File tree

src/crabcode

Lines changed: 28 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8107,14 +8107,34 @@ Spawn two reviewer agents to analyze the PR independently:
81078107
```
81088108
Use Task tool:
81098109
subagent_type: "general-purpose"
8110-
prompt: "You are Reviewer A. Review this PR for bugs, security issues, and code quality. Be thorough but avoid false positives. Structure findings as Critical/Warning/Suggestion with file:line references. Here is the context: [include PR diff]"
8110+
prompt: "You are Reviewer A. Review this PR for bugs, security issues, and code quality. Be thorough but avoid false positives. Structure findings as Critical/Warning/Suggestion with file:line references.
8111+
8112+
Beyond standard review, pay special attention to these high-risk patterns:
8113+
8114+
1. **Called-but-not-changed code**: When the diff introduces a new call to existing code (especially data mutation), use the Read tool to trace INTO the called function. Verify its return value and side effects match what the caller assumes. Do not trust functions you haven't read.
8115+
8116+
2. **Mutations on read paths**: Flag any write operation (DB update, cache mutation, queue publish) triggered by a GET/read endpoint. Fire-and-forget patterns that silently mutate data are high risk — the failure mode is silent data corruption with no error signal.
8117+
8118+
3. **Blast radius from initial state**: When new code processes existing records conditionally (e.g. 'refresh if stale'), check the initial state of existing data. If a column starts as null/default for all existing rows, the new path hits EVERY record on first access — that is a mass data rewrite disguised as normal traffic.
8119+
8120+
4. **Data contract completeness at merge boundaries**: When code spreads/merges one data source onto another (e.g. {...existing, ...fresh}), verify the fresh source returns ALL expected fields. Missing fields silently zero out or null existing data.
8121+
8122+
Here is the context: [include PR diff]"
81118123
```
81128124
81138125
**Reviewer B (Codex):**
81148126
```
81158127
Use Task tool:
81168128
subagent_type: "Bash"
8117-
prompt: "Run: codex --print -p 'You are Reviewer B. Review this code for bugs, security issues, and quality problems. Structure as Critical/Warning/Suggestion with file:line. Diff: [include relevant sections]'"
8129+
prompt: "Run: codex --print -p 'You are Reviewer B. Review this code for bugs, security issues, and quality problems. Structure as Critical/Warning/Suggestion with file:line.
8130+
8131+
Beyond standard review, check for these high-risk patterns:
8132+
1. Called-but-not-changed code: If the diff calls existing functions (especially data mutation), verify those functions return/do what the caller assumes.
8133+
2. Mutations on read paths: Flag any DB write or cache mutation triggered by a GET/read endpoint. Fire-and-forget data rewrites are high risk.
8134+
3. Blast radius from initial state: If new code processes records conditionally (refresh if stale), check if null/default initial state means ALL existing records get hit at once.
8135+
4. Data contract completeness: When code spreads/merges data sources ({...existing, ...fresh}), verify the fresh source returns ALL expected fields. Missing fields silently zero out existing data.
8136+
8137+
Diff: [include relevant sections]'"
81188138
```
81198139
81208140
Wait for BOTH reviewers to complete before proceeding.
@@ -8135,6 +8155,12 @@ For EACH finding from either reviewer:
81358155
3. **Check context**: Does surrounding code explain/mitigate it?
81368156
4. **When reviewers disagree**: Investigate deeper, examine edge cases
81378157
8158+
**Judge's Own Investigation** (do this even if neither reviewer flagged it):
8159+
- For any new call path to existing code that mutates data, Read the called function and verify its contract
8160+
- If you see a write/update triggered by a read endpoint, flag it
8161+
- If you see conditional processing of existing records (e.g. "refresh if stale"), check the initial state — does null/default mean all records get hit at once?
8162+
- If you see object spread/merge of data sources, verify field completeness of the source
8163+
81388164
**Critical Rule**: Do NOT include any finding you haven't personally verified in the code.
81398165
81408166
### Phase 4: Deliberate

0 commit comments

Comments
 (0)