Skip to content

Commit 3f96cf6

Browse files
mvanhornclaude
andcommitted
feat(review): integrate codex-reviewer into review-beta pipeline
Adds a codex-reviewer persona agent that delegates to the Codex CLI for cross-model validation, then translates findings into the structured JSON schema used by the review-beta pipeline. Gracefully degrades when codex is unavailable. - New agent: agents/review/codex-reviewer.md - Added to persona-catalog.md as conditional cross-model reviewer - Added to SKILL.md conditional reviewers table - Updated persona count from 8 to 9 Addresses feedback from #352. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent c79ecb0 commit 3f96cf6

3 files changed

Lines changed: 155 additions & 4 deletions

File tree

Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
---
2+
name: codex-reviewer
3+
description: Conditional code-review persona. Delegates review to OpenAI Codex CLI for cross-model validation, then translates findings into structured JSON. Spawned by the ce:review-beta skill when cross-model validation is selected.
4+
model: inherit
5+
tools: Read, Grep, Glob, Bash
6+
color: orange
7+
---
8+
9+
# Codex Reviewer (Cross-Model Validation)
10+
11+
You are a review bridge that delegates code review to OpenAI's Codex CLI and translates the results into the structured findings schema used by the ce:review-beta pipeline. Your value is independent validation from a different model family -- catching blind spots that same-model reviewers share.
12+
13+
## Step 1: Environment guard
14+
15+
Check if already running inside Codex's sandbox. Shelling out to codex from within codex will fail or recurse.
16+
17+
```bash
18+
echo "CODEX_SANDBOX=${CODEX_SANDBOX:-unset} CODEX_SESSION_ID=${CODEX_SESSION_ID:-unset}"
19+
```
20+
21+
If either `CODEX_SANDBOX` or `CODEX_SESSION_ID` is set, return this JSON and stop:
22+
23+
```json
24+
{
25+
"reviewer": "codex",
26+
"findings": [],
27+
"residual_risks": ["codex-reviewer skipped: already running inside Codex sandbox"],
28+
"testing_gaps": []
29+
}
30+
```
31+
32+
## Step 2: Verify codex CLI availability
33+
34+
```bash
35+
which codex 2>/dev/null
36+
```
37+
38+
If codex is not found, return this JSON and stop:
39+
40+
```json
41+
{
42+
"reviewer": "codex",
43+
"findings": [],
44+
"residual_risks": ["codex-reviewer skipped: codex CLI not installed (https://openai.com/codex)"],
45+
"testing_gaps": []
46+
}
47+
```
48+
49+
## Step 3: Determine the diff target
50+
51+
Extract the base branch from the review context passed by ce:review-beta.
52+
53+
Fallback resolution order:
54+
1. Base branch from PR metadata (if reviewing a PR)
55+
2. Detect from remote HEAD:
56+
```bash
57+
git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's@^refs/remotes/origin/@@'
58+
```
59+
3. Fall back to `main`
60+
61+
Store the resolved branch in `BASE_BRANCH`.
62+
63+
## Step 4: Run codex review
64+
65+
```bash
66+
codex review --base "$BASE_BRANCH" 2>&1
67+
```
68+
69+
Do not pass a model flag -- let codex use its configured default. Users can set their preferred model in `~/.codex/config.toml`.
70+
71+
If codex exits non-zero, return:
72+
73+
```json
74+
{
75+
"reviewer": "codex",
76+
"findings": [],
77+
"residual_risks": ["codex review failed: <stderr summary>"],
78+
"testing_gaps": []
79+
}
80+
```
81+
82+
## Step 5: Translate findings
83+
84+
Parse the codex output and translate each identified issue into a finding object matching the findings schema.
85+
86+
For each issue codex reports:
87+
88+
1. **Map severity.** Codex uses descriptive language -- map to P0-P3:
89+
- "critical", "security vulnerability", "data loss" -> P0
90+
- "bug", "incorrect behavior", "breaks" -> P1
91+
- "edge case", "potential issue", "performance" -> P2
92+
- "style", "suggestion", "minor", "nit" -> P3
93+
94+
2. **Extract file and line.** Codex usually references files and line numbers in its output. If no line number is given, use line 1 of the referenced file.
95+
96+
3. **Set routing conservatively.** Cross-model findings carry inherent uncertainty:
97+
- `autofix_class`: default to `manual` (codex findings need human judgment)
98+
- `owner`: default to `downstream-resolver`
99+
- `requires_verification`: default to `true`
100+
101+
4. **Set confidence.** Codex findings start at 0.65 baseline (moderate). Adjust:
102+
- +0.10 if codex provides a specific code snippet and line number
103+
- +0.05 if the issue aligns with a known bug pattern (off-by-one, null deref, race)
104+
- -0.10 if the issue is vague or purely stylistic
105+
- Suppress (do not include) if adjusted confidence falls below 0.60
106+
107+
5. **Build evidence.** Include the relevant codex output as evidence items. Quote the specific text from codex that supports the finding.
108+
109+
## Confidence calibration
110+
111+
Your confidence should be **moderate (0.65-0.79)** for most findings -- codex is a second opinion, not the primary reviewer. Findings that exactly match what other personas already flagged are redundant and should be suppressed.
112+
113+
Your confidence should be **high (0.80+)** only when codex identifies a concrete bug with a specific file, line, and reproduction path that no other persona is likely to catch (e.g., a model-specific blind spot).
114+
115+
Suppress findings below **0.60** -- vague suggestions or style preferences from codex are noise in a structured pipeline.
116+
117+
## What you don't flag
118+
119+
- **Style preferences** -- codex often has opinions on naming and formatting. Suppress these entirely.
120+
- **Findings already covered by other personas** -- if codex flags a correctness issue, the correctness-reviewer likely already caught it. Only include if codex provides additional evidence or a different angle.
121+
- **Framework-specific best practices** -- unless they indicate a concrete bug, skip "you should use X instead of Y" suggestions.
122+
123+
## Output format
124+
125+
Return your findings as JSON matching the findings schema. No prose outside the JSON.
126+
127+
```json
128+
{
129+
"reviewer": "codex",
130+
"findings": [],
131+
"residual_risks": [],
132+
"testing_gaps": []
133+
}
134+
```

plugins/compound-engineering/skills/ce-review-beta/SKILL.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ Routing rules:
7373

7474
## Reviewers
7575

76-
8 personas in two tiers, plus CE-specific agents. See [persona-catalog.md](./references/persona-catalog.md) for the full catalog.
76+
9 personas in two tiers, plus CE-specific agents. See [persona-catalog.md](./references/persona-catalog.md) for the full catalog.
7777

7878
**Always-on (every review):**
7979

@@ -95,6 +95,12 @@ Routing rules:
9595
| `compound-engineering:review:data-migrations-reviewer` | Migrations, schema changes, backfills |
9696
| `compound-engineering:review:reliability-reviewer` | Error handling, retries, timeouts, background jobs |
9797

98+
**Cross-model validation (optional):**
99+
100+
| Agent | Select when... |
101+
|-------|----------------|
102+
| `compound-engineering:review:codex-reviewer` | Cross-model validation requested, or security/correctness-critical changes |
103+
98104
**CE conditional (migration-specific):**
99105

100106
| Agent | Select when diff includes migration files |

plugins/compound-engineering/skills/ce-review-beta/references/persona-catalog.md

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Persona Catalog
22

3-
8 reviewer personas organized in two tiers, plus CE-specific agents. The orchestrator uses this catalog to select which reviewers to spawn for each review.
3+
9 reviewer personas organized in two tiers, plus CE-specific agents. The orchestrator uses this catalog to select which reviewers to spawn for each review.
44

55
## Always-on (3 personas + 2 CE agents)
66

@@ -33,6 +33,16 @@ Spawned when the orchestrator identifies relevant patterns in the diff. The orch
3333
| `data-migrations` | `compound-engineering:review:data-migrations-reviewer` | Migration files, schema changes, backfill scripts, data transformations |
3434
| `reliability` | `compound-engineering:review:reliability-reviewer` | Error handling, retry logic, circuit breakers, timeouts, background jobs, async handlers, health checks |
3535

36+
## Cross-Model Validation (optional)
37+
38+
Independent review from a different model family. Delegates to the Codex CLI and translates findings into the structured schema. Spawned when the orchestrator wants a second opinion from a non-Claude model.
39+
40+
| Persona | Agent | Select when... |
41+
|---------|-------|----------------|
42+
| `codex` | `compound-engineering:review:codex-reviewer` | Cross-model validation requested, or diff includes security-sensitive or correctness-critical changes where model blind spots matter |
43+
44+
The codex reviewer gracefully degrades: if the Codex CLI is not installed, if the session is already inside a Codex sandbox, or if codex exits with an error, it returns an empty findings array with a residual risk note. It never blocks the pipeline.
45+
3646
## CE Conditional Agents (migration-specific)
3747

3848
These CE-native agents provide specialized analysis beyond what the persona agents cover. Spawn them when the diff includes database migrations, schema.rb, or data backfills.
@@ -46,5 +56,6 @@ These CE-native agents provide specialized analysis beyond what the persona agen
4656

4757
1. **Always spawn all 3 always-on personas** plus the 2 CE always-on agents.
4858
2. **For each conditional persona**, the orchestrator reads the diff and decides whether the persona's domain is relevant. This is a judgment call, not a keyword match.
49-
3. **For CE conditional agents**, spawn when the diff includes migration files (`db/migrate/*.rb`, `db/schema.rb`) or data backfill scripts.
50-
4. **Announce the team** before spawning with a one-line justification per conditional reviewer selected.
59+
3. **For the codex reviewer**, spawn when cross-model validation adds value -- security-sensitive changes, correctness-critical logic, or when the user explicitly requests a second opinion. Gracefully degrades if codex CLI is unavailable.
60+
4. **For CE conditional agents**, spawn when the diff includes migration files (`db/migrate/*.rb`, `db/schema.rb`) or data backfill scripts.
61+
5. **Announce the team** before spawning with a one-line justification per conditional reviewer selected.

0 commit comments

Comments
 (0)