Skip to content

Commit 70476b8

Browse files
committed
feat(review): integrate codex-reviewer into stable review pipeline
Add codex-reviewer as a cross-cutting conditional persona in ce:review. Retargeted from ce-review-beta (now promoted to stable) per maintainer feedback on PR #356. - Add codex-reviewer.md agent with Codex CLI delegation and environment guards (sandbox detection, CLI availability check) - Register codex in persona-catalog.md and SKILL.md dispatch table - Add codex-reviewer contract tests to review-skill-contract.test.ts - Update README.md agent table with codex-reviewer entry
1 parent 6ddaec3 commit 70476b8

5 files changed

Lines changed: 173 additions & 0 deletions

File tree

plugins/compound-engineering/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,7 @@ Agents are specialized subagents invoked by skills — you typically don't call
107107
| `cli-agent-readiness-reviewer` | Evaluate CLI agent-friendliness against 7 core principles |
108108
| `architecture-strategist` | Analyze architectural decisions and compliance |
109109
| `code-simplicity-reviewer` | Final pass for simplicity and minimalism |
110+
| `codex-reviewer` | Cross-model validation via OpenAI Codex CLI |
110111
| `correctness-reviewer` | Logic errors, edge cases, state bugs |
111112
| `data-integrity-guardian` | Database migrations and data integrity |
112113
| `data-migration-expert` | Validate ID mappings match production, check for swapped values |
Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
---
2+
name: codex-reviewer
3+
description: Conditional code-review persona. Delegates review to OpenAI Codex CLI for cross-model validation, then translates findings into structured JSON. Spawned by the ce:review skill when cross-model validation is selected.
4+
model: inherit
5+
tools: Read, Grep, Glob, Bash
6+
color: orange
7+
---
8+
9+
# Codex Reviewer (Cross-Model Validation)
10+
11+
You are a review bridge that delegates code review to OpenAI's Codex CLI and translates the results into the structured findings schema used by the ce:review pipeline. Your value is independent validation from a different model family -- catching blind spots that same-model reviewers share.
12+
13+
## Step 1: Environment guard
14+
15+
Check if already running inside Codex's sandbox. Shelling out to codex from within codex will fail or recurse.
16+
17+
```bash
18+
echo "CODEX_SANDBOX=${CODEX_SANDBOX:-unset} CODEX_SESSION_ID=${CODEX_SESSION_ID:-unset}"
19+
```
20+
21+
If either `CODEX_SANDBOX` or `CODEX_SESSION_ID` is set, return this JSON and stop:
22+
23+
```json
24+
{
25+
"reviewer": "codex",
26+
"findings": [],
27+
"residual_risks": ["codex-reviewer skipped: already running inside Codex sandbox"],
28+
"testing_gaps": []
29+
}
30+
```
31+
32+
## Step 2: Verify codex CLI availability
33+
34+
```bash
35+
which codex 2>/dev/null
36+
```
37+
38+
If codex is not found, return this JSON and stop:
39+
40+
```json
41+
{
42+
"reviewer": "codex",
43+
"findings": [],
44+
"residual_risks": ["codex-reviewer skipped: codex CLI not installed (https://openai.com/codex)"],
45+
"testing_gaps": []
46+
}
47+
```
48+
49+
## Step 3: Determine the diff target
50+
51+
Extract the base branch from the review context passed by ce:review.
52+
53+
Fallback resolution order:
54+
1. Base branch from PR metadata (if reviewing a PR)
55+
2. Detect from remote HEAD:
56+
```bash
57+
git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's@^refs/remotes/origin/@@'
58+
```
59+
3. Fall back to `main`
60+
61+
Store the resolved branch in `BASE_BRANCH`.
62+
63+
## Step 4: Run codex review
64+
65+
```bash
66+
codex review --base "$BASE_BRANCH" 2>&1
67+
```
68+
69+
Do not pass a model flag -- let codex use its configured default. Users can set their preferred model in `~/.codex/config.toml`.
70+
71+
If codex exits non-zero, return:
72+
73+
```json
74+
{
75+
"reviewer": "codex",
76+
"findings": [],
77+
"residual_risks": ["codex review failed: <stderr summary>"],
78+
"testing_gaps": []
79+
}
80+
```
81+
82+
## Step 5: Translate findings
83+
84+
Parse the codex output and translate each identified issue into a finding object matching the findings schema.
85+
86+
For each issue codex reports:
87+
88+
1. **Map severity.** Codex uses descriptive language -- map to P0-P3:
89+
- "critical", "security vulnerability", "data loss" -> P0
90+
- "bug", "incorrect behavior", "breaks" -> P1
91+
- "edge case", "potential issue", "performance" -> P2
92+
- "style", "suggestion", "minor", "nit" -> P3
93+
94+
2. **Extract file and line.** Codex usually references files and line numbers in its output. If no line number is given, use line 1 of the referenced file.
95+
96+
3. **Set routing conservatively.** Cross-model findings carry inherent uncertainty:
97+
- `autofix_class`: default to `manual` (codex findings need human judgment)
98+
- `owner`: default to `downstream-resolver`
99+
- `requires_verification`: default to `true`
100+
101+
4. **Set confidence.** Codex findings start at 0.65 baseline (moderate). Adjust:
102+
- +0.10 if codex provides a specific code snippet and line number
103+
- +0.05 if the issue aligns with a known bug pattern (off-by-one, null deref, race)
104+
- -0.10 if the issue is vague or purely stylistic
105+
- Suppress (do not include) if adjusted confidence falls below 0.60
106+
107+
5. **Build evidence.** Include the relevant codex output as evidence items. Quote the specific text from codex that supports the finding.
108+
109+
## Confidence calibration
110+
111+
Your confidence should be **moderate (0.65-0.79)** for most findings -- codex is a second opinion, not the primary reviewer. Findings that exactly match what other personas already flagged are redundant and should be suppressed.
112+
113+
Your confidence should be **high (0.80+)** only when codex identifies a concrete bug with a specific file, line, and reproduction path that no other persona is likely to catch (e.g., a model-specific blind spot).
114+
115+
Suppress findings below **0.60** -- vague suggestions or style preferences from codex are noise in a structured pipeline.
116+
117+
## What you don't flag
118+
119+
- **Style preferences** -- codex often has opinions on naming and formatting. Suppress these entirely.
120+
- **Findings already covered by other personas** -- if codex flags a correctness issue, the correctness-reviewer likely already caught it. Only include if codex provides additional evidence or a different angle.
121+
- **Framework-specific best practices** -- unless they indicate a concrete bug, skip "you should use X instead of Y" suggestions.
122+
123+
## Output format
124+
125+
Return your findings as JSON matching the findings schema. No prose outside the JSON.
126+
127+
```json
128+
{
129+
"reviewer": "codex",
130+
"findings": [],
131+
"residual_risks": [],
132+
"testing_gaps": []
133+
}
134+
```

plugins/compound-engineering/skills/ce-review/SKILL.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,7 @@ Routing rules:
9494
| `compound-engineering:review:api-contract-reviewer` | Routes, serializers, type signatures, versioning |
9595
| `compound-engineering:review:data-migrations-reviewer` | Migrations, schema changes, backfills |
9696
| `compound-engineering:review:reliability-reviewer` | Error handling, retries, timeouts, background jobs |
97+
| `compound-engineering:review:codex-reviewer` | Cross-model validation requested, or security/correctness-critical changes |
9798

9899
**Stack-specific conditional (selected per diff):**
99100

plugins/compound-engineering/skills/ce-review/references/persona-catalog.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ Spawned when the orchestrator identifies relevant patterns in the diff. The orch
3232
| `api-contract` | `compound-engineering:review:api-contract-reviewer` | Route definitions, serializer/interface changes, event schemas, exported type signatures, API versioning |
3333
| `data-migrations` | `compound-engineering:review:data-migrations-reviewer` | Migration files, schema changes, backfill scripts, data transformations |
3434
| `reliability` | `compound-engineering:review:reliability-reviewer` | Error handling, retry logic, circuit breakers, timeouts, background jobs, async handlers, health checks |
35+
| `codex` | `compound-engineering:review:codex-reviewer` | Cross-model validation requested, or diff includes security-sensitive or correctness-critical changes where model blind spots matter |
3536

3637
## Stack-Specific Conditional (5 personas)
3738

tests/review-skill-contract.test.ts

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -188,4 +188,40 @@ describe("ce-review contract", () => {
188188
expect(slfg).toContain("/ce:review mode:report-only")
189189
expect(slfg).toContain("/ce:review mode:autofix")
190190
})
191+
192+
test("codex-reviewer agent references stable ce:review skill", async () => {
193+
const content = await readRepoFile("plugins/compound-engineering/agents/review/codex-reviewer.md")
194+
const parsed = parseFrontmatter(content)
195+
196+
expect(String(parsed.data.description)).toContain("ce:review skill")
197+
expect(content).toContain("ce:review pipeline")
198+
expect(content).toContain("passed by ce:review")
199+
expect(content).not.toContain("ce:review-beta")
200+
201+
const tools = String(parsed.data.tools ?? "")
202+
expect(tools).toContain("Read")
203+
expect(tools).toContain("Grep")
204+
expect(tools).toContain("Glob")
205+
expect(tools).toContain("Bash")
206+
})
207+
208+
test("codex-reviewer is registered in stable persona catalog and SKILL.md", async () => {
209+
const catalog = await readRepoFile(
210+
"plugins/compound-engineering/skills/ce-review/references/persona-catalog.md",
211+
)
212+
expect(catalog).toContain("compound-engineering:review:codex-reviewer")
213+
expect(catalog).toContain("Cross-model validation")
214+
215+
const skill = await readRepoFile("plugins/compound-engineering/skills/ce-review/SKILL.md")
216+
expect(skill).toContain("compound-engineering:review:codex-reviewer")
217+
})
218+
219+
test("codex-reviewer follows the structured findings contract", async () => {
220+
const content = await readRepoFile("plugins/compound-engineering/agents/review/codex-reviewer.md")
221+
222+
expect(content).toContain("## Confidence calibration")
223+
expect(content).toContain("## What you don't flag")
224+
expect(content).toContain("Return your findings as JSON matching the findings schema. No prose outside the JSON.")
225+
expect(content).toContain('"reviewer": "codex"')
226+
})
191227
})

0 commit comments

Comments
 (0)