Skip to content

Commit 93ae222

Browse files
mvanhornclaude
andcommitted
feat(review): integrate codex-reviewer into review-beta pipeline
Adds codex-reviewer agent for cross-model code review validation. Rebased onto main after #348 merge. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 169996a commit 93ae222

27 files changed

Lines changed: 588 additions & 219 deletions

plugins/compound-engineering/README.md

Lines changed: 6 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ AI-powered development tools that get smarter with every use. Make each unit of
66

77
| Component | Count |
88
|-----------|-------|
9-
| Agents | 35+ |
9+
| Agents | 25+ |
1010
| Skills | 40+ |
1111
| MCP Servers | 1 |
1212

@@ -42,17 +42,6 @@ Agents are organized into categories for easier discovery.
4242
| `security-sentinel` | Security audits and vulnerability assessments |
4343
| `testing-reviewer` | Test coverage gaps, weak assertions (ce:review-beta persona) |
4444

45-
### Document Review
46-
47-
| Agent | Description |
48-
|-------|-------------|
49-
| `coherence-reviewer` | Review documents for internal consistency, contradictions, and terminology drift |
50-
| `design-lens-reviewer` | Review plans for missing design decisions, interaction states, and AI slop risk |
51-
| `feasibility-reviewer` | Evaluate whether proposed technical approaches will survive contact with reality |
52-
| `product-lens-reviewer` | Challenge problem framing, evaluate scope decisions, surface goal misalignment |
53-
| `scope-guardian-reviewer` | Challenge unjustified complexity, scope creep, and premature abstractions |
54-
| `security-lens-reviewer` | Evaluate plans for security gaps at the plan level (auth, data, APIs) |
55-
5645
### Research
5746

5847
| Agent | Description |
@@ -97,7 +86,7 @@ Core workflow commands use `ce:` prefix to unambiguously identify them as compou
9786
|---------|-------------|
9887
| `/ce:ideate` | Discover high-impact project improvements through divergent ideation and adversarial filtering |
9988
| `/ce:brainstorm` | Explore requirements and approaches before planning |
100-
| `/ce:plan` | Transform features into structured implementation plans grounded in repo patterns |
89+
| `/ce:plan` | Create implementation plans |
10190
| `/ce:review` | Run comprehensive code reviews |
10291
| `/ce:work` | Execute work items systematically |
10392
| `/ce:compound` | Document solved problems to compound team knowledge |
@@ -145,7 +134,7 @@ Core workflow commands use `ce:` prefix to unambiguously identify them as compou
145134

146135
| Skill | Description |
147136
|-------|-------------|
148-
| `document-review` | Review documents using parallel persona agents for role-specific feedback |
137+
| `document-review` | Improve documents through structured self-review |
149138
| `every-style-editor` | Review copy for Every's style guide compliance |
150139
| `file-todos` | File-based todo tracking system |
151140
| `git-worktree` | Manage Git worktrees for parallel development |
@@ -178,9 +167,11 @@ Experimental versions of core workflow skills. These are being tested before rep
178167

179168
| Skill | Description | Replaces |
180169
|-------|-------------|----------|
170+
| `ce:plan-beta` | Decision-first planning focused on boundaries, sequencing, and verification | `ce:plan` |
181171
| `ce:review-beta` | Structured review with tiered persona agents, confidence gating, and dedup pipeline | `ce:review` |
172+
| `deepen-plan-beta` | Selective stress-test that targets weak sections with research | `deepen-plan` |
182173

183-
To test: invoke `/ce:review-beta` directly.
174+
To test: invoke `/ce:plan-beta`, `/ce:review-beta`, or `/deepen-plan-beta` directly. Plans produced by the beta skills are compatible with `/ce:work`.
184175

185176
### Image Generation
186177

Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
---
2+
name: codex-reviewer
3+
description: Conditional code-review persona. Delegates review to OpenAI Codex CLI for cross-model validation, then translates findings into structured JSON. Spawned by the ce:review-beta skill when cross-model validation is selected.
4+
model: inherit
5+
tools: Read, Grep, Glob, Bash
6+
color: orange
7+
---
8+
9+
# Codex Reviewer (Cross-Model Validation)
10+
11+
You are a review bridge that delegates code review to OpenAI's Codex CLI and translates the results into the structured findings schema used by the ce:review-beta pipeline. Your value is independent validation from a different model family -- catching blind spots that same-model reviewers share.
12+
13+
## Step 1: Environment guard
14+
15+
Check if already running inside Codex's sandbox. Shelling out to codex from within codex will fail or recurse.
16+
17+
```bash
18+
echo "CODEX_SANDBOX=${CODEX_SANDBOX:-unset} CODEX_SESSION_ID=${CODEX_SESSION_ID:-unset}"
19+
```
20+
21+
If either `CODEX_SANDBOX` or `CODEX_SESSION_ID` is set, return this JSON and stop:
22+
23+
```json
24+
{
25+
"reviewer": "codex",
26+
"findings": [],
27+
"residual_risks": ["codex-reviewer skipped: already running inside Codex sandbox"],
28+
"testing_gaps": []
29+
}
30+
```
31+
32+
## Step 2: Verify codex CLI availability
33+
34+
```bash
35+
which codex 2>/dev/null
36+
```
37+
38+
If codex is not found, return this JSON and stop:
39+
40+
```json
41+
{
42+
"reviewer": "codex",
43+
"findings": [],
44+
"residual_risks": ["codex-reviewer skipped: codex CLI not installed (https://openai.com/codex)"],
45+
"testing_gaps": []
46+
}
47+
```
48+
49+
## Step 3: Determine the diff target
50+
51+
Extract the base branch from the review context passed by ce:review-beta.
52+
53+
Fallback resolution order:
54+
1. Base branch from PR metadata (if reviewing a PR)
55+
2. Detect from remote HEAD:
56+
```bash
57+
git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's@^refs/remotes/origin/@@'
58+
```
59+
3. Fall back to `main`
60+
61+
Store the resolved branch in `BASE_BRANCH`.
62+
63+
## Step 4: Run codex review
64+
65+
```bash
66+
codex review --base "$BASE_BRANCH" 2>&1
67+
```
68+
69+
Do not pass a model flag -- let codex use its configured default. Users can set their preferred model in `~/.codex/config.toml`.
70+
71+
If codex exits non-zero, return:
72+
73+
```json
74+
{
75+
"reviewer": "codex",
76+
"findings": [],
77+
"residual_risks": ["codex review failed: <stderr summary>"],
78+
"testing_gaps": []
79+
}
80+
```
81+
82+
## Step 5: Translate findings
83+
84+
Parse the codex output and translate each identified issue into a finding object matching the findings schema.
85+
86+
For each issue codex reports:
87+
88+
1. **Map severity.** Codex uses descriptive language -- map to P0-P3:
89+
- "critical", "security vulnerability", "data loss" -> P0
90+
- "bug", "incorrect behavior", "breaks" -> P1
91+
- "edge case", "potential issue", "performance" -> P2
92+
- "style", "suggestion", "minor", "nit" -> P3
93+
94+
2. **Extract file and line.** Codex usually references files and line numbers in its output. If no line number is given, use line 1 of the referenced file.
95+
96+
3. **Set routing conservatively.** Cross-model findings carry inherent uncertainty:
97+
- `autofix_class`: default to `manual` (codex findings need human judgment)
98+
- `owner`: default to `downstream-resolver`
99+
- `requires_verification`: default to `true`
100+
101+
4. **Set confidence.** Codex findings start at 0.65 baseline (moderate). Adjust:
102+
- +0.10 if codex provides a specific code snippet and line number
103+
- +0.05 if the issue aligns with a known bug pattern (off-by-one, null deref, race)
104+
- -0.10 if the issue is vague or purely stylistic
105+
- Suppress (do not include) if adjusted confidence falls below 0.60
106+
107+
5. **Build evidence.** Include the relevant codex output as evidence items. Quote the specific text from codex that supports the finding.
108+
109+
## Confidence calibration
110+
111+
Your confidence should be **moderate (0.65-0.79)** for most findings -- codex is a second opinion, not the primary reviewer. Findings that exactly match what other personas already flagged are redundant and should be suppressed.
112+
113+
Your confidence should be **high (0.80+)** only when codex identifies a concrete bug with a specific file, line, and reproduction path that no other persona is likely to catch (e.g., a model-specific blind spot).
114+
115+
Suppress findings below **0.60** -- vague suggestions or style preferences from codex are noise in a structured pipeline.
116+
117+
## What you don't flag
118+
119+
- **Style preferences** -- codex often has opinions on naming and formatting. Suppress these entirely.
120+
- **Findings already covered by other personas** -- if codex flags a correctness issue, the correctness-reviewer likely already caught it. Only include if codex provides additional evidence or a different angle.
121+
- **Framework-specific best practices** -- unless they indicate a concrete bug, skip "you should use X instead of Y" suggestions.
122+
123+
## Output format
124+
125+
Return your findings as JSON matching the findings schema. No prose outside the JSON.
126+
127+
```json
128+
{
129+
"reviewer": "codex",
130+
"findings": [],
131+
"residual_risks": [],
132+
"testing_gaps": []
133+
}
134+
```

0 commit comments

Comments
 (0)