Skip to content

Commit f883bc6

Browse files
xuiocodex
andcommitted
Clarify native Codex routing guidance
Co-Authored-By: OpenAI Codex <noreply@openai.com>
1 parent 92297ca commit f883bc6

5 files changed

Lines changed: 149 additions & 37 deletions

File tree

README.md

Lines changed: 39 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,19 @@
66

77
Run OpenAI Codex agents from Claude Code through a daemonless MCP plugin.
88

9-
`claude-code-codex-subagents` lets Claude Code ask Codex for read-only reviews,
10-
parallel investigations, Spark checks, native follow-ups, live steering, and
11-
explicit full-access Codex work when the user asks for it.
9+
`claude-code-codex-subagents` lets Claude Code ask Codex, an independent
10+
frontier model like Claude, for read-only reviews, parallel investigations, Spark
11+
checks, native follow-ups, live steering, and explicit full-access Codex work
12+
when the user asks for it. It is especially useful when Claude needs a more
13+
technical subagent for deep codebase work, server/deployment tasks, difficult
14+
debugging, or adversarial review.
1215

1316
![Terminal demo of Claude launching parallel Codex reviewers](docs/assets/demo.svg)
1417

1518
## Why Use It?
1619

1720
- **Native Claude Code workflow:** Claude gets a small Task-like MCP surface: `codex_task`, `codex_task_group`, `codex_followup`, and `codex_wait_any`.
21+
- **Frontier-model second opinion:** Codex gives Claude an independent technical reviewer for adversarial checks and hard engineering work.
1822
- **Read-only by default:** Codex starts with `--sandbox read-only` and non-interactive approvals.
1923
- **No daemon:** Claude launches the MCP server over stdio for the active session.
2024
- **Fast parallel review:** Claude can launch several independent Codex agents with bounded concurrency.
@@ -30,17 +34,38 @@ Requirements:
3034
- Claude Code
3135
- Codex CLI, preferably the Codex desktop app
3236

37+
Install the plugin once:
38+
3339
```sh
3440
git clone https://github.com/xuio/claude-code-codex-subagents.git
3541
cd claude-code-codex-subagents
3642
npm run install:local
37-
claude --plugin-dir .
3843
```
3944

40-
Then ask Claude something like:
45+
Then, in any project where Claude Code should use Codex subagents, add the MCP
46+
server:
47+
48+
```sh
49+
mkdir -p .claude
50+
cat > .claude/mcp.json <<'JSON'
51+
{
52+
"mcpServers": {
53+
"codex-subagents": { "command": "codex-subagents-mcp" }
54+
}
55+
}
56+
JSON
57+
```
58+
59+
Then ask Claude:
4160

4261
```text
43-
Use Codex to review this repository read-only. Focus on reliability risks and missing tests.
62+
Use Codex to review this PR for correctness, security, and deployment risks.
63+
```
64+
65+
For plugin development, run Claude directly against this checkout:
66+
67+
```sh
68+
claude --plugin-dir .
4469
```
4570

4671
For local development against Claude's installed plugin cache:
@@ -59,15 +84,15 @@ after `dist/index.js` is rebuilt.
5984
### Ask One Codex Agent
6085

6186
```text
62-
Ask Codex for a second opinion on the session recovery code. Keep it read-only and return concrete findings with file paths.
87+
Ask Codex for an adversarial second opinion on the session recovery code. Keep it read-only and return concrete findings with file paths.
6388
```
6489

6590
Claude should use `codex_task`.
6691

6792
### Run Parallel Codex Agents
6893

6994
```text
70-
Launch three Codex subagents in parallel: one for API behavior, one for tests, and one for security. Keep all of them read-only.
95+
Launch three Codex subagents in parallel: one for API behavior, one for tests, and one adversarial security reviewer. Keep all of them read-only.
7196
```
7297

7398
Claude can call `codex_task` three times in parallel, or use `codex_task_group`
@@ -146,6 +171,12 @@ writes and DNS/network remain disabled unless `full_access: true` is set.
146171
| Session progress | MCP resource `codex://sessions/{session_id}` |
147172
| Diagnostics | MCP resources `codex://status`, `codex://doctor`, `codex://usage` |
148173

174+
Prefer Codex over native Task when the user wants an independent frontier-model
175+
second opinion, deep technical review, complex codebase exploration, server or
176+
deployment investigation, or adversarial validation. Prefer native Task when the
177+
work depends heavily on Claude's conversation history or Claude-only built-in
178+
tools.
179+
149180
Legacy tools such as `ask_codex`, `run_agent`, `run_agents`, `start_session`, and
150181
`send_session_prompt` are hidden by default. Set
151182
`CODEX_SUBAGENTS_ENABLE_LEGACY_TOOLS=1` only for older clients that still call the

dist/index.js

Lines changed: 30 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -26350,18 +26350,28 @@ var sessionManager = new CodexSessionManager({ persist: true });
2635026350
var usageGuide = [
2635126351
"Claude Code integration guide for codex-subagents:",
2635226352
"",
26353-
"Use Codex subagents like Claude's native Task tool when the user needs an independent OpenAI Codex worker. Use this MCP server whenever the user asks Claude to use Codex, OpenAI Codex, Codex subagents, Codex Spark, a Codex second opinion, parallel Codex review, or independent Codex codebase analysis. You do not need the user to name an MCP tool.",
26353+
"Use Codex subagents like Claude's native Task tool when the user needs an independent OpenAI Codex worker. Codex is a frontier model, like Claude, and is especially useful as a more technical, reliability-focused subagent for deep codebase work, server/deployment tasks, complex debugging, and adversarial review. Use this MCP server whenever the user asks Claude to use Codex, OpenAI Codex, Codex subagents, Codex Spark, a Codex second opinion, parallel Codex review, independent Codex codebase analysis, deep technical review, or adversarial validation. You do not need the user to name an MCP tool.",
2635426354
"",
2635526355
"Tool choice:",
26356-
"- Use codex_task for one delegated Codex task. It is the native Claude-like front door: description plus prompt, read-only by default, and answer-first result. Call codex_task multiple times in one assistant turn when independent investigations can run in parallel.",
26357-
"- Use codex_task_group when the work can be split into independent concurrent tasks and Claude wants one combined response with rolled-up per-task findings.",
26358-
"- Use codex_followup when Claude already has a session_id from codex_task or codex_task_group and wants to continue, steer, wait on, or cancel that same Codex context.",
26356+
"- Use codex_task when you want one independent Codex subagent: a frontier-model second opinion, deep technical implementation/review, complex codebase exploration, server/deployment work, or adversarial analysis. It is the Codex equivalent of native Task: description plus prompt, read-only by default, and answer-first result.",
26357+
"- Use multiple codex_task calls in parallel when investigations are independent and Claude can synthesize the answers itself.",
26358+
"- Use codex_task_group when the work can be split into independent concurrent Codex tasks and Claude wants one combined response with rolled-up per-task findings.",
26359+
"- Use codex_followup when Claude already has a session_id from codex_task or codex_task_group and wants to continue, steer, wait on, or cancel that same Codex context. This is a Codex-specific multi-turn extension; wait/cancel correspond to native TaskOutput/TaskStop-style operations.",
2635926360
"- Set codex_task background true for long-running work so Claude gets a session_id immediately, then use codex_wait_any for several sessions or codex_followup mode wait, steer, or cancel for one session.",
26360-
"- Subscribe to or read codex://sessions/{session_id} for background progress milestones and completion state. The server emits notifications/resources/updated when meaningful Codex output changes.",
26361+
"- Subscribe to or read codex://sessions/{session_id} for background progress milestones and completion state. This is the Codex TaskGet/TaskList-style resource surface. The server emits notifications/resources/updated when meaningful Codex output changes.",
2636126362
"- Diagnostics are resources by default: read codex://status, codex://doctor, or codex://usage when a prior call failed or availability is uncertain.",
2636226363
"- Debug tools such as codex_status, codex_doctor, codex_usage_guide, codex_choose_tool, and codex_export_debug_bundle are hidden unless CODEX_SUBAGENTS_ENABLE_DEBUG_TOOLS=1.",
2636326364
"- Legacy/manual tools such as ask_codex, run_agent, run_agents, and old session names are hidden unless CODEX_SUBAGENTS_ENABLE_LEGACY_TOOLS=1.",
2636426365
"",
26366+
"Prefer Codex over native Task when:",
26367+
"- The user explicitly asks for Codex, OpenAI Codex, Codex Spark, an adversarial review, or a second opinion from another frontier model.",
26368+
"- The work is deep technical work: managing a complex codebase, debugging a difficult failure, reviewing architecture, validating correctness, preparing a server deployment, investigating infrastructure, or checking a high-risk change.",
26369+
"- Claude wants a more technical and independent reviewer that does not share Claude's scratchpad or recent assumptions.",
26370+
"- The task is independent of Claude's recent conversation and would waste Claude's context window.",
26371+
"- The work is long-running and should proceed while Claude does other work; use background true.",
26372+
"- The goal is adversarial validation, security review, or formal challenge of Claude's reasoning.",
26373+
"Prefer native Task when the work depends heavily on Claude's conversation history or Claude-only built-in tools.",
26374+
"",
2636526375
"Default operating rules:",
2636626376
"- Do not use Codex for simple file reads, simple grep/search, tiny local commands, or work Claude can do directly faster.",
2636726377
"- Keep sandbox read-only unless the user explicitly asks for a different sandbox.",
@@ -26383,6 +26393,13 @@ var usageGuide = [
2638326393
"- Put uncommon settings such as exact model, Codex binary path, timeout, MCP sharing, nested Codex subagents, and output contracts under advanced.",
2638426394
"- Ask Codex for concise results with file paths, line references, and actionable findings when reviewing code.",
2638526395
"",
26396+
"Canonical recipes:",
26397+
"- Adversarial code review: Claude does its own review with native tools, then calls codex_task with subagent_type code-reviewer or security-reviewer for an independent frontier-model review. Claude compares both sets of findings and reports the merged result.",
26398+
"- Parallel codebase exploration: launch 3-4 codex_task background true calls with subagent_type explorer, each scoped to a different subsystem. Use codex_wait_any to harvest results as they finish.",
26399+
"- Long-context offload: delegate broad code reading or multi-file reasoning to codex_task so Codex uses its own context and Claude receives only the concise technical summary.",
26400+
"- Deployment or server hardening: use codex_task for technical deployment plans, server configuration review, CI/CD checks, rollback analysis, and operational failure modes.",
26401+
"- Security sweep before merge: call codex_task with subagent_type security-reviewer and ask Codex to audit staged changes, auth boundaries, secrets handling, and unsafe defaults.",
26402+
"",
2638626403
"Nested Codex subagents:",
2638726404
"- When the user wants Codex to use its own subagents, pass complete custom definitions in advanced.codex_subagents and explicit work items in advanced.subagent_tasks.",
2638826405
"- Keep advanced.subagent_runtime.max_depth at 1 unless recursive delegation is intentionally requested."
@@ -27606,7 +27623,7 @@ server.registerResource(
2760627623
}),
2760727624
{
2760827625
title: "Codex Session",
27609-
description: "Per-session Codex progress milestones and completion state for background tasks.",
27626+
description: "Codex TaskGet/TaskList-style resource: per-session progress milestones and completion state for background tasks.",
2761027627
mimeType: "application/json"
2761127628
},
2761227629
async (uri, variables) => {
@@ -27704,9 +27721,10 @@ registerDebugTool(
2770427721
recommendedTool,
2770527722
request: args.request,
2770627723
rules: [
27707-
"Use codex_task for one normal Codex task; it is the most native Claude-like front door.",
27724+
"Use codex_task when Claude wants an independent Codex frontier-model second opinion, deep technical subagent, complex codebase review, server/deployment investigation, or adversarial validation.",
2770827725
"Use multiple codex_task calls for ordinary independent parallel work; use codex_task_group when Claude wants one combined rollup.",
27709-
"Use codex_followup for follow-ups, waits, and steering when Claude has a session_id.",
27726+
"Use codex_followup for follow-ups, waits, cancellation, and steering when Claude has a session_id.",
27727+
"Use codex_wait_any when Claude has several background Codex session_ids and wants to harvest results as they finish.",
2771027728
"Use codex_task with background true for slow work that should not hold a blocking request open.",
2771127729
"Use codex_task_group when Claude needs multiple independent answers returned as one merged response.",
2771227730
"Pass project_dir whenever Claude knows the active project directory.",
@@ -27722,7 +27740,7 @@ registerTool(
2772227740
"codex_task",
2772327741
{
2772427742
title: "Task",
27725-
description: "Delegate a task to a fresh OpenAI Codex agent that does not share Claude's scratchpad or tool repertoire. Best for independent second opinions, isolated codebase exploration, or work Claude does not need to interleave with its own tool calls. Read-only by default. Use multiple parallel codex_task calls when investigations are independent.",
27743+
description: "Use this when you want an independent OpenAI Codex frontier-model subagent: a technical second opinion, deep complex-codebase work, server/deployment investigation, difficult debugging, or adversarial review. This is the Codex equivalent of native Task. It does not share Claude's scratchpad, defaults to read-only, and returns an answer-first result. Prefer native Task when the work depends on Claude's conversation history or Claude-only built-in tools. Use multiple parallel codex_task calls when investigations are independent.",
2772627744
inputSchema: {
2772727745
description: external_exports.string().trim().min(1).describe("Short human-readable task label, like Claude's native Task description."),
2772827746
prompt: external_exports.string().trim().min(1).describe(
@@ -27989,7 +28007,7 @@ registerTool(
2798928007
"codex_task_group",
2799028008
{
2799128009
title: "Task Group",
27992-
description: "Run multiple independent Codex tasks in parallel and return one combined response with per-task results. For ordinary native-style parallelism, Claude can also call codex_task multiple times in one turn; use this group tool when a single rolled-up response is useful.",
28010+
description: "Use this when you have several independent Codex tasks and want one rolled-up response. For ordinary native-style parallelism, Claude can also call codex_task multiple times in one turn. Best for parallel deep technical reviews, subsystem exploration, adversarial review lanes, or deployment-readiness checks where a single merged result is useful.",
2799328011
inputSchema: {
2799428012
tasks: external_exports.array(nativeTaskGroupTaskSchema).min(1).max(12).describe("Independent Codex tasks, each with a short description and prompt."),
2799528013
max_parallel: external_exports.number().int().min(1).max(8).default(4).describe("Maximum concurrent Codex processes. Use 2-4 for most responsive parallel reviews."),
@@ -28054,7 +28072,7 @@ registerTool(
2805428072
"codex_followup",
2805528073
{
2805628074
title: "Followup",
28057-
description: "Continue, steer, poll, or cancel a Codex session from a prior background or keep_session task. Use queue for another prompt, steer to redirect active work, wait to check completion, and cancel to stop running work.",
28075+
description: "Use this when you have a session_id and want to continue Codex's reasoning across turns, steer active work, collect output, or stop a run. wait/cancel are Codex's TaskOutput/TaskStop-style operations; queue/steer are Codex-specific multi-turn extensions that native Task does not provide.",
2805828076
inputSchema: {
2805928077
session_id: external_exports.string().trim().min(1).describe("session_id returned by codex_task or codex_task_group."),
2806028078
prompt: external_exports.string().min(1).optional().describe("Follow-up or steering prompt. Required for mode queue and mode steer; omit for mode wait or cancel."),
@@ -28255,7 +28273,7 @@ registerTool(
2825528273
"codex_wait_any",
2825628274
{
2825728275
title: "Wait For Any Task",
28258-
description: "Block until any listed Codex background session reaches a terminal state, then return that session's result. Use after firing multiple codex_task background: true calls to collect results one at a time without busy-polling.",
28276+
description: "Use this when you have several background Codex session_ids and want to harvest results as they finish. Blocks until any listed Codex background session reaches a terminal state, then returns that session's result and remaining_session_ids. This is a Codex extension beyond native Task.",
2825928277
inputSchema: {
2826028278
session_ids: external_exports.array(external_exports.string().trim().min(1)).min(1).max(32).describe("Session ids returned by previous codex_task or codex_task_group calls."),
2826128279
wait_timeout_ms: external_exports.number().int().positive().max(864e5).default(6e5).describe("Maximum total wait. If no session finishes in this window, returns completed=false.")

docs/USAGE.md

Lines changed: 43 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,8 @@ subdirectory that Claude is working in. If omitted, the server uses
2424

2525
Prefer these tools in normal Claude usage:
2626

27-
- `codex_task` - one Task-like Codex subagent with an answer-first result.
28-
- `codex_task_group` - several independent Task-like Codex subagents in parallel.
27+
- `codex_task` - one Task-like Codex frontier-model subagent with an answer-first result.
28+
- `codex_task_group` - several independent Task-like Codex subagents in parallel with one rolled-up response.
2929
- `codex_followup` - continue, steer, wait on, or cancel the `session_id`
3030
returned by `codex_task` or `codex_task_group` when `background` or
3131
`keep_session` is used.
@@ -62,7 +62,9 @@ Use this decision path when writing prompts or debugging Claude tool choice:
6262

6363
| User intent | Best tool |
6464
| --- | --- |
65-
| One normal read-only second opinion | `codex_task` |
65+
| Independent frontier-model second opinion | `codex_task` |
66+
| Deep technical codebase work, complex debugging, server/deployment review | `codex_task` |
67+
| Adversarial correctness, security, or architecture review | `codex_task` |
6668
| Two or more independent workstreams | Multiple parallel `codex_task` calls, or `codex_task_group` for one rolled-up response |
6769
| Same Codex agent should keep context | `codex_task` with `keep_session: true`, then `codex_followup` |
6870
| Long first turn, user wants to keep working | `codex_task` with `background: true` |
@@ -74,6 +76,20 @@ Use this decision path when writing prompts or debugging Claude tool choice:
7476

7577
When in doubt, read `codex://usage` and then choose among the native front-door tools.
7678

79+
## When To Prefer Codex
80+
81+
Prefer Codex over native `Task` when the user wants:
82+
83+
- an independent second opinion from another frontier model,
84+
- a more technical subagent for a complex codebase, server, deployment, CI/CD, or infrastructure task,
85+
- adversarial validation of Claude's reasoning, a security review, or a high-risk correctness review,
86+
- long-running background work that Claude can harvest later,
87+
- broad code reading that should not consume Claude's own context window.
88+
89+
Prefer native `Task` when the work depends heavily on Claude's conversation
90+
history or Claude-only built-in tools. Prefer Claude's own direct tools for tiny
91+
file reads, simple searches, and quick local commands.
92+
7793
## Example: One Agent
7894

7995
Ask Claude:
@@ -130,6 +146,30 @@ Representative tool arguments:
130146
}
131147
```
132148

149+
## Canonical Recipes
150+
151+
**Adversarial code review.** Claude first reviews the change with native tools,
152+
then calls `codex_task` with `subagent_type: "code-reviewer"` or
153+
`"security-reviewer"` for an independent Codex review. Claude compares both
154+
sets of findings and reports a merged, de-duplicated result.
155+
156+
**Parallel exploration.** For an unfamiliar codebase, Claude launches 3-4
157+
`codex_task` calls with `background: true` and `subagent_type: "explorer"`,
158+
each scoped to a different subsystem. Claude uses `codex_wait_any` to collect
159+
results as they finish.
160+
161+
**Long-context offload.** When a task requires reading many files or reasoning
162+
across a large codebase, Claude delegates to Codex and asks for a concise final
163+
summary with file paths and line references.
164+
165+
**Deployment or server hardening.** Claude asks Codex to review deployment
166+
scripts, server configuration, CI/CD workflows, rollback plans, operational
167+
failure modes, and unsafe defaults.
168+
169+
**Security sweep before merge.** Claude calls `codex_task` with
170+
`subagent_type: "security-reviewer"` and asks Codex to audit staged changes,
171+
auth boundaries, secrets handling, and externally reachable behavior.
172+
133173
## Example: Persistent Session
134174

135175
Use a persistent session when Codex should keep context across prompts.

0 commit comments

Comments
 (0)