Improve Claude autodiscovery for Codex subagents

xuio · xuio · commit f80f9130fa47 · 2026-05-09T19:58:27.000+02:00
diff --git a/.claude-plugin/mcp.json b/.claude-plugin/mcp.json
@@ -1,10 +1,7 @@
 {
   "mcpServers": {
     "codex-subagents": {
-      "command": "${CLAUDE_PLUGIN_ROOT}/dist/index.js",
-      "env": {
-        "CLAUDE_PROJECT_DIR": "${CLAUDE_PROJECT_DIR}"
-      }
+      "command": "${CLAUDE_PLUGIN_ROOT}/dist/index.js"
     }
   }
 }
diff --git a/.claude-plugin/plugin.json b/.claude-plugin/plugin.json
@@ -1,7 +1,7 @@
 {
   "name": "codex-subagents",
   "version": "0.1.0",
-  "description": "Launch OpenAI Codex agents from Claude Code through a daemonless MCP server.",
+  "description": "Launch OpenAI Codex agents, Spark agents, and parallel Codex subagents from Claude Code through a daemonless read-only MCP server.",
   "author": {
     "name": "xuio"
   },
diff --git a/README.md b/README.md
@@ -63,6 +63,8 @@ npm run test:claude-desktop
 
 `test:claude-real-codex` is the full opt-in live path: Claude Code loads the plugin and calls real Codex through the desktop app binary, including one single agent, one parallel run, and one nested Spark subagent run. It spends both Claude and Codex tokens, so it is intentionally not part of the default suite.
 
+`test:claude-autodiscovery` is an opt-in live Claude Code test for automatic tool selection. It gives Claude a natural "ask Codex" request, uses the installed plugin and fake Codex binary, and verifies that Claude chooses the Codex MCP tool without being told the exact tool name.
+
 Run Claude Code with the local plugin:
 
 ```sh
@@ -77,6 +79,8 @@ After startup, ask Claude to use Codex subagents, or invoke the plugin skill:
 
 ## MCP Tools
 
+`codex_usage_guide` returns the operating guide and example calls Claude can use when deciding how to delegate to Codex.
+
 `run_agent` launches one Codex `exec` process.
 
 `run_agents` launches multiple Codex `exec` processes concurrently with a bounded `max_parallel` setting.
diff --git a/dist/index.js b/dist/index.js
@@ -21703,13 +21703,36 @@ async function probeCodexVersion(codexBin, env = process.env) {
 }
 
 // src/index.ts
+var usageGuide = [
+  "Claude Code integration guide for codex-subagents:",
+  "",
+  "Use this MCP server whenever the user asks Claude to use Codex, OpenAI Codex, Codex subagents, Codex Spark, a Codex second opinion, parallel Codex review, or independent Codex codebase analysis. You do not need the user to name an MCP tool.",
+  "",
+  "Tool choice:",
+  "- Use run_agent for one delegated Codex task.",
+  "- Use run_agents when the work can be split into independent concurrent tasks, for example separate reviewers for API flow, tests, security, performance, UI, docs, or migration risk.",
+  "- Use codex_status only for diagnostics or when you need to confirm the Codex binary/version.",
+  "- Use codex_usage_guide if you are unsure how to structure a Codex delegation.",
+  "",
+  "Default operating rules:",
+  "- Keep sandbox read-only unless the user explicitly asks for a different sandbox.",
+  "- Approvals are non-interactive; do not expect Codex to ask permission.",
+  '- Prefer model_preset "spark" for fast focused checks, small reviews, UI iteration, and responsive sidecar analysis.',
+  '- Use reasoning_effort "medium" by default, "low" for simple checks, and "high" or "xhigh" only for difficult analysis.',
+  "- Pass project_dir whenever Claude knows the active project directory so Codex works in the same tree as Claude Code.",
+  "- Ask Codex for concise results with file paths, line references, and actionable findings when reviewing code.",
+  "",
+  "Nested Codex subagents:",
+  "- When the user wants Codex to use its own subagents, pass complete custom definitions in codex_subagents and explicit work items in subagent_tasks.",
+  "- Keep subagent_runtime.max_depth at 1 unless recursive delegation is intentionally requested."
+].join("\n");
 var server = new McpServer(
   {
     name: "codex-subagents",
     version: "0.1.0"
   },
   {
-    instructions: "Launch OpenAI Codex agents from Claude Code. Use run_agent for one read-only Codex delegation and run_agents for parallel delegations. Defaults are daemonless stdio, Codex desktop app binary when installed, read-only sandbox, approval_policy=never, and fast service tier."
+    instructions: usageGuide
   }
 );
 var reasoningEffortSchema = external_exports.enum(reasoningEfforts);
@@ -21743,10 +21766,16 @@ var subagentRuntimeSchema = external_exports.object({
   job_max_runtime_seconds: external_exports.number().int().positive().max(86400).optional()
 });
 var commonInputSchema = {
-  model: external_exports.string().trim().min(1).optional().describe("Codex model, for example gpt-5.3-codex. Omit to use the plugin or Codex default."),
-  model_preset: modelPresetSchema.optional().describe("Convenience model preset. `spark` maps to gpt-5.3-codex-spark."),
-  reasoning_effort: reasoningEffortSchema.optional().describe("Codex model reasoning effort. Defaults to plugin config or medium."),
-  sandbox: sandboxModeSchema.default("read-only").describe("Codex sandbox mode. Defaults to read-only."),
+  model: external_exports.string().trim().min(1).optional().describe(
+    "Exact Codex model, for example gpt-5.3-codex. Omit to use model_preset, the plugin default, or Codex default."
+  ),
+  model_preset: modelPresetSchema.optional().describe(
+    "Convenience model preset. Use `spark` for fast Codex Spark work; it maps to gpt-5.3-codex-spark."
+  ),
+  reasoning_effort: reasoningEffortSchema.optional().describe(
+    "Codex model reasoning effort. Prefer medium by default, low for simple checks, high/xhigh only for difficult analysis."
+  ),
+  sandbox: sandboxModeSchema.default("read-only").describe("Codex sandbox mode. Keep read-only unless the user explicitly asks otherwise."),
   service_tier: serviceTierSchema.default("fast").describe("Codex service tier. Defaults to fast for responsiveness."),
   model_verbosity: modelVerbositySchema.optional().describe("Optional GPT-5 model verbosity override."),
   reasoning_summary: reasoningSummarySchema.optional().describe("Optional Codex reasoning summary setting."),
@@ -21757,14 +21786,18 @@ var commonInputSchema = {
   codex_bin: external_exports.string().trim().min(1).optional().describe("Explicit Codex CLI path. Overrides default binary resolution for this call."),
   profile: external_exports.string().trim().min(1).optional().describe("Optional Codex config profile."),
   timeout_ms: external_exports.number().int().positive().max(864e5).default(6e5).describe("Maximum runtime per Codex process in milliseconds."),
-  max_output_chars: external_exports.number().int().positive().max(5e5).default(6e4).describe("Maximum final message/stdout characters retained per agent."),
+  max_output_chars: external_exports.number().int().positive().max(5e5).default(6e4).describe("Maximum final message/stdout characters retained per agent. Lower this for concise parallel reviews."),
   include_events: external_exports.boolean().default(false).describe("Include parsed Codex JSONL events in the result. Usually leave false."),
   ephemeral: external_exports.boolean().default(true).describe("Run Codex without persisting session rollout files."),
   skip_git_repo_check: external_exports.boolean().default(false).describe("Allow Codex to run outside a Git repository."),
   ignore_rules: external_exports.boolean().default(false).describe("Skip Codex execpolicy .rules files."),
-  codex_subagents: external_exports.array(codexSubagentSchema).max(24).optional().describe("Custom Codex subagents available inside this Codex run."),
-  subagent_tasks: external_exports.array(subagentTaskSchema).max(24).optional().describe("Specific built-in or custom Codex subagents the parent Codex agent should spawn."),
-  subagent_runtime: subagentRuntimeSchema.optional().describe("Runtime limits for nested Codex subagent orchestration.")
+  codex_subagents: external_exports.array(codexSubagentSchema).max(24).optional().describe(
+    "Complete custom Codex subagent definitions available inside this Codex run for nested delegation."
+  ),
+  subagent_tasks: external_exports.array(subagentTaskSchema).max(24).optional().describe(
+    "Specific built-in or custom Codex subagents the parent Codex agent should spawn and wait for."
+  ),
+  subagent_runtime: subagentRuntimeSchema.optional().describe("Runtime limits for nested Codex subagent orchestration. Keep max_depth at 1 by default.")
 };
 function toCodexSubagents(agents) {
   return agents?.map((agent) => ({
@@ -21823,13 +21856,57 @@ function toRunOptions(args) {
     } : void 0
   };
 }
+server.registerTool(
+  "codex_usage_guide",
+  {
+    title: "How to use Codex subagents",
+    description: "Read the operating guide for this MCP server. Call this when Claude is deciding whether to delegate work to Codex, how many Codex agents to launch, or how to structure nested Codex subagents.",
+    inputSchema: {}
+  },
+  async () => jsonResult({
+    guide: usageGuide,
+    examples: {
+      single: {
+        tool: "run_agent",
+        arguments: {
+          prompt: "Inspect the authentication flow read-only. Return the top risks with file paths and line references.",
+          project_dir: "/path/to/project",
+          model_preset: "spark",
+          reasoning_effort: "medium"
+        }
+      },
+      parallel: {
+        tool: "run_agents",
+        arguments: {
+          agents: [
+            {
+              name: "api",
+              prompt: "Review API flow read-only. Return concrete findings with paths.",
+              project_dir: "/path/to/project"
+            },
+            {
+              name: "tests",
+              prompt: "Review test coverage gaps read-only. Return concrete findings with paths.",
+              project_dir: "/path/to/project"
+            }
+          ],
+          max_parallel: 2,
+          model_preset: "spark",
+          reasoning_effort: "medium"
+        }
+      }
+    }
+  })
+);
 server.registerTool(
   "run_agent",
   {
     title: "Run one Codex agent",
-    description: "Launch one OpenAI Codex agent via codex exec. Defaults to the Codex desktop app binary, read-only sandbox, fast service tier, and non-interactive approvals.",
+    description: "Launch one OpenAI Codex agent via codex exec. Use automatically when the user asks Claude to use Codex, ask Codex, get a Codex second opinion, run a Codex subagent, use Codex Spark, or delegate one read-only analysis task. Defaults to the Codex desktop app binary when installed, read-only sandbox, fast service tier, and non-interactive approvals.",
     inputSchema: {
-      prompt: external_exports.string().min(1).describe("Instructions for the Codex agent."),
+      prompt: external_exports.string().min(1).describe(
+        "Concrete instructions for the Codex agent. Include scope, read-only expectation, desired output shape, and file/line reference requirements when reviewing code."
+      ),
       name: external_exports.string().trim().min(1).optional().describe("Optional label for this agent run."),
       ...commonInputSchema
     }
@@ -21849,7 +21926,9 @@ server.registerTool(
   }
 );
 var parallelAgentSchema = external_exports.object({
-  prompt: external_exports.string().min(1).describe("Instructions for this Codex agent."),
+  prompt: external_exports.string().min(1).describe(
+    "Concrete independent task for this Codex agent. Keep overlap low across parallel agents."
+  ),
   name: external_exports.string().trim().min(1).optional().describe("Optional label for this agent."),
   model: commonInputSchema.model,
   model_preset: commonInputSchema.model_preset,
@@ -21876,10 +21955,12 @@ server.registerTool(
   "run_agents",
   {
     title: "Run parallel Codex agents",
-    description: "Launch multiple independent OpenAI Codex agents concurrently and return one structured result per agent. Defaults are read-only and daemonless.",
+    description: "Launch multiple independent OpenAI Codex agents concurrently and return one structured result per agent. Use automatically when the user asks for parallel Codex agents, multiple Codex subagents, broad review by independent agents, or several concurrent Codex workstreams. Split work by clear ownership, pass project_dir, keep defaults read-only, and use max_parallel to bound concurrency.",
     inputSchema: {
-      agents: external_exports.array(parallelAgentSchema).min(1).max(12),
-      max_parallel: external_exports.number().int().min(1).max(8).default(4),
+      agents: external_exports.array(parallelAgentSchema).min(1).max(12).describe(
+        "Independent Codex agent tasks. Use names like api, tests, security, docs, performance, or ui when helpful."
+      ),
+      max_parallel: external_exports.number().int().min(1).max(8).default(4).describe("Maximum concurrent Codex processes. Use 2-4 for most responsive parallel reviews."),
       ...commonInputSchema
     }
   },
@@ -21948,7 +22029,7 @@ server.registerTool(
   "codex_status",
   {
     title: "Codex status",
-    description: "Report Codex binary resolution, version, server working directory, and default execution settings.",
+    description: "Report Codex binary resolution, version, server working directory, and default execution settings. Use for diagnostics before delegation only when Codex availability is uncertain or a prior tool call failed.",
     inputSchema: {
       codex_bin: commonInputSchema.codex_bin
     }
diff --git a/package.json b/package.json
@@ -25,6 +25,7 @@
     "test:reliability": "node test/reliability-matrix.mjs",
     "test:codex-runtime": "node test/codex-runtime-probe.mjs",
     "test:claude-orchestration": "node test/claude-orchestration.mjs",
+    "test:claude-autodiscovery": "node test/claude-autodiscovery.mjs",
     "test:claude-real-codex": "node test/claude-real-codex.mjs",
     "test:ci": "npm run build && npm test && npm run smoke:mcp && npm run test:reliability",
     "test:comprehensive": "npm run build && npm test && npm run smoke:mcp && npm run test:reliability && npm run test:codex-runtime && npm run validate:plugin && npm run test:claude-desktop",
diff --git a/skills/codex-subagents/SKILL.md b/skills/codex-subagents/SKILL.md
@@ -1,10 +1,10 @@
 ---
-description: Launch one or more OpenAI Codex agents from Claude Code for read-only exploration, review, planning, or parallel codebase analysis. Use when the user asks for Codex subagents, parallel Codex agents, or wants Claude Code to delegate work to Codex.
+description: Launch one or more OpenAI Codex agents from Claude Code for read-only exploration, review, planning, second opinions, Spark checks, or parallel codebase analysis. Use automatically when the user asks for Codex, OpenAI Codex, Codex Spark, Codex subagents, parallel Codex agents, a Codex second opinion, or for Claude Code to delegate work to Codex.
 ---
 
 # Codex Subagents
 
-Use the `codex-subagents` MCP server when the task benefits from delegating read-only work to OpenAI Codex from inside Claude Code.
+Use the `codex-subagents` MCP server when the task benefits from delegating read-only work to OpenAI Codex from inside Claude Code. If the user asks to "use Codex", "ask Codex", "launch Codex subagents", "use Spark", "get a Codex second opinion", or "run parallel Codex agents", call the MCP tools directly. Do not wait for the user to name the MCP tool.
 
 Default behavior:
 
@@ -16,16 +16,51 @@ Default behavior:
 - Supports `model_preset: "spark"` for Codex Spark (`gpt-5.3-codex-spark`) without requiring Claude to remember the exact model string.
 - Supports nested Codex subagents by passing `codex_subagents`, `subagent_tasks`, and `subagent_runtime`; custom agents are sent as Codex `agents.<name>...` config overrides for the child run.
 
-For one delegated task, call `run_agent`.
+For one delegated task, call `run_agent`. Make the prompt self-contained: include the scope, the expected read-only behavior, and the output shape Claude needs. For code review and exploration, ask for concise findings with file paths and line references.
 
-For independent tasks that can run concurrently, call `run_agents` with one agent object per task. Keep each prompt concrete and bounded, and ask Codex to return concise findings with file paths and line references when relevant.
+For independent tasks that can run concurrently, call `run_agents` with one agent object per task. Split by ownership such as API flow, tests, security, performance, UI, docs, or migration risk. Keep prompts concrete and bounded, and set `max_parallel` to the smaller of the useful agent count and `4` unless the user asks for more.
 
 When Claude wants Codex to work in the same repository or folder as the active Claude Code session, pass that folder as `project_dir`. Use `cwd` only as a compatibility alias.
 
 Prefer `reasoning_effort: "medium"` for exploration and `high` or `xhigh` only when the task is complex enough to justify the extra latency and token usage.
 
 Use `model_preset: "spark"` for fast, focused work such as UI iteration, narrow exploration, small reviews, and quick sidecar checks.
 
+Use `codex_status` only when diagnosing installation or binary resolution, or after a failed Codex tool call. Normal delegation should start with `run_agent` or `run_agents`.
+
+Example single-agent call:
+
+```json
+{
+  "prompt": "Review the MCP server implementation read-only. Return the top risks with file paths and line references, then a brief summary.",
+  "project_dir": "/path/to/project",
+  "model_preset": "spark",
+  "reasoning_effort": "medium"
+}
+```
+
+Example parallel call:
+
+```json
+{
+  "agents": [
+    {
+      "name": "api",
+      "prompt": "Review the MCP tool schemas and runtime options read-only. Return concrete risks with paths.",
+      "project_dir": "/path/to/project"
+    },
+    {
+      "name": "tests",
+      "prompt": "Review the test coverage read-only. Identify missing scenarios with paths.",
+      "project_dir": "/path/to/project"
+    }
+  ],
+  "max_parallel": 2,
+  "model_preset": "spark",
+  "reasoning_effort": "medium"
+}
+```
+
 When nested Codex delegation is needed:
 
 - Put complete custom subagent definitions in `codex_subagents`.
diff --git a/src/index.ts b/src/index.ts
diff --git a/test/claude-autodiscovery.mjs b/test/claude-autodiscovery.mjs
diff --git a/test/smoke-mcp.mjs b/test/smoke-mcp.mjs

Original file line number	Diff line number	Diff line change
`@@ -1,10 +1,7 @@`
`1`	`1`	`{`
`2`	`2`	`"mcpServers": {`
`3`	`3`	`"codex-subagents": {`
`4`		`- "command": "${CLAUDE_PLUGIN_ROOT}/dist/index.js",`
`5`		`- "env": {`
`6`		`- "CLAUDE_PROJECT_DIR": "${CLAUDE_PROJECT_DIR}"`
`7`		`- }`
	`4`	`+ "command": "${CLAUDE_PLUGIN_ROOT}/dist/index.js"`
`8`	`5`	`}`
`9`	`6`	`}`
`10`	`7`	`}`
Original file line number	Diff line number	Diff line change
`@@ -1,7 +1,7 @@`
`1`	`1`	`{`
`2`	`2`	`"name": "codex-subagents",`
`3`	`3`	`"version": "0.1.0",`
`4`		`- "description": "Launch OpenAI Codex agents from Claude Code through a daemonless MCP server.",`
	`4`	`+ "description": "Launch OpenAI Codex agents, Spark agents, and parallel Codex subagents from Claude Code through a daemonless read-only MCP server.",`
`5`	`5`	`"author": {`
`6`	`6`	`"name": "xuio"`
`7`	`7`	`},`