Skip to content

communicate: add codex adapter via app-server JSON-RPC (parity with claude-sdk) #803

@willwashburn

Description

@willwashburn

Problem

Communicate-mode adapters in packages/sdk/src/communicate/adapters/ cover
claude-sdk, pi, ai-sdk, crewai, google-adk, langgraph, and
openai-agents — but there is no codex adapter. Codex is currently
reachable only through tier-1 PTY mode (relay.codex.spawn(...) in
packages/sdk/src/relay.ts:463), which means:

  • Inbound messages are delivered as keystroke injection into a PTY, racing
    with TUI state and dependent on per-platform node-pty / conpty / winpty
    bindings (see the per-platform broker binaries broker-darwin-arm64,
    broker-linux-arm64, broker-win32-x64, …).
  • "Agent ready" detection is prompt-sniffing in broker/src/helpers.rs
    (cf. the broader fix in broker: composable wait-conditions for CLI readiness (steal from ht) #800).
  • There is no structured signal for turn started / turn completed; the
    delivery_injected → active → verified state machine has to infer from
    PTY output.
  • thread/fork-style multi-agent patterns (branch N workers from one
    checkpoint) aren't expressible at all.

The reason claude-sdk is in the structured tier and codex isn't is purely
that Anthropic shipped an embeddable library
(@anthropic-ai/claude-agent-sdk) and OpenAI hadn't shipped a stable
control plane. That has now changed.

Prior art

OpenAI now ships codex app-server:
a JSON-RPC 2.0 control plane over stdio (default), unix socket, or
experimental websocket. It is what the official Codex VS Code extension
talks to. Surface area relevant to a relay adapter:

  • thread/start | resume | fork — synchronous threadId at spawn time,
    plus ephemeral: true for in-memory forks.
  • turn/start | steer | interrupt — structured input delivery and clean
    cancel/steer mid-flight.
  • item/started, item/completed, turn/started, turn/completed
    notifications — the structured analog of PostToolUse / Stop hooks.
  • mcpServerStatus/list, config/mcpServer/reload — register
    relaycast as an MCP server on the agent without re-spawning.
  • initialize / initialized handshake as the readiness signal (replaces
    prompt sniffing for the codex case).

Schema is regenerated per Codex version
(codex app-server generate-ts); we already pin
version: '0.124.0' in packages/shared/cli-registry.yaml.

Proposal

Add packages/sdk/src/communicate/adapters/codex.ts, modeled on
claude-sdk.ts. The shape mapping is direct:

claude-sdk.ts does codex adapter equivalent
Add relaycast to options.mcpServers Ensure relaycast MCP server present in config.toml (or register via config/mcpServer/reload) before thread/start
PostToolUse hook returns systemMessage Subscribe to item/completed; on relevant items, drain inbox via relay.inbox() and call turn/steer with the formatted messages
Stop hook returns { continue: true, systemMessage } On turn/completed, drain inbox; if non-empty, prepend formatted messages to the next turn/start input

Unlike claude-sdk, this adapter is out-of-process — it spawns or attaches
to a codex app-server over stdio JSON-RPC. The wiring is more code than
an in-process hook adapter, but it sits in the same architectural tier and
removes the PTY dependency for codex.

Sketch

// packages/sdk/src/communicate/adapters/codex.ts
export interface CodexAdapterOptions {
  cwd?: string;
  model?: string;
  permissionProfile?: string;
  // Stdio transport by default; ws/unix-socket as future opt-ins.
}

export function onRelay(
  name: string,
  options: CodexAdapterOptions,
  relay: RelayLike = new Relay(name),
): CodexHandle {
  // 1. Spawn `codex app-server` with stdio transport, do initialize
  //    handshake with clientInfo.name = 'agent_relay'.
  // 2. Ensure relaycast MCP server registered.
  // 3. thread/start → record threadId synchronously.
  // 4. Subscribe to item/* and turn/* notifications.
  // 5. On item/completed: drainInbox(relay) → turn/steer if non-empty.
  // 6. On turn/completed: drainInbox(relay) → store for next turn/start.
  // 7. Expose .send(text) → turn/start, .interrupt() → turn/interrupt,
  //    .fork(opts) → thread/fork, .close() → graceful shutdown.
}

Files to touch

  • New: packages/sdk/src/communicate/adapters/codex.ts
  • New: packages/sdk/src/communicate/adapters/codex-jsonrpc.ts
    (transport + handshake; reusable if we ever talk to other JSON-RPC agents
    the same way — acp-bridge is precedent)
  • Update: packages/sdk/src/communicate/index.ts
    add onCodexRelay export and a discriminator branch in onRelay()
  • Update: packages/sdk/src/communicate/adapters/index.ts
  • Update: packages/sdk/package.json — no new runtime dep; codex is a
    binary we shell out to (already in cli-registry.yaml)
  • Tests: mirror tests/communicate/adapters/test_claude_sdk.py against a
    fake JSON-RPC peer
  • Docs: short note in the communicate README about the new adapter and the
    PTY-vs-app-server tradeoff

Caveats / scope

  • Doesn't replace PTY mode. Tier-1 PTY remains for foreground
    "user-watching" sessions and for claude / gemini / cursor. This adapter
    targets SDK-driven background workers and headless/CI flows where TTY
    allocation is awkward.
  • No user-facing TUI. The app-server is a backend; if relay wants to
    surface activity to a watching human, it has to render the
    item/* stream itself (out of scope here).
  • Schema versioning. Pin against codex 0.124.0 from
    cli-registry.yaml; add a version probe at handshake time and fail fast
    if the connected app-server is older than what the adapter expects. Some
    methods (dynamic tools, realtime, webrtc) require
    capabilities.experimentalApi — adapter opts in only for what it uses.
  • WebSocket transport is documented as experimental/unsupported in the
    codex README; this adapter uses stdio. Unix socket later if useful.
  • Auth / config inheritance. The adapter should run codex app-server
    with the user's existing $CODEX_HOME, so ChatGPT/API auth and
    config.toml settings carry over without extra wiring.

Why now

  • Codex now has a stable control plane to bind to; communicate mode has had
    a codex-shaped hole since it was introduced.
  • It removes one of the two things keeping the broker on the hot path for
    codex (the other being foreground PTY UX).
  • It's a clean prerequisite for richer telemetry / trajectory consumption
    per turn — turn/completed includes structured token usage and item
    history that PTY parsing has to reconstruct heuristically.

Effort

Medium. Adapter + JSON-RPC client + tests is ~500–1000 LOC. No new runtime
deps. Risk concentrated in the lifecycle / reconnection logic; the
event-shape mapping itself is well-defined by the codex schema.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions