[codex] Prototype Codex Apps as virtual HTTP MCP servers#30000
[codex] Prototype Codex Apps as virtual HTTP MCP servers#30000aibrahim-oai wants to merge 1 commit into
Conversation
11b6cb0 to
d0a8930
Compare
06221a7 to
b5b7ecc
Compare
b5b7ecc to
42e8a8e
Compare
623bd99 to
b1b3d96
Compare
b1b3d96 to
1043025
Compare
|
I would make the runtime-server move prove one boundary before landing: generic MCP layers should never need to know that a server came from Apps. The risk in this shape is not the loopback adapter itself. It is a partial migration where Apps are exposed as ordinary MCP servers, but approval identity, file-argument rewriting, resource access, auth revision, and install eligibility still have hidden connector-shaped branches in core or the connection manager. That would preserve the old coupling under a new transport name. Useful regression shape:
That keeps the architecture honest: Apps can own product policy and auth, but the model/runtime boundary should consume only ordinary MCP registrations plus trusted generic metadata. Boundary: architecture and regression-test feedback only; no claim about running this branch or validating implementation behavior. |
Important
This is an architecture prototype, not a landing-sized change. The review ask is: is this the right ownership boundary, and did the prototype preserve every behavior worth preserving at the right layer? If the answer is yes, the final section proposes a dependency-ordered landing stack.
Problem
Hosted Codex Apps already speak MCP, but Codex does not currently treat them like ordinary MCP servers.
Today they enter the client as one reserved
codex_appsserver. Generic MCP and core code then unpack connector annotations from that server's tool list and reconstruct product concepts that MCP should not need to understand:codex-mcpand the connection manager recognize the reserved server, parse connector identity, create connector-specific namespaces, and carry Apps-specific cache, reconnect, auth, approval, file, and refresh behavior.That inversion makes every Apps behavior a cross-cutting change. It also creates two lifecycle paths for what is fundamentally the same thing: an MCP server with tools, resources, authentication, approvals, and a connection lifecycle.
The target invariant is:
One compatibility seam remains in protocol serialization: generic
McpToolSourceaccessors still serialize the historicalconnector_id/connector_name/connector_descriptionfields consumed by Guardian and rollout clients. The prototype does not branch on those fields in generic layers, but the boundary verifier cannot prove that semantic distinction.Product-aware hosts may still use the shared Apps service directly. In particular, app-server should list Apps and read auth/install state from the Apps owner rather than reverse-engineering them from generic MCP tool state. Runtime Apps authentication remains an Apps-owned elicitation flow.
Review priorities, in order
These priorities are ordered. A lower item should not compromise a higher one.
Intentional or stricter behavior changes
These are not described as code motion. They need explicit reviewer agreement before landing.
Cold discovery no longer blocks the first model request. With no cached/published generation, the first cold turn can proceed without Apps tools or Apps instructions. Background discovery publishes a generation for a later safe boundary. This removes startup latency and failure coupling, but it is a visible availability tradeoff.
The current prototype exposes the real MCP topology through app-server.
mcpServerStatus/listchanges from onecodex_appsentry with grouped tools to a resource-onlycodex_appsentry plus onecodex_apps__<connector>entry per materialized connector namespace. This is not required by the core/MCP ownership boundary because app-server is allowed to understand Apps; preserving the aggregate public shape is an open landing decision. Connector resource reads already remain restricted to declared URIs.Generic OAuth does not claim runtime Apps servers.
mcpServer/oauth/loginrejects these runtime-only registrations. Apps auth remains the private auth-elicitation/install flow owned by the Apps adapter and extension.The current prototype makes generic install suggestion tools plugin-only. Standalone connectors are no longer returned by
list_available_plugins_to_installor accepted byrequest_plugin_install. This also means codex-tui has no model-driven install candidate when only standalone connectors are available. The architecture does not require that regression: if standalone model-driven installation remains required, it should move behind an Apps-owned extension surface rather than disappear or return as a core branch.Plugin and Apps guidance follows callable MCP namespaces. Model-facing plugin instructions no longer render a separate “Apps from this plugin” list; declared Apps appear through their contributed MCP namespaces and attribution. Apps instructions retain
include_apps_instructionsbut now require at least one policy-enabled tool in the published snapshot, avoiding instructions for an App with nothing callable.Explicit/implicit Apps analytics become turn-local. The previous session-global connector selection made every later use look explicit. The extension now classifies explicit use from raw mentions in the current turn. This fixes sticky classification, but an app mention introduced by a skill no longer counts as an explicit user mention.
Malformed inventory is handled more strictly. Blank tool names and incomplete connector identity are omitted; inconsistent display names for one stable connector ID reject the generation. Synthetic-only connectors remain available for auth/install checks but do not become installed Apps or callable connector servers.
Legacy provenance-free Apps caches are not migrated. Old entries cannot prove their upstream URL or SKU. Ignoring them avoids cross-origin/account reuse, but the first run after upgrade may require live discovery.
File handling is stricter. The old path could read from the primary environment without exact sandbox context. The new path requires the pinned environment generation, rejects stale instances and malformed arrays, and checks actual buffered size. Cancellation prevents partial argument rewriting or an upstream invocation, but does not roll back uploads that already completed.
Collision precedence is explicit. An enabled configured server loses to a generated Apps registration with the same connector name. An explicitly disabled same-name server remains a veto for that registration; a disabled configured
codex_appssingleton vetoes the whole Apps bundle for compatibility.Literal-loopback proxy/redirect hardening is generic. The no-proxy rule and literal-loopback redirect restriction live in the shared exec-server HTTP path, so their safety benefit and compatibility impact extend beyond Apps.
Known blockers and open decisions
These are prototype findings, not claims that the branch is landing-ready.
connectorswhile the body previously claimedcodex-appsowned names.Architecture at a glance
Names and endpoints
https://chatgpt.com/backend-api/ps/mcp/ps/mcpURL.codex_appsbefore this PRcodex_apps_upstreamafter this PRcodex_appsafter this PRcodex_apps__<connector>after this PRmcp__codex_apps__<connector>.Before and after
flowchart LR subgraph BEFORE["Before: one remote endpoint, one reserved local name"] direction TB B_CLIENT["Codex MCP client"] B_NAME["local registration: codex_apps"] B_REMOTE["actual remote HTTP MCP: /backend-api/ps/mcp"] B_TOOLS["one logical, potentially paginated tool inventory"] B_SPECIAL["codex-mcp + core decode connector metadata and synthesize namespaces"] B_CLIENT --> B_NAME --> B_REMOTE --> B_TOOLS --> B_SPECIAL end subgraph AFTER["After: the product adapter publishes ordinary MCP registrations"] direction TB A_REMOTE["same Hosted Apps upstream: /backend-api/ps/mcp"] A_OWNER["codex-apps + Apps extension: product-aware owner"] A_HTTP["one loopback listener per immutable generation"] A_ROUTES["HTTP routes: codex_apps resources; codex_apps__calendar tools; codex_apps__gmail tools"] A_GENERIC["generic catalog → connection manager → core"] A_REMOTE <-->|"inventory + forwarding"| A_OWNER A_OWNER --> A_HTTP --> A_ROUTES --> A_GENERIC A_OWNER -. "direct inventory + auth state" .-> A_APP["app-server"] endThere is one
127.0.0.1:0listener per immutable generation, not one listener or process per connector. Each route becomes a separateEffectiveMcpServerregistration with its own URL path, runtime-only bearer, policy, and trusted metadata.Ownership boundary
flowchart TB subgraph PRODUCT["Product-aware zone"] direction LR APP["app-server: direct list, auth, and install APIs"] EXT["Apps extension: eligibility, lifecycle, policy, presentation, analytics"] APPS["codex-apps: inventory, cache, HTTP adapter, files, resources, auth"] SUPPORT["plugin + connectors: product inputs"] APP -->|"direct service"| EXT -->|"owns"| APPS SUPPORT -->|"declarations + policy"| EXT SUPPORT -->|"naming helpers"| APPS end subgraph GENERIC["Product-agnostic zone"] direction LR API["extension-api: revisioned effective servers"] CORE["core: safe-boundary orchestration"] MCP["codex-mcp: catalog + client reconciliation"] PATH["ordinary MCP tool, resource, and approval path"] API --> CORE --> MCP --> PATH end EXT -->|"server contributions"| API MCP -->|"Streamable HTTP calls to loopback routes"| APPSmcp-serveronly constructs and shuts down an opaque host-extension bundle, then passes the registry to core.exec-serverandrmcp-clientprovide generic loopback safety and bounded HTTP responses.Exposed MCP topology
tools/listresources/readcodex_appscodex_apps__<connector>Cost and isolation tradeoff
/ps/mcp. Afterward, inventory uses a shared discovery connection and every used downstream MCP session lazily opens its own upstream session. That increases connection fanout, but prevents an elicitation in one connector/session from blocking another.Changed-module behavior map
This table covers every production module group in the diff. Test fixtures, manifests, generated build metadata, and
Cargo.lockfollow the owner they validate or register.codex-apps(new)/ps/mcpprotocol adaptation, bounded raw inventory/cache, immutable generations, loopback route authentication, connector/resource proxying, file uploads, auth and standard elicitation bridges, approval presentation, cancellation, and shutdown.codex-mcp. It consumes connector naming helpers that still live inconnectors; naming is not yet fully consolidated in this crate.ext/mcp::appsext/extension-apiCurrentversusDiscovercontribution modes, captured contributor revisions, runtime-effective server contributions, thread-data initialization, and composable install verification while reusing existing item hooks.codex-mcpEffectiveMcpServer, runtime-only bearer/owner/metadata, catalog precedence, generic elicitation, launch, status/resources, and per-client reconciliation.codex_appsdetection, connector parsing, Apps cache/hard refresh, Apps auth parsing, and Apps file/tool normalization.McpRuntimeSnapshot, and consumes generic trusted metadata for tool calls, approvals, Guardian, telemetry, resources, and presentation hooks.connectorscodex_appscompatibility constant, and connector server/tool/title naming helpers.codex-appsor be documented as a stable product split.pluginapp_configmoves here fromconnectors; the old path is temporarily re-exported for compatibility.core-pluginschatgpt::connectorsconfigMcpToolApprovalwhile keepingAppToolApprovalas an alias; retains generic discoverable-type parsing.protocolMcpToolSource; supports pinned item presentation and opaque elicitation IDs.connector_*Guardian fields as a compatibility seam.exec-server+rmcp-clientloginauth_with_revision()so credentials and revision-scoped runtime state cannot be paired across a refresh race.analyticsMcpToolSourceaccessors while retaining compatibility fields.toolsext/skillscodex_appsresource server.mcp-serverMcpHostExtensionsbundle and passes only its registry to core.codex-apps/codex-connectorsdependency, reserved name, or product branch.utils/string.githubboundary checkBuild manifests register the new crate and dependency direction; the approval template asset moves from core to
codex-appsunchanged.Why these structural choices
EffectiveMcpServerregistrationsCurrentandDiscover, capturing contributor revision before resolving contributionsCurrentprojections do not initiate external Apps discovery or refresh, discovery stays non-blocking, and a publication race converges before a later model request.Runtime flows
Cold start and discovery
Step-by-step behavior
At a model safe boundary, core samples contributor revisions and resolves contributions when the runtime inputs changed. The Apps extension returns its current publication immediately and starts one background initialization per connection key when no usable generation exists.
A failed cold initialization retries once immediately, then becomes eligible at a capped
1/2/4/8/16/30scooldown. Cooldown timers perform no network work, and failed refresh preserves the last-good generation.Connector tool call
sequenceDiagram participant Core as core generic tool path participant MCP as codex-mcp client participant App as connector loopback MCP participant Env as pinned environment participant Backend as ChatGPT backend Core->>MCP: call server codex_apps__calendar / tool create_event Note over Core,MCP: Generic policy and approval use trusted runtime metadata MCP->>App: HTTP MCP call + route-specific bearer App->>App: verify route, generation, Origin, and auth guard opt Schema declares file parameters App->>Env: read with pinned instance + sandbox context Env-->>App: bytes or fail closed App->>Backend: upload through file API end App->>App: routed name → stable upstream tool identity App->>Backend: /ps/mcp tools/call with sanitized metadata Backend-->>App: result or elicitation request opt Elicitation required App->>MCP: bridge to the initiating downstream session MCP-->>App: elicitation response end App-->>MCP: ordinary result + trusted effective input when rewritten MCP-->>Core: ordinary MCP result Note over App,Backend: Each downstream session lazily owns one upstream MCP sessionStep-by-step behavior
The routed name resolves to the generation's stable upstream identity. Trusted runtime metadata supplies approval, telemetry, plugin attribution, and effective-input capabilities; all response and presentation handling then follows generic MCP/tool lifecycle hooks.
Refresh, auth change, and in-flight work
flowchart TB START["Published generation G1: listener P1, bearer B1, auth revision R1"] TRIGGER{"What changed?"} AUTH_EVENT["Re-evaluate auth and Apps eligibility"] BUILD["Build G2 off to the side: new listener P2 and new bearers"] REMOVE["If no longer eligible: publish removal contributions"] PUBLISH["Atomic Apps publication: G2 becomes available for contribution"] RECONCILE["Next safe boundary: catalog adopts the change and reconciles clients"] OLD["Pinned G1 remains alive while snapshot/runtime owners retain it"] KIND{"Did auth revision change?"} NORMAL["No: pinned G1 sessions remain valid until released"] AUTH["Yes: reject new G1 requests and recheck before upstream forward"] INFLIGHT["A call already forwarded upstream may finish"] DROP["Last G1 owner drops: cancel sessions and stop listener"] START --> TRIGGER TRIGGER -->|"inventory refresh"| BUILD TRIGGER -->|"login, logout, or token change"| AUTH_EVENT AUTH_EVENT -->|"still eligible"| BUILD AUTH_EVENT -->|"ineligible or logged out"| REMOVE --> RECONCILE BUILD --> PUBLISH --> RECONCILE START -. "old snapshots" .-> OLD --> KIND KIND -->|"no"| NORMAL --> DROP KIND -->|"yes"| AUTH --> INFLIGHT --> DROP AUTH --> DROPStep-by-step behavior
Behavior audit
Detailed preserved behavior
Inventory, identity, and lifecycle
codex_apps__<connector>MCP server names; stable connector IDs disambiguate collisions. Tools route through a per-generation raw-name map.Auth, approval, and trust
openai/formrequests bridge to the originating downstream session. Unsupported capabilities cancel rather than leak across sessions.Files, resources, plugins, and presentation
openai/fileParamsbecome local-path schemas. Reads use the exact pinned environment/sandbox state, validate type and buffered size, upload before forwarding, and preserve array constraints. Cancellation prevents partial argument rewriting/upstream invocation; completed uploads are not rolled back.codex_appsserver proxies global resources/templates and exposes no tools. Connector routes can read only URIs declared by their tools and cannot enumerate the global set. The skills extension intentionally retains that resource-server compatibility name for orchestrator resources.include_apps_instructions, require a policy-enabled tool in the published snapshot, and describe per-connector namespaces.tools/listavoids duplicate physical-transport accounting.Security invariants
127.0.0.1:0.Origin, even with a valid bearer.Reviewer map
Smallest useful file-reading order
.github/scripts/verify_codex_apps_mcp_boundary.py— the invariant being enforced.codex-rs/ext/extension-api/src/contributors/mcp.rs— the generic contribution contract.codex-rs/codex-mcp/src/server.rs,catalog.rs,runtime_metadata.rs— runtime registration, precedence, and trusted capabilities.codex-rs/apps/src/lib.rs,generation.rs,http.rs,connector_server.rs,resource_server.rs— immutable HTTP generations and protocol forwarding.codex-rs/ext/mcp/src/apps/— eligibility, lifecycle, policy, presentation, analytics, and install verification.codex-rs/core/src/mcp.rs,session/mcp.rs,session/mcp_runtime.rs— generic projection and safe-boundary reconciliation.codex-rs/app-server/src/request_processors/apps_processor.rs,mcp_processor.rs,plugins.rs— the intentional product-aware direct service boundary.codex-rs/ext/mcp/src/lib.rsandcodex-rs/mcp-server/src/message_processor.rs— opaque host composition.Validation
The published prototype head
1043025epassed its GitHub checks on June 26, 2026. The branch is now behindmain, and the local rebase is unfinished with conflicts, so this description does not claim that the rebased tree is green. The test strategy follows the risks rather than the file layout:codex-appsand its extension.Originrejection, literal-loopback redirect/proxy rules, response/inventory/upload bounds, private metadata stripping, and cancellation.mcp-serverHosted Apps end-to-end test.Revised proposed landing stack
Each stage should be independently green. A staged adapter may exist behind tests, but no stage should leave two production implementations active.
EffectiveMcpServer, the smallest justified runtime bearer/owner/metadata surface, deterministic catalog semantics, and non-Apps proofs.Current/Discover, revisions, publication-race handling, safe-boundary projection, stable semantic identity, and non-Apps unchanged-client tests.mcp-serverto generic host composition, enable the boundary verifier, and retain the real-host end-to-end proof.The app-server status shape, standalone connector suggestions, cold-first-turn semantics, and other intentional behavior changes should land separately or be explicitly approved; they should not inflate the structural cutover.