feat(adapters): Codex engine adapter with app-server JSON-RPC protocol#437
Conversation
…tocol (#412) Implement CodexAdapter that speaks the Codex app-server JSON-RPC protocol over stdio, enabling `cf work start <id> --execute --engine codex`. - Full 4-step handshake: initialize → initialized → thread/start → turn/start - Turn streaming with event routing (session_started, notification, tool_call, turn/completed, turn/failed, turn/cancelled) - Auto-approval of tool calls (configurable via approval_policy) - Timeout enforcement: turn, read, and stall timeouts - Token usage tracking from turn_completed events - Registered in engine_registry (VALID_ENGINES + EXTERNAL_ENGINES) - OPENAI_API_KEY validator for CLI pre-flight checks - 23 unit tests covering protocol, handshake, streaming, approval, and full run
- Enforce read timeout via selectors (prevents readline() deadlock) - Handle non-auto approval policies by rejecting tool calls - Extract _detect_modified_files to shared git_utils module - Add test for non-auto approval policy rejection
WalkthroughAdds OpenAI Codex engine support: new CodexAdapter (app-server JSON-RPC over stdio), OpenAI API-key validator, engine_registry wiring for "codex", shared git-file detection utility, CLI integration for Changes
Sequence DiagramsequenceDiagram
participant CLI
participant EngineRegistry
participant CodexAdapter
participant Codex_Process as Codex App-Server
participant Git as Workspace/Git
CLI->>EngineRegistry: request adapter for "codex"
EngineRegistry->>CodexAdapter: return CodexAdapter
CLI->>CodexAdapter: run(task_id, prompt, workspace)
CodexAdapter->>Codex_Process: spawn subprocess (codex_command)
CodexAdapter->>Codex_Process: send initialize
Codex_Process->>CodexAdapter: initialized
CodexAdapter->>Codex_Process: send thread/start
CodexAdapter->>Codex_Process: send turn/start
Codex_Process->>CodexAdapter: session_started / notifications
loop Turn streaming
Codex_Process->>CodexAdapter: notification / progress / tool_call
alt tool_call
CodexAdapter->>Codex_Process: send approval (per policy)
end
CodexAdapter->>CLI: emit AgentEvent(progress)
end
Codex_Process->>CodexAdapter: turn/completed
CodexAdapter->>Git: detect_modified_files(workspace)
CodexAdapter->>CLI: return AgentResult(status, token_usage, modified_files)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
Code Review — Codex Engine Adapter (#437)This is a clean, well-structured implementation. The JSON-RPC protocol handling, Issues1. The parameter is accepted and stored as 2. Double-kill in The 3. API key validation is hardcoded rather than registry-driven ( The 4. Test-specific branch in production The comment says "Mock objects (in tests) won't have fileno(), so fall back to direct readline for those." Production code branching on mock-object characteristics is a code smell. Minor Observations
Stall timeout test is testing EOF, not stall ( What's Working Well
The |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
codeframe/cli/app.py (1)
2063-2069: Centralize engine credential preflight.This engine-to-credential branching now exists in two commands, and
work_retry()still has its own Anthropic-only check at Lines 2523-2524. A shared helper would keep Codex auth behavior from drifting across start, batch, and retry paths.Also applies to: 2994-3000
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@codeframe/cli/app.py` around lines 2063 - 2069, The engine-to-credential branching is duplicated across commands (start, batch and work_retry) which causes inconsistent Codex/Anthropic checks; create a centralized helper (e.g., preflight_engine_credentials(engine)) that imports and uses is_external_engine, require_openai_api_key and require_anthropic_api_key to perform the correct check for "codex" vs non-external engines, then replace the duplicated blocks in the start/batch command handlers and in work_retry() with a call to this helper (also update the occurrences around the other noted region). Ensure the helper is invoked before any engine-specific actions so all paths share the same auth preflight behavior.tests/core/adapters/test_codex.py (1)
223-231: This is exercising EOF, not a stall.
_FakeStdout([])returns""immediately, so this test only covers the EOF branch. It will not catch regressions where a per-read timeout fires beforestall_timeout_ms.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/core/adapters/test_codex.py` around lines 223 - 231, The test currently uses _FakeStdout([]) which returns EOF immediately and only exercises the EOF branch; change the test_stall_timeout to use a fake stdout that simulates a blocking read (i.e., does not raise StopIteration and does not yield data) so adapter._stream_turn actually hits the stall timeout path; update or extend _FakeStdout (or create a new _BlockingStdout) to support a blocking/never-yielding mode and use that in test_stall_timeout with stall_timeout_ms=100 and on_event=None so the assertion that result.status == "failed" and the error mentions "stall" validates the per-read stall behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@codeframe/core/adapters/codex.py`:
- Around line 48-57: Constructor stores sandbox_mode into self._sandbox_mode but
run() never forwards it, so setting sandbox_mode has no effect; update the code
that builds the command or JSON-RPC payload in the run() method (and any other
place that assembles cmd/params, e.g., where cmd variable or payload dict is
created) to include the sandbox flag from self._sandbox_mode (e.g., add a
"--sandbox" CLI arg when constructing codex command or add "sandbox_mode":
self._sandbox_mode to the JSON payload), and ensure downstream code
serializes/handles a None value correctly so behavior remains unchanged when
sandbox_mode is not provided.
In `@tests/core/adapters/test_codex.py`:
- Around line 1-8: This test module is missing the v2 mark; add a module-level
pytestmark = pytest.mark.v2 so pytest selects it as a v2 test. Edit the top of
the test module to define pytestmark (using the existing imported pytest) and
assign pytest.mark.v2, ensuring the module-level symbol pytestmark is present
before any tests run.
---
Nitpick comments:
In `@codeframe/cli/app.py`:
- Around line 2063-2069: The engine-to-credential branching is duplicated across
commands (start, batch and work_retry) which causes inconsistent Codex/Anthropic
checks; create a centralized helper (e.g., preflight_engine_credentials(engine))
that imports and uses is_external_engine, require_openai_api_key and
require_anthropic_api_key to perform the correct check for "codex" vs
non-external engines, then replace the duplicated blocks in the start/batch
command handlers and in work_retry() with a call to this helper (also update the
occurrences around the other noted region). Ensure the helper is invoked before
any engine-specific actions so all paths share the same auth preflight behavior.
In `@tests/core/adapters/test_codex.py`:
- Around line 223-231: The test currently uses _FakeStdout([]) which returns EOF
immediately and only exercises the EOF branch; change the test_stall_timeout to
use a fake stdout that simulates a blocking read (i.e., does not raise
StopIteration and does not yield data) so adapter._stream_turn actually hits the
stall timeout path; update or extend _FakeStdout (or create a new
_BlockingStdout) to support a blocking/never-yielding mode and use that in
test_stall_timeout with stall_timeout_ms=100 and on_event=None so the assertion
that result.status == "failed" and the error mentions "stall" validates the
per-read stall behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 051eb633-10c0-4409-b93e-3efc6983c2d1
📒 Files selected for processing (8)
codeframe/cli/app.pycodeframe/cli/validators.pycodeframe/core/adapters/__init__.pycodeframe/core/adapters/codex.pycodeframe/core/adapters/git_utils.pycodeframe/core/adapters/subprocess_adapter.pycodeframe/core/engine_registry.pytests/core/adapters/test_codex.py
- Differentiate read timeout vs EOF in _recv_line using _TIMEOUT sentinel (prevents premature failure when stall/turn timeouts should still apply) - Forward sandbox_mode to initialize params (was a no-op) - Add pytestmark = pytest.mark.v2 to test module
Follow-up Review — Codex Adapter (#437)Reviewing the updated commit. Checking status of previous findings and flagging a new issue. Correction to My Previous Review
New Bug:
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@codeframe/core/adapters/codex.py`:
- Around line 231-238: The _handshake function assumes response is a dict but
_recv_line can return the _TIMEOUT sentinel (an object) causing an
AttributeError when calling response.get; update _handshake to explicitly check
for the _TIMEOUT sentinel (or non-dict) immediately after calling _recv_line and
return False on timeout before accessing response.get, referencing _handshake,
_recv_line and the _TIMEOUT symbol to locate the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0f71f3d6-e696-4e26-b3e6-f2e494474d83
📒 Files selected for processing (2)
codeframe/core/adapters/codex.pytests/core/adapters/test_codex.py
Prevents AttributeError when _recv_line returns _TIMEOUT during the initialized wait step (CodeRabbit finding).
|
Follow-up Review: Codex Adapter (cdbda79 latest commit) Previously Flagged - Now Resolved:
Still Open:
Summary: The critical runtime bug (_TIMEOUT crash on slow handshake) is fixed. Remaining items are one dead-code issue (double-kill), one correctness gap (work_retry missing Codex preflight - the only item that causes a silent user-visible failure), and one code smell (mock detection). The work_retry gap is worth addressing before merge; the others can follow. |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@codeframe/core/adapters/codex.py`:
- Around line 142-145: The stderr drain thread (stderr_thread) can block on
process.stderr.read() until the subprocess exits, so move the call to
self._kill(process) to occur before stderr_thread.join(timeout=5) so the
subprocess is terminated first and the read unblocks; ensure you still call
stderr_thread.join(timeout=5) afterward to wait for the thread to finish and
preserve the existing finally cleanup behavior that currently references
stderr_thread and self._kill(process).
- Around line 127-148: The adapter currently only uses the optional on_event
callback and omits emitting core audit events; update the Codex adapter to call
the central event pipeline (import and use the emitter from core/events.py) at
handshake start, handshake failure, approval decisions, and terminal turn
states: emit a "handshake_started" before calling self._handshake, emit
"handshake_failed" if not ok or in the except block (include error text), emit
approval/decision events from within self._stream_turn where approvals occur,
and emit a terminal "turn_finished" (with status and duration) just before
returning result; place these emissions alongside the existing on_event uses
(refer to self._handshake, self._stream_turn, and _kill) so all state
transitions flow through core/events.py as well as the on_event callback.
- Around line 268-290: The loop currently uses a fixed per-read timeout
(read_timeout_s) which can exceed remaining stall/turn deadlines; compute the
remaining times using self._stall_timeout_ms and last_event_time and
self._turn_timeout_ms and turn_start, convert to seconds, take the min positive
remaining_deadline = min(remaining_stall, remaining_turn, read_timeout_s), if
remaining_deadline <= 0 return the appropriate AgentResult failure immediately,
and call self._recv_line(stdout, timeout_s=remaining_deadline) so each read is
bounded by the nearest deadline; adjust references to
stall_timeout_s/turn_timeout_s/read_timeout_s/_recv_line/last_event_time/turn_start/_stall_timeout_ms/_turn_timeout_ms
accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 26f9a0bc-1a54-42b9-816b-43460f8e4295
📒 Files selected for processing (1)
codeframe/core/adapters/codex.py
| try: | ||
| ok = self._handshake( | ||
| process.stdin, process.stdout, prompt=prompt, workspace_path=workspace_path | ||
| ) | ||
| if not ok: | ||
| self._kill(process) | ||
| return AgentResult( | ||
| status="failed", | ||
| error="Codex app-server handshake failed (no initialized response)", | ||
| ) | ||
|
|
||
| result = self._stream_turn(process.stdout, on_event=on_event, stdin=process.stdin) | ||
| except Exception as exc: | ||
| self._kill(process) | ||
| return AgentResult(status="failed", error=str(exc)) | ||
| finally: | ||
| stderr_thread.join(timeout=5) | ||
| self._kill(process) | ||
|
|
||
| result.modified_files = self._detect_modified_files(workspace_path) | ||
| result.duration_ms = int((time.monotonic() - start) * 1000) | ||
| return result |
There was a problem hiding this comment.
State transitions are not reaching the core audit event pipeline.
This adapter only uses the optional on_event callback. Handshake start/failure, approval decisions, and terminal turn states never emit via core/events.py, so those transitions disappear from the standard audit/observability path.
As per coding guidelines, "All core modules must emit events for state transitions via core/events.py for audit and observability".
Also applies to: 203-252, 258-361
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@codeframe/core/adapters/codex.py` around lines 127 - 148, The adapter
currently only uses the optional on_event callback and omits emitting core audit
events; update the Codex adapter to call the central event pipeline (import and
use the emitter from core/events.py) at handshake start, handshake failure,
approval decisions, and terminal turn states: emit a "handshake_started" before
calling self._handshake, emit "handshake_failed" if not ok or in the except
block (include error text), emit approval/decision events from within
self._stream_turn where approvals occur, and emit a terminal "turn_finished"
(with status and duration) just before returning result; place these emissions
alongside the existing on_event uses (refer to self._handshake,
self._stream_turn, and _kill) so all state transitions flow through
core/events.py as well as the on_event callback.
| finally: | ||
| stderr_thread.join(timeout=5) | ||
| self._kill(process) | ||
|
|
There was a problem hiding this comment.
Kill the child before joining the stderr drain thread.
stderr_thread is blocked on process.stderr.read() until the subprocess exits. On the success path the app-server is still alive here, so join(timeout=5) waits the full timeout before _kill() finally unblocks it. That inflates duration_ms and makes completed turns look slow.
🧹 Proposed fix
finally:
- stderr_thread.join(timeout=5)
self._kill(process)
+ stderr_thread.join(timeout=5)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@codeframe/core/adapters/codex.py` around lines 142 - 145, The stderr drain
thread (stderr_thread) can block on process.stderr.read() until the subprocess
exits, so move the call to self._kill(process) to occur before
stderr_thread.join(timeout=5) so the subprocess is terminated first and the read
unblocks; ensure you still call stderr_thread.join(timeout=5) afterward to wait
for the thread to finish and preserve the existing finally cleanup behavior that
currently references stderr_thread and self._kill(process).
| stall_timeout_s = self._stall_timeout_ms / 1000 | ||
| turn_timeout_s = self._turn_timeout_ms / 1000 | ||
| read_timeout_s = self._read_timeout_ms / 1000 | ||
|
|
||
| while True: | ||
| # Check stall timeout | ||
| if stall_timeout_s > 0 and (time.monotonic() - last_event_time) > stall_timeout_s: | ||
| return AgentResult( | ||
| status="failed", | ||
| error=f"Stall timeout: no events for {self._stall_timeout_ms}ms", | ||
| ) | ||
|
|
||
| # Check turn timeout | ||
| if turn_timeout_s > 0 and (time.monotonic() - turn_start) > turn_timeout_s: | ||
| return AgentResult( | ||
| status="failed", | ||
| error=f"Turn timeout: exceeded {self._turn_timeout_ms}ms", | ||
| ) | ||
|
|
||
| msg = self._recv_line(stdout, timeout_s=read_timeout_s) | ||
| if msg is _TIMEOUT: | ||
| # Read timed out — loop back to check stall/turn timeouts | ||
| continue |
There was a problem hiding this comment.
read_timeout_ms can mask shorter stall/turn deadlines.
This loop only re-checks stall and turn timeouts after _recv_line() returns. If callers configure stall_timeout_ms or turn_timeout_ms below read_timeout_ms, the selector can sleep past the real deadline and fail much later than requested. Bound each read by the smallest remaining deadline instead of the fixed per-read timeout.
⏱️ Proposed fix
while True:
# Check stall timeout
if stall_timeout_s > 0 and (time.monotonic() - last_event_time) > stall_timeout_s:
return AgentResult(
status="failed",
@@
if turn_timeout_s > 0 and (time.monotonic() - turn_start) > turn_timeout_s:
return AgentResult(
status="failed",
error=f"Turn timeout: exceeded {self._turn_timeout_ms}ms",
)
- msg = self._recv_line(stdout, timeout_s=read_timeout_s)
+ stall_remaining = (
+ float("inf")
+ if stall_timeout_s <= 0
+ else max(0.0, stall_timeout_s - (time.monotonic() - last_event_time))
+ )
+ turn_remaining = (
+ float("inf")
+ if turn_timeout_s <= 0
+ else max(0.0, turn_timeout_s - (time.monotonic() - turn_start))
+ )
+ msg = self._recv_line(
+ stdout,
+ timeout_s=min(read_timeout_s, stall_remaining, turn_remaining),
+ )
if msg is _TIMEOUT:
# Read timed out — loop back to check stall/turn timeouts
continue🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@codeframe/core/adapters/codex.py` around lines 268 - 290, The loop currently
uses a fixed per-read timeout (read_timeout_s) which can exceed remaining
stall/turn deadlines; compute the remaining times using self._stall_timeout_ms
and last_event_time and self._turn_timeout_ms and turn_start, convert to
seconds, take the min positive remaining_deadline = min(remaining_stall,
remaining_turn, read_timeout_s), if remaining_deadline <= 0 return the
appropriate AgentResult failure immediately, and call self._recv_line(stdout,
timeout_s=remaining_deadline) so each read is bounded by the nearest deadline;
adjust references to
stall_timeout_s/turn_timeout_s/read_timeout_s/_recv_line/last_event_time/turn_start/_stall_timeout_ms/_turn_timeout_ms
accordingly.
Summary
Implements #412: Codex/OpenAI Engine Adapter (App-Server Protocol)
AgentAdapterprotocol with full JSON-RPC over stdioinitialize→initialized→thread/start→turn/startselectors(turn, read, stall — no blocking readline deadlocks)VALID_ENGINES+EXTERNAL_ENGINES)OPENAI_API_KEYvalidator for CLI pre-flight checksgit_utils.detect_modified_files()extracted from duplicated codeAcceptance Criteria
CodexAdapterimplementsAgentAdapterprotocolcf work start <id> --execute --engine codexTest Plan
Implementation Notes
codeframe/core/adapters/codex.py(notcore/engines/as original plan suggested — adapters package is the established location)SubprocessAdapter, Codex uses bidirectional JSON-RPC requiring a custom implementation_recv_lineuses OS-level I/O multiplexing to enforce read timeouts, preventing the classicreadline()deadlock_detect_modified_filesintocodeframe/core/adapters/git_utils.pyto eliminate code duplication betweenSubprocessAdapterandCodexAdapterCloses #412
Summary by CodeRabbit
New Features
Other
Tests