feat(adapters): Codex engine adapter with app-server JSON-RPC protocol by frankbria · Pull Request #437 · frankbria/codeframe

frankbria · 2026-03-13T16:35:52Z

Summary

Implements #412: Codex/OpenAI Engine Adapter (App-Server Protocol)

CodexAdapter implementing AgentAdapter protocol with full JSON-RPC over stdio
4-step handshake: initialize → initialized → thread/start → turn/start
Turn streaming with event routing (session_started, notification, tool_call, turn/completed, turn/failed, turn/cancelled)
Auto-approval of tool calls (configurable; non-auto policies reject with notification)
Timeout enforcement via selectors (turn, read, stall — no blocking readline deadlocks)
Token usage tracking from turn_completed events
Registered in engine registry (VALID_ENGINES + EXTERNAL_ENGINES)
OPENAI_API_KEY validator for CLI pre-flight checks
Shared git_utils.detect_modified_files() extracted from duplicated code

Acceptance Criteria

CodexAdapter implements AgentAdapter protocol
Full app-server handshake (initialize → thread/start → turn/start)
Turn streaming with proper line-delimited JSON parsing
Auto-approval of command/file change requests (configurable)
Timeout enforcement (turn, read, stall)
Token usage tracked from Codex events
Works with cf work start <id> --execute --engine codex
Integration test with mock app-server

Test Plan

24 unit tests covering protocol conformance, handshake, streaming, approval policies, timeouts, and full run
All 1681 core tests passing (0 regressions)
Ruff linting clean
Code review passed (addressed all critical/important findings)

Implementation Notes

Location: codeframe/core/adapters/codex.py (not core/engines/ as original plan suggested — adapters package is the established location)
Direct AgentAdapter implementation: Unlike Claude Code/OpenCode which extend SubprocessAdapter, Codex uses bidirectional JSON-RPC requiring a custom implementation
selectors-based timeout: _recv_line uses OS-level I/O multiplexing to enforce read timeouts, preventing the classic readline() deadlock
Shared git_utils: Extracted _detect_modified_files into codeframe/core/adapters/git_utils.py to eliminate code duplication between SubprocessAdapter and CodexAdapter

Closes #412

Summary by CodeRabbit

New Features
- Added Codex engine as an OpenAI-powered execution option.
- Codex available in interactive runs, batch runs, and work sessions.
- CLI now validates OpenAI API key when using Codex.
Other
- Improved detection of workspace file changes after runs.
Tests
- Added comprehensive tests for Codex integration and protocol behavior.

…tocol (#412) Implement CodexAdapter that speaks the Codex app-server JSON-RPC protocol over stdio, enabling `cf work start <id> --execute --engine codex`. - Full 4-step handshake: initialize → initialized → thread/start → turn/start - Turn streaming with event routing (session_started, notification, tool_call, turn/completed, turn/failed, turn/cancelled) - Auto-approval of tool calls (configurable via approval_policy) - Timeout enforcement: turn, read, and stall timeouts - Token usage tracking from turn_completed events - Registered in engine_registry (VALID_ENGINES + EXTERNAL_ENGINES) - OPENAI_API_KEY validator for CLI pre-flight checks - 23 unit tests covering protocol, handshake, streaming, approval, and full run

- Enforce read timeout via selectors (prevents readline() deadlock) - Handle non-auto approval policies by rejecting tool calls - Extract _detect_modified_files to shared git_utils module - Add test for non-auto approval policy rejection

coderabbitai · 2026-03-13T16:36:11Z

Walkthrough

Adds OpenAI Codex engine support: new CodexAdapter (app-server JSON-RPC over stdio), OpenAI API-key validator, engine_registry wiring for "codex", shared git-file detection utility, CLI integration for --engine codex, and comprehensive adapter tests.

Changes

Cohort / File(s)	Summary
CLI & Validators `codeframe/cli/app.py`, `codeframe/cli/validators.py`	Add `require_openai_api_key()` and invoke it when `--engine codex` across task, batch, and work-start flows; extend engine help text to list `codex`.
Codex External Adapter `codeframe/core/adapters/codex.py`	New `CodexAdapter` implementing app-server JSON-RPC over stdio: initialize/thread/turn handshake, streaming turn events, approval policies, token-usage extraction, subprocess lifecycle, timeouts, and `run()` → `AgentResult`.
Adapter Exports & Engine Registry `codeframe/core/adapters/__init__.py`, `codeframe/core/engine_registry.py`	Export `CodexAdapter`; add `"codex"` to `VALID_ENGINES` and `EXTERNAL_ENGINES`; resolve `"codex"` to `CodexAdapter`.
Git utils & Subprocess adapter `codeframe/core/adapters/git_utils.py`, `codeframe/core/adapters/subprocess_adapter.py`	Introduce `detect_modified_files(workspace_path)` and replace SubprocessAdapter's inline git-diff logic with a call to the new utility.
Tests `tests/core/adapters/test_codex.py`	Add extensive tests covering JSON-RPC framing, handshake, streaming, approval policies, timeouts, token usage extraction, modified-file detection, and full `run()` integration with a mocked subprocess.

Sequence Diagram

sequenceDiagram
    participant CLI
    participant EngineRegistry
    participant CodexAdapter
    participant Codex_Process as Codex App-Server
    participant Git as Workspace/Git

    CLI->>EngineRegistry: request adapter for "codex"
    EngineRegistry->>CodexAdapter: return CodexAdapter
    CLI->>CodexAdapter: run(task_id, prompt, workspace)
    CodexAdapter->>Codex_Process: spawn subprocess (codex_command)
    CodexAdapter->>Codex_Process: send initialize
    Codex_Process->>CodexAdapter: initialized
    CodexAdapter->>Codex_Process: send thread/start
    CodexAdapter->>Codex_Process: send turn/start
    Codex_Process->>CodexAdapter: session_started / notifications
    loop Turn streaming
        Codex_Process->>CodexAdapter: notification / progress / tool_call
        alt tool_call
            CodexAdapter->>Codex_Process: send approval (per policy)
        end
        CodexAdapter->>CLI: emit AgentEvent(progress)
    end
    Codex_Process->>CodexAdapter: turn/completed
    CodexAdapter->>Git: detect_modified_files(workspace)
    CodexAdapter->>CLI: return AgentResult(status, token_usage, modified_files)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

feat(adapters): detect modified files via git diff after subprocess execution #436: Adds/refactors modified-files detection used by subprocess adapters — directly related to git_utils.detect_modified_files extraction.
feat(core): engine registry and runtime selection (#414) #433: Engine registry and runtime selection changes — related to adding codex to VALID_ENGINES/EXTERNAL_ENGINES and adapter resolution.
fix: fail loudly when ANTHROPIC_API_KEY is missing (#376) #383: CLI API-key validator additions — related by adding OpenAI API-key validation parallel to existing anthropic validator.

Poem

🐰✨ I hopped in with a JSON-RPC song,
Initialize, thread, and turn—dance along,
I watched approvals and tokens flow,
Drummed the subprocess, then checked files below,
A codex hop — CodeFrame leaps long!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: adding a CodexAdapter implementing the app-server JSON-RPC protocol as a new engine adapter.
Linked Issues check	✅ Passed	The PR implements all acceptance criteria from issue `#412`: CodexAdapter with protocol handshake, JSON-RPC streaming, approval policies, timeouts, token tracking, CLI integration, and test coverage.
Out of Scope Changes check	✅ Passed	All changes are within scope: CodexAdapter implementation, supporting validators/registry updates, git utility extraction, and comprehensive tests directly support the Codex adapter objective.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/issue-412-codex-engine-adapter

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude · 2026-03-13T16:46:41Z

Code Review — Codex Engine Adapter (#437)

This is a clean, well-structured implementation. The JSON-RPC protocol handling, selectors-based timeout enforcement, and test coverage are all solid. A few items worth addressing before merge:

Issues

1. sandbox_mode is accepted but never used (codex.py, __init__)

The parameter is accepted and stored as self._sandbox_mode but never passed to the codex app-server subprocess command. Either forward it as a CLI flag (e.g. --sandbox-mode) or remove it until the feature is ready. Accepting unused configuration is confusing.

2. Double-kill in run() (codex.py)

The except block calls self._kill(process) and the finally block also calls self._kill(process) unconditionally. Since finally always runs after except, the call inside except is redundant. Either remove it from except or restructure. Current code is harmless but slightly misleading.

3. API key validation is hardcoded rather than registry-driven (app.py)

The if engine == "codex": require_openai_api_key() pattern appears in two places (work_start and batch_run). As more non-Anthropic engines are added this will grow into a long if/elif chain. CodexAdapter.requirements() already returns {"OPENAI_API_KEY": ...} — consider a registry helper like get_required_env_vars(engine) that app.py can call uniformly instead.

4. Test-specific branch in production _recv_line (codex.py)

The comment says "Mock objects (in tests) won't have fileno(), so fall back to direct readline for those." Production code branching on mock-object characteristics is a code smell. _FakeStdout in the test file could raise io.UnsupportedOperation from fileno() (which is what real non-file streams like StringIO do), and the production code can catch that exception rather than duck-typing for the mock. This would test the real fallback path rather than a mock-detection shortcut.

Minor Observations

require_openai_api_key duplicates require_anthropic_api_key (validators.py): The two functions are structurally identical. A private helper _require_api_key(var_name, friendly_name) would eliminate ~35 lines of duplication.

Stall timeout test is testing EOF, not stall (test_codex.py, test_stall_timeout): The test sets stall_timeout_ms=100 but passes an empty _FakeStdout, so the EOF branch fires immediately without waiting 100ms. The test name implies time-based stall behaviour but it is actually testing EOF. Consider renaming to test_eof_returns_stall_error or adding a companion test that patches time.monotonic to exercise the real stall path.

What's Working Well

selectors-based timeout in _recv_line correctly avoids readline() deadlocks — good use of OS-level I/O multiplexing
Extracting detect_modified_files to git_utils.py eliminates real duplication cleanly
The 4-step handshake accepts both method="initialized" and a result response — correct for JSON-RPC 2.0 which allows both notification and response forms
24 tests with good protocol coverage and an end-to-end TestCodexFullRun
Engine registry registration is clean and consistent with existing adapters

The sandbox_mode item (1) is the one that should be resolved before merge — the rest are improvement opportunities.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

codeframe/cli/app.py (1)

2063-2069: Centralize engine credential preflight.

This engine-to-credential branching now exists in two commands, and work_retry() still has its own Anthropic-only check at Lines 2523-2524. A shared helper would keep Codex auth behavior from drifting across start, batch, and retry paths.

Also applies to: 2994-3000

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@codeframe/cli/app.py` around lines 2063 - 2069, The engine-to-credential
branching is duplicated across commands (start, batch and work_retry) which
causes inconsistent Codex/Anthropic checks; create a centralized helper (e.g.,
preflight_engine_credentials(engine)) that imports and uses is_external_engine,
require_openai_api_key and require_anthropic_api_key to perform the correct
check for "codex" vs non-external engines, then replace the duplicated blocks in
the start/batch command handlers and in work_retry() with a call to this helper
(also update the occurrences around the other noted region). Ensure the helper
is invoked before any engine-specific actions so all paths share the same auth
preflight behavior.

tests/core/adapters/test_codex.py (1)

223-231: This is exercising EOF, not a stall.

_FakeStdout([]) returns "" immediately, so this test only covers the EOF branch. It will not catch regressions where a per-read timeout fires before stall_timeout_ms.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/core/adapters/test_codex.py` around lines 223 - 231, The test currently
uses _FakeStdout([]) which returns EOF immediately and only exercises the EOF
branch; change the test_stall_timeout to use a fake stdout that simulates a
blocking read (i.e., does not raise StopIteration and does not yield data) so
adapter._stream_turn actually hits the stall timeout path; update or extend
_FakeStdout (or create a new _BlockingStdout) to support a
blocking/never-yielding mode and use that in test_stall_timeout with
stall_timeout_ms=100 and on_event=None so the assertion that result.status ==
"failed" and the error mentions "stall" validates the per-read stall behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@codeframe/core/adapters/codex.py`:
- Around line 48-57: Constructor stores sandbox_mode into self._sandbox_mode but
run() never forwards it, so setting sandbox_mode has no effect; update the code
that builds the command or JSON-RPC payload in the run() method (and any other
place that assembles cmd/params, e.g., where cmd variable or payload dict is
created) to include the sandbox flag from self._sandbox_mode (e.g., add a
"--sandbox" CLI arg when constructing codex command or add "sandbox_mode":
self._sandbox_mode to the JSON payload), and ensure downstream code
serializes/handles a None value correctly so behavior remains unchanged when
sandbox_mode is not provided.

In `@tests/core/adapters/test_codex.py`:
- Around line 1-8: This test module is missing the v2 mark; add a module-level
pytestmark = pytest.mark.v2 so pytest selects it as a v2 test. Edit the top of
the test module to define pytestmark (using the existing imported pytest) and
assign pytest.mark.v2, ensuring the module-level symbol pytestmark is present
before any tests run.

---

Nitpick comments:
In `@codeframe/cli/app.py`:
- Around line 2063-2069: The engine-to-credential branching is duplicated across
commands (start, batch and work_retry) which causes inconsistent Codex/Anthropic
checks; create a centralized helper (e.g., preflight_engine_credentials(engine))
that imports and uses is_external_engine, require_openai_api_key and
require_anthropic_api_key to perform the correct check for "codex" vs
non-external engines, then replace the duplicated blocks in the start/batch
command handlers and in work_retry() with a call to this helper (also update the
occurrences around the other noted region). Ensure the helper is invoked before
any engine-specific actions so all paths share the same auth preflight behavior.

In `@tests/core/adapters/test_codex.py`:
- Around line 223-231: The test currently uses _FakeStdout([]) which returns EOF
immediately and only exercises the EOF branch; change the test_stall_timeout to
use a fake stdout that simulates a blocking read (i.e., does not raise
StopIteration and does not yield data) so adapter._stream_turn actually hits the
stall timeout path; update or extend _FakeStdout (or create a new
_BlockingStdout) to support a blocking/never-yielding mode and use that in
test_stall_timeout with stall_timeout_ms=100 and on_event=None so the assertion
that result.status == "failed" and the error mentions "stall" validates the
per-read stall behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 051eb633-10c0-4409-b93e-3efc6983c2d1

📥 Commits

Reviewing files that changed from the base of the PR and between 81fe3fc and b2b47e5.

📒 Files selected for processing (8)

codeframe/cli/app.py
codeframe/cli/validators.py
codeframe/core/adapters/__init__.py
codeframe/core/adapters/codex.py
codeframe/core/adapters/git_utils.py
codeframe/core/adapters/subprocess_adapter.py
codeframe/core/engine_registry.py
tests/core/adapters/test_codex.py

- Differentiate read timeout vs EOF in _recv_line using _TIMEOUT sentinel (prevents premature failure when stall/turn timeouts should still apply) - Forward sandbox_mode to initialize params (was a no-op) - Add pytestmark = pytest.mark.v2 to test module

claude · 2026-03-13T16:53:23Z

Follow-up Review — Codex Adapter (#437)

Reviewing the updated commit. Checking status of previous findings and flagging a new issue.

Correction to My Previous Review

sandbox_mode IS forwarded — I was wrong. The _handshake method already passes it in the JSON-RPC initialize params via init_params["sandbox_mode"] = self._sandbox_mode. Disregard prior issue #1.

New Bug: `_handshake` doesn't handle `_TIMEOUT` sentinel

_recv_line returns one of three values: a parsed dict, None (EOF), or the _TIMEOUT sentinel. _stream_turn handles all three correctly. But _handshake only guards for None:

response = self._recv_line(stdout, timeout_s=timeout_s)
if response is None:
    return False
method = response.get("method", "")  # AttributeError if response is _TIMEOUT

If the subprocess is slow to respond and the read times out, _recv_line returns the _TIMEOUT sentinel and the next line raises AttributeError: 'object' object has no attribute 'get'. The fix is:

if response is None or response is _TIMEOUT:
    return False

This is masked in tests because _FakeStdout has no fileno, so the selector/_TIMEOUT path is never exercised in handshake tests.

Still Open from Previous Review

Double-kill in run(): The except block calls self._kill(process) and the finally block also calls it unconditionally. Harmless due to idempotency but still redundant.

API key validation hardcoded: The if engine == "codex": require_openai_api_key() pattern appears in work_start and batch_run. CodeRabbit also noted work_retry() has an Anthropic-only check. CodexAdapter.requirements() already returns {"OPENAI_API_KEY": ...} — a shared preflight_engine_credentials(engine) helper would prevent drift as more engines are added.

Mock detection in _recv_line: The hasattr(stdout, "fileno") check is still present. Minor code smell — the selector fallback should be triggered by capability (catching io.UnsupportedOperation), not mock-object detection.

What's Working

Stall test renamed to test_eof_detected_as_failure — addresses the naming concern I raised.
pytestmark = pytest.mark.v2 is present — CodeRabbit's flag resolved.
_stream_turn correctly handles all three _recv_line return values; the asymmetry with _handshake is the isolated gap.

The _TIMEOUT handling in _handshake should be addressed before merge — it's a latent crash on slow Codex startup. The other items are clean-up opportunities.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@codeframe/core/adapters/codex.py`:
- Around line 231-238: The _handshake function assumes response is a dict but
_recv_line can return the _TIMEOUT sentinel (an object) causing an
AttributeError when calling response.get; update _handshake to explicitly check
for the _TIMEOUT sentinel (or non-dict) immediately after calling _recv_line and
return False on timeout before accessing response.get, referencing _handshake,
_recv_line and the _TIMEOUT symbol to locate the change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0f71f3d6-e696-4e26-b3e6-f2e494474d83

📥 Commits

Reviewing files that changed from the base of the PR and between b2b47e5 and 95b8024.

📒 Files selected for processing (2)

codeframe/core/adapters/codex.py
tests/core/adapters/test_codex.py

Prevents AttributeError when _recv_line returns _TIMEOUT during the initialized wait step (CodeRabbit finding).

claude · 2026-03-13T17:00:28Z

Follow-up Review: Codex Adapter (cdbda79 latest commit)

Previously Flagged - Now Resolved:

_TIMEOUT bug in _handshake: Fixed. The diff shows the sentinel check before response.get() is now in place.
Stall test renamed to test_eof_detected_as_failure: still present and correct.
pytestmark = pytest.mark.v2: present in test file, resolved.
sandbox_mode forwarded in handshake: confirmed correct.

Still Open:

Double-kill in run() (codex.py): The except block calls self._kill(process) and the finally block also calls it unconditionally. Since finally always executes after except, the call inside except is dead code on the error path. Harmless due to _kill idempotency, but obscures intent.
API key validation hardcoded in CLI (app.py lines 2063-2069 and 2994-3000): The Codex preflight block is duplicated in work_start and batch_run. CodeRabbit also flagged that work_retry at lines 2523-2524 still has an Anthropic-only check with no Codex equivalent - meaning cf work resume --engine codex bypasses the preflight entirely. CodexAdapter.requirements() already returns the needed metadata. A preflight_engine_credentials(engine) helper in validators.py would fix both the duplication and the work_retry gap in one change.
Mock detection in _recv_line (codex.py): hasattr(stdout, fileno) is still the selector-fallback trigger. The correct signal for non-file streams is io.UnsupportedOperation raised by fileno() - which is what StringIO and similar types raise. Catching that exception instead of duck-typing for attribute absence would exercise the real fallback path in both production and tests, removing the implicit mock-coupling.

Summary: The critical runtime bug (_TIMEOUT crash on slow handshake) is fixed. Remaining items are one dead-code issue (double-kill), one correctness gap (work_retry missing Codex preflight - the only item that causes a silent user-visible failure), and one code smell (mock detection). The work_retry gap is worth addressing before merge; the others can follow.

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@codeframe/core/adapters/codex.py`:
- Around line 142-145: The stderr drain thread (stderr_thread) can block on
process.stderr.read() until the subprocess exits, so move the call to
self._kill(process) to occur before stderr_thread.join(timeout=5) so the
subprocess is terminated first and the read unblocks; ensure you still call
stderr_thread.join(timeout=5) afterward to wait for the thread to finish and
preserve the existing finally cleanup behavior that currently references
stderr_thread and self._kill(process).
- Around line 127-148: The adapter currently only uses the optional on_event
callback and omits emitting core audit events; update the Codex adapter to call
the central event pipeline (import and use the emitter from core/events.py) at
handshake start, handshake failure, approval decisions, and terminal turn
states: emit a "handshake_started" before calling self._handshake, emit
"handshake_failed" if not ok or in the except block (include error text), emit
approval/decision events from within self._stream_turn where approvals occur,
and emit a terminal "turn_finished" (with status and duration) just before
returning result; place these emissions alongside the existing on_event uses
(refer to self._handshake, self._stream_turn, and _kill) so all state
transitions flow through core/events.py as well as the on_event callback.
- Around line 268-290: The loop currently uses a fixed per-read timeout
(read_timeout_s) which can exceed remaining stall/turn deadlines; compute the
remaining times using self._stall_timeout_ms and last_event_time and
self._turn_timeout_ms and turn_start, convert to seconds, take the min positive
remaining_deadline = min(remaining_stall, remaining_turn, read_timeout_s), if
remaining_deadline <= 0 return the appropriate AgentResult failure immediately,
and call self._recv_line(stdout, timeout_s=remaining_deadline) so each read is
bounded by the nearest deadline; adjust references to
stall_timeout_s/turn_timeout_s/read_timeout_s/_recv_line/last_event_time/turn_start/_stall_timeout_ms/_turn_timeout_ms
accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 26f9a0bc-1a54-42b9-816b-43460f8e4295

📥 Commits

Reviewing files that changed from the base of the PR and between 95b8024 and cdbda79.

📒 Files selected for processing (1)

codeframe/core/adapters/codex.py

coderabbitai · 2026-03-13T17:04:23Z

+        try:
+            ok = self._handshake(
+                process.stdin, process.stdout, prompt=prompt, workspace_path=workspace_path
+            )
+            if not ok:
+                self._kill(process)
+                return AgentResult(
+                    status="failed",
+                    error="Codex app-server handshake failed (no initialized response)",
+                )
+
+            result = self._stream_turn(process.stdout, on_event=on_event, stdin=process.stdin)
+        except Exception as exc:
+            self._kill(process)
+            return AgentResult(status="failed", error=str(exc))
+        finally:
+            stderr_thread.join(timeout=5)
+            self._kill(process)
+
+        result.modified_files = self._detect_modified_files(workspace_path)
+        result.duration_ms = int((time.monotonic() - start) * 1000)
+        return result


⚠️ Potential issue | 🟠 Major

State transitions are not reaching the core audit event pipeline.

This adapter only uses the optional on_event callback. Handshake start/failure, approval decisions, and terminal turn states never emit via core/events.py, so those transitions disappear from the standard audit/observability path.

As per coding guidelines, "All core modules must emit events for state transitions via core/events.py for audit and observability".

Also applies to: 203-252, 258-361

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@codeframe/core/adapters/codex.py` around lines 127 - 148, The adapter currently only uses the optional on_event callback and omits emitting core audit events; update the Codex adapter to call the central event pipeline (import and use the emitter from core/events.py) at handshake start, handshake failure, approval decisions, and terminal turn states: emit a "handshake_started" before calling self._handshake, emit "handshake_failed" if not ok or in the except block (include error text), emit approval/decision events from within self._stream_turn where approvals occur, and emit a terminal "turn_finished" (with status and duration) just before returning result; place these emissions alongside the existing on_event uses (refer to self._handshake, self._stream_turn, and _kill) so all state transitions flow through core/events.py as well as the on_event callback.

coderabbitai · 2026-03-13T17:04:23Z

+        finally:
+            stderr_thread.join(timeout=5)
+            self._kill(process)
+


⚠️ Potential issue | 🟠 Major

Kill the child before joining the stderr drain thread.

stderr_thread is blocked on process.stderr.read() until the subprocess exits. On the success path the app-server is still alive here, so join(timeout=5) waits the full timeout before _kill() finally unblocks it. That inflates duration_ms and makes completed turns look slow.

🧹 Proposed fix

finally: - stderr_thread.join(timeout=5) self._kill(process) + stderr_thread.join(timeout=5)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@codeframe/core/adapters/codex.py` around lines 142 - 145, The stderr drain thread (stderr_thread) can block on process.stderr.read() until the subprocess exits, so move the call to self._kill(process) to occur before stderr_thread.join(timeout=5) so the subprocess is terminated first and the read unblocks; ensure you still call stderr_thread.join(timeout=5) afterward to wait for the thread to finish and preserve the existing finally cleanup behavior that currently references stderr_thread and self._kill(process).

coderabbitai · 2026-03-13T17:04:23Z

+        stall_timeout_s = self._stall_timeout_ms / 1000
+        turn_timeout_s = self._turn_timeout_ms / 1000
+        read_timeout_s = self._read_timeout_ms / 1000
+
+        while True:
+            # Check stall timeout
+            if stall_timeout_s > 0 and (time.monotonic() - last_event_time) > stall_timeout_s:
+                return AgentResult(
+                    status="failed",
+                    error=f"Stall timeout: no events for {self._stall_timeout_ms}ms",
+                )
+
+            # Check turn timeout
+            if turn_timeout_s > 0 and (time.monotonic() - turn_start) > turn_timeout_s:
+                return AgentResult(
+                    status="failed",
+                    error=f"Turn timeout: exceeded {self._turn_timeout_ms}ms",
+                )
+
+            msg = self._recv_line(stdout, timeout_s=read_timeout_s)
+            if msg is _TIMEOUT:
+                # Read timed out — loop back to check stall/turn timeouts
+                continue


⚠️ Potential issue | 🔴 Critical

read_timeout_ms can mask shorter stall/turn deadlines.

This loop only re-checks stall and turn timeouts after _recv_line() returns. If callers configure stall_timeout_ms or turn_timeout_ms below read_timeout_ms, the selector can sleep past the real deadline and fail much later than requested. Bound each read by the smallest remaining deadline instead of the fixed per-read timeout.

⏱️ Proposed fix

while True: # Check stall timeout if stall_timeout_s > 0 and (time.monotonic() - last_event_time) > stall_timeout_s: return AgentResult( status="failed", @@ if turn_timeout_s > 0 and (time.monotonic() - turn_start) > turn_timeout_s: return AgentResult( status="failed", error=f"Turn timeout: exceeded {self._turn_timeout_ms}ms", ) - msg = self._recv_line(stdout, timeout_s=read_timeout_s) + stall_remaining = ( + float("inf") + if stall_timeout_s <= 0 + else max(0.0, stall_timeout_s - (time.monotonic() - last_event_time)) + ) + turn_remaining = ( + float("inf") + if turn_timeout_s <= 0 + else max(0.0, turn_timeout_s - (time.monotonic() - turn_start)) + ) + msg = self._recv_line( + stdout, + timeout_s=min(read_timeout_s, stall_remaining, turn_remaining), + ) if msg is _TIMEOUT: # Read timed out — loop back to check stall/turn timeouts continue

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@codeframe/core/adapters/codex.py` around lines 268 - 290, The loop currently uses a fixed per-read timeout (read_timeout_s) which can exceed remaining stall/turn deadlines; compute the remaining times using self._stall_timeout_ms and last_event_time and self._turn_timeout_ms and turn_start, convert to seconds, take the min positive remaining_deadline = min(remaining_stall, remaining_turn, read_timeout_s), if remaining_deadline <= 0 return the appropriate AgentResult failure immediately, and call self._recv_line(stdout, timeout_s=remaining_deadline) so each read is bounded by the nearest deadline; adjust references to stall_timeout_s/turn_timeout_s/read_timeout_s/_recv_line/last_event_time/turn_start/_stall_timeout_ms/_turn_timeout_ms accordingly.

Test User added 2 commits March 13, 2026 09:28

coderabbitai Bot reviewed Mar 13, 2026

View reviewed changes

Comment thread codeframe/core/adapters/codex.py

Comment thread codeframe/core/adapters/codex.py Outdated

Comment thread tests/core/adapters/test_codex.py Outdated

coderabbitai Bot reviewed Mar 13, 2026

View reviewed changes

Comment thread codeframe/core/adapters/codex.py

fix(adapters): handle _TIMEOUT sentinel in handshake response check

cdbda79

Prevents AttributeError when _recv_line returns _TIMEOUT during the initialized wait step (CodeRabbit finding).

coderabbitai Bot reviewed Mar 13, 2026

View reviewed changes

frankbria merged commit ad9ea52 into main Mar 13, 2026
17 checks passed

coderabbitai Bot mentioned this pull request Mar 19, 2026

feat(engines): add Kilocode engine adapter (#417) #451

Merged

4 tasks

frankbria deleted the feature/issue-412-codex-engine-adapter branch March 24, 2026 23:28

This was referenced Mar 26, 2026

Backend: Streaming chat adapter for Anthropic SDK (token-by-token relay) #503

Closed

feat(core): StreamingChatAdapter for token-by-token Anthropic relay (#503) #512

Merged

coderabbitai Bot mentioned this pull request Apr 3, 2026

[Phase 4.B] E2BAgentAdapter: cloud execution in isolated Linux VM #534

Closed

8 tasks

Conversation

frankbria commented Mar 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Acceptance Criteria

Test Plan

Implementation Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

claude Bot commented Mar 13, 2026

Code Review — Codex Engine Adapter (#437)

Issues

Minor Observations

What's Working Well

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude Bot commented Mar 13, 2026

Follow-up Review — Codex Adapter (#437)

Correction to My Previous Review

New Bug: _handshake doesn't handle _TIMEOUT sentinel

Still Open from Previous Review

What's Working

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

claude Bot commented Mar 13, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

frankbria commented Mar 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 13, 2026 •

edited

Loading

New Bug: `_handshake` doesn't handle `_TIMEOUT` sentinel