Skip to content

feat(adapters): Codex engine adapter with app-server JSON-RPC protocol#437

Merged
frankbria merged 4 commits into
mainfrom
feature/issue-412-codex-engine-adapter
Mar 13, 2026
Merged

feat(adapters): Codex engine adapter with app-server JSON-RPC protocol#437
frankbria merged 4 commits into
mainfrom
feature/issue-412-codex-engine-adapter

Conversation

@frankbria

@frankbria frankbria commented Mar 13, 2026

Copy link
Copy Markdown
Owner

Summary

Implements #412: Codex/OpenAI Engine Adapter (App-Server Protocol)

  • CodexAdapter implementing AgentAdapter protocol with full JSON-RPC over stdio
  • 4-step handshake: initializeinitializedthread/startturn/start
  • Turn streaming with event routing (session_started, notification, tool_call, turn/completed, turn/failed, turn/cancelled)
  • Auto-approval of tool calls (configurable; non-auto policies reject with notification)
  • Timeout enforcement via selectors (turn, read, stall — no blocking readline deadlocks)
  • Token usage tracking from turn_completed events
  • Registered in engine registry (VALID_ENGINES + EXTERNAL_ENGINES)
  • OPENAI_API_KEY validator for CLI pre-flight checks
  • Shared git_utils.detect_modified_files() extracted from duplicated code

Acceptance Criteria

  • CodexAdapter implements AgentAdapter protocol
  • Full app-server handshake (initialize → thread/start → turn/start)
  • Turn streaming with proper line-delimited JSON parsing
  • Auto-approval of command/file change requests (configurable)
  • Timeout enforcement (turn, read, stall)
  • Token usage tracked from Codex events
  • Works with cf work start <id> --execute --engine codex
  • Integration test with mock app-server

Test Plan

  • 24 unit tests covering protocol conformance, handshake, streaming, approval policies, timeouts, and full run
  • All 1681 core tests passing (0 regressions)
  • Ruff linting clean
  • Code review passed (addressed all critical/important findings)

Implementation Notes

  • Location: codeframe/core/adapters/codex.py (not core/engines/ as original plan suggested — adapters package is the established location)
  • Direct AgentAdapter implementation: Unlike Claude Code/OpenCode which extend SubprocessAdapter, Codex uses bidirectional JSON-RPC requiring a custom implementation
  • selectors-based timeout: _recv_line uses OS-level I/O multiplexing to enforce read timeouts, preventing the classic readline() deadlock
  • Shared git_utils: Extracted _detect_modified_files into codeframe/core/adapters/git_utils.py to eliminate code duplication between SubprocessAdapter and CodexAdapter

Closes #412

Summary by CodeRabbit

  • New Features

    • Added Codex engine as an OpenAI-powered execution option.
    • Codex available in interactive runs, batch runs, and work sessions.
    • CLI now validates OpenAI API key when using Codex.
  • Other

    • Improved detection of workspace file changes after runs.
  • Tests

    • Added comprehensive tests for Codex integration and protocol behavior.

Test User added 2 commits March 13, 2026 09:28
…tocol (#412)

Implement CodexAdapter that speaks the Codex app-server JSON-RPC protocol
over stdio, enabling `cf work start <id> --execute --engine codex`.

- Full 4-step handshake: initialize → initialized → thread/start → turn/start
- Turn streaming with event routing (session_started, notification, tool_call,
  turn/completed, turn/failed, turn/cancelled)
- Auto-approval of tool calls (configurable via approval_policy)
- Timeout enforcement: turn, read, and stall timeouts
- Token usage tracking from turn_completed events
- Registered in engine_registry (VALID_ENGINES + EXTERNAL_ENGINES)
- OPENAI_API_KEY validator for CLI pre-flight checks
- 23 unit tests covering protocol, handshake, streaming, approval, and full run
- Enforce read timeout via selectors (prevents readline() deadlock)
- Handle non-auto approval policies by rejecting tool calls
- Extract _detect_modified_files to shared git_utils module
- Add test for non-auto approval policy rejection
@coderabbitai

coderabbitai Bot commented Mar 13, 2026

Copy link
Copy Markdown
Contributor

Walkthrough

Adds OpenAI Codex engine support: new CodexAdapter (app-server JSON-RPC over stdio), OpenAI API-key validator, engine_registry wiring for "codex", shared git-file detection utility, CLI integration for --engine codex, and comprehensive adapter tests.

Changes

Cohort / File(s) Summary
CLI & Validators
codeframe/cli/app.py, codeframe/cli/validators.py
Add require_openai_api_key() and invoke it when --engine codex across task, batch, and work-start flows; extend engine help text to list codex.
Codex External Adapter
codeframe/core/adapters/codex.py
New CodexAdapter implementing app-server JSON-RPC over stdio: initialize/thread/turn handshake, streaming turn events, approval policies, token-usage extraction, subprocess lifecycle, timeouts, and run()AgentResult.
Adapter Exports & Engine Registry
codeframe/core/adapters/__init__.py, codeframe/core/engine_registry.py
Export CodexAdapter; add "codex" to VALID_ENGINES and EXTERNAL_ENGINES; resolve "codex" to CodexAdapter.
Git utils & Subprocess adapter
codeframe/core/adapters/git_utils.py, codeframe/core/adapters/subprocess_adapter.py
Introduce detect_modified_files(workspace_path) and replace SubprocessAdapter's inline git-diff logic with a call to the new utility.
Tests
tests/core/adapters/test_codex.py
Add extensive tests covering JSON-RPC framing, handshake, streaming, approval policies, timeouts, token usage extraction, modified-file detection, and full run() integration with a mocked subprocess.

Sequence Diagram

sequenceDiagram
    participant CLI
    participant EngineRegistry
    participant CodexAdapter
    participant Codex_Process as Codex App-Server
    participant Git as Workspace/Git

    CLI->>EngineRegistry: request adapter for "codex"
    EngineRegistry->>CodexAdapter: return CodexAdapter
    CLI->>CodexAdapter: run(task_id, prompt, workspace)
    CodexAdapter->>Codex_Process: spawn subprocess (codex_command)
    CodexAdapter->>Codex_Process: send initialize
    Codex_Process->>CodexAdapter: initialized
    CodexAdapter->>Codex_Process: send thread/start
    CodexAdapter->>Codex_Process: send turn/start
    Codex_Process->>CodexAdapter: session_started / notifications
    loop Turn streaming
        Codex_Process->>CodexAdapter: notification / progress / tool_call
        alt tool_call
            CodexAdapter->>Codex_Process: send approval (per policy)
        end
        CodexAdapter->>CLI: emit AgentEvent(progress)
    end
    Codex_Process->>CodexAdapter: turn/completed
    CodexAdapter->>Git: detect_modified_files(workspace)
    CodexAdapter->>CLI: return AgentResult(status, token_usage, modified_files)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰✨ I hopped in with a JSON-RPC song,
Initialize, thread, and turn—dance along,
I watched approvals and tokens flow,
Drummed the subprocess, then checked files below,
A codex hop — CodeFrame leaps long!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding a CodexAdapter implementing the app-server JSON-RPC protocol as a new engine adapter.
Linked Issues check ✅ Passed The PR implements all acceptance criteria from issue #412: CodexAdapter with protocol handshake, JSON-RPC streaming, approval policies, timeouts, token tracking, CLI integration, and test coverage.
Out of Scope Changes check ✅ Passed All changes are within scope: CodexAdapter implementation, supporting validators/registry updates, git utility extraction, and comprehensive tests directly support the Codex adapter objective.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/issue-412-codex-engine-adapter
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude

claude Bot commented Mar 13, 2026

Copy link
Copy Markdown

Code Review — Codex Engine Adapter (#437)

This is a clean, well-structured implementation. The JSON-RPC protocol handling, selectors-based timeout enforcement, and test coverage are all solid. A few items worth addressing before merge:

Issues

1. sandbox_mode is accepted but never used (codex.py, __init__)

The parameter is accepted and stored as self._sandbox_mode but never passed to the codex app-server subprocess command. Either forward it as a CLI flag (e.g. --sandbox-mode) or remove it until the feature is ready. Accepting unused configuration is confusing.

2. Double-kill in run() (codex.py)

The except block calls self._kill(process) and the finally block also calls self._kill(process) unconditionally. Since finally always runs after except, the call inside except is redundant. Either remove it from except or restructure. Current code is harmless but slightly misleading.

3. API key validation is hardcoded rather than registry-driven (app.py)

The if engine == "codex": require_openai_api_key() pattern appears in two places (work_start and batch_run). As more non-Anthropic engines are added this will grow into a long if/elif chain. CodexAdapter.requirements() already returns {"OPENAI_API_KEY": ...} — consider a registry helper like get_required_env_vars(engine) that app.py can call uniformly instead.

4. Test-specific branch in production _recv_line (codex.py)

The comment says "Mock objects (in tests) won't have fileno(), so fall back to direct readline for those." Production code branching on mock-object characteristics is a code smell. _FakeStdout in the test file could raise io.UnsupportedOperation from fileno() (which is what real non-file streams like StringIO do), and the production code can catch that exception rather than duck-typing for the mock. This would test the real fallback path rather than a mock-detection shortcut.

Minor Observations

require_openai_api_key duplicates require_anthropic_api_key (validators.py): The two functions are structurally identical. A private helper _require_api_key(var_name, friendly_name) would eliminate ~35 lines of duplication.

Stall timeout test is testing EOF, not stall (test_codex.py, test_stall_timeout): The test sets stall_timeout_ms=100 but passes an empty _FakeStdout, so the EOF branch fires immediately without waiting 100ms. The test name implies time-based stall behaviour but it is actually testing EOF. Consider renaming to test_eof_returns_stall_error or adding a companion test that patches time.monotonic to exercise the real stall path.

What's Working Well

  • selectors-based timeout in _recv_line correctly avoids readline() deadlocks — good use of OS-level I/O multiplexing
  • Extracting detect_modified_files to git_utils.py eliminates real duplication cleanly
  • The 4-step handshake accepts both method="initialized" and a result response — correct for JSON-RPC 2.0 which allows both notification and response forms
  • 24 tests with good protocol coverage and an end-to-end TestCodexFullRun
  • Engine registry registration is clean and consistent with existing adapters

The sandbox_mode item (1) is the one that should be resolved before merge — the rest are improvement opportunities.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
codeframe/cli/app.py (1)

2063-2069: Centralize engine credential preflight.

This engine-to-credential branching now exists in two commands, and work_retry() still has its own Anthropic-only check at Lines 2523-2524. A shared helper would keep Codex auth behavior from drifting across start, batch, and retry paths.

Also applies to: 2994-3000

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/cli/app.py` around lines 2063 - 2069, The engine-to-credential
branching is duplicated across commands (start, batch and work_retry) which
causes inconsistent Codex/Anthropic checks; create a centralized helper (e.g.,
preflight_engine_credentials(engine)) that imports and uses is_external_engine,
require_openai_api_key and require_anthropic_api_key to perform the correct
check for "codex" vs non-external engines, then replace the duplicated blocks in
the start/batch command handlers and in work_retry() with a call to this helper
(also update the occurrences around the other noted region). Ensure the helper
is invoked before any engine-specific actions so all paths share the same auth
preflight behavior.
tests/core/adapters/test_codex.py (1)

223-231: This is exercising EOF, not a stall.

_FakeStdout([]) returns "" immediately, so this test only covers the EOF branch. It will not catch regressions where a per-read timeout fires before stall_timeout_ms.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/core/adapters/test_codex.py` around lines 223 - 231, The test currently
uses _FakeStdout([]) which returns EOF immediately and only exercises the EOF
branch; change the test_stall_timeout to use a fake stdout that simulates a
blocking read (i.e., does not raise StopIteration and does not yield data) so
adapter._stream_turn actually hits the stall timeout path; update or extend
_FakeStdout (or create a new _BlockingStdout) to support a
blocking/never-yielding mode and use that in test_stall_timeout with
stall_timeout_ms=100 and on_event=None so the assertion that result.status ==
"failed" and the error mentions "stall" validates the per-read stall behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@codeframe/core/adapters/codex.py`:
- Around line 48-57: Constructor stores sandbox_mode into self._sandbox_mode but
run() never forwards it, so setting sandbox_mode has no effect; update the code
that builds the command or JSON-RPC payload in the run() method (and any other
place that assembles cmd/params, e.g., where cmd variable or payload dict is
created) to include the sandbox flag from self._sandbox_mode (e.g., add a
"--sandbox" CLI arg when constructing codex command or add "sandbox_mode":
self._sandbox_mode to the JSON payload), and ensure downstream code
serializes/handles a None value correctly so behavior remains unchanged when
sandbox_mode is not provided.

In `@tests/core/adapters/test_codex.py`:
- Around line 1-8: This test module is missing the v2 mark; add a module-level
pytestmark = pytest.mark.v2 so pytest selects it as a v2 test. Edit the top of
the test module to define pytestmark (using the existing imported pytest) and
assign pytest.mark.v2, ensuring the module-level symbol pytestmark is present
before any tests run.

---

Nitpick comments:
In `@codeframe/cli/app.py`:
- Around line 2063-2069: The engine-to-credential branching is duplicated across
commands (start, batch and work_retry) which causes inconsistent Codex/Anthropic
checks; create a centralized helper (e.g., preflight_engine_credentials(engine))
that imports and uses is_external_engine, require_openai_api_key and
require_anthropic_api_key to perform the correct check for "codex" vs
non-external engines, then replace the duplicated blocks in the start/batch
command handlers and in work_retry() with a call to this helper (also update the
occurrences around the other noted region). Ensure the helper is invoked before
any engine-specific actions so all paths share the same auth preflight behavior.

In `@tests/core/adapters/test_codex.py`:
- Around line 223-231: The test currently uses _FakeStdout([]) which returns EOF
immediately and only exercises the EOF branch; change the test_stall_timeout to
use a fake stdout that simulates a blocking read (i.e., does not raise
StopIteration and does not yield data) so adapter._stream_turn actually hits the
stall timeout path; update or extend _FakeStdout (or create a new
_BlockingStdout) to support a blocking/never-yielding mode and use that in
test_stall_timeout with stall_timeout_ms=100 and on_event=None so the assertion
that result.status == "failed" and the error mentions "stall" validates the
per-read stall behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 051eb633-10c0-4409-b93e-3efc6983c2d1

📥 Commits

Reviewing files that changed from the base of the PR and between 81fe3fc and b2b47e5.

📒 Files selected for processing (8)
  • codeframe/cli/app.py
  • codeframe/cli/validators.py
  • codeframe/core/adapters/__init__.py
  • codeframe/core/adapters/codex.py
  • codeframe/core/adapters/git_utils.py
  • codeframe/core/adapters/subprocess_adapter.py
  • codeframe/core/engine_registry.py
  • tests/core/adapters/test_codex.py

Comment thread codeframe/core/adapters/codex.py
Comment thread codeframe/core/adapters/codex.py Outdated
Comment thread tests/core/adapters/test_codex.py Outdated
- Differentiate read timeout vs EOF in _recv_line using _TIMEOUT sentinel
  (prevents premature failure when stall/turn timeouts should still apply)
- Forward sandbox_mode to initialize params (was a no-op)
- Add pytestmark = pytest.mark.v2 to test module
@claude

claude Bot commented Mar 13, 2026

Copy link
Copy Markdown

Follow-up Review — Codex Adapter (#437)

Reviewing the updated commit. Checking status of previous findings and flagging a new issue.

Correction to My Previous Review

sandbox_mode IS forwarded — I was wrong. The _handshake method already passes it in the JSON-RPC initialize params via init_params["sandbox_mode"] = self._sandbox_mode. Disregard prior issue #1.

New Bug: _handshake doesn't handle _TIMEOUT sentinel

_recv_line returns one of three values: a parsed dict, None (EOF), or the _TIMEOUT sentinel. _stream_turn handles all three correctly. But _handshake only guards for None:

response = self._recv_line(stdout, timeout_s=timeout_s)
if response is None:
    return False
method = response.get("method", "")  # AttributeError if response is _TIMEOUT

If the subprocess is slow to respond and the read times out, _recv_line returns the _TIMEOUT sentinel and the next line raises AttributeError: 'object' object has no attribute 'get'. The fix is:

if response is None or response is _TIMEOUT:
    return False

This is masked in tests because _FakeStdout has no fileno, so the selector/_TIMEOUT path is never exercised in handshake tests.

Still Open from Previous Review

Double-kill in run(): The except block calls self._kill(process) and the finally block also calls it unconditionally. Harmless due to idempotency but still redundant.

API key validation hardcoded: The if engine == "codex": require_openai_api_key() pattern appears in work_start and batch_run. CodeRabbit also noted work_retry() has an Anthropic-only check. CodexAdapter.requirements() already returns {"OPENAI_API_KEY": ...} — a shared preflight_engine_credentials(engine) helper would prevent drift as more engines are added.

Mock detection in _recv_line: The hasattr(stdout, "fileno") check is still present. Minor code smell — the selector fallback should be triggered by capability (catching io.UnsupportedOperation), not mock-object detection.

What's Working

  • Stall test renamed to test_eof_detected_as_failure — addresses the naming concern I raised.
  • pytestmark = pytest.mark.v2 is present — CodeRabbit's flag resolved.
  • _stream_turn correctly handles all three _recv_line return values; the asymmetry with _handshake is the isolated gap.

The _TIMEOUT handling in _handshake should be addressed before merge — it's a latent crash on slow Codex startup. The other items are clean-up opportunities.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@codeframe/core/adapters/codex.py`:
- Around line 231-238: The _handshake function assumes response is a dict but
_recv_line can return the _TIMEOUT sentinel (an object) causing an
AttributeError when calling response.get; update _handshake to explicitly check
for the _TIMEOUT sentinel (or non-dict) immediately after calling _recv_line and
return False on timeout before accessing response.get, referencing _handshake,
_recv_line and the _TIMEOUT symbol to locate the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0f71f3d6-e696-4e26-b3e6-f2e494474d83

📥 Commits

Reviewing files that changed from the base of the PR and between b2b47e5 and 95b8024.

📒 Files selected for processing (2)
  • codeframe/core/adapters/codex.py
  • tests/core/adapters/test_codex.py

Comment thread codeframe/core/adapters/codex.py
Prevents AttributeError when _recv_line returns _TIMEOUT during
the initialized wait step (CodeRabbit finding).
@claude

claude Bot commented Mar 13, 2026

Copy link
Copy Markdown

Follow-up Review: Codex Adapter (cdbda79 latest commit)

Previously Flagged - Now Resolved:

  • _TIMEOUT bug in _handshake: Fixed. The diff shows the sentinel check before response.get() is now in place.
  • Stall test renamed to test_eof_detected_as_failure: still present and correct.
  • pytestmark = pytest.mark.v2: present in test file, resolved.
  • sandbox_mode forwarded in handshake: confirmed correct.

Still Open:

  1. Double-kill in run() (codex.py): The except block calls self._kill(process) and the finally block also calls it unconditionally. Since finally always executes after except, the call inside except is dead code on the error path. Harmless due to _kill idempotency, but obscures intent.

  2. API key validation hardcoded in CLI (app.py lines 2063-2069 and 2994-3000): The Codex preflight block is duplicated in work_start and batch_run. CodeRabbit also flagged that work_retry at lines 2523-2524 still has an Anthropic-only check with no Codex equivalent - meaning cf work resume --engine codex bypasses the preflight entirely. CodexAdapter.requirements() already returns the needed metadata. A preflight_engine_credentials(engine) helper in validators.py would fix both the duplication and the work_retry gap in one change.

  3. Mock detection in _recv_line (codex.py): hasattr(stdout, fileno) is still the selector-fallback trigger. The correct signal for non-file streams is io.UnsupportedOperation raised by fileno() - which is what StringIO and similar types raise. Catching that exception instead of duck-typing for attribute absence would exercise the real fallback path in both production and tests, removing the implicit mock-coupling.

Summary: The critical runtime bug (_TIMEOUT crash on slow handshake) is fixed. Remaining items are one dead-code issue (double-kill), one correctness gap (work_retry missing Codex preflight - the only item that causes a silent user-visible failure), and one code smell (mock detection). The work_retry gap is worth addressing before merge; the others can follow.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@codeframe/core/adapters/codex.py`:
- Around line 142-145: The stderr drain thread (stderr_thread) can block on
process.stderr.read() until the subprocess exits, so move the call to
self._kill(process) to occur before stderr_thread.join(timeout=5) so the
subprocess is terminated first and the read unblocks; ensure you still call
stderr_thread.join(timeout=5) afterward to wait for the thread to finish and
preserve the existing finally cleanup behavior that currently references
stderr_thread and self._kill(process).
- Around line 127-148: The adapter currently only uses the optional on_event
callback and omits emitting core audit events; update the Codex adapter to call
the central event pipeline (import and use the emitter from core/events.py) at
handshake start, handshake failure, approval decisions, and terminal turn
states: emit a "handshake_started" before calling self._handshake, emit
"handshake_failed" if not ok or in the except block (include error text), emit
approval/decision events from within self._stream_turn where approvals occur,
and emit a terminal "turn_finished" (with status and duration) just before
returning result; place these emissions alongside the existing on_event uses
(refer to self._handshake, self._stream_turn, and _kill) so all state
transitions flow through core/events.py as well as the on_event callback.
- Around line 268-290: The loop currently uses a fixed per-read timeout
(read_timeout_s) which can exceed remaining stall/turn deadlines; compute the
remaining times using self._stall_timeout_ms and last_event_time and
self._turn_timeout_ms and turn_start, convert to seconds, take the min positive
remaining_deadline = min(remaining_stall, remaining_turn, read_timeout_s), if
remaining_deadline <= 0 return the appropriate AgentResult failure immediately,
and call self._recv_line(stdout, timeout_s=remaining_deadline) so each read is
bounded by the nearest deadline; adjust references to
stall_timeout_s/turn_timeout_s/read_timeout_s/_recv_line/last_event_time/turn_start/_stall_timeout_ms/_turn_timeout_ms
accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 26f9a0bc-1a54-42b9-816b-43460f8e4295

📥 Commits

Reviewing files that changed from the base of the PR and between 95b8024 and cdbda79.

📒 Files selected for processing (1)
  • codeframe/core/adapters/codex.py

Comment on lines +127 to +148
try:
ok = self._handshake(
process.stdin, process.stdout, prompt=prompt, workspace_path=workspace_path
)
if not ok:
self._kill(process)
return AgentResult(
status="failed",
error="Codex app-server handshake failed (no initialized response)",
)

result = self._stream_turn(process.stdout, on_event=on_event, stdin=process.stdin)
except Exception as exc:
self._kill(process)
return AgentResult(status="failed", error=str(exc))
finally:
stderr_thread.join(timeout=5)
self._kill(process)

result.modified_files = self._detect_modified_files(workspace_path)
result.duration_ms = int((time.monotonic() - start) * 1000)
return result

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

State transitions are not reaching the core audit event pipeline.

This adapter only uses the optional on_event callback. Handshake start/failure, approval decisions, and terminal turn states never emit via core/events.py, so those transitions disappear from the standard audit/observability path.

As per coding guidelines, "All core modules must emit events for state transitions via core/events.py for audit and observability".

Also applies to: 203-252, 258-361

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/core/adapters/codex.py` around lines 127 - 148, The adapter
currently only uses the optional on_event callback and omits emitting core audit
events; update the Codex adapter to call the central event pipeline (import and
use the emitter from core/events.py) at handshake start, handshake failure,
approval decisions, and terminal turn states: emit a "handshake_started" before
calling self._handshake, emit "handshake_failed" if not ok or in the except
block (include error text), emit approval/decision events from within
self._stream_turn where approvals occur, and emit a terminal "turn_finished"
(with status and duration) just before returning result; place these emissions
alongside the existing on_event uses (refer to self._handshake,
self._stream_turn, and _kill) so all state transitions flow through
core/events.py as well as the on_event callback.

Comment on lines +142 to +145
finally:
stderr_thread.join(timeout=5)
self._kill(process)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Kill the child before joining the stderr drain thread.

stderr_thread is blocked on process.stderr.read() until the subprocess exits. On the success path the app-server is still alive here, so join(timeout=5) waits the full timeout before _kill() finally unblocks it. That inflates duration_ms and makes completed turns look slow.

🧹 Proposed fix
         finally:
-            stderr_thread.join(timeout=5)
             self._kill(process)
+            stderr_thread.join(timeout=5)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/core/adapters/codex.py` around lines 142 - 145, The stderr drain
thread (stderr_thread) can block on process.stderr.read() until the subprocess
exits, so move the call to self._kill(process) to occur before
stderr_thread.join(timeout=5) so the subprocess is terminated first and the read
unblocks; ensure you still call stderr_thread.join(timeout=5) afterward to wait
for the thread to finish and preserve the existing finally cleanup behavior that
currently references stderr_thread and self._kill(process).

Comment on lines +268 to +290
stall_timeout_s = self._stall_timeout_ms / 1000
turn_timeout_s = self._turn_timeout_ms / 1000
read_timeout_s = self._read_timeout_ms / 1000

while True:
# Check stall timeout
if stall_timeout_s > 0 and (time.monotonic() - last_event_time) > stall_timeout_s:
return AgentResult(
status="failed",
error=f"Stall timeout: no events for {self._stall_timeout_ms}ms",
)

# Check turn timeout
if turn_timeout_s > 0 and (time.monotonic() - turn_start) > turn_timeout_s:
return AgentResult(
status="failed",
error=f"Turn timeout: exceeded {self._turn_timeout_ms}ms",
)

msg = self._recv_line(stdout, timeout_s=read_timeout_s)
if msg is _TIMEOUT:
# Read timed out — loop back to check stall/turn timeouts
continue

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

read_timeout_ms can mask shorter stall/turn deadlines.

This loop only re-checks stall and turn timeouts after _recv_line() returns. If callers configure stall_timeout_ms or turn_timeout_ms below read_timeout_ms, the selector can sleep past the real deadline and fail much later than requested. Bound each read by the smallest remaining deadline instead of the fixed per-read timeout.

⏱️ Proposed fix
         while True:
             # Check stall timeout
             if stall_timeout_s > 0 and (time.monotonic() - last_event_time) > stall_timeout_s:
                 return AgentResult(
                     status="failed",
@@
             if turn_timeout_s > 0 and (time.monotonic() - turn_start) > turn_timeout_s:
                 return AgentResult(
                     status="failed",
                     error=f"Turn timeout: exceeded {self._turn_timeout_ms}ms",
                 )
 
-            msg = self._recv_line(stdout, timeout_s=read_timeout_s)
+            stall_remaining = (
+                float("inf")
+                if stall_timeout_s <= 0
+                else max(0.0, stall_timeout_s - (time.monotonic() - last_event_time))
+            )
+            turn_remaining = (
+                float("inf")
+                if turn_timeout_s <= 0
+                else max(0.0, turn_timeout_s - (time.monotonic() - turn_start))
+            )
+            msg = self._recv_line(
+                stdout,
+                timeout_s=min(read_timeout_s, stall_remaining, turn_remaining),
+            )
             if msg is _TIMEOUT:
                 # Read timed out — loop back to check stall/turn timeouts
                 continue
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/core/adapters/codex.py` around lines 268 - 290, The loop currently
uses a fixed per-read timeout (read_timeout_s) which can exceed remaining
stall/turn deadlines; compute the remaining times using self._stall_timeout_ms
and last_event_time and self._turn_timeout_ms and turn_start, convert to
seconds, take the min positive remaining_deadline = min(remaining_stall,
remaining_turn, read_timeout_s), if remaining_deadline <= 0 return the
appropriate AgentResult failure immediately, and call self._recv_line(stdout,
timeout_s=remaining_deadline) so each read is bounded by the nearest deadline;
adjust references to
stall_timeout_s/turn_timeout_s/read_timeout_s/_recv_line/last_event_time/turn_start/_stall_timeout_ms/_turn_timeout_ms
accordingly.

@frankbria frankbria merged commit ad9ea52 into main Mar 13, 2026
17 checks passed
@frankbria frankbria deleted the feature/issue-412-codex-engine-adapter branch March 24, 2026 23:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Phase 4] Codex/OpenAI Engine Adapter (App-Server Protocol)

1 participant