feat(codex-bridge): Init Gate prevents first-prompt loss on cold start#194
Open
SevenX77 wants to merge 3 commits intobfly123:mainfrom
Open
feat(codex-bridge): Init Gate prevents first-prompt loss on cold start#194SevenX77 wants to merge 3 commits intobfly123:mainfrom
SevenX77 wants to merge 3 commits intobfly123:mainfrom
Conversation
…dy detection Implements the generic Init Gate skeleton per Q3 Stage 1a design: - InitGateState enum: LAUNCHED, INITIALIZING, READY, INIT_FAIL - InitGateProbe protocol for provider-specific detection - InitGate class with segmented polling, steady-state debounce, deadline-based failure detection, and diagnostic capture - load_init_gate_env() for env var configuration (generic + per-provider) Features: - Fast→slow segmented polling (poll_fast_ms→poll_slow_ms@poll_switch_s) - Steady-state debounce (steady_count consecutive True before READY) - Bypass mode for emergency override - Comprehensive failure diagnostics JSON with recent pane captures ring All 11 unit tests pass: - Happy path, deadline exceeded, steady-state debounce - Bypass, segmented polling, failure JSON structure - Ring buffer, env var loading (generic + per-provider)
Provider-specific init probe for Codex CLI satisfying InitGateProbe protocol: - S1: welcome banner strings absent in visible-screen capture - S2: last non-empty line starts with '› ' (idle input prompt) - Uses capture-pane -p -t <pane> (visible only, no -S scrollback) - Conservative failure mode: any tmux/parse error returns False 7 unit tests covering banner detection, prompt-on-last-line logic, visible-only capture invariant, and exception safety. Module written by Codex (gpt-5.4 xhigh) in Q3-S1a.2 task. Commit created by Claude (master) due to CCB reply truncation on Codex side preventing it from completing the commit step itself. Refs: .kiro/specs/q3-stage1-init-gate/DESIGN.md §4.3.1 (v2) Co-Authored-By: Codex <noreply@openai.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Integrates Init Gate into Codex bridge lifecycle (DESIGN.md §4.6 + §4.7):
- runtime_state.py: BridgeRuntimeState mutable, init_gate + fifo_holder_fd
fields; build_bridge_runtime_state constructs CodexInitProbe + InitGate
with proper env loading; tmux_run_str adapter converts CompletedProcess
to str for the probe (extracts stdout, decodes bytes, returns "" on
any failure)
- service.py: DualBridge.__init__ creates TmuxBackend (no-arg ctor; pane
binding happens via tmux_run_fn closure), opens FIFO holder fd with
O_RDONLY|O_NONBLOCK before Init Gate so upstream writers don't block
on open(O_WRONLY); run() waits on init_gate.wait_until_ready() before
the main loop and returns 3 on INIT_FAIL; new _teardown() (idempotent)
closes holder fd + stops binding tracker; called from _handle_signal
and main loop finally
- init_probe.py: capture_visible_for_diagnostics() exposed for InitGate
capture_fn
3 unit tests for DualBridge integration (FIFO holder open with correct
flags, INIT_FAIL exit code 3 path, success path enters main loop).
Implementation written by Codex (gpt-5.4 xhigh) in Q3-S1a.3 task; commit
finalized by Claude (master) under user-authorized circuit-breaker mode
after Codex CLI hit two consecutive WebSocket conversation interrupts:
- service.py:27 fix TmuxBackend signature (one-line)
- runtime_state.py: tmux_run_str adapter (CompletedProcess -> str)
- test bug fixes (O_RDONLY bitmask is 0 on POSIX; _running is instance
attr not class attr) so the 3 unit tests pass
Refs: DESIGN.md §4.6 §4.7 + Q3-S1a.3 task spec + ccb-collaboration.md
circuit-breaker authorization (user 2026-04-24)
Co-Authored-By: Codex <noreply@openai.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a provider-agnostic Init Gate state machine to the Codex bridge so the first prompt sent to a freshly-spawned Codex agent is never delivered while the TUI is still rendering its welcome banner / authenticating / loading. Today, the bridge enters its
read FIFO → paste-into-pane → Enterloop the moment the Python process spawns; if accb askarrives before the Codex TUI has shown its idle prompt, the keystrokes are eaten by the splash/auth modal and the message is silently lost.Why
We hit this reliably on cold start of
a1:codexagents — the first prompt has a non-trivial loss rate, and the loss is invisible because:process_requestswallows anyCcbDeliveryErrorinto a history entry_paste_via_buffer_reception_drivenretry usespane_shows_agent_activity()to detect 'delivered', but Codex's own startup spinner (Loading..., Braille frames) false-positives that check, so the retry returns successfully even though the message never reached the prompt bufferThe send-side retry was designed to handle Enter-swallow on a pane that's already idle, not to handle 'pane has never been idle yet'. We need an explicit pre-send ready gate, which is what this PR adds.
What's in this PR
lib/provider_core/init_gate.py(new)InitGateclass: state machine (LAUNCHED → INITIALIZING → READY | INIT_FAIL), segmented polling (200ms→500ms@5s), steady-state debounce (2× consecutive True), deadline (default 30s), bypass flag, env var loader, failure diagnostics JSON with ring buffer of last 3 captureslib/provider_backends/codex/bridge_runtime/init_probe.py(new)CodexInitProbeimplementingInitGateProbe: visible-screen capture only (no scrollback) + banner blacklist (OpenAI Codex,Sign in with ChatGPT,Trust this workspace, etc.) + 'last non-empty line starts with `› `' check (tolerates idle hint)lib/provider_backends/codex/bridge_runtime/service.py(modified)DualBridge.__init__opens FIFO holder fd (O_RDONLY | O_NONBLOCK) so upstream writers don't block during Init Gate;run()callsinit_gate.wait_until_ready()before main loop, exits with code 3 onINIT_FAILlib/provider_backends/codex/bridge_runtime/runtime_state.py(modified)BridgeRuntimeStatenow mutable, holdsinit_gate+fifo_holder_fd;build_bridge_runtime_stateconstructs probe + gate with env-driven configtest/test_init_gate.py(new)test/test_codex_init_probe.py(new)test/test_codex_communicator_init.py(new)Total: 21 unit tests, all green (
python -m pytest test/test_init_gate.py test/test_codex_init_probe.py test/test_codex_communicator_init.py).Design contract
steady_count(default 2) consecutive polls.CCB_CODEX_INIT_DEADLINE_Sor genericCCB_INIT_GATE_DEADLINE_S); on timeout, writeinit_gate_failure.jsonwith last 3 captures + probe history, exit 3.capture-pane -p -t <pane>(no-Sflag) — historical scrollback would otherwise pin S1 (banner-gone) to false forever.open(O_WRONLY)doesn't block during the 0–30s gate window — payloads queue into the kernel pipe buffer (default 64KB) and main loop drains them after Init Gate passes.CCB_INIT_GATE_BYPASS=1skips the gate entirely (logs WARN); useful for debugging or if Codex CLI banner format changes break the probe before the constant can be patched.Env vars (all optional, sensible defaults)
CCB_INIT_GATE_DEADLINE_SCCB_CODEX_INIT_DEADLINE_SCCB_INIT_GATE_POLL_FAST_MSCCB_INIT_GATE_POLL_SLOW_MSCCB_INIT_GATE_POLL_SWITCH_SCCB_INIT_GATE_STEADY_COUNTCCB_INIT_GATE_BYPASSReview history (private)
[Q3-S1a.1]InitGate skeleton: 9.2/10[Q3-S1a.2]CodexInitProbe: 8.8/10[Q3-S1a.3]DualBridge wiring: 9.0/10must_fixitems across all reviews.Design doc + reviews available locally; happy to share if useful.
Test plan
CCB_INIT_GATE_BYPASS=1to confirm bug reproRisks
CODEX_INIT_BANNERSlives in one constant (4 lines);CCB_INIT_GATE_BYPASS=1is the emergency hatch; the S2 prompt-on-last-line check is the stronger signal and works regardless of banner text.CCB_CODEX_INIT_DEADLINE_S=60env override.🤖 Generated with Claude Code