Path: @/codex-rs/tui-pty-e2e
- Black-box integration testing framework for the Codex TUI using PTY (pseudo-terminal) emulation
- Spawns the real
codexbinary in a simulated terminal and exercises full application stack - Uses VT100 parser to capture and validate terminal screen output via snapshot testing
- Provides programmatic keyboard input simulation and screen state polling
- Tests the complete integration between
@/codex-rs/cli,@/codex-rs/tui,@/codex-rs/core, and@/codex-rs/acp - Complements unit tests in
@/codex-rs/tui/src/chatwidget.rsby testing full application behavior - Uses
@/codex-rs/mock-acp-agentas the ACP backend for deterministic test scenarios - Validates CLI argument parsing, TUI event loop, ACP protocol communication, and terminal rendering
- Part of the workspace at
@/codex-rs/Cargo.toml:46
Test Harness: TuiSession in @/codex-rs/tui-pty-e2e/src/lib.rs
The main API provides:
spawn(rows, cols)- Launch codex binary with mock-acp-agent in PTY with automatic temp directoryspawn_with_config(rows, cols, config)- Launch with custom configuration and automatic temp directorysend_str(text)- Simulate typing textsend_key(key)- Send keyboard events (Enter, Escape, Ctrl-C, etc.)wait_for_text(needle, timeout)- Poll screen until text appearswait_for(predicate, timeout)- Poll screen until condition matchesscreen_contents()- Get current terminal screen as string
Debugging Aids:
TuiSession implements Drop to print screen state when tests panic, making it easier to diagnose PTY timing issues:
impl Drop for TuiSession {
fn drop(&mut self) {
if std::thread::panicking() {
eprintln!("\n=== TUI Screen State at Panic ===");
eprintln!("{}", self.screen_contents());
eprintln!("=================================\n");
}
}
}The crate exports helper functions for consistent test patterns:
TIMEOUT: Duration- Standard 5-second timeout constant for use across all testsTIMEOUT_INPUT: Duration- 300ms timeout for input stabilization before snapshotsnormalize_for_snapshot(contents: String) -> String- Normalizes dynamic content for snapshot testing (see below)normalize_for_input_snapshot(contents: String) -> String- Extends normalization by stripping the startup header block (see below)
Automatic Test Isolation:
All tests run in isolated temporary directories created in /tmp/:
- Each
spawn()orspawn_with_config()call creates a new temp directory - Directory contains a
hello.pyfile withprint('Hello, World!') - A
config.tomlis automatically generated in the temp directory (used as CODEX_HOME) - Temp directory is automatically cleaned up when
TuiSessionis dropped - Tests no longer run in user's home directory for better isolation
Generated config.toml:
By default, each session creates a config.toml in the temp directory with:
model- Set to the configured model (defaults to"mock-model")model_provider = "mock_provider"- Uses a custom provider that doesn't require OpenAI authtrust_level = "trusted"for the working directory - Skips trust approval screenwire_api = "acp"- Routes through ACP registry for model resolution
Custom config.toml content can be provided via SessionConfig::with_config_toml(content)
Architecture:
Test Code
↓
TuiSession (portable_pty)
↓
PTY Master ←→ PTY Slave
↓ ↓
VT100 Parser codex binary (--model mock-model)
↓ ↓
Screen State ACP registry lookup → mock-acp provider
↓
ACP JSON-RPC over stdin/stdout
↓
mock_acp_agent (env var configured)
Key Input Handling: Key enum in @/codex-rs/tui-pty-e2e/src/keys.rs
Converts high-level key events to ANSI escape sequences:
Key::Enter→\rKey::Escape→\x1bKey::Up/Down/Left/Right→\x1b[A/B/D/CKey::Backspace→\x7fKey::Ctrl('c')→ Control character encoding
Session Configuration: SessionConfig in @/codex-rs/tui-pty-e2e/src/lib.rs
Builder pattern for test environment setup:
modelfield - Model name to use (defaults to"mock-model"which resolves to mock-acp-agent via ACP registry)with_mock_response(text)- SetMOCK_AGENT_RESPONSEenv varwith_stream_until_cancel()- SetMOCK_AGENT_STREAM_UNTIL_CANCEL=1with_agent_env(key, value)- Pass custom env vars to mock agentwith_approval_policy(policy)- Set approval policy (defaults toOnFailure)without_approval_policy()- Remove approval policy to test trust screenwith_config_toml(content)- Provide custom config.toml content (overrides default generation)cwdfield - Optional working directory (auto-created temp directory if None)config_tomlfield - Optional custom config.toml content (None generates default)
Approval Policy: ApprovalPolicy enum controls when codex asks for command approval:
Untrusted- Only run trusted commands without approvalOnFailure- Ask for approval only when commands fail (default for tests)OnRequest- Model decides when to ask for approvalNever- Never ask for approval
By default, all spawned sessions use ApprovalPolicy::OnFailure which:
- Skips the trust directory approval screen at startup
- Allows tests to run without manual intervention
- Sets both
--ask-for-approval on-failureand--sandbox workspace-writeflags
PTY Input Timing Pattern:
To avoid race conditions between sending input and the TUI processing it, tests add a 100ms delay after send_str() and send_key() operations when submitting prompts or navigating UI:
session.send_str("testing!!!").unwrap();
std::thread::sleep(Duration::from_millis(100));
session.send_key(Key::Enter).unwrap();
std::thread::sleep(Duration::from_millis(100));This delay allows the PTY subprocess time to process input and update the display before assertions check for results. The delay is added in test code (not in TuiSession methods) for flexibility—not all operations need delays.
Test Files Structure:
| File | Coverage |
|---|---|
@/codex-rs/tui-pty-e2e/tests/startup.rs |
TUI initialization, prompt display, trust screen skipping, snapshot testing for 4 startup scenarios, non-blocking PTY verification |
@/codex-rs/tui-pty-e2e/tests/prompt_flow.rs |
Prompt submission and agent responses |
@/codex-rs/tui-pty-e2e/tests/input_handling.rs |
Text editing, backspace, Ctrl-C clearing, arrow key navigation with snapshot testing |
@/codex-rs/tui-pty-e2e/tests/streaming.rs |
Prompt submission with timing delays, agent response streaming |
@/codex-rs/tui-pty-e2e/tests/acp_mode.rs |
ACP mode startup, response flow, and approval bridging - validates TUI works with ACP wire API and mock agent; includes test for permission request display |
@/codex-rs/tui-pty-e2e/tests/agent_switching.rs |
ACP agent subprocess lifecycle and event isolation - verifies subprocess spawning, cleanup on session switch, different agents use different processes, and event filtering prevents cross-agent contamination (Linux only) |
@/codex-rs/tui-pty-e2e/tests/live_acp.rs |
Live authenticated ACP tests for Gemini and Claude with real API connections (opt-in, marked #[ignore]) |
Snapshot Files:
| File | Test Coverage |
|---|---|
@/codex-rs/tui-pty-e2e/tests/snapshots/startup__*.snap |
Various startup screen scenarios (welcome, dimensions, temp directory, trust screen) |
@/codex-rs/tui-pty-e2e/tests/snapshots/input_handling__*.snap |
Input handling scenarios (ctrl-c clear, typing/backspace, model changed) |
@/codex-rs/tui-pty-e2e/tests/snapshots/streaming__submit_input.snap |
Prompt submission and streaming response |
@/codex-rs/tui-pty-e2e/tests/snapshots/acp_mode__*.snap |
ACP mode startup screen |
Snapshot Testing with Insta:
Tests use insta::assert_snapshot!() to capture terminal output for visual regression testing:
assert_snapshot!("startup_screen", normalize_for_snapshot(session.screen_contents()));Snapshots stored in @/codex-rs/tui-pty-e2e/tests/snapshots/*.snap for regression detection. Each snapshot captures the exact terminal output state at a specific test point.
Snapshot Normalization:
Two normalization helpers in @/codex-rs/tui-pty-e2e/src/lib.rs ensure stable snapshots:
| Function | Use Case |
|---|---|
normalize_for_snapshot() |
General snapshots that should include the startup header |
normalize_for_input_snapshot() |
Input-focused tests where header visibility varies with scroll timing |
normalize_for_snapshot() - Base normalization rules:
- Temp directory paths (
/tmp/.tmpXXXXXX) →[TMP_DIR]placeholder - Random default prompts on lines starting with
›→[DEFAULT_PROMPT]placeholder- Detects specific default prompt patterns: "Find and fix a bug", "Explain this codebase", "Write tests for", etc.
- Preserves user-entered prompts and UI text like "? for shortcuts"
normalize_for_input_snapshot() - Extends base normalization by stripping the startup header block:
- Detects the header block (lines containing
╭──through the/reviewand/modelcommand list) - Removes the entire header section to prevent flaky snapshots
- Used by input handling tests in
@/codex-rs/tui-pty-e2e/tests/input_handling.rs
Why Two Functions: Terminal render timing can cause the startup header block to scroll partially in or out of the viewport before a snapshot is taken. For tests focused on input handling, the header presence is irrelevant - only the input area matters. By stripping the header, normalize_for_input_snapshot() produces deterministic snapshots regardless of scroll state.
This normalization allows snapshot assertions to focus on UI structure and static content rather than ephemeral runtime values.
PTY Implementation Details:
- Uses
portable-ptycrate for cross-platform PTY support - PTY master is set to non-blocking mode using
fcntl(O_NONBLOCK)on Unix systems - This prevents
read()from blocking indefinitely when no data is available - Sets
TERM=xterm-256colorfor terminal feature detection - NO_COLOR=1 by default for deterministic output parsing
- Terminal size configurable (default 24x80, some tests use 40x120)
Polling Pattern:
poll() method performs non-blocking read from PTY master:
- PTY file descriptor is set to non-blocking mode during session initialization
- Reads up to 8KB buffer per poll
- Intercepts and responds to terminal control sequences before parsing
- Feeds processed data to VT100 parser incrementally
- Returns immediately with
WouldBlockerror when no data is available wait_for()loops with 50ms sleep between polls, checking timeout after each iteration- Timeout mechanism works correctly because
read()never blocks indefinitely
Control Sequence Interception:
The intercept_control_sequences() method handles terminal queries that require responses:
- Detects cursor position query (
ESC[6n) in output stream from codex binary - Writes cursor position response (
ESC[1;1R) back to PTY input - Removes control sequences from parser stream to avoid rendering artifacts
- Enables crossterm terminal initialization without real terminal support
Mock Agent Integration:
Tests use the model name "mock-model" which the ACP registry (@/codex-rs/acp/src/registry.rs) resolves to the mock-acp-agent subprocess. The registry returns configuration with:
provider: "mock-acp"command: <path-to-mock_acp_agent-binary>args: []
An alternate model "mock-model-alt" is also registered with provider_slug: "mock-acp-alt" for testing agent switching scenarios where different models must spawn different subprocesses.
Tests control mock agent behavior via environment variables:
MOCK_AGENT_RESPONSE- Custom response text instead of defaultsMOCK_AGENT_DELAY_MS- Simulate streaming delaysMOCK_AGENT_STREAM_UNTIL_CANCEL- Stream until Escape pressedMOCK_AGENT_REQUEST_PERMISSION- Trigger permission request to test approval bridging
See @/codex-rs/mock-acp-agent/docs.md for full list of env vars.
Agent Subprocess Lifecycle Testing (agent_switching.rs):
Linux-only tests that verify ACP subprocess lifecycle management and event isolation:
Subprocess Management Tests:
acp_log_path()method onTuiSessionreturns the path to the ACP tracing log file- Tests extract PIDs from log lines matching
"ACP agent spawned (pid: Some(...))" - Uses
/proc/{pid}filesystem to verify process existence and zombie state - Key verified behaviors:
- Agent subprocess spawns with unique PID
/newcommand spawns new subprocess with different PID- Old subprocess is terminated (not zombie) after session switch
- Cleanup happens when session switches, not when individual prompt turns end
- Different models (
mock-modelvsmock-model-alt) spawn different subprocesses
Event Isolation Tests:
extract_agent_messages_from_log()helper parsesMock agent:log entries from ACP log filetest_agent_switch_message_flow_mock_to_mock_altverifies that after switching agents, the NEW agent receives and responds to prompts (catches race conditions where OLD agent events could leak)test_agent_switch_logs_correct_sequenceverifies the expected log sequence during agent switch: agent receives prompt, logs receipt, sends response
Binary Discovery:
codex_binary_path() locates the compiled binary:
test_exe: target/debug/deps/startup-abc123
↓
target/debug/deps (parent)
↓
target/debug (parent.parent)
↓
target/debug/codex (join "codex")
Live ACP Testing:
Two opt-in E2E tests in @/codex-rs/tui-pty-e2e/tests/live_acp.rs validate integration with real ACP providers:
test_gemini_acp_live_response- Tests gemini-acp with real Gemini API (requires GEMINI_API_KEY environment variable)test_claude_acp_live_response- Tests claude-acp with real Claude API (requires ANTHROPIC_API_KEY environment variable)- Both tests are marked
#[ignore]to be opt-in and run separately:cargo test --package tui-pty-e2e -- --ignored - Use 30-second timeout vs 5-second standard timeout to account for network latency and model processing time
- Generate dynamic config.toml with
wire_api = "acp"to route through ACP registry - Verify basic response reception without requiring specific output text
Known Limitations:
- VT100 parser may not perfectly emulate all terminal behaviors
- Terminal size changes after spawn not currently supported
- Color codes disabled (NO_COLOR=1) for test determinism
Dependencies:
portable-pty = "0.8"- PTY creation and managementvt100 = "0.15"- Terminal emulator/parserinsta = "1"- Snapshot testing frameworkanyhow = "1"- Error handlingtempfile = "3"- Temporary directory creation for test isolationnix = "0.27"(Unix only) - fcntl for non-blocking I/O setuplibc = "0.2"(Unix only) - Low-level fcntl operations
Debugging:
Set DEBUG_TUI_PTY=1 environment variable to enable detailed logging of PTY operations:
DEBUG_TUI_PTY=1 cargo test test_name -- --nocaptureThis shows:
- Each
poll()call and its duration - Read results (bytes read, WouldBlock, EOF)
wait_for()loop iterations and elapsed time- Screen contents preview at each iteration
Created and maintained by Nori.