Skip to content

Commit e665335

Browse files
authored
feat(pty): Add automation testing for full binary (#54)
## Summary - Adds PTY-based integration test framework for full binary TUI testing - Implements visual regression testing using insta snapshots - Provides test coverage for startup, input handling, prompt flow, and cancellation flows - Includes comprehensive documentation and Claude skills for PTY testing patterns ## Key Features - **PTY Test Harness**: Custom test framework that spawns the actual binary in a pseudo-terminal for realistic testing - **Visual Regression Tests**: Insta snapshot tests capture terminal output for detecting UI regressions - **Test Coverage**: Startup flows, input handling, prompt interactions, and cancellation scenarios - **Developer Tools**: PTY testing skill documentation to guide future test development ## Technical Details - Intercepts cursor position queries to prevent test hangs - Non-blocking PTY reader implementation for reliable test execution - Normalizes screen snapshots to handle variable content - Supports multiple terminal dimensions for responsive UI testing ## Test Plan - [x] All existing tests pass - [x] New PTY integration tests pass with snapshots - [x] Tests run reliably without timeouts or hangs - [x] Documentation accurately describes the test framework
2 parents bb90ac6 + 4df50df commit e665335

30 files changed

Lines changed: 2154 additions & 9 deletions

.claude/settings.json

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
{
2+
"$schema": "https://json.schemastore.org/claude-code-settings.json"
3+
}
Lines changed: 345 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,345 @@
1+
---
2+
name: pty-tui-testing
3+
description: Use this skill to build features that require testing the TUI with the PTY-based integration test harness.
4+
---
5+
6+
<required>
7+
*CRITICAL* Add the following steps to your Todo list using TodoWrite:
8+
9+
<system-reminder>From this point on, ignore any existing tests until you have a working example validated through a new PTY-based integration test.</system-reminder>
10+
1. Identify the TUI behavior to test (startup, input handling, agent responses, screen updates, etc.)
11+
2. Write a new Rust integration test in `codex-rs/tui-integration-tests/tests/`
12+
- Use `TuiSession::spawn()` or `TuiSession::spawn_with_config()` to launch codex in a PTY
13+
- Configure terminal dimensions (rows, cols) appropriate for the test
14+
- Configure `SessionConfig` with mock agent behavior if needed
15+
3. Follow these steps in a loop until the test passes:
16+
- Add debug logging using `DEBUG_TUI_PTY=1` environment variable
17+
- Run the specific test: `cargo test test_name -- --nocapture`
18+
- Examine the PTY polling behavior, screen contents, and timing
19+
- Update the test expectations or fix the TUI code
20+
<system-reminder>If you get stuck: did you add DEBUG_TUI_PTY=1 logging?</system-reminder>
21+
4. Review snapshots if using `insta::assert_snapshot!()` and accept with `cargo insta review`
22+
5. Run all TUI integration tests to ensure nothing broke: `cargo test -p tui-integration-tests`
23+
</required>
24+
25+
# PTY-Based TUI Integration Testing
26+
27+
To test the Codex terminal user interface, write Rust integration tests using the `tui-integration-tests` harness. This framework spawns the real `codex` binary in a pseudo-terminal (PTY) and validates terminal output through screen content assertions.
28+
29+
## Core Workflow
30+
31+
**Test Structure:**
32+
33+
All tests follow this pattern:
34+
1. Spawn a TUI session in a PTY with configured dimensions
35+
2. Wait for expected screen content to appear
36+
3. Send keyboard input to simulate user interactions
37+
4. Poll and validate screen state changes
38+
5. Optionally capture snapshots for regression testing
39+
40+
**TUI Session Lifecycle:**
41+
42+
```rust
43+
use tui_integration_tests::{TuiSession, SessionConfig, Key};
44+
use std::time::Duration;
45+
46+
const TIMEOUT: Duration = Duration::from_secs(5);
47+
48+
#[test]
49+
fn test_tui_behavior() {
50+
// Spawn codex in a 24x80 terminal with default config
51+
let mut session = TuiSession::spawn(24, 80)
52+
.expect("Failed to spawn codex");
53+
54+
// Wait for welcome message to appear
55+
session.wait_for_text("To get started", TIMEOUT)
56+
.expect("Welcome message did not appear");
57+
58+
// Simulate user typing
59+
session.send_str("Hello").unwrap();
60+
61+
// Submit with Enter key
62+
session.send_key(Key::Enter).unwrap();
63+
64+
// Wait for agent response
65+
session.wait_for_text("Test message", TIMEOUT)
66+
.expect("Agent response did not appear");
67+
68+
// Assert final screen state
69+
let contents = session.screen_contents();
70+
assert!(contents.contains("expected text"));
71+
}
72+
```
73+
74+
**Session Configuration:**
75+
76+
Use `SessionConfig` to control test environment:
77+
78+
```rust
79+
use tui_integration_tests::{TuiSession, SessionConfig, ApprovalPolicy};
80+
81+
let config = SessionConfig::new()
82+
.with_mock_response("Custom agent response")
83+
.with_approval_policy(ApprovalPolicy::Never)
84+
.with_agent_env("MOCK_AGENT_DELAY_MS", "100");
85+
86+
let mut session = TuiSession::spawn_with_config(40, 120, config)
87+
.expect("Failed to spawn codex");
88+
```
89+
90+
## Key Testing Patterns
91+
92+
**Pattern 1: Startup and Initialization**
93+
94+
Test that the TUI displays correct welcome screens and skips onboarding appropriately:
95+
96+
```rust
97+
#[test]
98+
fn test_startup_shows_welcome() {
99+
let mut session = TuiSession::spawn_with_config(
100+
24, 80,
101+
SessionConfig::default()
102+
.without_approval_policy()
103+
.without_sandbox(),
104+
).expect("Failed to spawn codex");
105+
106+
session.wait_for_text("Welcome", TIMEOUT)
107+
.expect("Welcome did not appear");
108+
109+
let contents = session.screen_contents();
110+
assert!(contents.contains("Welcome to Codex"));
111+
assert!(contents.contains("/tmp/"));
112+
}
113+
```
114+
115+
**Pattern 2: Input Handling and Screen Updates**
116+
117+
Test keyboard input, character echo, and text editing:
118+
119+
```rust
120+
#[test]
121+
fn test_typing_and_backspace() {
122+
let mut session = TuiSession::spawn(24, 80).unwrap();
123+
session.wait_for_text("", TIMEOUT).unwrap();
124+
125+
// Type text
126+
session.send_str("Hello World").unwrap();
127+
session.wait_for_text("Hello World", TIMEOUT).unwrap();
128+
129+
// Backspace to remove "World"
130+
for _ in 0..5 {
131+
session.send_key(Key::Backspace).unwrap();
132+
}
133+
std::thread::sleep(Duration::from_millis(100));
134+
135+
// Verify deletion
136+
let contents = session.screen_contents();
137+
assert!(contents.contains("Hello"));
138+
assert!(!contents.contains("World"));
139+
}
140+
```
141+
142+
**Pattern 3: Agent Interaction and Streaming**
143+
144+
Test agent responses with custom mock behavior:
145+
146+
```rust
147+
#[test]
148+
fn test_agent_response_streaming() {
149+
let config = SessionConfig::new()
150+
.with_mock_response("Response line 1\nResponse line 2");
151+
152+
let mut session = TuiSession::spawn_with_config(24, 80, config).unwrap();
153+
session.wait_for_text("", TIMEOUT).unwrap();
154+
155+
session.send_str("test prompt").unwrap();
156+
session.send_key(Key::Enter).unwrap();
157+
158+
// Wait for both lines to stream in
159+
session.wait_for_text("Response line 1", TIMEOUT).unwrap();
160+
session.wait_for_text("Response line 2", TIMEOUT).unwrap();
161+
}
162+
```
163+
164+
**Pattern 4: Cancellation and Control Flow**
165+
166+
Test Escape key cancellation and Ctrl-C behavior:
167+
168+
```rust
169+
#[test]
170+
fn test_cancel_streaming_with_escape() {
171+
let config = SessionConfig::new()
172+
.with_stream_until_cancel();
173+
174+
let mut session = TuiSession::spawn_with_config(24, 80, config).unwrap();
175+
session.wait_for_text("", TIMEOUT).unwrap();
176+
177+
session.send_str("test").unwrap();
178+
session.send_key(Key::Enter).unwrap();
179+
180+
// Wait for streaming to start
181+
session.wait_for_text("streaming", TIMEOUT).unwrap();
182+
183+
// Cancel with Escape
184+
session.send_key(Key::Escape).unwrap();
185+
186+
// Verify cancellation message appears
187+
session.wait_for_text("Cancelled", TIMEOUT).unwrap();
188+
}
189+
```
190+
191+
**Pattern 5: Snapshot Testing**
192+
193+
Capture and validate complete screen state:
194+
195+
```rust
196+
use insta::assert_snapshot;
197+
198+
#[test]
199+
fn test_screen_layout() {
200+
let mut session = TuiSession::spawn(40, 120).unwrap();
201+
session.wait_for_text("", TIMEOUT).unwrap();
202+
203+
session.send_str("test prompt").unwrap();
204+
session.send_key(Key::Enter).unwrap();
205+
session.wait_for_text("Test message", TIMEOUT).unwrap();
206+
207+
// Capture full screen state for regression testing
208+
assert_snapshot!("prompt_submitted", session.screen_contents());
209+
}
210+
```
211+
212+
Review snapshots with `cargo insta review` after first run.
213+
214+
**Normalizing Dynamic Content in Snapshots**
215+
216+
When tests include dynamic content (temp paths, timestamps, random prompts), normalize before snapshotting to prevent spurious failures:
217+
218+
```rust
219+
/// Normalize dynamic content in screen output for snapshot testing
220+
fn normalize_for_snapshot(contents: String) -> String {
221+
let mut normalized = contents;
222+
223+
// Replace /tmp/.tmpXXXXXX with placeholder
224+
if let Some(start) = normalized.find("/tmp/.tmp") {
225+
if let Some(end) = normalized[start..].find(char::is_whitespace) {
226+
normalized.replace_range(start..start + end, "[TMP_DIR]");
227+
}
228+
}
229+
230+
// Replace dynamic prompt text on lines starting with ›
231+
let lines: Vec<String> = normalized
232+
.lines()
233+
.map(|line| {
234+
if line.trim_start().starts_with("") && !line.contains("for shortcuts") {
235+
"› [DEFAULT_PROMPT]".to_string()
236+
} else {
237+
line.to_string()
238+
}
239+
})
240+
.collect();
241+
242+
lines.join("\n")
243+
}
244+
245+
#[test]
246+
fn test_with_normalized_snapshot() {
247+
let mut session = TuiSession::spawn(24, 80).unwrap();
248+
session.wait_for_text("Welcome", TIMEOUT).unwrap();
249+
250+
// Normalize before asserting to handle dynamic temp paths
251+
assert_snapshot!(
252+
"welcome_screen",
253+
normalize_for_snapshot(session.screen_contents())
254+
);
255+
}
256+
```
257+
258+
**Common Dynamic Content to Normalize:**
259+
260+
- Temp directory paths: `/tmp/.tmpXXXXXX``[TMP_DIR]`
261+
- Random default prompts: `› Improve documentation...``› [DEFAULT_PROMPT]`
262+
- Timestamps: `2025-01-15 10:30:45``[TIMESTAMP]`
263+
- Session IDs, PIDs, or other ephemeral identifiers
264+
265+
This pattern ensures snapshots focus on UI structure and static content rather than runtime-specific values. See `@/codex-rs/tui-integration-tests/tests/startup.rs` for reference implementation.
266+
267+
## Configuration Options
268+
269+
**SessionConfig Methods:**
270+
271+
| Method | Purpose |
272+
|--------|---------|
273+
| `with_mock_response(text)` | Set custom agent response instead of defaults |
274+
| `with_stream_until_cancel()` | Make agent stream continuously until Escape pressed |
275+
| `with_agent_env(key, val)` | Pass environment variables to mock agent |
276+
| `with_approval_policy(policy)` | Control approval prompts (Untrusted, OnFailure, OnRequest, Never) |
277+
| `without_approval_policy()` | Remove approval policy to test trust screens |
278+
| `with_sandbox(sandbox)` | Set sandbox level (ReadOnly, WorkspaceWrite, DangerFullAccess) |
279+
| `without_sandbox()` | Remove sandbox to test trust screens |
280+
281+
## TuiSession API
282+
283+
**Spawning:**
284+
285+
- `TuiSession::spawn(rows, cols)` - Launch with defaults in temp directory
286+
- `TuiSession::spawn_with_config(rows, cols, config)` - Launch with custom config
287+
288+
**Input:**
289+
290+
- `send_str(text)` - Simulate typing a string
291+
- `send_key(key)` - Send a keyboard event (Enter, Escape, Backspace, Arrow keys, Ctrl+key)
292+
293+
**Polling and Waiting:**
294+
295+
- `wait_for_text(needle, timeout)` - Poll until text appears on screen
296+
- `wait_for(predicate, timeout)` - Poll until custom condition matches
297+
- `poll()` - Manually read available output and update screen state
298+
- `screen_contents()` - Get current terminal screen as string
299+
300+
**Available Keys:**
301+
302+
- `Key::Enter`, `Key::Escape`, `Key::Backspace`
303+
- `Key::Up`, `Key::Down`, `Key::Left`, `Key::Right`
304+
- `Key::Ctrl('c')`, `Key::Ctrl('d')`, etc.
305+
306+
## Debugging
307+
308+
**Enable Debug Logging:**
309+
310+
```bash
311+
DEBUG_TUI_PTY=1 cargo test test_name -- --nocapture
312+
```
313+
314+
This shows:
315+
- Each `poll()` call and duration
316+
- Read results (bytes read, WouldBlock, EOF)
317+
- `wait_for()` loop iterations and elapsed time
318+
- Screen contents preview at each iteration
319+
320+
**Common Issues:**
321+
322+
1. **Test times out waiting for text**
323+
- Add `DEBUG_TUI_PTY=1` to see polling behavior
324+
- Check if text appears but with different formatting/spacing
325+
- Verify mock agent is configured correctly
326+
- Increase timeout for slower operations
327+
328+
2. **Snapshot differences**
329+
- Run `cargo insta review` to inspect changes
330+
- Check for timing-dependent content (e.g., timestamps)
331+
- Verify terminal dimensions match snapshot expectations
332+
333+
3. **PTY blocking issues**
334+
- Poll returns immediately even when no data (non-blocking mode)
335+
- Use `wait_for()` which polls in a loop with 50ms sleep
336+
- Don't rely on `poll()` alone for synchronization
337+
338+
4. **Control sequence artifacts**
339+
- PTY harness intercepts cursor position queries automatically
340+
- If seeing escape sequences in output, may need additional interception
341+
- Check `intercept_control_sequences()` in lib.rs
342+
343+
## Testing Philosophy
344+
345+
These are black-box integration tests that exercise the full executable stack (CLI → TUI → Core → ACP). Each test runs in isolation with deterministic mock agent responses, validating external behavior through screen content assertions.

0 commit comments

Comments
 (0)