Skip to content

fix: resolve flaky PTY tests on macOS and Windows#166

Closed
branchseer wants to merge 1 commit intomainfrom
fix/pty-terminal-flaky-tests
Closed

fix: resolve flaky PTY tests on macOS and Windows#166
branchseer wants to merge 1 commit intomainfrom
fix/pty-terminal-flaky-tests

Conversation

@branchseer
Copy link
Copy Markdown
Member

Summary

Fixes two flaky PTY test failures by deferring slave and master cleanup to the background thread after child.wait() returns, instead of dropping them eagerly during spawn.

Bug 1: macOS is_terminal — empty output (CI #22038626101)

Root cause: The parent's slave fd was dropped immediately after spawn (drop(pty_pair.slave)). When the child exited quickly, all slave references closed before the reader's first read(). macOS returns EIO on the master fd without draining the output buffer. portable-pty converts EIO → Ok(0), so read_to_end returns empty data.

Reproduction: Adding thread::sleep(2s) before read_to_end in the is_terminal test deterministically produced empty output (child exits during the sleep, all slave refs close, reader gets EIO on first read).

Fix: Keep the slave alive in the background thread. It is dropped only after child.wait() returns, ensuring the output buffer is fully drained before EIO can occur.

Bug 2: Windows read_to_end_returns_exit_status_nonzero — 5s timeout (CI #22038894925)

Root cause: The reader pipe gets EOF only when ClosePseudoConsole is called, which requires all Arc<Mutex<Inner>> references (held by both master and slave) to reach zero. The slave was already dropped, but the master lived inside PtyWriter and was never explicitly dropped. EOF depended on undocumented ConPTY auto-close behavior that is unreliable under CI load.

Reproduction: Adding a 10s sleep in the background thread before closing the writer caused read_to_end to hang and the test to timeout at 5s — proving EOF depends on cleanup timing, not child exit. With the fix applied, the same test passes even with a 3s delay because the master is explicitly dropped.

Fix: Wrap master in Arc<Mutex<Option<...>>> (like writer) and drop it from the background thread after child.wait(). This calls ClosePseudoConsole → closes the stdout pipe → reader gets EOF deterministically.

Changes

  • PtyWriter.master: Box<dyn MasterPty + Send>Arc<Mutex<Option<Box<dyn MasterPty + Send>>>>
  • Terminal::spawn: background thread now drops writer, master, and slave in sequence after child.wait()
  • PtyWriter::resize: handles Option (returns error if master already dropped)

Validation

  • macOS: 9/9 tests pass × 20 runs (180/180)
  • Windows (cargo xtest): 9/9 tests pass × 20 runs (180/180)
  • Clippy clean, no sleeps in final code

Defer PTY slave and master cleanup to background thread after child.wait()
instead of dropping them eagerly during spawn.

macOS (is_terminal flaky empty output): dropping the slave fd immediately
after spawn meant that when the child exited quickly, all slave references
closed before the reader's first read(). macOS returns EIO without draining
the output buffer, causing portable-pty to report Ok(0) and read_to_end to
return empty data.

Windows (read_to_end_returns_exit_status_nonzero timeout): the reader pipe
only gets EOF when ClosePseudoConsole is called, which requires all
Arc<Mutex<Inner>> references (held by master and slave) to be dropped.
The master was held in PtyWriter and never explicitly dropped, so EOF
depended on undocumented ConPTY auto-close behavior that is unreliable
under CI load.

The fix wraps master in Arc<Mutex<Option<...>>> (like writer) and drops
writer, master, and slave sequentially in the background thread after
child.wait() returns. This deterministically triggers EOF on both
platforms regardless of timing.
@branchseer
Copy link
Copy Markdown
Member Author

Closing: the Windows fix in this PR was based on incorrect root cause analysis. The actual Windows deadlock is caused by child.wait() returning Err and the if let Ok skipping OnceLock::set — fixed correctly in #165. The macOS fix (deferred slave drop) is already in #164.

@branchseer branchseer closed this Feb 16, 2026
@branchseer branchseer deleted the fix/pty-terminal-flaky-tests branch February 16, 2026 03:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant