Skip to content

fix: avoid PTY wait deadlock on wait errors#165

Merged
branchseer merged 7 commits intomainfrom
fix/pty-terminal-nonzero-timeout
Feb 16, 2026
Merged

fix: avoid PTY wait deadlock on wait errors#165
branchseer merged 7 commits intomainfrom
fix/pty-terminal-nonzero-timeout

Conversation

@branchseer
Copy link
Copy Markdown
Member

@branchseer branchseer commented Feb 15, 2026

Summary

  • Fix the PTY wait deadlock by ensuring the child-monitor thread always sets the shared completion slot, including wait failures.
  • Store wait failures as Arc<std::io::Error> in the shared OnceLock (Result<ExitStatus, Arc<std::io::Error>>) so error state is shareable without synthetic fallback status.
  • Change ChildHandle::wait() to return anyhow::Result<ExitStatus> and propagate wait failures through callers.
  • Update downstream helpers/callers (pty_terminal_test, milestone tests, and vite_task_bin e2e harness) to consume the new anyhow::Result return type.
  • Keep PTY integration tests strict at #[timeout(5000)] and run them without a Windows serialization guard.

Stable Reproduction (before fix)

To deterministically reproduce the real deadlock path (without mocking child.wait() return values), I used a temporary Windows-only repro patch that swaps the internal WinChild process handle with a non-process handle (CreateEventW event handle). This makes child.wait() execute the real Windows wait/get-exit-code path and return Err.

With the old code path (if let Ok(status) = child.wait() { ... }), OnceLock was never set on error, and ChildHandle::wait() blocked forever.

Repro command:

cargo xtest --builder cargo-xwin --target x86_64-pc-windows-msvc -p pty_terminal --test terminal -- read_to_end_returns_exit_status_nonzero

Observed behavior under repro:

  • timed out at ~15s with #[timeout(15000)]
  • timed out again at ~60s with #[timeout(60000)]

That demonstrates timeout-independent deadlock behavior.

Why this fixes it

  • Background monitor thread now always sets the shared wait result (Ok or Err).
  • ChildHandle::wait() cannot block forever on an unset OnceLock.
  • Wait failures are preserved and surfaced as real errors (no synthetic ExitStatus::with_exit_code(1)).

Validation

  • cargo clippy -p pty_terminal --all-targets --all-features -- -D warnings
  • cargo test -p pty_terminal -p pty_terminal_test
  • cargo test -p vite_task_bin --test e2e_snapshots --no-run
  • cargo xtest --builder cargo-xwin --target x86_64-pc-windows-msvc -p pty_terminal --test terminal
  • cargo xtest --builder cargo-xwin --target aarch64-pc-windows-msvc -p pty_terminal --test terminal
  • cargo xtest --builder cargo-xwin --target x86_64-pc-windows-msvc -p pty_terminal --test terminal repeated 40 times (all green, without serial guard)

@branchseer branchseer changed the title fix: stabilize flaky nonzero PTY exit test on Windows fix: avoid PTY wait deadlock on wait errors Feb 16, 2026
@branchseer branchseer merged commit 9681eca into main Feb 16, 2026
6 checks passed
@branchseer branchseer deleted the fix/pty-terminal-nonzero-timeout branch February 16, 2026 04:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant