Skip to content

fix(eslint-plugin-runner): await stdio pipe close before terminating workers#1029

Closed
fansenze wants to merge 1 commit into
mainfrom
fix/worker-pool-terminate-pipe-race
Closed

fix(eslint-plugin-runner): await stdio pipe close before terminating workers#1029
fansenze wants to merge 1 commit into
mainfrom
fix/worker-pool-terminate-pipe-race

Conversation

@fansenze
Copy link
Copy Markdown
Contributor

Summary

On windows-latest, terminating a worker thread while its stdio pipes are still live lets libuv's concurrent pipe teardown fault below the JS layer (a native abort). Because rstest's forked test child intercepts uncaughtException / unhandledRejection / process.exit, the only way that child can "exit unexpectedly" is a native fault — so this surfaced as Worker exited unexpectedly in the high-terminate-churn worker-pool-e2e suite (e.g. multiconfig, resilience), a flaky that has been failing main CI.

terminateWorker already destroyed the pipes first, but destroy() only initiates the close (it completes on a later tick), so a terminate() called right after it still races the in-flight teardown. This PR awaits each pipe's 'close' — bounded by a 1s grace so a wedged worker can't stall — before terminating. The two are now serialized: by the time we terminate, the pipes are already gone and there is no concurrent teardown left to fault. terminateWorker is only ever called from cold paths (init/task timeout, crash respawn, shutdown grace), so normal lint throughput is unaffected.

It also routes the tests' raw worker.terminate() calls through terminateWorker, so they exercise the same safe teardown as the production timeout / crash / shutdown paths — production never bare-terminates a worker.

Verification note: the native abort only reproduces on windows-latest. Locally (macOS) all 554 tests pass but cannot exercise the abort, so effectiveness needs to be confirmed across windows CI runs.

Related Links

Checklist

  • Tests updated (or not required).
  • Documentation updated (or not required).

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to safely terminate worker threads by waiting for their piped stdout and stderr streams to fully close before calling terminate(). This is implemented via a new closePipe helper function with a 1-second grace timeout to prevent teardown from stalling. Test files are updated to use the exported terminateWorker utility. Feedback was provided to improve closePipe by ensuring that the 'close' event listener is cleaned up if the timeout fires, preventing potential memory leaks, and wrapping stream.destroy() in a try-catch block to handle synchronous errors.

Comment thread packages/eslint-plugin-runner/src/worker-pool.ts Outdated
…workers

On windows-latest, terminating a worker thread while its stdio pipes are still live lets libuv's concurrent pipe teardown fault below the JS layer (a native abort). rstest's forked test child intercepts uncaughtException/unhandledRejection/process.exit, so this surfaced as "Worker exited unexpectedly" in the high-terminate-churn worker-pool-e2e suite.

terminateWorker already destroyed the pipes first, but destroy() only initiates the close; a terminate() right after still races the in-flight teardown. Await each pipe's 'close' (bounded by a 1s grace) so the two are serialized — by the time we terminate, the pipes are gone and there is no concurrent teardown left to fault.

Also route the tests' raw worker.terminate() calls through terminateWorker so they exercise the same safe teardown as the production timeout/crash/shutdown paths.
@fansenze fansenze force-pushed the fix/worker-pool-terminate-pipe-race branch from 9044bf3 to c7e1d78 Compare May 27, 2026 07:00
@fansenze
Copy link
Copy Markdown
Contributor Author

Superseded by #1031. The pipe-teardown approach here didn't address the root cause: the windows abort comes from tearing down a worker thread that has oxc (a napi addon) loaded (nodejs/node#34567), not from stdio pipe teardown. #1031 skips the windows-flaky worker-pool e2e suites instead — a known worker_threads + napi limitation; a real fix would be process isolation (child_process), tracked separately.

@fansenze fansenze closed this May 27, 2026
@fansenze fansenze deleted the fix/worker-pool-terminate-pipe-race branch May 27, 2026 08:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant