Skip to content

Prevent process pool zombies on crash#3207

Merged
liquidsec merged 8 commits into
devfrom
pool-pdeathsig
Jun 23, 2026
Merged

Prevent process pool zombies on crash#3207
liquidsec merged 8 commits into
devfrom
pool-pdeathsig

Conversation

@liquidsec

@liquidsec liquidsec commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • When the bbot parent process is killed without cleanup (OOM kill, SIGKILL, segfault, etc.), ProcessPoolExecutor workers become orphans reparented to PID 1, sitting forever on dead IPC pipes. They accumulate across crashes.
  • Calls prctl(PR_SET_PDEATHSIG, SIGTERM) in each pool worker's initializer so the kernel signals workers on parent death, regardless of how the parent died. ~15 lines, stdlib only (ctypes), Linux-only (no-op elsewhere).
  • Adds test_pool_workers_die_with_parent which spawns a subprocess with a pool, SIGKILL's the parent, and asserts workers are dead within 2 seconds.

closes #3203

…lation

When the parent process is killed without cleanup (OOM, SIGKILL, etc.),
ProcessPoolExecutor workers become orphans that sit forever on dead IPC
pipes. Setting PR_SET_PDEATHSIG(SIGTERM) in the worker initializer makes
the kernel signal workers on parent death, regardless of how the parent
died.
The test subprocess imported bbot which sets mp.set_start_method("spawn"),
making workers slow to initialize (full bbot import per worker). In CI
containers the 0.5s sleep wasn't enough for workers to call prctl before
the parent was killed. Now the test is self-contained with no bbot imports,
uses a temp file so spawn-mode workers can import the initializer, and
gives workers 2s to start.
@liquidsec liquidsec changed the title Set PR_SET_PDEATHSIG on process pool workers to prevent zombie accumulation Prevent process pool zombies on crash Jun 17, 2026
@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

🚀 Performance Benchmark Report

⚠️ No current benchmark data available

This might be because:

  • Benchmarks failed to run
  • No benchmark tests found
  • Dependencies missing

@codecov

codecov Bot commented Jun 17, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 84.09091% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 90%. Comparing base (82df903) to head (8e0431a).
⚠️ Report is 20 commits behind head on dev.

Files with missing lines Patch % Lines
bbot/core/helpers/helper.py 60% 4 Missing ⚠️
bbot/test/test_step_1/test_helpers.py 92% 3 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff           @@
##             dev   #3207    +/-   ##
======================================
+ Coverage     90%     90%    +1%     
======================================
  Files        453     453            
  Lines      46101   46278   +177     
======================================
+ Hits       41216   41396   +180     
+ Misses      4885    4882     -3     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Get worker PIDs via futures instead of private pool._processes dict.
Use SIGKILL instead of SIGTERM for PR_SET_PDEATHSIG since
ProcessPoolExecutor's except BaseException catches SIGTERM's SystemExit.
Loop PID collection to ensure unique worker PIDs. Check /proc/PID/stat
instead of os.kill(pid, 0) to distinguish zombies from running processes.
Python 3.14 defaults to forkserver, which re-imports the script file.
Without the guard the forkserver child re-executes pool creation and crashes.
With forkserver (Python 3.14 default), worker spawn is slow enough
that fast sequential tasks all route to the first worker. Submit
concurrent 1s tasks so the pool is forced to dispatch to both workers.
Forkserver adds an intermediary process that stalls pool startup in CI.
Use mp.get_context("fork") since PR_SET_PDEATHSIG is start-method-agnostic.
Also drops the __main__ guard (not needed with fork context).
@liquidsec liquidsec requested a review from ausmaster June 18, 2026 15:27
@liquidsec liquidsec mentioned this pull request Jun 18, 2026
28 tasks
@liquidsec liquidsec added this to the BBOT 3.0 - blazed_elijah milestone Jun 19, 2026
@liquidsec liquidsec merged commit 3bc8e9c into dev Jun 23, 2026
17 checks passed
@liquidsec liquidsec deleted the pool-pdeathsig branch June 23, 2026 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants