You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: cross-platform force-kill primitive for stuck PHP threads
Introduces a self-contained primitive that wakes a PHP thread parked in
a blocking call (sleep, synchronous I/O, etc.) so the graceful drain
used by RestartWorkers / DrainWorkers / Shutdown completes promptly
instead of waiting for the syscall to return naturally.
Design: each PHP thread, at boot from its own TSRM context, hands a
force_kill_slot (pointers to its EG(vm_interrupt) and EG(timed_out)
atomic bools, plus pthread_t / Windows HANDLE) back to Go via
go_frankenphp_store_force_kill_slot. The slot lives on phpThread and is
protected by a per-thread RWMutex so the zero-and-release path at
thread exit cannot race an in-flight kill. From any goroutine, Go
passes the slot back to frankenphp_force_kill_thread, which stores
true into both bools (waking the VM at the next opcode boundary,
routing through zend_timeout -> "Maximum execution time exceeded") and
delivers a platform-specific wake-up:
- Linux/FreeBSD: pthread_kill(SIGRTMIN+3) with a no-op handler installed
via pthread_once, SA_ONSTACK, no SA_RESTART. Signal delivery causes
the in-flight blocking syscall to return EINTR.
- Windows: CancelSynchronousIo + QueueUserAPC covers alertable I/O and
SleepEx. Non-alertable Sleep (including PHP's usleep) stays
uninterruptible.
- macOS: atomic-bool-only path. Threads stuck in blocking syscalls wait
for the syscall to complete naturally.
Reserved signal: SIGRTMIN+3. PHP's pcntl_signal(SIGRTMIN+3, ...)
clobbers it; embedders whose own Go code uses that signal must patch
the constant. glibc NPTL reserves SIGRTMIN..SIGRTMIN+2.
Drain integration: drainWorkerThreads waits drainGracePeriod (5s) for
each thread to reach Yielding, then arms force-kill on stragglers and
keeps waiting until they yield. phpThread.shutdown does the same.
There is no abandon path: if a thread is stuck in a syscall force-kill
cannot interrupt (macOS, Windows non-alertable Sleep) the drain blocks
until the syscall returns naturally - matching pre-patch behaviour
exactly, just typically much faster because force-kill cuts a 60s
sleep down to milliseconds. Operators that want a harder bound rely on
their orchestrator (systemd, k8s, supervisord) to SIGKILL the process.
worker_test.go + testdata/worker-sleep.php exercise the full path:
the test marks a file before sleep(60), polls until the worker is
proven parked, then asserts RestartWorkers completes within the grace
period and that the post-sleep echo never runs (which would mean the
VM interrupt was never observed).
0 commit comments