Skip to content

Commit 2ac6c41

Browse files
feat: cross-platform force-kill primitive for stuck PHP threads (#2365)
First step of the split suggested in #2287: land the force-kill infrastructure as a standalone, reviewable primitive independent of background workers. ## Design Each PHP thread, at boot from its own TSRM context, hands a `force_kill_slot` (pointers to its `EG(vm_interrupt)` and `EG(timed_out)` atomic bools, plus `pthread_t` / Windows `HANDLE`) back to Go via `go_frankenphp_store_force_kill_slot`. The slot lives on `phpThread` and is protected by a per-thread `RWMutex` so the zero-and-release path at thread exit cannot race an in-flight kill. From any goroutine, Go passes the slot back to `frankenphp_force_kill_thread`, which stores `true` into both atomic bools (waking the VM at the next opcode boundary, routing through `zend_timeout` -> "Maximum execution time exceeded") and delivers a platform-specific wake-up: - **Linux/FreeBSD**: `pthread_kill(SIGRTMIN+3)` with a no-op handler installed once via `pthread_once`, `SA_ONSTACK`, no `SA_RESTART`. Signal delivery returns any in-flight blocking syscall with `EINTR`. - **Windows**: `CancelSynchronousIo` + `QueueUserAPC` covers alertable I/O and `SleepEx`. Non-alertable `Sleep` (including PHP's `usleep`) stays uninterruptible. - **macOS**: atomic-bool path only; threads stuck in blocking syscalls wait for the syscall to complete naturally. **Reserved signal**: `SIGRTMIN+3`. A PHP script that calls `pcntl_signal(SIGRTMIN+3, ...)` clobbers this. Embedders whose own Go code uses `SIGRTMIN+3` must patch it here. glibc NPTL reserves `SIGRTMIN..SIGRTMIN+2`, so the offset cannot go lower. ## Drain integration `drainWorkerThreads` waits `drainGracePeriod` (30s) for each thread to reach `Yielding`, then arms force-kill on stragglers and **keeps waiting** until they yield. `phpThread.shutdown` does the same. There is no abandon path: if a thread is stuck in a syscall force-kill cannot interrupt (macOS, Windows non-alertable Sleep), the drain blocks until the syscall returns naturally — matching pre-patch behaviour exactly, just typically much faster because force-kill cuts a `sleep(60)` down to milliseconds. Operators that want a harder bound rely on their orchestrator (systemd, k8s, supervisord) to SIGKILL the process. `go_frankenphp_on_thread_shutdown` runs on both the healthy path and the unhealthy-during-Shutdown path so `state.Done` is set even when force-kill bails the thread. Without it, `phpThread.shutdown`'s `WaitFor(state.Done)` would never unblock. ## Testing `TestRestartWorkersForceKillsStuckThread` drives the full path via a marker file so `RestartWorkers` only arms once the worker is proven parked in `sleep()`, then asserts bounded elapsed time and that the post-sleep echo never runs.
1 parent 9208c55 commit 2ac6c41

7 files changed

Lines changed: 354 additions & 15 deletions

File tree

frankenphp.c

Lines changed: 127 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,111 @@ static bool is_forked_child = false;
9898
static void frankenphp_fork_child(void) { is_forked_child = true; }
9999
#endif
100100

101+
/* Best-effort force-kill for stuck PHP threads.
102+
*
103+
* Each thread captures &EG(vm_interrupt) / &EG(timed_out) at boot and
104+
* hands them to Go via go_frankenphp_store_force_kill_slot. To kill,
105+
* Go passes the slot back to frankenphp_force_kill_thread, which stores
106+
* true into both bools (the VM bails through zend_timeout() at the next
107+
* opcode boundary) and then wakes any in-flight syscall:
108+
* - Linux/FreeBSD: pthread_kill(SIGRTMIN+3) -> EINTR.
109+
* - Windows: CancelSynchronousIo + QueueUserAPC for alertable I/O +
110+
* SleepEx. Non-alertable Sleep (including PHP's usleep) stays stuck.
111+
* - macOS: atomic-bool only; busy loops bail, blocking syscalls don't.
112+
*
113+
* Reserved signal: SIGRTMIN+3. PHP's pcntl_signal(SIGRTMIN+3, ...)
114+
* clobbers it. glibc NPTL reserves SIGRTMIN..SIGRTMIN+2; embedders with
115+
* their own Go signal usage may need to patch this constant.
116+
*
117+
* The slot lives Go-side on phpThread; the C side has no global table.
118+
* The signal handler is installed once via pthread_once. */
119+
#ifdef PHP_WIN32
120+
static void CALLBACK frankenphp_noop_apc(ULONG_PTR param) { (void)param; }
121+
#endif
122+
123+
#ifdef FRANKENPHP_HAS_KILL_SIGNAL
124+
/* No-op: delivery itself is what unblocks the syscall via EINTR. */
125+
static void frankenphp_kill_signal_handler(int sig) { (void)sig; }
126+
127+
static pthread_once_t kill_signal_handler_installed = PTHREAD_ONCE_INIT;
128+
/* Set to true only after sigaction() succeeds. force_kill_thread skips
129+
* pthread_kill when this is false, so a sigaction failure (invalid
130+
* signal number, exhausted handler slots, etc.) can't deliver the
131+
* signal with its default action (process termination). */
132+
static zend_atomic_bool kill_signal_handler_active;
133+
static void install_kill_signal_handler(void) {
134+
/* No SA_RESTART so syscalls return EINTR rather than being restarted.
135+
* SA_ONSTACK guards against an accidental process-level delivery to a
136+
* Go-managed thread, where Go requires the alternate signal stack. */
137+
struct sigaction sa;
138+
memset(&sa, 0, sizeof(sa));
139+
sa.sa_handler = frankenphp_kill_signal_handler;
140+
sigemptyset(&sa.sa_mask);
141+
sa.sa_flags = SA_ONSTACK;
142+
if (sigaction(FRANKENPHP_KILL_SIGNAL, &sa, NULL) == 0) {
143+
zend_atomic_bool_store(&kill_signal_handler_active, true);
144+
}
145+
}
146+
#endif
147+
148+
/* Must run on the PHP thread itself: EG() resolves to its own TSRM
149+
* context and pthread_self() captures the right tid. */
150+
static void frankenphp_register_thread_for_kill(uintptr_t idx) {
151+
force_kill_slot slot;
152+
memset(&slot, 0, sizeof(slot));
153+
slot.vm_interrupt = &EG(vm_interrupt);
154+
slot.timed_out = &EG(timed_out);
155+
#ifdef FRANKENPHP_HAS_KILL_SIGNAL
156+
slot.tid = pthread_self();
157+
pthread_once(&kill_signal_handler_installed, install_kill_signal_handler);
158+
#elif defined(PHP_WIN32)
159+
if (!DuplicateHandle(GetCurrentProcess(), GetCurrentThread(),
160+
GetCurrentProcess(), &slot.thread_handle, 0, FALSE,
161+
DUPLICATE_SAME_ACCESS)) {
162+
/* On failure, force_kill falls back to atomic-bool only. */
163+
slot.thread_handle = NULL;
164+
}
165+
#endif
166+
go_frankenphp_store_force_kill_slot(idx, slot);
167+
}
168+
169+
void frankenphp_force_kill_thread(force_kill_slot slot) {
170+
if (slot.vm_interrupt == NULL) {
171+
/* Boot aborted before the slot was published. */
172+
return;
173+
}
174+
175+
/* Atomic stores first: by the time the thread wakes (signal-driven or
176+
* natural) the VM sees them and bails through zend_timeout(). */
177+
zend_atomic_bool_store(slot.timed_out, true);
178+
zend_atomic_bool_store(slot.vm_interrupt, true);
179+
180+
#ifdef FRANKENPHP_HAS_KILL_SIGNAL
181+
/* ESRCH (thread already exited) / EINVAL are both benign here.
182+
* Skip if sigaction() failed at install time: delivering an unhandled
183+
* SIGRTMIN+3 would terminate the process. */
184+
if (zend_atomic_bool_load(&kill_signal_handler_active)) {
185+
pthread_kill(slot.tid, FRANKENPHP_KILL_SIGNAL);
186+
}
187+
#elif defined(PHP_WIN32)
188+
if (slot.thread_handle != NULL) {
189+
CancelSynchronousIo(slot.thread_handle);
190+
QueueUserAPC((PAPCFUNC)frankenphp_noop_apc, slot.thread_handle, 0);
191+
}
192+
#endif
193+
}
194+
195+
/* CloseHandle on Windows; no-op on POSIX. */
196+
void frankenphp_release_thread_for_kill(force_kill_slot slot) {
197+
#ifdef PHP_WIN32
198+
if (slot.thread_handle != NULL) {
199+
CloseHandle(slot.thread_handle);
200+
}
201+
#else
202+
(void)slot;
203+
#endif
204+
}
205+
101206
void frankenphp_update_local_thread_context(bool is_worker) {
102207
is_worker_thread = is_worker;
103208

@@ -1118,6 +1223,16 @@ static void *php_thread(void *arg) {
11181223
snprintf(thread_name, 16, "php-%" PRIxPTR, thread_index);
11191224
set_thread_name(thread_name);
11201225

1226+
#ifdef FRANKENPHP_HAS_KILL_SIGNAL
1227+
/* The spawning Go-managed M may block realtime signals, which the
1228+
* new pthread inherits. Unblock FRANKENPHP_KILL_SIGNAL here so
1229+
* force-kill deliveries are not silently dropped. */
1230+
sigset_t unblock;
1231+
sigemptyset(&unblock);
1232+
sigaddset(&unblock, FRANKENPHP_KILL_SIGNAL);
1233+
pthread_sigmask(SIG_UNBLOCK, &unblock, NULL);
1234+
#endif
1235+
11211236
/* Initial allocation of all global PHP memory for this thread */
11221237
#ifdef ZTS
11231238
(void)ts_resource(0);
@@ -1126,6 +1241,10 @@ static void *php_thread(void *arg) {
11261241
#endif
11271242
#endif
11281243

1244+
/* Publish this thread's force-kill slot to Go so the graceful-drain
1245+
* grace period can wake it from a busy PHP loop or blocking syscall. */
1246+
frankenphp_register_thread_for_kill(thread_index);
1247+
11291248
bool thread_is_healthy = true;
11301249
bool has_attempted_shutdown = false;
11311250

@@ -1203,6 +1322,11 @@ static void *php_thread(void *arg) {
12031322
}
12041323
zend_end_try();
12051324

1325+
/* Must precede ts_free_thread: that frees the TSRM storage backing
1326+
* the slot's &EG() pointers. Clearing first means any concurrent
1327+
* force-kill either ran before us or sees a zero slot. */
1328+
go_frankenphp_clear_force_kill_slot(thread_index);
1329+
12061330
/* free all global PHP memory reserved for this thread */
12071331
#ifdef ZTS
12081332
ts_free_thread();
@@ -1211,12 +1335,9 @@ static void *php_thread(void *arg) {
12111335
/* Thread is healthy, signal to Go that the thread has shut down */
12121336
if (thread_is_healthy) {
12131337
go_frankenphp_on_thread_shutdown(thread_index);
1214-
12151338
return NULL;
12161339
}
12171340

1218-
/* Thread is unhealthy, PHP globals might be in a bad state after a bailout,
1219-
* restart the entire thread */
12201341
frankenphp_log_message("Restarting unhealthy thread", LOG_WARNING);
12211342

12221343
if (!frankenphp_new_php_thread(thread_index)) {
@@ -1318,7 +1439,9 @@ static void *php_main(void *arg) {
13181439

13191440
go_frankenphp_main_thread_is_ready();
13201441

1321-
/* channel closed, shutdown gracefully */
1442+
/* channel closed, shutdown gracefully. drainPHPThreads has already
1443+
* waited for every PHP thread to exit (state.Done), so SAPI/TSRM
1444+
* teardown here is safe. */
13221445
frankenphp_sapi_module.shutdown(&frankenphp_sapi_module);
13231446

13241447
sapi_shutdown();

frankenphp.h

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,28 @@ static inline HRESULT LongLongSub(LONGLONG llMinuend, LONGLONG llSubtrahend,
4646
#include <stdbool.h>
4747
#include <stdint.h>
4848

49+
#ifndef PHP_WIN32
50+
#include <pthread.h>
51+
#include <signal.h>
52+
#endif
53+
54+
/* Platform capabilities for the force-kill primitive; declared in the
55+
* header so Go (via CGo) gets the correct struct layout too. */
56+
#if !defined(PHP_WIN32) && defined(SIGRTMIN)
57+
#define FRANKENPHP_HAS_KILL_SIGNAL 1
58+
#define FRANKENPHP_KILL_SIGNAL (SIGRTMIN + 3)
59+
#endif
60+
61+
typedef struct {
62+
zend_atomic_bool *vm_interrupt;
63+
zend_atomic_bool *timed_out;
64+
#ifdef FRANKENPHP_HAS_KILL_SIGNAL
65+
pthread_t tid;
66+
#elif defined(PHP_WIN32)
67+
HANDLE thread_handle;
68+
#endif
69+
} force_kill_slot;
70+
4971
#ifndef FRANKENPHP_VERSION
5072
#define FRANKENPHP_VERSION dev
5173
#endif
@@ -193,6 +215,17 @@ void frankenphp_init_thread_metrics(int max_threads);
193215
void frankenphp_destroy_thread_metrics(void);
194216
size_t frankenphp_get_thread_memory_usage(uintptr_t thread_index);
195217

218+
/* Best-effort force-kill primitives. The slot is populated by each PHP
219+
* thread at boot (an internal helper calls back into Go via
220+
* go_frankenphp_store_force_kill_slot) and lives in the Go-side phpThread.
221+
* force_kill_thread interrupts the Zend VM at the next opcode boundary;
222+
* on POSIX it also delivers SIGRTMIN+3 to the target thread, on Windows
223+
* it calls CancelSynchronousIo + QueueUserAPC. release_thread drops any
224+
* OS-owned resource tied to the slot (currently the Windows thread
225+
* handle). */
226+
void frankenphp_force_kill_thread(force_kill_slot slot);
227+
void frankenphp_release_thread_for_kill(force_kill_slot slot);
228+
196229
void register_extensions(zend_module_entry **m, int len);
197230

198231
#endif

phpmainthread.go

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,8 @@ func initPHPThreads(numThreads int, numMaxThreads int, phpIni map[string]string)
5454
return nil, err
5555
}
5656

57+
// Must follow start(): maxThreads is only final once
58+
// setAutomaticMaxThreads runs on the main PHP thread (before Ready).
5759
C.frankenphp_init_thread_metrics(C.int(mainThread.maxThreads))
5860

5961
// initialize all other threads
@@ -79,6 +81,11 @@ func drainPHPThreads() {
7981
if mainThread == nil {
8082
return // mainThread was never initialized
8183
}
84+
// Idempotent: post-drain state is Reserved; a re-entry (e.g. a
85+
// failed-Init cleanup) must not double-close mainThread.done.
86+
if mainThread.state.Is(state.Reserved) {
87+
return
88+
}
8289
doneWG := sync.WaitGroup{}
8390
doneWG.Add(len(phpThreads))
8491
mainThread.state.Set(state.ShuttingDown)

phpthread.go

Lines changed: 51 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ import (
88
"runtime"
99
"sync"
1010
"sync/atomic"
11+
"time"
1112
"unsafe"
1213

1314
"github.com/dunglas/frankenphp/internal/state"
@@ -25,6 +26,12 @@ type phpThread struct {
2526
contextMu sync.RWMutex
2627
state *state.ThreadState
2728
requestCount atomic.Int64
29+
// forceKill holds &EG() pointers captured on the PHP thread itself.
30+
// forceKillMu pairs with go_frankenphp_clear_force_kill_slot's write
31+
// lock so a concurrent kill never dereferences pointers freed by
32+
// ts_free_thread.
33+
forceKillMu sync.RWMutex
34+
forceKill C.force_kill_slot
2835
}
2936

3037
// threadHandler defines how the callbacks from the C thread should be handled
@@ -99,7 +106,27 @@ func (thread *phpThread) shutdown() {
99106
}
100107

101108
close(thread.drainChan)
102-
thread.state.WaitFor(state.Done)
109+
110+
// Arm force-kill after the grace period to wake any thread stuck in
111+
// a blocking syscall (sleep, blocking I/O). The wait remains
112+
// unbounded - on platforms where force-kill cannot interrupt the
113+
// syscall (macOS, Windows non-alertable Sleep) the thread will exit
114+
// when the syscall completes naturally; the operator's orchestrator
115+
// is responsible for any harder timeout.
116+
done := make(chan struct{})
117+
go func() {
118+
thread.state.WaitFor(state.Done)
119+
close(done)
120+
}()
121+
select {
122+
case <-done:
123+
case <-time.After(drainGracePeriod):
124+
thread.forceKillMu.RLock()
125+
C.frankenphp_force_kill_thread(thread.forceKill)
126+
thread.forceKillMu.RUnlock()
127+
<-done
128+
}
129+
103130
thread.drainChan = make(chan struct{})
104131

105132
// threads go back to the reserved state from which they can be booted again
@@ -209,6 +236,29 @@ func go_frankenphp_after_script_execution(threadIndex C.uintptr_t, exitStatus C.
209236
thread.Unpin()
210237
}
211238

239+
//export go_frankenphp_store_force_kill_slot
240+
func go_frankenphp_store_force_kill_slot(threadIndex C.uintptr_t, slot C.force_kill_slot) {
241+
thread := phpThreads[threadIndex]
242+
thread.forceKillMu.Lock()
243+
// Release any prior slot's OS resource (Windows HANDLE) before
244+
// overwriting; a phpThread can reboot and re-register.
245+
C.frankenphp_release_thread_for_kill(thread.forceKill)
246+
thread.forceKill = slot
247+
thread.forceKillMu.Unlock()
248+
}
249+
250+
//export go_frankenphp_clear_force_kill_slot
251+
func go_frankenphp_clear_force_kill_slot(threadIndex C.uintptr_t) {
252+
// Called from C before ts_free_thread on both exit paths. Zeroing
253+
// the slot under the write lock guarantees any concurrent kill
254+
// either completed before we got the lock or sees a zero slot.
255+
thread := phpThreads[threadIndex]
256+
thread.forceKillMu.Lock()
257+
C.frankenphp_release_thread_for_kill(thread.forceKill)
258+
thread.forceKill = C.force_kill_slot{}
259+
thread.forceKillMu.Unlock()
260+
}
261+
212262
//export go_frankenphp_on_thread_shutdown
213263
func go_frankenphp_on_thread_shutdown(threadIndex C.uintptr_t) {
214264
thread := phpThreads[threadIndex]

testdata/worker-sleep.php

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
<?php
2+
3+
// Worker that sleeps inside the handler to simulate a stuck request blocking
4+
// drain. Used to test the force-kill grace period.
5+
//
6+
// Before sleeping we touch a marker file whose path is passed via the
7+
// SLEEP_MARKER header. The Go test polls for the file so it only arms
8+
// RestartWorkers once the worker is proven to be inside sleep(), removing
9+
// the fixed-time race of a bare time.Sleep on the caller side.
10+
$fn = static function () {
11+
$marker = $_SERVER['HTTP_SLEEP_MARKER'] ?? '';
12+
if ($marker !== '') {
13+
touch($marker);
14+
}
15+
sleep(60);
16+
echo 'should not reach';
17+
};
18+
19+
do {
20+
$ret = \frankenphp_handle_request($fn);
21+
} while ($ret);

0 commit comments

Comments
 (0)