Skip to content

Commit fd6a4c0

Browse files
feat: cross-platform force-kill primitive for stuck PHP threads
Introduces a self-contained primitive that wakes a PHP thread parked in a blocking call (sleep, synchronous I/O, etc.) so the graceful drain used by RestartWorkers / DrainWorkers / Shutdown completes promptly instead of waiting for the syscall to return naturally. Design: each PHP thread, at boot from its own TSRM context, hands a force_kill_slot (pointers to its EG(vm_interrupt) and EG(timed_out) atomic bools, plus pthread_t / Windows HANDLE) back to Go via go_frankenphp_store_force_kill_slot. The slot lives on phpThread and is protected by a per-thread RWMutex so the zero-and-release path at thread exit cannot race an in-flight kill. From any goroutine, Go passes the slot back to frankenphp_force_kill_thread, which stores true into both bools (waking the VM at the next opcode boundary, routing through zend_timeout -> "Maximum execution time exceeded") and delivers a platform-specific wake-up: - Linux/FreeBSD: pthread_kill(SIGRTMIN+3) with a no-op handler installed via pthread_once, SA_ONSTACK, no SA_RESTART. Signal delivery causes the in-flight blocking syscall to return EINTR. - Windows: CancelSynchronousIo + QueueUserAPC covers alertable I/O and SleepEx. Non-alertable Sleep (including PHP's usleep) stays uninterruptible. - macOS: atomic-bool-only path. Threads stuck in blocking syscalls wait for the syscall to complete naturally. Reserved signal: SIGRTMIN+3. PHP's pcntl_signal(SIGRTMIN+3, ...) clobbers it; embedders whose own Go code uses that signal must patch the constant. glibc NPTL reserves SIGRTMIN..SIGRTMIN+2. Drain integration: drainWorkerThreads waits drainGracePeriod (5s) for each thread to reach Yielding, then arms force-kill on stragglers and keeps waiting until they yield. phpThread.shutdown does the same. There is no abandon path: if a thread is stuck in a syscall force-kill cannot interrupt (macOS, Windows non-alertable Sleep) the drain blocks until the syscall returns naturally - matching pre-patch behaviour exactly, just typically much faster because force-kill cuts a 60s sleep down to milliseconds. Operators that want a harder bound rely on their orchestrator (systemd, k8s, supervisord) to SIGKILL the process. worker_test.go + testdata/worker-sleep.php exercise the full path: the test marks a file before sleep(60), polls until the worker is proven parked, then asserts RestartWorkers completes within the grace period and that the post-sleep echo never runs (which would mean the VM interrupt was never observed).
1 parent a05e6dd commit fd6a4c0

7 files changed

Lines changed: 358 additions & 16 deletions

File tree

frankenphp.c

Lines changed: 135 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,107 @@ static bool is_forked_child = false;
9292
static void frankenphp_fork_child(void) { is_forked_child = true; }
9393
#endif
9494

95+
/* Best-effort force-kill for stuck PHP threads.
96+
*
97+
* Each thread captures &EG(vm_interrupt) / &EG(timed_out) at boot and
98+
* hands them to Go via go_frankenphp_store_force_kill_slot. To kill,
99+
* Go passes the slot back to frankenphp_force_kill_thread, which stores
100+
* true into both bools (the VM bails through zend_timeout() at the next
101+
* opcode boundary) and then wakes any in-flight syscall:
102+
* - Linux/FreeBSD: pthread_kill(SIGRTMIN+3) -> EINTR.
103+
* - Windows: CancelSynchronousIo + QueueUserAPC for alertable I/O +
104+
* SleepEx. Non-alertable Sleep (including PHP's usleep) stays stuck.
105+
* - macOS: atomic-bool only; busy loops bail, blocking syscalls don't.
106+
*
107+
* Reserved signal: SIGRTMIN+3. PHP's pcntl_signal(SIGRTMIN+3, ...)
108+
* clobbers it. glibc NPTL reserves SIGRTMIN..SIGRTMIN+2; embedders with
109+
* their own Go signal usage may need to patch this constant.
110+
*
111+
* The slot lives Go-side on phpThread; the C side has no global table.
112+
* The signal handler is installed once via pthread_once. */
113+
#ifdef PHP_WIN32
114+
static void CALLBACK frankenphp_noop_apc(ULONG_PTR param) { (void)param; }
115+
#endif
116+
117+
#ifdef FRANKENPHP_HAS_KILL_SIGNAL
118+
/* No-op: delivery itself is what unblocks the syscall via EINTR. */
119+
static void frankenphp_kill_signal_handler(int sig) { (void)sig; }
120+
121+
static pthread_once_t kill_signal_handler_installed = PTHREAD_ONCE_INIT;
122+
static void install_kill_signal_handler(void) {
123+
/* No SA_RESTART so syscalls return EINTR rather than being restarted.
124+
* SA_ONSTACK guards against an accidental process-level delivery to a
125+
* Go-managed thread, where Go requires the alternate signal stack. */
126+
struct sigaction sa;
127+
memset(&sa, 0, sizeof(sa));
128+
sa.sa_handler = frankenphp_kill_signal_handler;
129+
sigemptyset(&sa.sa_mask);
130+
sa.sa_flags = SA_ONSTACK;
131+
sigaction(FRANKENPHP_KILL_SIGNAL, &sa, NULL);
132+
}
133+
#endif
134+
135+
/* Set by frankenphp_set_shutdown_in_progress to gate the unhealthy-thread
136+
* respawn loop off once Shutdown begins. */
137+
static zend_atomic_bool shutdown_in_progress;
138+
139+
void frankenphp_set_shutdown_in_progress(bool v) {
140+
zend_atomic_bool_store(&shutdown_in_progress, v);
141+
}
142+
143+
/* Must run on the PHP thread itself: EG() resolves to its own TSRM
144+
* context and pthread_self() captures the right tid. */
145+
void frankenphp_register_thread_for_kill(uintptr_t idx) {
146+
force_kill_slot slot;
147+
memset(&slot, 0, sizeof(slot));
148+
slot.vm_interrupt = &EG(vm_interrupt);
149+
slot.timed_out = &EG(timed_out);
150+
#ifdef FRANKENPHP_HAS_KILL_SIGNAL
151+
slot.tid = pthread_self();
152+
pthread_once(&kill_signal_handler_installed, install_kill_signal_handler);
153+
#elif defined(PHP_WIN32)
154+
if (!DuplicateHandle(GetCurrentProcess(), GetCurrentThread(),
155+
GetCurrentProcess(), &slot.thread_handle, 0, FALSE,
156+
DUPLICATE_SAME_ACCESS)) {
157+
/* On failure, force_kill falls back to atomic-bool only. */
158+
slot.thread_handle = NULL;
159+
}
160+
#endif
161+
go_frankenphp_store_force_kill_slot(idx, slot);
162+
}
163+
164+
void frankenphp_force_kill_thread(force_kill_slot slot) {
165+
if (slot.vm_interrupt == NULL) {
166+
/* Boot aborted before register_thread_for_kill. */
167+
return;
168+
}
169+
/* Atomic stores first: by the time the thread wakes (signal-driven or
170+
* natural) the VM sees them and bails through zend_timeout(). */
171+
zend_atomic_bool_store(slot.timed_out, true);
172+
zend_atomic_bool_store(slot.vm_interrupt, true);
173+
174+
#ifdef FRANKENPHP_HAS_KILL_SIGNAL
175+
/* ESRCH (thread already exited) / EINVAL are both benign here. */
176+
pthread_kill(slot.tid, FRANKENPHP_KILL_SIGNAL);
177+
#elif defined(PHP_WIN32)
178+
if (slot.thread_handle != NULL) {
179+
CancelSynchronousIo(slot.thread_handle);
180+
QueueUserAPC((PAPCFUNC)frankenphp_noop_apc, slot.thread_handle, 0);
181+
}
182+
#endif
183+
}
184+
185+
/* CloseHandle on Windows; no-op on POSIX. */
186+
void frankenphp_release_thread_for_kill(force_kill_slot slot) {
187+
#ifdef PHP_WIN32
188+
if (slot.thread_handle != NULL) {
189+
CloseHandle(slot.thread_handle);
190+
}
191+
#else
192+
(void)slot;
193+
#endif
194+
}
195+
95196
void frankenphp_update_local_thread_context(bool is_worker) {
96197
is_worker_thread = is_worker;
97198

@@ -1065,6 +1166,16 @@ static void *php_thread(void *arg) {
10651166
snprintf(thread_name, 16, "php-%" PRIxPTR, thread_index);
10661167
set_thread_name(thread_name);
10671168

1169+
#ifdef FRANKENPHP_HAS_KILL_SIGNAL
1170+
/* The spawning Go-managed M may block realtime signals, which the
1171+
* new pthread inherits. Unblock FRANKENPHP_KILL_SIGNAL here so
1172+
* force-kill deliveries are not silently dropped. */
1173+
sigset_t unblock;
1174+
sigemptyset(&unblock);
1175+
sigaddset(&unblock, FRANKENPHP_KILL_SIGNAL);
1176+
pthread_sigmask(SIG_UNBLOCK, &unblock, NULL);
1177+
#endif
1178+
10681179
/* Initial allocation of all global PHP memory for this thread */
10691180
#ifdef ZTS
10701181
(void)ts_resource(0);
@@ -1073,6 +1184,11 @@ static void *php_thread(void *arg) {
10731184
#endif
10741185
#endif
10751186

1187+
/* Register this thread's vm_interrupt/timed_out addresses so the Go side
1188+
* can force-kill it after the graceful-drain grace period if it gets stuck
1189+
* in a busy PHP loop. */
1190+
frankenphp_register_thread_for_kill(thread_index);
1191+
10761192
bool thread_is_healthy = true;
10771193
bool has_attempted_shutdown = false;
10781194

@@ -1150,6 +1266,11 @@ static void *php_thread(void *arg) {
11501266
}
11511267
zend_end_try();
11521268

1269+
/* Must precede ts_free_thread: that frees the TSRM storage backing
1270+
* the slot's &EG() pointers. Clearing first means any concurrent
1271+
* force-kill either ran before us or sees a zero slot. */
1272+
go_frankenphp_clear_force_kill_slot(thread_index);
1273+
11531274
/* free all global PHP memory reserved for this thread */
11541275
#ifdef ZTS
11551276
ts_free_thread();
@@ -1158,12 +1279,20 @@ static void *php_thread(void *arg) {
11581279
/* Thread is healthy, signal to Go that the thread has shut down */
11591280
if (thread_is_healthy) {
11601281
go_frankenphp_on_thread_shutdown(thread_index);
1161-
11621282
return NULL;
11631283
}
11641284

1165-
/* Thread is unhealthy, PHP globals might be in a bad state after a bailout,
1166-
* restart the entire thread */
1285+
/* Unhealthy: respawn unless Shutdown is in progress; respawning then
1286+
* would hand a fresh pthread a phpThreads slice already untracked.
1287+
* Notify Go either way so phpThread.shutdown's WaitFor(state.Done)
1288+
* unblocks - force-kill bails the thread through this path. */
1289+
if (zend_atomic_bool_load(&shutdown_in_progress)) {
1290+
frankenphp_log_message(
1291+
"Unhealthy thread unwinding after Shutdown; not restarting",
1292+
LOG_WARNING);
1293+
go_frankenphp_on_thread_shutdown(thread_index);
1294+
return NULL;
1295+
}
11671296
frankenphp_log_message("Restarting unhealthy thread", LOG_WARNING);
11681297

11691298
if (!frankenphp_new_php_thread(thread_index)) {
@@ -1265,7 +1394,9 @@ static void *php_main(void *arg) {
12651394

12661395
go_frankenphp_main_thread_is_ready();
12671396

1268-
/* channel closed, shutdown gracefully */
1397+
/* channel closed, shutdown gracefully. drainPHPThreads has already
1398+
* waited for every PHP thread to exit (state.Done), so SAPI/TSRM
1399+
* teardown here is safe. */
12691400
frankenphp_sapi_module.shutdown(&frankenphp_sapi_module);
12701401

12711402
sapi_shutdown();

frankenphp.h

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,28 @@ static inline HRESULT LongLongSub(LONGLONG llMinuend, LONGLONG llSubtrahend,
4646
#include <stdbool.h>
4747
#include <stdint.h>
4848

49+
#ifndef PHP_WIN32
50+
#include <pthread.h>
51+
#include <signal.h>
52+
#endif
53+
54+
/* Platform capabilities for the force-kill primitive; declared in the
55+
* header so Go (via CGo) gets the correct struct layout too. */
56+
#if !defined(PHP_WIN32) && defined(SIGRTMIN)
57+
#define FRANKENPHP_HAS_KILL_SIGNAL 1
58+
#define FRANKENPHP_KILL_SIGNAL (SIGRTMIN + 3)
59+
#endif
60+
61+
typedef struct {
62+
zend_atomic_bool *vm_interrupt;
63+
zend_atomic_bool *timed_out;
64+
#ifdef FRANKENPHP_HAS_KILL_SIGNAL
65+
pthread_t tid;
66+
#elif defined(PHP_WIN32)
67+
HANDLE thread_handle;
68+
#endif
69+
} force_kill_slot;
70+
4971
#ifndef FRANKENPHP_VERSION
5072
#define FRANKENPHP_VERSION dev
5173
#endif
@@ -193,6 +215,19 @@ void frankenphp_init_thread_metrics(int max_threads);
193215
void frankenphp_destroy_thread_metrics(void);
194216
size_t frankenphp_get_thread_memory_usage(uintptr_t thread_index);
195217

218+
/* Best-effort force-kill primitives. The slot is populated by each PHP
219+
* thread at boot (frankenphp_register_thread_for_kill calls back into Go
220+
* via go_frankenphp_store_force_kill_slot) and lives in the Go-side
221+
* phpThread. force_kill_thread interrupts the Zend VM at the next opcode
222+
* boundary; on POSIX it also delivers SIGRTMIN+3 to the target thread,
223+
* on Windows it calls CancelSynchronousIo + QueueUserAPC. release_thread
224+
* drops any OS-owned resource tied to the slot (currently the Windows
225+
* thread handle). */
226+
void frankenphp_set_shutdown_in_progress(bool v);
227+
void frankenphp_register_thread_for_kill(uintptr_t thread_index);
228+
void frankenphp_force_kill_thread(force_kill_slot slot);
229+
void frankenphp_release_thread_for_kill(force_kill_slot slot);
230+
196231
void register_extensions(zend_module_entry **m, int len);
197232

198233
#endif

phpmainthread.go

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,9 @@ var (
3535
// a fixed number of inactive PHP threads
3636
// and reserves a fixed number of possible PHP threads
3737
func initPHPThreads(numThreads int, numMaxThreads int, phpIni map[string]string) (*phpMainThread, error) {
38+
// Re-arm the unhealthy-restart respawn path for this Init cycle.
39+
C.frankenphp_set_shutdown_in_progress(false)
40+
3841
mainThread = &phpMainThread{
3942
state: state.NewThreadState(),
4043
done: make(chan struct{}),
@@ -54,6 +57,8 @@ func initPHPThreads(numThreads int, numMaxThreads int, phpIni map[string]string)
5457
return nil, err
5558
}
5659

60+
// Must follow start(): maxThreads is only final once
61+
// setAutomaticMaxThreads runs on the main PHP thread (before Ready).
5762
C.frankenphp_init_thread_metrics(C.int(mainThread.maxThreads))
5863

5964
// initialize all other threads
@@ -79,6 +84,13 @@ func drainPHPThreads() {
7984
if mainThread == nil {
8085
return // mainThread was never initialized
8186
}
87+
// Idempotent: post-drain state is Reserved; a re-entry (e.g. a
88+
// failed-Init cleanup) must not double-close mainThread.done.
89+
if mainThread.state.Is(state.Reserved) {
90+
return
91+
}
92+
// Stop the unhealthy-restart respawn path before any thread exits.
93+
C.frankenphp_set_shutdown_in_progress(true)
8294
doneWG := sync.WaitGroup{}
8395
doneWG.Add(len(phpThreads))
8496
mainThread.state.Set(state.ShuttingDown)

phpthread.go

Lines changed: 51 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ import (
88
"runtime"
99
"sync"
1010
"sync/atomic"
11+
"time"
1112
"unsafe"
1213

1314
"github.com/dunglas/frankenphp/internal/state"
@@ -25,6 +26,12 @@ type phpThread struct {
2526
contextMu sync.RWMutex
2627
state *state.ThreadState
2728
requestCount atomic.Int64
29+
// forceKill holds &EG() pointers captured on the PHP thread itself.
30+
// forceKillMu pairs with go_frankenphp_clear_force_kill_slot's write
31+
// lock so a concurrent kill never dereferences pointers freed by
32+
// ts_free_thread.
33+
forceKillMu sync.RWMutex
34+
forceKill C.force_kill_slot
2835
}
2936

3037
// threadHandler defines how the callbacks from the C thread should be handled
@@ -93,7 +100,27 @@ func (thread *phpThread) shutdown() {
93100
}
94101

95102
close(thread.drainChan)
96-
thread.state.WaitFor(state.Done)
103+
104+
// Arm force-kill after the grace period to wake any thread stuck in
105+
// a blocking syscall (sleep, blocking I/O). The wait remains
106+
// unbounded - on platforms where force-kill cannot interrupt the
107+
// syscall (macOS, Windows non-alertable Sleep) the thread will exit
108+
// when the syscall completes naturally; the operator's orchestrator
109+
// is responsible for any harder timeout.
110+
done := make(chan struct{})
111+
go func() {
112+
thread.state.WaitFor(state.Done)
113+
close(done)
114+
}()
115+
select {
116+
case <-done:
117+
case <-time.After(drainGracePeriod):
118+
thread.forceKillMu.RLock()
119+
C.frankenphp_force_kill_thread(thread.forceKill)
120+
thread.forceKillMu.RUnlock()
121+
<-done
122+
}
123+
97124
thread.drainChan = make(chan struct{})
98125

99126
// threads go back to the reserved state from which they can be booted again
@@ -203,6 +230,29 @@ func go_frankenphp_after_script_execution(threadIndex C.uintptr_t, exitStatus C.
203230
thread.Unpin()
204231
}
205232

233+
//export go_frankenphp_store_force_kill_slot
234+
func go_frankenphp_store_force_kill_slot(threadIndex C.uintptr_t, slot C.force_kill_slot) {
235+
thread := phpThreads[threadIndex]
236+
thread.forceKillMu.Lock()
237+
// Release any prior slot's OS resource (Windows HANDLE) before
238+
// overwriting; a phpThread can reboot and re-register.
239+
C.frankenphp_release_thread_for_kill(thread.forceKill)
240+
thread.forceKill = slot
241+
thread.forceKillMu.Unlock()
242+
}
243+
244+
//export go_frankenphp_clear_force_kill_slot
245+
func go_frankenphp_clear_force_kill_slot(threadIndex C.uintptr_t) {
246+
// Called from C before ts_free_thread on both exit paths. Zeroing
247+
// the slot under the write lock guarantees any concurrent kill
248+
// either completed before we got the lock or sees a zero slot.
249+
thread := phpThreads[threadIndex]
250+
thread.forceKillMu.Lock()
251+
C.frankenphp_release_thread_for_kill(thread.forceKill)
252+
thread.forceKill = C.force_kill_slot{}
253+
thread.forceKillMu.Unlock()
254+
}
255+
206256
//export go_frankenphp_on_thread_shutdown
207257
func go_frankenphp_on_thread_shutdown(threadIndex C.uintptr_t) {
208258
thread := phpThreads[threadIndex]

testdata/worker-sleep.php

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
<?php
2+
3+
// Worker that sleeps inside the handler to simulate a stuck request blocking
4+
// drain. Used to test the force-kill grace period.
5+
//
6+
// Before sleeping we touch a marker file whose path is passed via the
7+
// SLEEP_MARKER header. The Go test polls for the file so it only arms
8+
// RestartWorkers once the worker is proven to be inside sleep(), removing
9+
// the fixed-time race of a bare time.Sleep on the caller side.
10+
$fn = static function () {
11+
$marker = $_SERVER['HTTP_SLEEP_MARKER'] ?? '';
12+
if ($marker !== '') {
13+
touch($marker);
14+
}
15+
sleep(60);
16+
echo 'should not reach';
17+
};
18+
19+
do {
20+
$ret = \frankenphp_handle_request($fn);
21+
} while ($ret);

0 commit comments

Comments
 (0)