Skip to content

Commit 815d2ea

Browse files
feat: cross-platform force-kill primitive for stuck PHP threads
Introduces a self-contained primitive that wakes a PHP thread parked in a blocking call (sleep, synchronous I/O, etc.) so the graceful drain used by RestartWorkers / DrainWorkers / Shutdown completes promptly instead of waiting for the syscall to return naturally. Design: each PHP thread, at boot from its own TSRM context, hands a force_kill_slot (pointers to its EG(vm_interrupt) and EG(timed_out) atomic bools, plus pthread_t / Windows HANDLE) back to Go via go_frankenphp_store_force_kill_slot. The slot lives on phpThread and is protected by a per-thread RWMutex so the zero-and-release path at thread exit cannot race an in-flight kill. From any goroutine, Go passes the slot back to frankenphp_force_kill_thread, which stores true into both bools (waking the VM at the next opcode boundary, routing through zend_timeout -> "Maximum execution time exceeded") and delivers a platform-specific wake-up: - Linux/FreeBSD: pthread_kill(SIGRTMIN+3) with a no-op handler installed via pthread_once, SA_ONSTACK, no SA_RESTART. Signal delivery causes the in-flight blocking syscall to return EINTR. - Windows: CancelSynchronousIo + QueueUserAPC covers alertable I/O and SleepEx. Non-alertable Sleep (including PHP's usleep) stays uninterruptible. - macOS: atomic-bool-only path. Threads stuck in blocking syscalls wait for the syscall to complete naturally. Reserved signal: SIGRTMIN+3. PHP's pcntl_signal(SIGRTMIN+3, ...) clobbers it; embedders whose own Go code uses that signal must patch the constant. glibc NPTL reserves SIGRTMIN..SIGRTMIN+2. Drain integration: drainWorkerThreads waits drainGracePeriod (5s) for each thread to reach Yielding, then arms force-kill on stragglers and keeps waiting until they yield. phpThread.shutdown does the same. There is no abandon path: if a thread is stuck in a syscall force-kill cannot interrupt (macOS, Windows non-alertable Sleep) the drain blocks until the syscall returns naturally - matching pre-patch behaviour exactly, just typically much faster because force-kill cuts a 60s sleep down to milliseconds. Operators that want a harder bound rely on their orchestrator (systemd, k8s, supervisord) to SIGKILL the process. worker_test.go + testdata/worker-sleep.php exercise the full path: the test marks a file before sleep(60), polls until the worker is proven parked, then asserts RestartWorkers completes within the grace period and that the post-sleep echo never runs (which would mean the VM interrupt was never observed).
1 parent a05e6dd commit 815d2ea

7 files changed

Lines changed: 329 additions & 16 deletions

File tree

frankenphp.c

Lines changed: 113 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,78 @@ static bool is_forked_child = false;
9292
static void frankenphp_fork_child(void) { is_forked_child = true; }
9393
#endif
9494

95+
/* Best-effort force-kill for stuck PHP threads.
96+
*
97+
* Each thread captures &EG(vm_interrupt) / &EG(timed_out) at boot and
98+
* hands them to Go via go_frankenphp_store_force_kill_slot. To kill,
99+
* Go passes the slot back to frankenphp_force_kill_thread, which stores
100+
* true into both bools (the VM bails through zend_timeout() at the next
101+
* opcode boundary) and then wakes any in-flight syscall:
102+
* - Linux/FreeBSD: pthread_kill(SIGRTMIN+3) -> EINTR.
103+
* - Windows: CancelSynchronousIo + QueueUserAPC for alertable I/O +
104+
* SleepEx. Non-alertable Sleep (including PHP's usleep) stays stuck.
105+
* - macOS: atomic-bool only; busy loops bail, blocking syscalls don't.
106+
*
107+
* Reserved signal: SIGRTMIN+3. PHP's pcntl_signal(SIGRTMIN+3, ...)
108+
* clobbers it. glibc NPTL reserves SIGRTMIN..SIGRTMIN+2; embedders with
109+
* their own Go signal usage may need to patch this constant.
110+
*
111+
* The slot lives Go-side on phpThread; the C side has no global table.
112+
* The signal handler is installed once via pthread_once. */
113+
#ifdef PHP_WIN32
114+
static void CALLBACK frankenphp_noop_apc(ULONG_PTR param) { (void)param; }
115+
#endif
116+
117+
#ifdef FRANKENPHP_HAS_KILL_SIGNAL
118+
/* No-op: delivery itself is what unblocks the syscall via EINTR. */
119+
static void frankenphp_kill_signal_handler(int sig) { (void)sig; }
120+
121+
static pthread_once_t kill_signal_handler_installed = PTHREAD_ONCE_INIT;
122+
static void install_kill_signal_handler(void) {
123+
/* No SA_RESTART so syscalls return EINTR rather than being restarted.
124+
* SA_ONSTACK guards against an accidental process-level delivery to a
125+
* Go-managed thread, where Go requires the alternate signal stack. */
126+
struct sigaction sa;
127+
memset(&sa, 0, sizeof(sa));
128+
sa.sa_handler = frankenphp_kill_signal_handler;
129+
sigemptyset(&sa.sa_mask);
130+
sa.sa_flags = SA_ONSTACK;
131+
sigaction(FRANKENPHP_KILL_SIGNAL, &sa, NULL);
132+
}
133+
#endif
134+
135+
void frankenphp_force_kill_thread(force_kill_slot slot) {
136+
if (slot.vm_interrupt == NULL) {
137+
/* Boot aborted before the slot was published. */
138+
return;
139+
}
140+
/* Atomic stores first: by the time the thread wakes (signal-driven or
141+
* natural) the VM sees them and bails through zend_timeout(). */
142+
zend_atomic_bool_store(slot.timed_out, true);
143+
zend_atomic_bool_store(slot.vm_interrupt, true);
144+
145+
#ifdef FRANKENPHP_HAS_KILL_SIGNAL
146+
/* ESRCH (thread already exited) / EINVAL are both benign here. */
147+
pthread_kill(slot.tid, FRANKENPHP_KILL_SIGNAL);
148+
#elif defined(PHP_WIN32)
149+
if (slot.thread_handle != NULL) {
150+
CancelSynchronousIo(slot.thread_handle);
151+
QueueUserAPC((PAPCFUNC)frankenphp_noop_apc, slot.thread_handle, 0);
152+
}
153+
#endif
154+
}
155+
156+
/* CloseHandle on Windows; no-op on POSIX. */
157+
void frankenphp_release_thread_for_kill(force_kill_slot slot) {
158+
#ifdef PHP_WIN32
159+
if (slot.thread_handle != NULL) {
160+
CloseHandle(slot.thread_handle);
161+
}
162+
#else
163+
(void)slot;
164+
#endif
165+
}
166+
95167
void frankenphp_update_local_thread_context(bool is_worker) {
96168
is_worker_thread = is_worker;
97169

@@ -1065,6 +1137,16 @@ static void *php_thread(void *arg) {
10651137
snprintf(thread_name, 16, "php-%" PRIxPTR, thread_index);
10661138
set_thread_name(thread_name);
10671139

1140+
#ifdef FRANKENPHP_HAS_KILL_SIGNAL
1141+
/* The spawning Go-managed M may block realtime signals, which the
1142+
* new pthread inherits. Unblock FRANKENPHP_KILL_SIGNAL here so
1143+
* force-kill deliveries are not silently dropped. */
1144+
sigset_t unblock;
1145+
sigemptyset(&unblock);
1146+
sigaddset(&unblock, FRANKENPHP_KILL_SIGNAL);
1147+
pthread_sigmask(SIG_UNBLOCK, &unblock, NULL);
1148+
#endif
1149+
10681150
/* Initial allocation of all global PHP memory for this thread */
10691151
#ifdef ZTS
10701152
(void)ts_resource(0);
@@ -1073,6 +1155,29 @@ static void *php_thread(void *arg) {
10731155
#endif
10741156
#endif
10751157

1158+
/* Publish this thread's force-kill slot to Go so the graceful-drain
1159+
* grace period can wake it from a busy PHP loop or blocking syscall.
1160+
* Must run on the PHP thread itself: EG() resolves to its own TSRM
1161+
* context and pthread_self() captures the right tid. */
1162+
{
1163+
force_kill_slot slot;
1164+
memset(&slot, 0, sizeof(slot));
1165+
slot.vm_interrupt = &EG(vm_interrupt);
1166+
slot.timed_out = &EG(timed_out);
1167+
#ifdef FRANKENPHP_HAS_KILL_SIGNAL
1168+
slot.tid = pthread_self();
1169+
pthread_once(&kill_signal_handler_installed, install_kill_signal_handler);
1170+
#elif defined(PHP_WIN32)
1171+
if (!DuplicateHandle(GetCurrentProcess(), GetCurrentThread(),
1172+
GetCurrentProcess(), &slot.thread_handle, 0, FALSE,
1173+
DUPLICATE_SAME_ACCESS)) {
1174+
/* On failure, force_kill falls back to atomic-bool only. */
1175+
slot.thread_handle = NULL;
1176+
}
1177+
#endif
1178+
go_frankenphp_store_force_kill_slot(thread_index, slot);
1179+
}
1180+
10761181
bool thread_is_healthy = true;
10771182
bool has_attempted_shutdown = false;
10781183

@@ -1150,6 +1255,11 @@ static void *php_thread(void *arg) {
11501255
}
11511256
zend_end_try();
11521257

1258+
/* Must precede ts_free_thread: that frees the TSRM storage backing
1259+
* the slot's &EG() pointers. Clearing first means any concurrent
1260+
* force-kill either ran before us or sees a zero slot. */
1261+
go_frankenphp_clear_force_kill_slot(thread_index);
1262+
11531263
/* free all global PHP memory reserved for this thread */
11541264
#ifdef ZTS
11551265
ts_free_thread();
@@ -1158,12 +1268,9 @@ static void *php_thread(void *arg) {
11581268
/* Thread is healthy, signal to Go that the thread has shut down */
11591269
if (thread_is_healthy) {
11601270
go_frankenphp_on_thread_shutdown(thread_index);
1161-
11621271
return NULL;
11631272
}
11641273

1165-
/* Thread is unhealthy, PHP globals might be in a bad state after a bailout,
1166-
* restart the entire thread */
11671274
frankenphp_log_message("Restarting unhealthy thread", LOG_WARNING);
11681275

11691276
if (!frankenphp_new_php_thread(thread_index)) {
@@ -1265,7 +1372,9 @@ static void *php_main(void *arg) {
12651372

12661373
go_frankenphp_main_thread_is_ready();
12671374

1268-
/* channel closed, shutdown gracefully */
1375+
/* channel closed, shutdown gracefully. drainPHPThreads has already
1376+
* waited for every PHP thread to exit (state.Done), so SAPI/TSRM
1377+
* teardown here is safe. */
12691378
frankenphp_sapi_module.shutdown(&frankenphp_sapi_module);
12701379

12711380
sapi_shutdown();

frankenphp.h

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,28 @@ static inline HRESULT LongLongSub(LONGLONG llMinuend, LONGLONG llSubtrahend,
4646
#include <stdbool.h>
4747
#include <stdint.h>
4848

49+
#ifndef PHP_WIN32
50+
#include <pthread.h>
51+
#include <signal.h>
52+
#endif
53+
54+
/* Platform capabilities for the force-kill primitive; declared in the
55+
* header so Go (via CGo) gets the correct struct layout too. */
56+
#if !defined(PHP_WIN32) && defined(SIGRTMIN)
57+
#define FRANKENPHP_HAS_KILL_SIGNAL 1
58+
#define FRANKENPHP_KILL_SIGNAL (SIGRTMIN + 3)
59+
#endif
60+
61+
typedef struct {
62+
zend_atomic_bool *vm_interrupt;
63+
zend_atomic_bool *timed_out;
64+
#ifdef FRANKENPHP_HAS_KILL_SIGNAL
65+
pthread_t tid;
66+
#elif defined(PHP_WIN32)
67+
HANDLE thread_handle;
68+
#endif
69+
} force_kill_slot;
70+
4971
#ifndef FRANKENPHP_VERSION
5072
#define FRANKENPHP_VERSION dev
5173
#endif
@@ -193,6 +215,17 @@ void frankenphp_init_thread_metrics(int max_threads);
193215
void frankenphp_destroy_thread_metrics(void);
194216
size_t frankenphp_get_thread_memory_usage(uintptr_t thread_index);
195217

218+
/* Best-effort force-kill primitives. The slot is populated by each PHP
219+
* thread at boot (an internal helper calls back into Go via
220+
* go_frankenphp_store_force_kill_slot) and lives in the Go-side phpThread.
221+
* force_kill_thread interrupts the Zend VM at the next opcode boundary;
222+
* on POSIX it also delivers SIGRTMIN+3 to the target thread, on Windows
223+
* it calls CancelSynchronousIo + QueueUserAPC. release_thread drops any
224+
* OS-owned resource tied to the slot (currently the Windows thread
225+
* handle). */
226+
void frankenphp_force_kill_thread(force_kill_slot slot);
227+
void frankenphp_release_thread_for_kill(force_kill_slot slot);
228+
196229
void register_extensions(zend_module_entry **m, int len);
197230

198231
#endif

phpmainthread.go

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,8 @@ func initPHPThreads(numThreads int, numMaxThreads int, phpIni map[string]string)
5454
return nil, err
5555
}
5656

57+
// Must follow start(): maxThreads is only final once
58+
// setAutomaticMaxThreads runs on the main PHP thread (before Ready).
5759
C.frankenphp_init_thread_metrics(C.int(mainThread.maxThreads))
5860

5961
// initialize all other threads
@@ -79,6 +81,11 @@ func drainPHPThreads() {
7981
if mainThread == nil {
8082
return // mainThread was never initialized
8183
}
84+
// Idempotent: post-drain state is Reserved; a re-entry (e.g. a
85+
// failed-Init cleanup) must not double-close mainThread.done.
86+
if mainThread.state.Is(state.Reserved) {
87+
return
88+
}
8289
doneWG := sync.WaitGroup{}
8390
doneWG.Add(len(phpThreads))
8491
mainThread.state.Set(state.ShuttingDown)

phpthread.go

Lines changed: 51 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ import (
88
"runtime"
99
"sync"
1010
"sync/atomic"
11+
"time"
1112
"unsafe"
1213

1314
"github.com/dunglas/frankenphp/internal/state"
@@ -25,6 +26,12 @@ type phpThread struct {
2526
contextMu sync.RWMutex
2627
state *state.ThreadState
2728
requestCount atomic.Int64
29+
// forceKill holds &EG() pointers captured on the PHP thread itself.
30+
// forceKillMu pairs with go_frankenphp_clear_force_kill_slot's write
31+
// lock so a concurrent kill never dereferences pointers freed by
32+
// ts_free_thread.
33+
forceKillMu sync.RWMutex
34+
forceKill C.force_kill_slot
2835
}
2936

3037
// threadHandler defines how the callbacks from the C thread should be handled
@@ -93,7 +100,27 @@ func (thread *phpThread) shutdown() {
93100
}
94101

95102
close(thread.drainChan)
96-
thread.state.WaitFor(state.Done)
103+
104+
// Arm force-kill after the grace period to wake any thread stuck in
105+
// a blocking syscall (sleep, blocking I/O). The wait remains
106+
// unbounded - on platforms where force-kill cannot interrupt the
107+
// syscall (macOS, Windows non-alertable Sleep) the thread will exit
108+
// when the syscall completes naturally; the operator's orchestrator
109+
// is responsible for any harder timeout.
110+
done := make(chan struct{})
111+
go func() {
112+
thread.state.WaitFor(state.Done)
113+
close(done)
114+
}()
115+
select {
116+
case <-done:
117+
case <-time.After(drainGracePeriod):
118+
thread.forceKillMu.RLock()
119+
C.frankenphp_force_kill_thread(thread.forceKill)
120+
thread.forceKillMu.RUnlock()
121+
<-done
122+
}
123+
97124
thread.drainChan = make(chan struct{})
98125

99126
// threads go back to the reserved state from which they can be booted again
@@ -203,6 +230,29 @@ func go_frankenphp_after_script_execution(threadIndex C.uintptr_t, exitStatus C.
203230
thread.Unpin()
204231
}
205232

233+
//export go_frankenphp_store_force_kill_slot
234+
func go_frankenphp_store_force_kill_slot(threadIndex C.uintptr_t, slot C.force_kill_slot) {
235+
thread := phpThreads[threadIndex]
236+
thread.forceKillMu.Lock()
237+
// Release any prior slot's OS resource (Windows HANDLE) before
238+
// overwriting; a phpThread can reboot and re-register.
239+
C.frankenphp_release_thread_for_kill(thread.forceKill)
240+
thread.forceKill = slot
241+
thread.forceKillMu.Unlock()
242+
}
243+
244+
//export go_frankenphp_clear_force_kill_slot
245+
func go_frankenphp_clear_force_kill_slot(threadIndex C.uintptr_t) {
246+
// Called from C before ts_free_thread on both exit paths. Zeroing
247+
// the slot under the write lock guarantees any concurrent kill
248+
// either completed before we got the lock or sees a zero slot.
249+
thread := phpThreads[threadIndex]
250+
thread.forceKillMu.Lock()
251+
C.frankenphp_release_thread_for_kill(thread.forceKill)
252+
thread.forceKill = C.force_kill_slot{}
253+
thread.forceKillMu.Unlock()
254+
}
255+
206256
//export go_frankenphp_on_thread_shutdown
207257
func go_frankenphp_on_thread_shutdown(threadIndex C.uintptr_t) {
208258
thread := phpThreads[threadIndex]

testdata/worker-sleep.php

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
<?php
2+
3+
// Worker that sleeps inside the handler to simulate a stuck request blocking
4+
// drain. Used to test the force-kill grace period.
5+
//
6+
// Before sleeping we touch a marker file whose path is passed via the
7+
// SLEEP_MARKER header. The Go test polls for the file so it only arms
8+
// RestartWorkers once the worker is proven to be inside sleep(), removing
9+
// the fixed-time race of a bare time.Sleep on the caller side.
10+
$fn = static function () {
11+
$marker = $_SERVER['HTTP_SLEEP_MARKER'] ?? '';
12+
if ($marker !== '') {
13+
touch($marker);
14+
}
15+
sleep(60);
16+
echo 'should not reach';
17+
};
18+
19+
do {
20+
$ret = \frankenphp_handle_request($fn);
21+
} while ($ret);

0 commit comments

Comments
 (0)