Skip to content

Commit 6e14d11

Browse files
feat: cross-platform force-kill primitive for stuck PHP threads
Introduces a small, self-contained primitive that unblocks a PHP thread stuck in a blocking call (sleep, synchronous I/O, etc.) so the graceful drain used by RestartWorkers and DrainWorkers can make progress instead of waiting for the block to return on its own. The primitive is useful on its own and gives follow-up graceful-shutdown work a reviewed foundation to build on. - frankenphp.c: add frankenphp_init_force_kill / frankenphp_save_php_timer / frankenphp_force_kill_thread / frankenphp_destroy_force_kill. The per-thread PHP timer handle (Linux/FreeBSD ZTS) or OS thread handle (Windows) is captured at thread boot and stored in a pre-sized array so the kill path can fire from any goroutine without touching per-thread PHP state. Linux/FreeBSD arm PHP's max_execution_time timer (delivers SIGALRM -> "Maximum execution time exceeded"); Windows uses CancelSynchronousIo + QueueUserAPC to interrupt I/O and alertable waits; macOS and other platforms are a safe no-op (the thread is abandoned and exits when the blocking call returns naturally). - phpmainthread.go: wire frankenphp_init_force_kill into initPHPThreads (sized to maxThreads, matching the thread_metrics allocation) and frankenphp_destroy_force_kill into drainPHPThreads. - worker.go: add a 5-second graceful-drain grace period to drainWorkerThreads. Once elapsed, arm the force-kill primitive on any thread still outside Yielding and keep waiting on ready.Wait(); the kill lets the thread return from its blocking call so the drain completes in bounded time instead of hanging. - worker_test.go + testdata/worker-sleep.php: TestRestartWorkersForceKillsStuckThread drives the path end-to-end. A worker blocks inside sleep(60) below frankenphp_handle_request (so drainChan close can't reach it); the test asserts RestartWorkers returns within 8s (grace + slack). The test skips on platforms without the underlying primitive.
1 parent a05e6dd commit 6e14d11

7 files changed

Lines changed: 260 additions & 1 deletion

File tree

frankenphp.c

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,115 @@ static bool is_forked_child = false;
9292
static void frankenphp_fork_child(void) { is_forked_child = true; }
9393
#endif
9494

95+
/* Best-effort force-kill for PHP threads after the graceful-drain grace
96+
* period. Each thread captures pointers to its own executor_globals'
97+
* vm_interrupt and timed_out atomic bools at boot and hands them back to
98+
* Go via go_frankenphp_store_force_kill_slot. From any goroutine, the
99+
* Go side passes that slot back to frankenphp_force_kill_thread, which
100+
* stores true into both bools, waking the VM at the next opcode boundary
101+
* and unwinding the thread through zend_timeout().
102+
*
103+
* On platforms with POSIX realtime signals (Linux, FreeBSD), force-kill
104+
* also delivers SIGRTMIN+3 to the target thread so any in-flight blocking
105+
* syscall (select, sleep, nanosleep, blocking I/O without SA_RESTART)
106+
* returns EINTR and the VM gets a chance to observe the atomic bools on
107+
* the next opcode. On Windows, CancelSynchronousIo + QueueUserAPC does
108+
* the equivalent for alertable I/O and SleepEx. Non-alertable Sleep()
109+
* (including PHP's usleep on Windows) stays uninterruptible - the VM
110+
* must wait for it to return naturally before bailing.
111+
*
112+
* macOS has no realtime signals exposed to user-space, so the atomic
113+
* bool path is the only mechanism there: threads busy-looping in PHP
114+
* are killed promptly, threads stuck in blocking syscalls wait to
115+
* return on their own.
116+
*
117+
* The slot lives in the Go-side phpThread struct - there is no C-side
118+
* array or init/destroy dance. Signal handler installation happens once
119+
* via pthread_once the first time a thread registers. */
120+
#ifdef PHP_WIN32
121+
static void CALLBACK frankenphp_noop_apc(ULONG_PTR param) { (void)param; }
122+
#endif
123+
124+
#ifdef FRANKENPHP_HAS_KILL_SIGNAL
125+
/* No-op handler: signal delivery is sufficient on its own because it
126+
* forces the in-flight syscall to return EINTR. The VM then observes
127+
* vm_interrupt on the next opcode and unwinds via zend_timeout(). */
128+
static void frankenphp_kill_signal_handler(int sig) { (void)sig; }
129+
130+
static pthread_once_t kill_signal_handler_installed = PTHREAD_ONCE_INIT;
131+
static void install_kill_signal_handler(void) {
132+
/* Install the no-op handler process-wide with SA_RESTART cleared so
133+
* blocking syscalls return EINTR when the signal is delivered rather
134+
* than being transparently restarted by libc. */
135+
struct sigaction sa;
136+
memset(&sa, 0, sizeof(sa));
137+
sa.sa_handler = frankenphp_kill_signal_handler;
138+
sigemptyset(&sa.sa_mask);
139+
sa.sa_flags = 0;
140+
sigaction(FRANKENPHP_KILL_SIGNAL, &sa, NULL);
141+
}
142+
#endif
143+
144+
/* Called by each PHP thread at boot, from its own TSRM context, so that
145+
* the EG-backed addresses resolve to the thread's private executor_globals
146+
* and the captured thread identity refers to itself. Hands the slot to
147+
* the Go side via go_frankenphp_store_force_kill_slot; the slot's
148+
* lifetime is the phpThread's. */
149+
void frankenphp_register_thread_for_kill(uintptr_t idx) {
150+
force_kill_slot slot;
151+
memset(&slot, 0, sizeof(slot));
152+
slot.vm_interrupt = &EG(vm_interrupt);
153+
slot.timed_out = &EG(timed_out);
154+
#ifdef FRANKENPHP_HAS_KILL_SIGNAL
155+
slot.tid = pthread_self();
156+
pthread_once(&kill_signal_handler_installed, install_kill_signal_handler);
157+
#elif defined(PHP_WIN32)
158+
if (!DuplicateHandle(GetCurrentProcess(), GetCurrentThread(),
159+
GetCurrentProcess(), &slot.thread_handle, 0, FALSE,
160+
DUPLICATE_SAME_ACCESS)) {
161+
/* DuplicateHandle can fail under resource pressure; leave the handle
162+
* NULL so force_kill_thread falls back to the atomic-bool path only. */
163+
slot.thread_handle = NULL;
164+
}
165+
#endif
166+
go_frankenphp_store_force_kill_slot(idx, slot);
167+
}
168+
169+
void frankenphp_force_kill_thread(force_kill_slot slot) {
170+
if (slot.vm_interrupt == NULL) {
171+
/* Thread never reached register_thread_for_kill (aborted during boot). */
172+
return;
173+
}
174+
/* Set the atomic bools first so that by the time the thread wakes up -
175+
* whether from our signal/APC or naturally - the VM sees them and
176+
* routes through zend_timeout() -> "Maximum execution time exceeded". */
177+
zend_atomic_bool_store(slot.timed_out, true);
178+
zend_atomic_bool_store(slot.vm_interrupt, true);
179+
180+
#ifdef FRANKENPHP_HAS_KILL_SIGNAL
181+
/* Return value intentionally ignored: ESRCH (thread already exited) and
182+
* EINVAL are both benign - there is simply nothing to unblock. */
183+
pthread_kill(slot.tid, FRANKENPHP_KILL_SIGNAL);
184+
#elif defined(PHP_WIN32)
185+
if (slot.thread_handle != NULL) {
186+
CancelSynchronousIo(slot.thread_handle);
187+
QueueUserAPC((PAPCFUNC)frankenphp_noop_apc, slot.thread_handle, 0);
188+
}
189+
#endif
190+
}
191+
192+
/* Releases any OS resource tied to the slot (currently: CloseHandle on
193+
* Windows). Called by the Go side when a phpThread is torn down. */
194+
void frankenphp_release_thread_for_kill(force_kill_slot slot) {
195+
#ifdef PHP_WIN32
196+
if (slot.thread_handle != NULL) {
197+
CloseHandle(slot.thread_handle);
198+
}
199+
#else
200+
(void)slot;
201+
#endif
202+
}
203+
95204
void frankenphp_update_local_thread_context(bool is_worker) {
96205
is_worker_thread = is_worker;
97206

@@ -1073,6 +1182,11 @@ static void *php_thread(void *arg) {
10731182
#endif
10741183
#endif
10751184

1185+
/* Register this thread's vm_interrupt/timed_out addresses so the Go side
1186+
* can force-kill it after the graceful-drain grace period if it gets stuck
1187+
* in a busy PHP loop. */
1188+
frankenphp_register_thread_for_kill(thread_index);
1189+
10761190
bool thread_is_healthy = true;
10771191
bool has_attempted_shutdown = false;
10781192

frankenphp.h

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,28 @@ static inline HRESULT LongLongSub(LONGLONG llMinuend, LONGLONG llSubtrahend,
4646
#include <stdbool.h>
4747
#include <stdint.h>
4848

49+
#ifndef PHP_WIN32
50+
#include <pthread.h>
51+
#include <signal.h>
52+
#endif
53+
54+
/* Platform capabilities for the force-kill primitive; declared in the
55+
* header so Go (via CGo) gets the correct struct layout too. */
56+
#if !defined(PHP_WIN32) && defined(SIGRTMIN)
57+
#define FRANKENPHP_HAS_KILL_SIGNAL 1
58+
#define FRANKENPHP_KILL_SIGNAL (SIGRTMIN + 3)
59+
#endif
60+
61+
typedef struct {
62+
zend_atomic_bool *vm_interrupt;
63+
zend_atomic_bool *timed_out;
64+
#ifdef FRANKENPHP_HAS_KILL_SIGNAL
65+
pthread_t tid;
66+
#elif defined(PHP_WIN32)
67+
HANDLE thread_handle;
68+
#endif
69+
} force_kill_slot;
70+
4971
#ifndef FRANKENPHP_VERSION
5072
#define FRANKENPHP_VERSION dev
5173
#endif
@@ -193,6 +215,18 @@ void frankenphp_init_thread_metrics(int max_threads);
193215
void frankenphp_destroy_thread_metrics(void);
194216
size_t frankenphp_get_thread_memory_usage(uintptr_t thread_index);
195217

218+
/* Best-effort force-kill primitives. The slot is populated by each PHP
219+
* thread at boot (frankenphp_register_thread_for_kill calls back into Go
220+
* via go_frankenphp_store_force_kill_slot) and lives in the Go-side
221+
* phpThread. force_kill_thread interrupts the Zend VM at the next opcode
222+
* boundary; on POSIX it also delivers SIGRTMIN+3 to the target thread,
223+
* on Windows it calls CancelSynchronousIo + QueueUserAPC. release_thread
224+
* drops any OS-owned resource tied to the slot (currently the Windows
225+
* thread handle). */
226+
void frankenphp_register_thread_for_kill(uintptr_t thread_index);
227+
void frankenphp_force_kill_thread(force_kill_slot slot);
228+
void frankenphp_release_thread_for_kill(force_kill_slot slot);
229+
196230
void register_extensions(zend_module_entry **m, int len);
197231

198232
#endif

phpmainthread.go

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,9 @@ func drainPHPThreads() {
9797
}
9898

9999
doneWG.Wait()
100+
for _, thread := range phpThreads {
101+
C.frankenphp_release_thread_for_kill(thread.forceKill)
102+
}
100103
mainThread.state.Set(state.Done)
101104
mainThread.state.WaitFor(state.Reserved)
102105
C.frankenphp_destroy_thread_metrics()

phpthread.go

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,11 @@ type phpThread struct {
2525
contextMu sync.RWMutex
2626
state *state.ThreadState
2727
requestCount atomic.Int64
28+
// forceKill is populated by go_frankenphp_store_force_kill_slot from
29+
// the PHP thread's own TSRM context at boot. Read by other goroutines
30+
// via RestartWorkers/DrainWorkers; the write-before-Ready state
31+
// transition provides the happens-before edge.
32+
forceKill C.force_kill_slot
2833
}
2934

3035
// threadHandler defines how the callbacks from the C thread should be handled
@@ -203,6 +208,11 @@ func go_frankenphp_after_script_execution(threadIndex C.uintptr_t, exitStatus C.
203208
thread.Unpin()
204209
}
205210

211+
//export go_frankenphp_store_force_kill_slot
212+
func go_frankenphp_store_force_kill_slot(threadIndex C.uintptr_t, slot C.force_kill_slot) {
213+
phpThreads[threadIndex].forceKill = slot
214+
}
215+
206216
//export go_frankenphp_on_thread_shutdown
207217
func go_frankenphp_on_thread_shutdown(threadIndex C.uintptr_t) {
208218
thread := phpThreads[threadIndex]

testdata/worker-sleep.php

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
<?php
2+
3+
// Worker that sleeps inside the handler to simulate a stuck request blocking
4+
// drain. Used to test the force-kill grace period.
5+
$fn = static function () {
6+
sleep(60);
7+
echo 'should not reach';
8+
};
9+
10+
do {
11+
$ret = \frankenphp_handle_request($fn);
12+
} while ($ret);

worker.go

Lines changed: 33 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ package frankenphp
44
import "C"
55
import (
66
"fmt"
7+
"log/slog"
78
"os"
89
"path/filepath"
910
"runtime"
@@ -165,6 +166,13 @@ func newWorker(o workerOpt) (*worker, error) {
165166
return w, nil
166167
}
167168

169+
// drainGracePeriod is the time a worker thread has to stop gracefully after
170+
// receiving the drain signal before the force-kill primitive is armed on it.
171+
// Well-behaved scripts return promptly on drainChan close; stuck ones (e.g.
172+
// blocking C calls inside the VM) would otherwise hang drainWorkerThreads
173+
// forever.
174+
const drainGracePeriod = 5 * time.Second
175+
168176
// EXPERIMENTAL: DrainWorkers finishes all worker scripts before a graceful shutdown
169177
func DrainWorkers() {
170178
_ = drainWorkerThreads()
@@ -201,7 +209,31 @@ func drainWorkerThreads() []*phpThread {
201209
worker.threadMutex.RUnlock()
202210
}
203211

204-
ready.Wait()
212+
// Wait for graceful drain, then arm the force-kill primitive on any
213+
// thread still stuck. Linux/FreeBSD ZTS arms PHP's max_execution_time
214+
// timer; Windows interrupts blocking I/O and alertable waits; other
215+
// platforms leave the thread abandoned (it will exit when the blocking
216+
// call returns).
217+
done := make(chan struct{})
218+
go func() {
219+
ready.Wait()
220+
close(done)
221+
}()
222+
223+
select {
224+
case <-done:
225+
// everyone yielded in time
226+
case <-time.After(drainGracePeriod):
227+
for _, thread := range drainedThreads {
228+
if !thread.state.Is(state.Yielding) {
229+
C.frankenphp_force_kill_thread(thread.forceKill)
230+
}
231+
}
232+
if globalLogger.Enabled(globalCtx, slog.LevelWarn) {
233+
globalLogger.LogAttrs(globalCtx, slog.LevelWarn, "worker threads did not yield within grace period, force-killing stuck threads")
234+
}
235+
<-done
236+
}
205237

206238
return drainedThreads
207239
}

worker_test.go

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,17 @@ import (
99
"net/http"
1010
"net/http/httptest"
1111
"net/url"
12+
"os"
13+
"runtime"
1214
"strconv"
1315
"strings"
1416
"sync"
1517
"testing"
18+
"time"
1619

1720
"github.com/dunglas/frankenphp"
1821
"github.com/stretchr/testify/assert"
22+
"github.com/stretchr/testify/require"
1923
)
2024

2125
func TestWorker(t *testing.T) {
@@ -45,6 +49,56 @@ func TestWorker(t *testing.T) {
4549
}, &testOptions{workerScript: "worker.php", nbWorkers: 1, nbParallelRequests: 1})
4650
}
4751

52+
// TestRestartWorkersForceKillsStuckThread verifies that the drain path used
53+
// by RestartWorkers and DrainWorkers does not hang indefinitely when a
54+
// worker thread is stuck inside a blocking PHP call (sleep, synchronous
55+
// I/O, etc.). The force-kill primitive delivers a realtime signal to the
56+
// thread on Linux/FreeBSD (interrupts the syscall with EINTR) or calls
57+
// CancelSynchronousIo + QueueUserAPC on Windows. macOS has no realtime
58+
// signal exposed to user-space, so a thread stuck in sleep() cannot be
59+
// force-unblocked there; skip the test.
60+
func TestRestartWorkersForceKillsStuckThread(t *testing.T) {
61+
if runtime.GOOS != "linux" && runtime.GOOS != "freebsd" && runtime.GOOS != "windows" {
62+
t.Skipf("force-kill cannot interrupt blocking syscalls on %s", runtime.GOOS)
63+
}
64+
65+
cwd, _ := os.Getwd()
66+
testDataDir := cwd + "/testdata/"
67+
68+
require.NoError(t, frankenphp.Init(
69+
frankenphp.WithWorkers("sleep-worker", testDataDir+"worker-sleep.php", 1),
70+
frankenphp.WithNumThreads(2),
71+
))
72+
t.Cleanup(frankenphp.Shutdown)
73+
74+
// Fire a request the worker will handle and then block on (sleep 60s).
75+
// When the drain runs, the worker script is inside the handler callback,
76+
// below frankenphp_handle_request, so the drain signal on drainChan
77+
// can't be observed until the blocking sleep returns.
78+
go func() {
79+
req := httptest.NewRequest("GET", "http://example.com/worker-sleep.php", nil)
80+
fr, err := frankenphp.NewRequestWithContext(req, frankenphp.WithRequestDocumentRoot(testDataDir, false))
81+
if err != nil {
82+
return
83+
}
84+
_ = frankenphp.ServeHTTP(httptest.NewRecorder(), fr)
85+
}()
86+
87+
// Give the request time to reach the handler and enter sleep().
88+
time.Sleep(500 * time.Millisecond)
89+
90+
// RestartWorkers must complete within the grace period + a bit of slack.
91+
// Without force-kill, it would wait for the 60s sleep to return.
92+
start := time.Now()
93+
frankenphp.RestartWorkers()
94+
elapsed := time.Since(start)
95+
96+
// Grace period is 5s; allow margin for SIGALRM dispatch, PHP VM tick,
97+
// and the drain's final ready.Wait() plus the restart loop.
98+
const budget = 8 * time.Second
99+
assert.Less(t, elapsed, budget, "drain must force-kill the stuck thread within the grace period")
100+
}
101+
48102
func TestWorkerDie(t *testing.T) {
49103
runTest(t, func(handler func(http.ResponseWriter, *http.Request), _ *httptest.Server, i int) {
50104
req := httptest.NewRequest("GET", "http://example.com/die.php", nil)

0 commit comments

Comments
 (0)