You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CDP server ignores a single SIGTERM while a connection is live (distinct from #2507)
Summary
lightpanda serve ignores a singleSIGTERM whenever a CDP WebSocket
connection is still open. The process only terminates if the CDP socket is
closed (client EOF) before the signal, or if SIGTERM is sent three times.
A conventional one-shot graceful stop (kill -TERM <pid> then waitpid) hangs.
This is separate from #2507 / #2509 (the telemetry curl-multi poll-fd bug):
#!/usr/bin/env python3# usage: repro.py /path/to/lightpandaimportsocket, base64, os, sys, subprocess, time, signalBIN=sys.argv[1]
print("binary:", subprocess.run([BIN, "version"], capture_output=True, text=True).stdout.strip())
defws_connect(port):
s=socket.create_connection(("127.0.0.1", port), timeout=5)
key=base64.b64encode(os.urandom(16)).decode()
s.sendall(("GET / HTTP/1.1\r\nHost: x\r\nUpgrade: websocket\r\nConnection: Upgrade\r\n""Sec-WebSocket-Key: %s\r\nSec-WebSocket-Version: 13\r\n\r\n"%key).encode())
buf=b""whileb"\r\n\r\n"notinbuf:
buf+=s.recv(1)
assertbuf.startswith(b"HTTP/1.1 101"), buf[:40]
returnsdefrun(label, close_before_sigterm, port):
env=dict(os.environ); env["LIGHTPANDA_DISABLE_TELEMETRY"] ="true"p=subprocess.Popen([BIN, "serve", "--host", "127.0.0.1", "--port", str(port), "--log_level", "error"],
stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL, env=env, start_new_session=True)
time.sleep(2)
s=ws_connect(port) # one CDP WebSocket; no commands sentifclose_before_sigterm:
s.close(); time.sleep(0.3)
p.send_signal(signal.SIGTERM) # exactly ONE SIGTERMdeadline=time.time() +12whiletime.time() <deadline:
ifp.poll() isnotNone:
print("[%s] exited on SIGTERM (ok)"%label); breaktime.sleep(0.25)
else:
print("[%s] *** SIGTERM IGNORED — still alive 12s after one SIGTERM ***"%label)
os.killpg(os.getpgid(p.pid), signal.SIGKILL)
try: s.close()
exceptOSError: passtry: p.wait(timeout=3)
exceptException:
try: os.killpg(os.getpgid(p.pid), signal.SIGKILL)
exceptException: passrun("WS closed before SIGTERM", True, 39920)
run("WS still open at SIGTERM", False, 39921)
Output (identical on 6353 and on 6354 = main + #2509)
binary: 1.0.0-dev.6354+e1e49c8a
[WS closed before SIGTERM] exited on SIGTERM (ok)
[WS still open at SIGTERM] *** SIGTERM IGNORED — still alive 12s after one SIGTERM ***
Broken flow
sequenceDiagram
participant C as CDP client
participant S as Sighandler thread (sigwait)
participant M as Server/Network loop
participant W as Connection worker (handleConnection)
C->>M: WS upgrade → spawnWorker (active_threads=1)
Note over C: connection left OPEN
C->>S: SIGTERM (1st)
S->>M: attempt=1 → run graceful listeners → Server.deinit()
M->>W: Server.shutdown(): conn.shutdown()
Note over W: not unwound while the CDP socket is open
loop forever
M->>M: while active_threads>0 { sleep 10ms }
end
Note over S: a 2nd SIGTERM re-runs listeners, only a 3rd does process.exit(1)
Note over C,M: client sent ONE SIGTERM ⇒ HANG
Loading
Expected flow
sequenceDiagram
participant C as CDP client
participant S as Sighandler thread (sigwait)
participant M as Server/Network loop
participant W as Connection worker
C->>M: WS upgrade → spawnWorker (active_threads=1)
C->>S: SIGTERM (1st)
S->>M: run graceful listeners → Server.deinit()
M->>W: Server.shutdown(): conn.shutdown() (interrupts the worker)
W-->>M: worker returns (active_threads→0)
M-->>C: process exits cleanly
Loading
Root cause (pointers)
src/Sighandler.zigsighandle: the 1st/2nd SIGTERM only run the graceful
listeners; only the 3rd (attempt > 1) calls std.process.exit(1).
src/Server.zigdeinit: after shutdown() it spins while (active_threads.load() > 0) sleep(10ms), waiting for each handleConnection worker to return.
Server.shutdown() calls conn.shutdown() per live CDP connection, but that
does not unwind a worker whose connection socket is still open, so active_threads never reaches 0 and the graceful path blocks indefinitely.
The single SIGTERM is therefore consumed by the graceful path that can't
complete, instead of terminating the process.
Expected vs actual
Expected: one SIGTERM terminates lightpanda serve whether or not a CDP
connection is open (as already happens when the socket is closed first).
Actual: with a live CDP connection, the first (and second) SIGTERM are
absorbed; only SIGKILL or a third SIGTERM stops it.
Suggested direction
On the first termination signal, force-close live CDP connections (and/or make conn.shutdown() interrupt the worker's blocking wait) so Server.deinit's
drain loop can complete — i.e. the first SIGTERM should always lead to exit.
Why it matters
Any client that stops the browser with a single graceful SIGTERM (the
conventional path — Puppeteer/Playwright/chromedp/custom drivers) hangs after
driving a session. In capybara-lightpanda this surfaced as a multi-minute /
indefinite hang at suite teardown when process cleanup fell to a GC finalizer
that SIGTERMs without first closing the CDP socket.
CDP server ignores a single SIGTERM while a connection is live (distinct from #2507)
Summary
lightpanda serveignores a singleSIGTERMwhenever a CDP WebSocketconnection is still open. The process only terminates if the CDP socket is
closed (client EOF) before the signal, or if
SIGTERMis sent three times.A conventional one-shot graceful stop (
kill -TERM <pid>thenwaitpid) hangs.This is separate from #2507 / #2509 (the telemetry curl-multi poll-fd bug):
LIGHTPANDA_DISABLE_TELEMETRY=true),i.e. with no curl
multihandle, so the Fix CDP server stall/SIGTERM hang in optimized builds (Network drops CDP sockets from poll set) #2509preparePollFdsfix does nottouch it.
Environment
1.0.0-dev.6353+f1b0adf9and1.0.0-dev.6354+e1e49c8a(the latter = main + Fix CDP server stall/SIGTERM hang in optimized builds (Network drops CDP sockets from poll set) #2509), both
-Doptimize=ReleaseFast.6323) also exhibits it.Reproduction (Lightpanda + raw CDP only)
Output (identical on 6353 and on 6354 = main + #2509)
Broken flow
sequenceDiagram participant C as CDP client participant S as Sighandler thread (sigwait) participant M as Server/Network loop participant W as Connection worker (handleConnection) C->>M: WS upgrade → spawnWorker (active_threads=1) Note over C: connection left OPEN C->>S: SIGTERM (1st) S->>M: attempt=1 → run graceful listeners → Server.deinit() M->>W: Server.shutdown(): conn.shutdown() Note over W: not unwound while the CDP socket is open loop forever M->>M: while active_threads>0 { sleep 10ms } end Note over S: a 2nd SIGTERM re-runs listeners, only a 3rd does process.exit(1) Note over C,M: client sent ONE SIGTERM ⇒ HANGExpected flow
sequenceDiagram participant C as CDP client participant S as Sighandler thread (sigwait) participant M as Server/Network loop participant W as Connection worker C->>M: WS upgrade → spawnWorker (active_threads=1) C->>S: SIGTERM (1st) S->>M: run graceful listeners → Server.deinit() M->>W: Server.shutdown(): conn.shutdown() (interrupts the worker) W-->>M: worker returns (active_threads→0) M-->>C: process exits cleanlyRoot cause (pointers)
src/Sighandler.zigsighandle: the 1st/2ndSIGTERMonly run the gracefullisteners; only the 3rd (
attempt > 1) callsstd.process.exit(1).src/Server.zigdeinit: aftershutdown()it spinswhile (active_threads.load() > 0) sleep(10ms), waiting for eachhandleConnectionworker to return.Server.shutdown()callsconn.shutdown()per live CDP connection, but thatdoes not unwind a worker whose connection socket is still open, so
active_threadsnever reaches 0 and the graceful path blocks indefinitely.The single SIGTERM is therefore consumed by the graceful path that can't
complete, instead of terminating the process.
Expected vs actual
SIGTERMterminateslightpanda servewhether or not a CDPconnection is open (as already happens when the socket is closed first).
SIGTERMareabsorbed; only
SIGKILLor a thirdSIGTERMstops it.Suggested direction
On the first termination signal, force-close live CDP connections (and/or make
conn.shutdown()interrupt the worker's blocking wait) soServer.deinit'sdrain loop can complete — i.e. the first
SIGTERMshould always lead to exit.Why it matters
Any client that stops the browser with a single graceful
SIGTERM(theconventional path — Puppeteer/Playwright/chromedp/custom drivers) hangs after
driving a session. In
capybara-lightpandathis surfaced as a multi-minute /indefinite hang at suite teardown when process cleanup fell to a GC finalizer
that SIGTERMs without first closing the CDP socket.