Fix Fetch.enable + http-proxy CDP send/recv deadlock (#2462) by staylor · Pull Request #2468 · lightpanda-io/browser

staylor · 2026-05-15T05:46:57Z

Closes #2462.

Summary

Five coordinated fixes for the Fetch.enable + --http-proxy deadlock. The original symptom (CONNECT/TLS proxy round-trips inflate concurrency, lightpanda's WS send fills the kernel buffer, lightpanda blocks on write while puppeteer's matching Fetch.continueRequest replies pile up unread) is the headline scenario, but along the way an empirical reproducer surfaced four more places that produce the same "lightpanda stops draining the CDP socket and/or crashes mid-cleanup" symptom. All five share the root pattern: a synchronous wait loop or shutdown path that re-enters JS without giving CDP a chance to make progress.

A/B verification

Repro: puppeteer-core + setRequestInterception(true) + req.continue() against a localhost forward proxy, lightpanda started with --http-proxy http://127.0.0.1:8080.

URL	before	after
`https://example.com`	OK 124 ms	OK 117 ms
`https://github.com`	HANG 20 s (12/82 continues processed)	OK 1410 ms (82/82)
`https://www.shopify.com`	HANG 20 s (1/4)	OK 1973 ms (66/66)
`https://www.allbirds.com`	HANG 20 s (12/53)	OK 3944 ms (372/372)
`https://www.allbirds.com/products/womens-wool-runners-dapple-grey`	HANG 20 s (12/53)	OK 6286 ms (459/459)

Lightpanda survives 18 back-to-back navigations across the matrix above (3 per URL) without crashing on the patched binary. Pre-patch, the cleanup path triggered SIGSEGV (ZigToCurlAllocator.free: incorrect alignment) inside BrowserContext.deinit whenever the page actually completed before puppeteer disconnected.

(https://www.nike.com still doesn't fire the load event in either version. After this fix all 68 Fetch.continueRequest messages process cleanly; the remaining stall is third-party widgets keeping the page in loading state, not a CDP/HTTP deadlock. Out of scope.)

Reproducer

The puppeteer-core script that drove all measurements:

import puppeteer from 'puppeteer-core';

const browser = await puppeteer.connect({
  browserWSEndpoint: 'ws://127.0.0.1:9223',
  defaultViewport: null,
});
const page = await browser.newPage();
await page.setRequestInterception(true);
let seen = 0, continued = 0;
page.on('request', (req) => {
  seen++;
  req.continue().then(() => { continued++; }).catch(() => {});
});
const t0 = Date.now();
try {
  await page.goto('https://www.allbirds.com', { waitUntil: 'load', timeout: 20000 });
  console.log('OK ' + (Date.now() - t0) + 'ms');
} catch (e) {
  console.log('HANG: ' + e.message);
}
await new Promise(r => setTimeout(r, 1000));
console.log('seen=' + seen + ' continued=' + continued);
await browser.disconnect();

Lightpanda startup:

lightpanda serve --host 127.0.0.1 --port 9223 \
  --http-proxy http://127.0.0.1:8080

Any HTTP CONNECT-capable forward proxy on 127.0.0.1:8080 reproduces (vanilla Node http.createServer with a connect handler is enough).

Changes

`src/network/WsConnection.zig` -- drain reads while waiting for write space

WsConnection.send previously switched the socket to blocking mode on EWOULDBLOCK. The header comment said this "should virtually never happen" -- under Fetch.enable + --http-proxy it is routine, and the blocking write deadlocks because the peer is also blocked trying to send to us.

New behavior: on EWOULDBLOCK, poll for POLLOUT and POLLIN, drain any incoming bytes into the reader buffer (without dispatching -- that would re-enter send and recurse). Buffered bytes are picked up by the next CDP.readSocket. Adds tryRead (variant of read that distinguishes "no new bytes" from "peer closed") and bufferedBytes accessor.

`src/browser/HttpClient.zig` -- never starve CDP reads

perform() returned .normal early whenever processMessages() had work, so a steady stream of HTTP completions never gave the CDP socket a chance to be polled. Now: always do at least a non-blocking curl_multi_poll of the CDP socket, and return .cdp_socket immediately when the WS reader has bytes that were rescued during a backpressured send (via the new has_buffered_input callback on CDPClient).

Also makes Transfer.deinit idempotent. The error-callback path can cascade into re-entrant deinits (e.g. Script.errorCallback -> manager.evaluate() -> JS execution -> Frame.deinit -> abortOwner -> Transfer.kill -> Transfer.deinit); the second deinit then double-frees the Transfer's arena (ArenaPool: double-free detected panic). The fix uses client.transfers.remove(self.id) as a one-shot ownership claim -- if it returns false, this transfer was already torn down by the cascade and we early-return.

Transfer.kill is promoted from fn to pub fn so BrowserContext.deinit can use it (see CDP.zig change below). RequestParams.deinit / Request.deinit now take * instead of *const so they can call into the now-mutating Headers.deinit.

`src/network/http.zig` -- make Headers.deinit idempotent

Headers.deinit did not null self.headers after curl_slist_free_all. Combined with the value-copy hazard RobotsLayer already documents (Headers wraps a raw *curl_slist, so a value copy shares the pointer), a second deinit on a copied Headers double-freed the curl slist. The symptom was an "incorrect alignment" panic inside ZigToCurlAllocator.free because the slist nodes were already returned to the pool. Nulling self.headers after the free makes the second call a no-op regardless of how the value-copy hazard is triggered.

`src/browser/ScriptManagerBase.zig` -- drain CDP in `waitForImport`

waitForImport spun on client.tick(200) and discarded the .cdp_socket return. A script import() whose request was paused at the InterceptionLayer hung forever: the matching Fetch.continueRequest reply never reached CDP.processMessage, the request never resumed, and this loop spun until the client's navigation timeout fired. Mirrors the syncRequest pattern that already does this correctly.

The entry pointer is now re-fetched on every loop iteration. The CDP-drain step above re-enters JS, which can call back into preloadImport for a transitively-imported module; preloadImport calls imported_modules.getOrPut, which may rehash the map and invalidate every existing entry pointer. A cached entry was a use-after-free on the next entry.value_ptr.state access.

`src/cdp/CDP.zig` -- wire backpressured-buffer drain + safer teardown

readSocket now uses tryRead (the new WsConnection API) and tolerates the no_new_data case so messages drained during a backpressured send still get processed.

BrowserContext.deinit aborts pending intercepts via transfer.kill instead of transfer.abort. kill fires shutdown_callback (or no-op for transfers without one); abort fires error_callback, which re-enters JS via XHR/script error handlers. The handlers crash because the inspector and V8 context this BrowserContext owns have already been torn down (inspector.stopSession() runs two lines above the abort loop). The original code already aborted "before closing the session/page since some of these might callback into the page/scriptmanager" -- kill's shutdown path is the right primitive for that.

Stack of failure modes I observed during debugging

For anyone hitting this region in the future, the symptoms shifted as I peeled layers:

Fetch.requestPaused events fire in lightpanda but Fetch.continueRequest replies never arrive. (Backpressure deadlock -- fixed by the WsConnection + perform changes.)
After unblocking the deadlock, the page navigates successfully, then puppeteer disconnects, then BrowserContext.deinit SIGSEGVs in curl_slist_free_all. (Headers double-free -- fixed by Headers.deinit idempotency.)
After fixing that, the same teardown path SIGSEGVs differently: ArenaPool: double-free detected in Transfer.deinit. (Cascade out of error_callback re-deinits the same transfer -- fixed by Transfer.deinit ownership claim.)
After fixing that, the teardown path SIGSEGVs again inside _v8_inspector__Session__wrapObject. (XHR/script error_callback re-enters JS through a torn-down inspector -- fixed by kill instead of abort in BrowserContext.deinit.)
After fixing that, the matrix runs clean.

krichprollsch · 2026-05-18T09:33:46Z

I rebased + added a little change in #2489

karlseguin · 2026-05-18T14:31:18Z

I've been talking about a possible solution for some of these issues.

Currently, every CDP connection maps to 1 Browser and runs in a thread. I think having the CDP socket polled and read from the main thread (in Network.zig) and communicate messages to the worker via an Erlang OTP-style mailbox would be more robust.

In normal operations, the mailbox would be drained in Runner.tick, i.e. a safe point, never inside of a JS callback. For Request Interception, which needs to block current execution, it would scan the mailbox and only process "safe" messages (a simple allow list of commands) + a timeout.

This adds some complexity at the main <-> worker boundary. It removes the natural back-pressure on incoming CDP messages (which I don't think is really a problem, but there are pretty simple solutions to that if it is). The main upside is that CDP commands that mutate the environment (e.g. Target.closeTarget) are only ever executed in Runner.tick. I'm not sure if this 100% solves the blocking write issue, but it should unblock the peer send which, from the way you desribe it was at least part of the problem. Unrelated to the PR, another benefit is that we can detect a disconnected client even if the worker is stuck in a heavy JS loop.

Would really appreciate any feedback you have about this.

staylor · 2026-05-18T14:56:40Z

let me resolve conflicts and throw Claude at it - this PR is the main blocker for Puppeteer and Playwright.

) WsConnection.send used to switch the socket to blocking mode on EWOULDBLOCK, with a comment claiming this 'should virtually never happen'. Under CDP Fetch.enable + --http-proxy it is routine: the CONNECT+TLS proxy round-trip means many subresources are in flight simultaneously, lightpanda emits a flood of Fetch.requestPaused events, the kernel send buffer fills, lightpanda blocks on write, and puppeteer's matching Fetch.continueRequest replies pile up in lightpanda's TCP recv buffer (which lightpanda can't drain because it is blocked on the write). Both peers wedge until the client times out. Other contributing problems all collapse paused-intercept sites where an outer loop polls without ever draining the CDP socket, OR where disconnect-time cleanup re-enters JS through paths the runtime can no longer satisfy: * HttpClient.perform skipped the CDP socket poll entirely whenever processMessages() returned non-empty, so a steady stream of HTTP completions could starve CDP reads. * ScriptManagerBase.waitForImport spun on `client.tick(200)` and discarded the .cdp_socket return, so a script `import()` whose request was paused at the InterceptionLayer hung forever. * BrowserContext.deinit aborted pending intercepts via `transfer.abort`, which fired XHR/script error_callback chains into a half-torn-down V8 context (the inspector had already been stopped two lines above). * Headers.deinit was non-idempotent, so a value-copied Headers (the hazard RobotsLayer documents) double-freed its curl_slist on the second deinit; the symptom was an "incorrect alignment" panic inside ZigToCurlAllocator.free. * Transfer.deinit was non-idempotent, so a cascade out of error_callback (e.g. Script.errorCallback -> manager.evaluate() -> JS execution -> Frame.deinit -> abortOwner -> Transfer.kill -> Transfer.deinit) reached `arena_pool.release` twice on the same arena. Coordinated changes: * src/network/WsConnection.zig: On EWOULDBLOCK, instead of switching to a blocking write, poll for both POLLOUT and POLLIN. While waiting for write space, drain any incoming bytes into the reader buffer (without dispatching - that would re-enter send and recurse). Adds tryRead/bufferedBytes accessors. * src/browser/HttpClient.zig: - Add has_buffered_input to CDPClient. In perform(), return .cdp_socket when buffered input exists, and always do at least a non-blocking poll on the CDP socket so HTTP completions can no longer starve CDP reads. - Make Transfer.deinit idempotent by claiming ownership through `client.transfers.remove(self.id)`. Second deinits (cascades out of error_callback) early-return. - Make `Transfer.kill` public (was `fn`) so BrowserContext.deinit can use it. - Tighten RequestParams.deinit / Request.deinit to take `*` instead of `*const` so they can call into `Headers.deinit` (now mutating). * src/network/http.zig: Headers.deinit now nulls out `self.headers` after `curl_slist_free_all`, so a second deinit is a no-op. Without this guard a value-copied Headers double-frees the curl_slist (the hazard RobotsLayer's call site already documents). * src/browser/ScriptManagerBase.zig: waitForImport now drains pending CDP messages on every iteration (matching the syncRequest pattern) and re-fetches the imported_modules entry per iteration. The cached entry was a use-after-free risk because the CDP-drain step above re-enters JS, and a transitively-imported module's preloadImport() -> getOrPut() can rehash the map and invalidate the prior entry pointer. A failed CDP blocking_read surfaces as error.Failed so the caller breaks the loop rather than retrying against a dead socket. * src/cdp/CDP.zig: - Wire hasBufferedInput. - Replace read() with tryRead() in readSocket and tolerate the no_new_data case so we still process messages drained during a backpressured send. - BrowserContext.deinit aborts pending intercepts via `transfer.kill` instead of `transfer.abort`. `kill` fires shutdown_callback (or no-op for transfers without one), avoiding error_callback's re-entry into JS through XHR/script error handlers - those crash because the V8 context and inspector this BC owns have either been torn down already or are about to be. Verified end-to-end against a puppeteer + http-proxy reproducer: | URL | before | after | |----------------------------|----------------------|------------------| | example.com | OK 124ms | OK 117ms | | github.com | HANG 20s (12/82) | OK 1410ms (82/82)| | shopify.com | HANG 20s (1/4) | OK 1973ms (66/66)| | allbirds.com | HANG 20s (12/53) | OK 3944ms (372) | | allbirds wool-runners PDP | HANG 20s (12/53) | OK 6286ms (459) | (nike.com still doesn't reach `load` event but all 68 continueRequests process cleanly - the remaining stall is third-party widgets keeping the page in `loading` state, not the CDP/HTTP deadlock this PR fixes.) Lightpanda survives 18 back-to-back navigation runs across the matrix above (3 per URL) without crashing. Fixes lightpanda-io#2462.

EXPLORATION ON TOP OF THE DEADLOCK FIX. Drop this commit if we decide to ship just the inline fixes. Background: PR review suggested decoupling CDP reads from the worker's runner loop entirely, via a dedicated reader thread that parses WS frames and pushes complete CDP messages onto a mailbox the worker drains at safe points (Runner.tick, syncRequest, waitForImport). That removes the two-way backpressure deadlock at its root and replaces the per-site point fixes from the squashed parent commit (WsConnection.send POLLIN drain, has_buffered_input, CDP.tryRead, ScriptManager extra cdp.blocking_read) with a uniform model. Shape of the prototype: * New src/cdp/Mailbox.zig - Thread-safe FIFO + wake pipe. - Producer = reader thread (push, close). - Consumer = worker thread (pop, drainWake, freeMessage, deinit). - HttpClient.perform polls Mailbox.wake_read instead of the raw WS socket fd, so the runner still gets a .cdp_socket signal as soon as a message lands. * src/cdp/CDP.zig - Add `mailbox: Mailbox` and `reader_thread: ?std.Thread`. - init() prepares the mailbox and points CDPClient.socket at the wake fd, but does NOT spawn the reader yet — that would race the WS handshake on socket reads. - New `startReader()` is called from Server.serveConnection after handshake succeeds. - deinit() shuts down the WS recv side (wakes the reader's blocked poll via POLLHUP), joins the reader, then tears down everything else. - readSocket() now drains the mailbox; readerLoop() is the reader's poll-then-read-then-processMessages loop. - drainMailbox is the new CDPClient.blocking_read implementation. blocking_read_start / blocking_read_end / has_buffered_input are gone — the mailbox makes them obsolete. * src/network/WsConnection.zig - Two threads now reach the send path concurrently (reader sends close/pong on protocol errors and ping replies; worker sends CDP events and responses), so send_arena and the socket-write loop have to be serialized. Added a send_mutex; split send into sendLocked (caller holds the mutex) so sendPong / sendJSON can hold it across their alloc-from-send_arena + send sequence. - Reverted EWOULDBLOCK handling to the simpler "flip to blocking, do the write, flip back" approach. Safe again because the recv side is owned by a different thread — the peer keeps draining while we block. - Removed the POLLIN-drain machinery (waitWritableDrainingReads, drainAvailable, tryRead, bufferedBytes, ReadResult) — all of that existed to manually break the read/write deadlock the mailbox now eliminates structurally. * src/browser/HttpClient.zig - Drop has_buffered_input from CDPClient. Drop the corresponding branch in perform(). - Keep the perform change that always polls the CDP fd (with a timeout=0 fast poll if we just processed completions) — still valid; prevents HTTP completion floods from starving CDP reads. - Drop blocking_read_start / blocking_read_end from CDPClient (dead code: only the middle of the trio, blocking_read, is ever used). * src/Server.zig - Call cdp.startReader() after the handshake and after registering the connection. Tests that init CDP without a real handshake never call startReader and remain unaffected. Verified against the PR's exact reproducer (puppeteer setRequestInterception + http-proxy through a local CONNECT proxy): | URL | before fixes | with mailbox | |-------------------|------------------|--------------------| | allbirds.com (x3) | HANG 20s (12/53) | OK 2.1-3.1s (all) | All 692 in-tree tests pass. Open questions to discuss before promoting this: * Should the inline-fix commit live alongside this in the same PR, or should we drop the inline send/perform changes once the mailbox lands? Several of the parent commit's fixes (Transfer.deinit / Headers.deinit / BC.deinit kill vs abort / ScriptManager re-fetch + entry guard) are independent of the read path and stay either way. * The "allow-list-during-sync-wait" idea from review is not implemented here. With the mailbox we drain everything at every safe point, including syncRequest's tick loop, so an interception can resume even while a synchronous XHR/import is in flight. If that scope needs tightening (e.g. only allow Fetch.continueRequest / failRequest / fulfillRequest / continueWithAuth during sync wait) it would go in Mailbox.pop / CDP.readSocket. * Outbound writes still go inline (synchronous) on the worker thread. We could push outgoing frames through a writer thread too, but the inbound decoupling alone already breaks the deadlock and keeps the diff small.

staylor · 2026-05-18T15:29:38Z

@karlseguin — I took a swing at your mailbox idea as a second commit on top so it's easy to compare (and easy to drop if you'd rather we don't go this direction): 5577679a. The branch is rebased on current main, no conflicts.

Shape

New src/cdp/Mailbox.zig: thread-safe FIFO + wake pipe. Producer is the reader thread (push / close); consumer is the worker (pop / drainWake / freeMessage). HttpClient.perform polls Mailbox.wake_read instead of the bare WS socket fd, so the runner still gets a .cdp_socket signal as soon as a message lands.
CDP.zig: dedicated reader thread spawned via a new startReader() after the WS handshake (called from Server.serveConnection); spawning earlier races the handshake on socket reads. deinit() ws.shutdown()s the recv side to wake the reader's blocked poll (POLLHUP), joins, then tears down. readSocket() now drains the mailbox.
WsConnection.send: serialized by a send_mutex because both threads can reach the send path (reader for protocol close + pong replies, worker for CDP events). Reverted EWOULDBLOCK handling to the simpler "flip to blocking, write, flip back" approach — safe again because the recv side is owned by a different thread.
Deleted the read/write deadlock-breakers from the squashed parent commit (waitWritableDrainingReads, drainAvailable, tryRead, bufferedBytes, has_buffered_input) — the mailbox eliminates the deadlock structurally so all of that machinery becomes unnecessary.

Net: ~165 deletions in WsConnection.zig traded for ~120 lines of Mailbox.zig, plus the reader-thread plumbing in CDP.

Verification against the PR repro

Local CONNECT proxy on 127.0.0.1:8080, lightpanda with --http-proxy http://127.0.0.1:8080, the exact script from the PR description against https://www.allbirds.com, three back-to-back runs:

run 1   OK 2422ms   seen=378 continued=378
run 2   OK 2131ms   seen=379 continued=379
run 3   OK 3114ms   seen=393 continued=393

(For reference: HANG 20s (12/53) pre-fix; OK 3944ms (372/372) with just the squashed inline commit.)

All 692 in-tree tests still pass.

What I didn't change

The other fixes in the squashed parent commit (Transfer.deinit idempotency, Headers.deinit idempotency, BrowserContext.deinit switching to transfer.kill, ScriptManagerBase.waitForImport re-fetch + entry guard) are independent of the read path — they fix re-entrant cleanup crashes, not the deadlock — so they stay either way.
I did not implement the "allow-list during sync wait" you described. With the mailbox we already drain everything at every safe point (including syncRequest's tick loop), so paused intercepts always get a chance to resume even mid-XHR/mid-import(). If you'd rather tighten that scope — e.g. only allow Fetch.continueRequest / failRequest / fulfillRequest / continueWithAuth to dispatch while a sync wait is in flight — it would slot into Mailbox.pop or CDP.readSocket cleanly.
Outbound writes are still inline on the worker thread. We could push them through a writer thread too for full-duplex decoupling, but the inbound decoupling alone already breaks the deadlock and keeps the diff small.

How you'd like to proceed?

A few options I see:

Land this as-is (two commits on the same PR): the squashed inline-fix as belt, the mailbox as suspenders.
Squash mailbox into the fix commit and drop the now-redundant inline send/perform machinery from the message.
Split: keep the cleanup-path fixes (Transfer.deinit / Headers.deinit / BC.deinit / ScriptManager guards) on this PR; open a separate PR for the mailbox architecture so it can land on its own review timeline.

Happy to take whichever direction works for you — and to iterate on the prototype (allow-list, outbound writer thread, anything else).

karlseguin · 2026-05-19T13:07:55Z

@staylor thanks for taking a stab at this. This is something I wanted to implement myself (with Claude's help, if I'm being honest). One notable difference with my approach is that I use the main thread (via Network.zig) to poll/read the CDP sockets, so it doesn't require spinning up any new threads.

The PR is #2495. It doesn't address all of the issues in this PR, but it should address some (along with other issues we've been seeing) and hopefully serves as a better foundation. Would love if you'd be able to take a look and possibly try it.

staylor · 2026-05-19T13:12:41Z

yeah I can took a look in a lil bit - I have no preference on the approach, I just want to unblock our own adoption. if your PR fixes the IO deadlock, we're probably fine. I don't think we actually need #2399 and #2476, but I'll confirm. #2476 sprang up intermediately while I was trying to track down the deadlock, which took multiple multi-hour sessions. Claude even gave up twice.

staylor · 2026-05-19T16:14:39Z

stress-testing these scenarios:

A: PR Main/Network reads CDP socket #2495 alone vs
B: PR Main/Network reads CDP socket #2495 + Drain deferred disposeBrowserContext on next createBrowserContext (#2472 follow-up) #2476
C: PR Fix Fetch.enable + http-proxy CDP send/recv deadlock (#2462) #2468 alone vs
D: PR Fix Fetch.enable + http-proxy CDP send/recv deadlock (#2462) #2468 + Drain deferred disposeBrowserContext on next createBrowserContext (#2472 follow-up) #2476

staylor · 2026-05-19T17:20:26Z

@karlseguin 🥇

 In this smoke test, PR #2495 alone is the most reliable Playwright path 
against allbirds.com under --http-proxy

whatever we can do to get yours merged, let's do it!

Observations

No IO deadlocks observed in any branch. Every failure was either a V8 crash (SIGABRT) or allbirds latency (DOMContentLoaded firing ~4 s past the 20 s budget then everything completing). I never saw the "hang with no error"
signature.
PR Main/Network reads CDP socket #2495 alone is the most reliable in our smoke test. Across two proxy runs the server stayed alive end‑to‑end (24/25 page loads succeeded; the one fail was an allbirds timing edge, not a server crash).
PR Fix Fetch.enable + http-proxy CDP send/recv deadlock (#2462) #2468 (rebased onto main) consistently V8‑OOMs around iter 8–9 under proxy, whether or not Drain deferred disposeBrowserContext on next createBrowserContext (#2472 follow-up) #2476 is applied. The crash is V8 hitting its ~32 MB heap cap in RegExp.exec → oomCallback (Env.zig:563) which then nul‑derefs.
Drain deferred disposeBrowserContext on next createBrowserContext (#2472 follow-up) #2476 did not measurably improve reliability in this test against either base. The "drain deferred dispose" path isn't what's leaking heap here — the V8 isolate is being reused across BrowserContext cycles in both Fix Fetch.enable + http-proxy CDP send/recv deadlock (#2462) #2468 and
Main/Network reads CDP socket #2495 and growing past its cap on the heavy Shopify storefront.
The segfault on B & D iter 8 is log.err formatting a null/dangling string during the V8 resolveModuleCallback path — likely downstream of the OOM rather than caused by Drain deferred disposeBrowserContext on next createBrowserContext (#2472 follow-up) #2476 itself.

Caveats

Debug build, single page (allbirds.com/products/mens-wool-runners), one fixed wait strategy, on macOS. Different pages / release builds may shift the picture.
The noop proxy is just a CONNECT tunnel. A real proxy with latency/jitter could expose different races.
"Deadlock" wasn't reproduced — the test budget (20 s goto, 45–60 s per iter) may not be long enough to expose a slow deadlock vs. a crash.

Bottom line for your PR

In this smoke test, PR #2495 alone is the most reliable Playwright path against allbirds.com under --http-proxy. Your PR #2468 (rebased on current main) consistently crashes earlier under the same load, and #2476 doesn't fix it
— the failure mode is V8 heap exhaustion in the regex compiler, which neither PR addresses.

krichprollsch · 2026-05-20T08:24:40Z

Are we agree #2495 closes this PR and issue #2462?

karlseguin · 2026-05-21T03:48:37Z

Closing. Fixed (or at least significantly improved) by #2495

staylor force-pushed the fix/2462-fetch-proxy-deadlock branch from 37b3431 to f05f451 Compare May 15, 2026 14:56

krichprollsch mentioned this pull request May 18, 2026

Fix/2462 fetch proxy deadlock #2489

Closed

staylor force-pushed the fix/2462-fetch-proxy-deadlock branch from f05f451 to 865e876 Compare May 18, 2026 15:02

karlseguin closed this May 21, 2026

github-actions Bot locked and limited conversation to collaborators May 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Fetch.enable + http-proxy CDP send/recv deadlock (#2462)#2468

Fix Fetch.enable + http-proxy CDP send/recv deadlock (#2462)#2468
staylor wants to merge 2 commits into
lightpanda-io:mainfrom
staylor:fix/2462-fetch-proxy-deadlock

staylor commented May 15, 2026

Uh oh!

krichprollsch commented May 18, 2026

Uh oh!

karlseguin commented May 18, 2026

Uh oh!

staylor commented May 18, 2026

Uh oh!

staylor commented May 18, 2026

Uh oh!

karlseguin commented May 19, 2026

Uh oh!

staylor commented May 19, 2026

Uh oh!

staylor commented May 19, 2026 •

edited

Loading

Uh oh!

staylor commented May 19, 2026

Uh oh!

krichprollsch commented May 20, 2026

Uh oh!

karlseguin commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

staylor commented May 15, 2026

Summary

A/B verification

Reproducer

Changes

src/network/WsConnection.zig -- drain reads while waiting for write space

src/browser/HttpClient.zig -- never starve CDP reads

src/network/http.zig -- make Headers.deinit idempotent

src/browser/ScriptManagerBase.zig -- drain CDP in waitForImport

src/cdp/CDP.zig -- wire backpressured-buffer drain + safer teardown

Stack of failure modes I observed during debugging

Uh oh!

krichprollsch commented May 18, 2026

Uh oh!

karlseguin commented May 18, 2026

Uh oh!

staylor commented May 18, 2026

Uh oh!

staylor commented May 18, 2026

Shape

Verification against the PR repro

What I didn't change

How you'd like to proceed?

Uh oh!

karlseguin commented May 19, 2026

Uh oh!

staylor commented May 19, 2026

Uh oh!

staylor commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

staylor commented May 19, 2026

Observations

Caveats

Bottom line for your PR

Uh oh!

krichprollsch commented May 20, 2026

Uh oh!

karlseguin commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

`src/network/WsConnection.zig` -- drain reads while waiting for write space

`src/browser/HttpClient.zig` -- never starve CDP reads

`src/network/http.zig` -- make Headers.deinit idempotent

`src/browser/ScriptManagerBase.zig` -- drain CDP in `waitForImport`

`src/cdp/CDP.zig` -- wire backpressured-buffer drain + safer teardown

staylor commented May 19, 2026 •

edited

Loading