Skip to content

Fix Fetch.enable + http-proxy CDP send/recv deadlock (#2462)#2468

Closed
staylor wants to merge 2 commits into
lightpanda-io:mainfrom
staylor:fix/2462-fetch-proxy-deadlock
Closed

Fix Fetch.enable + http-proxy CDP send/recv deadlock (#2462)#2468
staylor wants to merge 2 commits into
lightpanda-io:mainfrom
staylor:fix/2462-fetch-proxy-deadlock

Conversation

@staylor
Copy link
Copy Markdown
Contributor

@staylor staylor commented May 15, 2026

Closes #2462.

Summary

Five coordinated fixes for the Fetch.enable + --http-proxy deadlock. The original symptom (CONNECT/TLS proxy round-trips inflate concurrency, lightpanda's WS send fills the kernel buffer, lightpanda blocks on write while puppeteer's matching Fetch.continueRequest replies pile up unread) is the headline scenario, but along the way an empirical reproducer surfaced four more places that produce the same "lightpanda stops draining the CDP socket and/or crashes mid-cleanup" symptom. All five share the root pattern: a synchronous wait loop or shutdown path that re-enters JS without giving CDP a chance to make progress.

A/B verification

Repro: puppeteer-core + setRequestInterception(true) + req.continue() against a localhost forward proxy, lightpanda started with --http-proxy http://127.0.0.1:8080.

URL before after
https://example.com OK 124 ms OK 117 ms
https://github.com HANG 20 s (12/82 continues processed) OK 1410 ms (82/82)
https://www.shopify.com HANG 20 s (1/4) OK 1973 ms (66/66)
https://www.allbirds.com HANG 20 s (12/53) OK 3944 ms (372/372)
https://www.allbirds.com/products/womens-wool-runners-dapple-grey HANG 20 s (12/53) OK 6286 ms (459/459)

Lightpanda survives 18 back-to-back navigations across the matrix above (3 per URL) without crashing on the patched binary. Pre-patch, the cleanup path triggered SIGSEGV (ZigToCurlAllocator.free: incorrect alignment) inside BrowserContext.deinit whenever the page actually completed before puppeteer disconnected.

(https://www.nike.com still doesn't fire the load event in either version. After this fix all 68 Fetch.continueRequest messages process cleanly; the remaining stall is third-party widgets keeping the page in loading state, not a CDP/HTTP deadlock. Out of scope.)

Reproducer

The puppeteer-core script that drove all measurements:

import puppeteer from 'puppeteer-core';

const browser = await puppeteer.connect({
  browserWSEndpoint: 'ws://127.0.0.1:9223',
  defaultViewport: null,
});
const page = await browser.newPage();
await page.setRequestInterception(true);
let seen = 0, continued = 0;
page.on('request', (req) => {
  seen++;
  req.continue().then(() => { continued++; }).catch(() => {});
});
const t0 = Date.now();
try {
  await page.goto('https://www.allbirds.com', { waitUntil: 'load', timeout: 20000 });
  console.log('OK ' + (Date.now() - t0) + 'ms');
} catch (e) {
  console.log('HANG: ' + e.message);
}
await new Promise(r => setTimeout(r, 1000));
console.log('seen=' + seen + ' continued=' + continued);
await browser.disconnect();

Lightpanda startup:

lightpanda serve --host 127.0.0.1 --port 9223 \
  --http-proxy http://127.0.0.1:8080

Any HTTP CONNECT-capable forward proxy on 127.0.0.1:8080 reproduces (vanilla Node http.createServer with a connect handler is enough).

Changes

src/network/WsConnection.zig -- drain reads while waiting for write space

WsConnection.send previously switched the socket to blocking mode on EWOULDBLOCK. The header comment said this "should virtually never happen" -- under Fetch.enable + --http-proxy it is routine, and the blocking write deadlocks because the peer is also blocked trying to send to us.

New behavior: on EWOULDBLOCK, poll for POLLOUT and POLLIN, drain any incoming bytes into the reader buffer (without dispatching -- that would re-enter send and recurse). Buffered bytes are picked up by the next CDP.readSocket. Adds tryRead (variant of read that distinguishes "no new bytes" from "peer closed") and bufferedBytes accessor.

src/browser/HttpClient.zig -- never starve CDP reads

perform() returned .normal early whenever processMessages() had work, so a steady stream of HTTP completions never gave the CDP socket a chance to be polled. Now: always do at least a non-blocking curl_multi_poll of the CDP socket, and return .cdp_socket immediately when the WS reader has bytes that were rescued during a backpressured send (via the new has_buffered_input callback on CDPClient).

Also makes Transfer.deinit idempotent. The error-callback path can cascade into re-entrant deinits (e.g. Script.errorCallback -> manager.evaluate() -> JS execution -> Frame.deinit -> abortOwner -> Transfer.kill -> Transfer.deinit); the second deinit then double-frees the Transfer's arena (ArenaPool: double-free detected panic). The fix uses client.transfers.remove(self.id) as a one-shot ownership claim -- if it returns false, this transfer was already torn down by the cascade and we early-return.

Transfer.kill is promoted from fn to pub fn so BrowserContext.deinit can use it (see CDP.zig change below). RequestParams.deinit / Request.deinit now take * instead of *const so they can call into the now-mutating Headers.deinit.

src/network/http.zig -- make Headers.deinit idempotent

Headers.deinit did not null self.headers after curl_slist_free_all. Combined with the value-copy hazard RobotsLayer already documents (Headers wraps a raw *curl_slist, so a value copy shares the pointer), a second deinit on a copied Headers double-freed the curl slist. The symptom was an "incorrect alignment" panic inside ZigToCurlAllocator.free because the slist nodes were already returned to the pool. Nulling self.headers after the free makes the second call a no-op regardless of how the value-copy hazard is triggered.

src/browser/ScriptManagerBase.zig -- drain CDP in waitForImport

waitForImport spun on client.tick(200) and discarded the .cdp_socket return. A script import() whose request was paused at the InterceptionLayer hung forever: the matching Fetch.continueRequest reply never reached CDP.processMessage, the request never resumed, and this loop spun until the client's navigation timeout fired. Mirrors the syncRequest pattern that already does this correctly.

The entry pointer is now re-fetched on every loop iteration. The CDP-drain step above re-enters JS, which can call back into preloadImport for a transitively-imported module; preloadImport calls imported_modules.getOrPut, which may rehash the map and invalidate every existing entry pointer. A cached entry was a use-after-free on the next entry.value_ptr.state access.

src/cdp/CDP.zig -- wire backpressured-buffer drain + safer teardown

readSocket now uses tryRead (the new WsConnection API) and tolerates the no_new_data case so messages drained during a backpressured send still get processed.

BrowserContext.deinit aborts pending intercepts via transfer.kill instead of transfer.abort. kill fires shutdown_callback (or no-op for transfers without one); abort fires error_callback, which re-enters JS via XHR/script error handlers. The handlers crash because the inspector and V8 context this BrowserContext owns have already been torn down (inspector.stopSession() runs two lines above the abort loop). The original code already aborted "before closing the session/page since some of these might callback into the page/scriptmanager" -- kill's shutdown path is the right primitive for that.

Stack of failure modes I observed during debugging

For anyone hitting this region in the future, the symptoms shifted as I peeled layers:

  1. Fetch.requestPaused events fire in lightpanda but Fetch.continueRequest replies never arrive. (Backpressure deadlock -- fixed by the WsConnection + perform changes.)
  2. After unblocking the deadlock, the page navigates successfully, then puppeteer disconnects, then BrowserContext.deinit SIGSEGVs in curl_slist_free_all. (Headers double-free -- fixed by Headers.deinit idempotency.)
  3. After fixing that, the same teardown path SIGSEGVs differently: ArenaPool: double-free detected in Transfer.deinit. (Cascade out of error_callback re-deinits the same transfer -- fixed by Transfer.deinit ownership claim.)
  4. After fixing that, the teardown path SIGSEGVs again inside _v8_inspector__Session__wrapObject. (XHR/script error_callback re-enters JS through a torn-down inspector -- fixed by kill instead of abort in BrowserContext.deinit.)
  5. After fixing that, the matrix runs clean.

@staylor staylor force-pushed the fix/2462-fetch-proxy-deadlock branch from 37b3431 to f05f451 Compare May 15, 2026 14:56
@krichprollsch
Copy link
Copy Markdown
Member

I rebased + added a little change in #2489

@karlseguin
Copy link
Copy Markdown
Collaborator

I've been talking about a possible solution for some of these issues.

Currently, every CDP connection maps to 1 Browser and runs in a thread. I think having the CDP socket polled and read from the main thread (in Network.zig) and communicate messages to the worker via an Erlang OTP-style mailbox would be more robust.

In normal operations, the mailbox would be drained in Runner.tick, i.e. a safe point, never inside of a JS callback. For Request Interception, which needs to block current execution, it would scan the mailbox and only process "safe" messages (a simple allow list of commands) + a timeout.

This adds some complexity at the main <-> worker boundary. It removes the natural back-pressure on incoming CDP messages (which I don't think is really a problem, but there are pretty simple solutions to that if it is). The main upside is that CDP commands that mutate the environment (e.g. Target.closeTarget) are only ever executed in Runner.tick. I'm not sure if this 100% solves the blocking write issue, but it should unblock the peer send which, from the way you desribe it was at least part of the problem. Unrelated to the PR, another benefit is that we can detect a disconnected client even if the worker is stuck in a heavy JS loop.

Would really appreciate any feedback you have about this.

@staylor
Copy link
Copy Markdown
Contributor Author

staylor commented May 18, 2026

let me resolve conflicts and throw Claude at it - this PR is the main blocker for Puppeteer and Playwright.

)

WsConnection.send used to switch the socket to blocking mode on
EWOULDBLOCK, with a comment claiming this 'should virtually never
happen'. Under CDP Fetch.enable + --http-proxy it is routine: the
CONNECT+TLS proxy round-trip means many subresources are in flight
simultaneously, lightpanda emits a flood of Fetch.requestPaused events,
the kernel send buffer fills, lightpanda blocks on write, and
puppeteer's matching Fetch.continueRequest replies pile up in
lightpanda's TCP recv buffer (which lightpanda can't drain because it
is blocked on the write). Both peers wedge until the client times out.

Other contributing problems all collapse paused-intercept sites where
an outer loop polls without ever draining the CDP socket, OR where
disconnect-time cleanup re-enters JS through paths the runtime can no
longer satisfy:

* HttpClient.perform skipped the CDP socket poll entirely whenever
  processMessages() returned non-empty, so a steady stream of HTTP
  completions could starve CDP reads.
* ScriptManagerBase.waitForImport spun on `client.tick(200)` and
  discarded the .cdp_socket return, so a script `import()` whose
  request was paused at the InterceptionLayer hung forever.
* BrowserContext.deinit aborted pending intercepts via `transfer.abort`,
  which fired XHR/script error_callback chains into a half-torn-down
  V8 context (the inspector had already been stopped two lines above).
* Headers.deinit was non-idempotent, so a value-copied Headers (the
  hazard RobotsLayer documents) double-freed its curl_slist on the
  second deinit; the symptom was an "incorrect alignment" panic
  inside ZigToCurlAllocator.free.
* Transfer.deinit was non-idempotent, so a cascade out of
  error_callback (e.g. Script.errorCallback -> manager.evaluate() ->
  JS execution -> Frame.deinit -> abortOwner -> Transfer.kill ->
  Transfer.deinit) reached `arena_pool.release` twice on the same
  arena.

Coordinated changes:

* src/network/WsConnection.zig: On EWOULDBLOCK, instead of switching
  to a blocking write, poll for both POLLOUT and POLLIN. While waiting
  for write space, drain any incoming bytes into the reader buffer
  (without dispatching - that would re-enter send and recurse). Adds
  tryRead/bufferedBytes accessors.

* src/browser/HttpClient.zig:
  - Add has_buffered_input to CDPClient. In perform(), return
    .cdp_socket when buffered input exists, and always do at least a
    non-blocking poll on the CDP socket so HTTP completions can no
    longer starve CDP reads.
  - Make Transfer.deinit idempotent by claiming ownership through
    `client.transfers.remove(self.id)`. Second deinits (cascades out
    of error_callback) early-return.
  - Make `Transfer.kill` public (was `fn`) so BrowserContext.deinit
    can use it.
  - Tighten RequestParams.deinit / Request.deinit to take `*` instead
    of `*const` so they can call into `Headers.deinit` (now mutating).

* src/network/http.zig: Headers.deinit now nulls out `self.headers`
  after `curl_slist_free_all`, so a second deinit is a no-op. Without
  this guard a value-copied Headers double-frees the curl_slist (the
  hazard RobotsLayer's call site already documents).

* src/browser/ScriptManagerBase.zig: waitForImport now drains pending
  CDP messages on every iteration (matching the syncRequest pattern)
  and re-fetches the imported_modules entry per iteration. The cached
  entry was a use-after-free risk because the CDP-drain step above
  re-enters JS, and a transitively-imported module's preloadImport()
  -> getOrPut() can rehash the map and invalidate the prior entry
  pointer. A failed CDP blocking_read surfaces as error.Failed so the
  caller breaks the loop rather than retrying against a dead socket.

* src/cdp/CDP.zig:
  - Wire hasBufferedInput.
  - Replace read() with tryRead() in readSocket and tolerate the
    no_new_data case so we still process messages drained during a
    backpressured send.
  - BrowserContext.deinit aborts pending intercepts via
    `transfer.kill` instead of `transfer.abort`. `kill` fires
    shutdown_callback (or no-op for transfers without one), avoiding
    error_callback's re-entry into JS through XHR/script error
    handlers - those crash because the V8 context and inspector this
    BC owns have either been torn down already or are about to be.

Verified end-to-end against a puppeteer + http-proxy reproducer:

| URL                        | before               | after            |
|----------------------------|----------------------|------------------|
| example.com                | OK 124ms             | OK 117ms         |
| github.com                 | HANG 20s (12/82)     | OK 1410ms (82/82)|
| shopify.com                | HANG 20s (1/4)       | OK 1973ms (66/66)|
| allbirds.com               | HANG 20s (12/53)     | OK 3944ms (372)  |
| allbirds wool-runners PDP  | HANG 20s (12/53)     | OK 6286ms (459)  |

(nike.com still doesn't reach `load` event but all 68 continueRequests
process cleanly - the remaining stall is third-party widgets keeping
the page in `loading` state, not the CDP/HTTP deadlock this PR fixes.)

Lightpanda survives 18 back-to-back navigation runs across the matrix
above (3 per URL) without crashing.

Fixes lightpanda-io#2462.
@staylor staylor force-pushed the fix/2462-fetch-proxy-deadlock branch from f05f451 to 865e876 Compare May 18, 2026 15:02
EXPLORATION ON TOP OF THE DEADLOCK FIX. Drop this commit if we decide
to ship just the inline fixes.

Background: PR review suggested decoupling CDP reads from the worker's
runner loop entirely, via a dedicated reader thread that parses WS
frames and pushes complete CDP messages onto a mailbox the worker
drains at safe points (Runner.tick, syncRequest, waitForImport). That
removes the two-way backpressure deadlock at its root and replaces the
per-site point fixes from the squashed parent commit (WsConnection.send
POLLIN drain, has_buffered_input, CDP.tryRead, ScriptManager extra
cdp.blocking_read) with a uniform model.

Shape of the prototype:

* New src/cdp/Mailbox.zig
  - Thread-safe FIFO + wake pipe.
  - Producer = reader thread (push, close).
  - Consumer = worker thread (pop, drainWake, freeMessage, deinit).
  - HttpClient.perform polls Mailbox.wake_read instead of the raw WS
    socket fd, so the runner still gets a .cdp_socket signal as soon
    as a message lands.

* src/cdp/CDP.zig
  - Add `mailbox: Mailbox` and `reader_thread: ?std.Thread`.
  - init() prepares the mailbox and points CDPClient.socket at the
    wake fd, but does NOT spawn the reader yet — that would race the
    WS handshake on socket reads.
  - New `startReader()` is called from Server.serveConnection after
    handshake succeeds.
  - deinit() shuts down the WS recv side (wakes the reader's blocked
    poll via POLLHUP), joins the reader, then tears down everything
    else.
  - readSocket() now drains the mailbox; readerLoop() is the reader's
    poll-then-read-then-processMessages loop.
  - drainMailbox is the new CDPClient.blocking_read implementation.
    blocking_read_start / blocking_read_end / has_buffered_input are
    gone — the mailbox makes them obsolete.

* src/network/WsConnection.zig
  - Two threads now reach the send path concurrently (reader sends
    close/pong on protocol errors and ping replies; worker sends CDP
    events and responses), so send_arena and the socket-write loop
    have to be serialized. Added a send_mutex; split send into
    sendLocked (caller holds the mutex) so sendPong / sendJSON can
    hold it across their alloc-from-send_arena + send sequence.
  - Reverted EWOULDBLOCK handling to the simpler "flip to blocking,
    do the write, flip back" approach. Safe again because the recv
    side is owned by a different thread — the peer keeps draining
    while we block.
  - Removed the POLLIN-drain machinery (waitWritableDrainingReads,
    drainAvailable, tryRead, bufferedBytes, ReadResult) — all of that
    existed to manually break the read/write deadlock the mailbox now
    eliminates structurally.

* src/browser/HttpClient.zig
  - Drop has_buffered_input from CDPClient. Drop the corresponding
    branch in perform().
  - Keep the perform change that always polls the CDP fd (with a
    timeout=0 fast poll if we just processed completions) — still
    valid; prevents HTTP completion floods from starving CDP reads.
  - Drop blocking_read_start / blocking_read_end from CDPClient (dead
    code: only the middle of the trio, blocking_read, is ever used).

* src/Server.zig
  - Call cdp.startReader() after the handshake and after registering
    the connection. Tests that init CDP without a real handshake
    never call startReader and remain unaffected.

Verified against the PR's exact reproducer (puppeteer
setRequestInterception + http-proxy through a local CONNECT proxy):

| URL               | before fixes     | with mailbox       |
|-------------------|------------------|--------------------|
| allbirds.com (x3) | HANG 20s (12/53) | OK 2.1-3.1s (all)  |

All 692 in-tree tests pass.

Open questions to discuss before promoting this:

* Should the inline-fix commit live alongside this in the same PR, or
  should we drop the inline send/perform changes once the mailbox
  lands? Several of the parent commit's fixes (Transfer.deinit /
  Headers.deinit / BC.deinit kill vs abort / ScriptManager re-fetch +
  entry guard) are independent of the read path and stay either way.
* The "allow-list-during-sync-wait" idea from review is not
  implemented here. With the mailbox we drain everything at every
  safe point, including syncRequest's tick loop, so an interception
  can resume even while a synchronous XHR/import is in flight. If
  that scope needs tightening (e.g. only allow Fetch.continueRequest
  / failRequest / fulfillRequest / continueWithAuth during sync
  wait) it would go in Mailbox.pop / CDP.readSocket.
* Outbound writes still go inline (synchronous) on the worker thread.
  We could push outgoing frames through a writer thread too, but the
  inbound decoupling alone already breaks the deadlock and keeps the
  diff small.
@staylor
Copy link
Copy Markdown
Contributor Author

staylor commented May 18, 2026

@karlseguin — I took a swing at your mailbox idea as a second commit on top so it's easy to compare (and easy to drop if you'd rather we don't go this direction): 5577679a. The branch is rebased on current main, no conflicts.

Shape

  • New src/cdp/Mailbox.zig: thread-safe FIFO + wake pipe. Producer is the reader thread (push / close); consumer is the worker (pop / drainWake / freeMessage). HttpClient.perform polls Mailbox.wake_read instead of the bare WS socket fd, so the runner still gets a .cdp_socket signal as soon as a message lands.
  • CDP.zig: dedicated reader thread spawned via a new startReader() after the WS handshake (called from Server.serveConnection); spawning earlier races the handshake on socket reads. deinit() ws.shutdown()s the recv side to wake the reader's blocked poll (POLLHUP), joins, then tears down. readSocket() now drains the mailbox.
  • WsConnection.send: serialized by a send_mutex because both threads can reach the send path (reader for protocol close + pong replies, worker for CDP events). Reverted EWOULDBLOCK handling to the simpler "flip to blocking, write, flip back" approach — safe again because the recv side is owned by a different thread.
  • Deleted the read/write deadlock-breakers from the squashed parent commit (waitWritableDrainingReads, drainAvailable, tryRead, bufferedBytes, has_buffered_input) — the mailbox eliminates the deadlock structurally so all of that machinery becomes unnecessary.

Net: ~165 deletions in WsConnection.zig traded for ~120 lines of Mailbox.zig, plus the reader-thread plumbing in CDP.

Verification against the PR repro

Local CONNECT proxy on 127.0.0.1:8080, lightpanda with --http-proxy http://127.0.0.1:8080, the exact script from the PR description against https://www.allbirds.com, three back-to-back runs:

run 1   OK 2422ms   seen=378 continued=378
run 2   OK 2131ms   seen=379 continued=379
run 3   OK 3114ms   seen=393 continued=393

(For reference: HANG 20s (12/53) pre-fix; OK 3944ms (372/372) with just the squashed inline commit.)

All 692 in-tree tests still pass.

What I didn't change

  • The other fixes in the squashed parent commit (Transfer.deinit idempotency, Headers.deinit idempotency, BrowserContext.deinit switching to transfer.kill, ScriptManagerBase.waitForImport re-fetch + entry guard) are independent of the read path — they fix re-entrant cleanup crashes, not the deadlock — so they stay either way.
  • I did not implement the "allow-list during sync wait" you described. With the mailbox we already drain everything at every safe point (including syncRequest's tick loop), so paused intercepts always get a chance to resume even mid-XHR/mid-import(). If you'd rather tighten that scope — e.g. only allow Fetch.continueRequest / failRequest / fulfillRequest / continueWithAuth to dispatch while a sync wait is in flight — it would slot into Mailbox.pop or CDP.readSocket cleanly.
  • Outbound writes are still inline on the worker thread. We could push them through a writer thread too for full-duplex decoupling, but the inbound decoupling alone already breaks the deadlock and keeps the diff small.

How you'd like to proceed?

A few options I see:

  1. Land this as-is (two commits on the same PR): the squashed inline-fix as belt, the mailbox as suspenders.
  2. Squash mailbox into the fix commit and drop the now-redundant inline send/perform machinery from the message.
  3. Split: keep the cleanup-path fixes (Transfer.deinit / Headers.deinit / BC.deinit / ScriptManager guards) on this PR; open a separate PR for the mailbox architecture so it can land on its own review timeline.

Happy to take whichever direction works for you — and to iterate on the prototype (allow-list, outbound writer thread, anything else).

@karlseguin
Copy link
Copy Markdown
Collaborator

@staylor thanks for taking a stab at this. This is something I wanted to implement myself (with Claude's help, if I'm being honest). One notable difference with my approach is that I use the main thread (via Network.zig) to poll/read the CDP sockets, so it doesn't require spinning up any new threads.

The PR is #2495. It doesn't address all of the issues in this PR, but it should address some (along with other issues we've been seeing) and hopefully serves as a better foundation. Would love if you'd be able to take a look and possibly try it.

@staylor
Copy link
Copy Markdown
Contributor Author

staylor commented May 19, 2026

yeah I can took a look in a lil bit - I have no preference on the approach, I just want to unblock our own adoption. if your PR fixes the IO deadlock, we're probably fine. I don't think we actually need #2399 and #2476, but I'll confirm. #2476 sprang up intermediately while I was trying to track down the deadlock, which took multiple multi-hour sessions. Claude even gave up twice.

@staylor
Copy link
Copy Markdown
Contributor Author

staylor commented May 19, 2026

@karlseguin 🥇

 In this smoke test, PR #2495 alone is the most reliable Playwright path 
against allbirds.com under --http-proxy

whatever we can do to get yours merged, let's do it!

Observations

  1. No IO deadlocks observed in any branch. Every failure was either a V8 crash (SIGABRT) or allbirds latency (DOMContentLoaded firing ~4 s past the 20 s budget then everything completing). I never saw the "hang with no error"
    signature.
  2. PR Main/Network reads CDP socket #2495 alone is the most reliable in our smoke test. Across two proxy runs the server stayed alive end‑to‑end (24/25 page loads succeeded; the one fail was an allbirds timing edge, not a server crash).
  3. PR Fix Fetch.enable + http-proxy CDP send/recv deadlock (#2462) #2468 (rebased onto main) consistently V8‑OOMs around iter 8–9 under proxy, whether or not Drain deferred disposeBrowserContext on next createBrowserContext (#2472 follow-up) #2476 is applied. The crash is V8 hitting its ~32 MB heap cap in RegExp.exec → oomCallback (Env.zig:563) which then nul‑derefs.
  4. Drain deferred disposeBrowserContext on next createBrowserContext (#2472 follow-up) #2476 did not measurably improve reliability in this test against either base. The "drain deferred dispose" path isn't what's leaking heap here — the V8 isolate is being reused across BrowserContext cycles in both Fix Fetch.enable + http-proxy CDP send/recv deadlock (#2462) #2468 and
    Main/Network reads CDP socket #2495 and growing past its cap on the heavy Shopify storefront.
  5. The segfault on B & D iter 8 is log.err formatting a null/dangling string during the V8 resolveModuleCallback path — likely downstream of the OOM rather than caused by Drain deferred disposeBrowserContext on next createBrowserContext (#2472 follow-up) #2476 itself.

Caveats

  • Debug build, single page (allbirds.com/products/mens-wool-runners), one fixed wait strategy, on macOS. Different pages / release builds may shift the picture.
  • The noop proxy is just a CONNECT tunnel. A real proxy with latency/jitter could expose different races.
  • "Deadlock" wasn't reproduced — the test budget (20 s goto, 45–60 s per iter) may not be long enough to expose a slow deadlock vs. a crash.

Bottom line for your PR

In this smoke test, PR #2495 alone is the most reliable Playwright path against allbirds.com under --http-proxy. Your PR #2468 (rebased on current main) consistently crashes earlier under the same load, and #2476 doesn't fix it
— the failure mode is V8 heap exhaustion in the regex compiler, which neither PR addresses.

@krichprollsch
Copy link
Copy Markdown
Member

Are we agree #2495 closes this PR and issue #2462?

@karlseguin
Copy link
Copy Markdown
Collaborator

Closing. Fixed (or at least significantly improved) by #2495

@karlseguin karlseguin closed this May 21, 2026
@github-actions github-actions Bot locked and limited conversation to collaborators May 21, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fetch.enable + --http-proxy deadlocks navigation (subresource events stop firing)

3 participants