Skip to content

Fetch.enable + --http-proxy deadlocks navigation (subresource events stop firing) #2462

@staylor

Description

@staylor

Summary

Calling Fetch.enable (CDP) on a Lightpanda session that was started with --http-proxy deadlocks every subsequent navigation. Specifically, Page.navigate returns successfully but the document never finishes loading -- subresource Network.requestWillBeSent events stop firing, and the navigation hangs until the client-side timeout.

This is the CDP-layer root cause behind a behavior that puppeteer-core's page.setRequestInterception(true) exposes -- puppeteer enables the Fetch domain under the hood when interception is requested.

Reproduction

CDP-level (no puppeteer):

  1. Start lightpanda: lightpanda serve --http-proxy http://127.0.0.1:8080 --port 9222 --host 127.0.0.1
    (any HTTP CONNECT proxy on 127.0.0.1:8080 reproduces -- a local mitmproxy / squid is fine; it doesn't have to actually reach the upstream)
  2. Connect a CDP client, attach to a target with flatten: true.
  3. Send Fetch.enable with patterns: [{ urlPattern: "*" }] (or even Fetch.enable {} with no patterns).
  4. Send Page.navigate { url: "https://example.com" }.

Observed: Page.navigate returns a frame id; no Page.frameStoppedLoading, no Page.loadEventFired, no further Network.requestWillBeSent after the initial document request. The navigation hangs until the client-side timeout.

Without --http-proxy: Fetch.enable works correctly -- subresource interception fires, navigation completes normally.

Without Fetch.enable: --http-proxy works correctly -- pages load through the proxy in the expected time.

The deadlock requires both at once.

puppeteer-core surface

// Same shape, easier to reproduce locally
const browser = await puppeteer.connect({ browserWSEndpoint: 'ws://127.0.0.1:9222' });
const page = await browser.newPage();
await page.setRequestInterception(true);  // calls Fetch.enable internally
await page.goto('https://example.com');   // hangs until timeout

I empirically measured this against https://www.allbirds.com/products/womens-wool-runners-dapple-grey through a residential-proxy chain: direct fetch through Lightpanda (with --http-proxy set) completes in ~4.4 s; the same page with setRequestInterception(true) enabled hits the navigation timeout (30 s) and fails.

Why this matters

Fetch.enable is the standard way to filter outbound resource requests at the browser level (block image / font / media / tracking-domain hosts before they fetch). For automation use cases this is critical:

  • Scraping at scale: blocking unnecessary subresources cuts page load time 2-5x and saves substantial proxy bandwidth.
  • The --http-proxy flag is also critical -- routing through residential proxies / proxy classifiers is a common production setup.

When the two can't coexist, callers have to choose: skip the bandwidth filter (heavier pages, more third-party JS executes -- including widgets like Yotpo that inject hundreds of <style> elements per interaction), or skip the proxy (loses the routing model entirely). Neither is acceptable for production scrape paths.

Suspected mechanism

I haven't traced this in the Lightpanda source, but the symptom (subresource events stop firing, document never completes) suggests something in the CDP Fetch.enable path either:

  1. Doesn't propagate the upstream proxy configuration to the intercepted-and-resumed request, so resumed requests stall at connect time, or
  2. Holds a lock / pending-completion handle that the proxy round-trip doesn't release, or
  3. Expects the Fetch.requestPaused / Fetch.continueRequest round-trip to complete on the direct request path and breaks when the request is going through the proxy

Happy to instrument and bisect if a maintainer can point me at the right entry point in src/cdp/domains/fetch* (or wherever the Fetch domain lives).

Environment

  • Lightpanda: 1.0.0-nightly.6240+37391687
  • OS: macOS 14, arm64
  • Test driver: puppeteer-core via CDP at ws://127.0.0.1:9222

Related

  • Affects scraper setups that need both per-request filtering and proxy routing.
  • Documented as a known constraint in our codebase (we skip resource filtering on Lightpanda only, with a multi-paragraph comment pointing at this exact deadlock) -- happy to share the workaround code if it's useful as a reference for what surface client code uses.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions