Skip to content

V8 fatal AllowHeapAllocation::IsAllowed() during GC weak callback under CDP load #2407

@staylor

Description

@staylor

Summary

Under sustained CDP load (back-to-back puppeteer-core connections against an iframe-heavy page that runs Web Workers), debug builds of Lightpanda hit a V8 fatal during garbage collection:

# Fatal error in ../../../src/heap/heap-allocator-inl.h, line 79
# Debug check failed: AllowHeapAllocation::IsAllowed().

Process aborts with SIGTRAP. This is a debug-only V8 invariant ("no heap allocation during weak callbacks"); release builds will not abort but the underlying misuse of the V8 inspector API is the same.

Trigger

The fatal fires from inside V8's first-pass weak-callback phase during an incremental GC, when V8's PromiseHandlerTracker discards a stale Runtime.evaluate promise and tries to send a failure response back over the inspector channel. Building that response needs a JS string allocation, which is forbidden during weak callbacks:

v8::internal::HeapAllocator::AllocateRaw
  v8::String::NewFromOneByte
    v8_inspector__Channel__IMPL::sendResponse
      v8_crdtp::DomainDispatcher::sendResponse
        v8_inspector::EvaluateCallback::sendFailure
          v8_inspector::PromiseHandlerTracker::sendFailure
            v8_inspector::PromiseHandlerTracker::discard
              v8::internal::GlobalHandles::InvokeFirstPassWeakCallbacks  <-- inside GC
                v8::internal::Heap::PerformGarbageCollection
                  ...
                    v8::platform::DefaultPlatform::PumpMessageLoop
                      browser.js.Env.pumpMessageLoop
                        browser.js.Local.runMacrotasks
                          browser.webapi.WorkerGlobalScope.importScript
                            ...
                              browser.webapi.Worker.loadInitialScript
                                browser.webapi.Worker.httpDoneCallback
                                  ... HttpClient.processOneMessage ...

The Lightpanda-side entry is a Worker's httpDoneCallback -> loadInitialScript -> worker eval -> importScripts(...) -> runMacrotasks -> pumpMessageLoop. V8 decides to do an incremental GC inside the message loop, the GC's weak-callback phase invokes the inspector's PromiseHandlerTracker.discard, and the discard's sendFailure allocates -- fatal.

Reproduction

Build:

zig build  # debug build

Server:

./zig-out/bin/lightpanda serve --host 127.0.0.1 --port 9222 --log-level warn

Driver script (repro.mjs, requires puppeteer-core):

import puppeteer from 'puppeteer-core';
const URL = 'https://www.allbirds.com/products/mens-wool-runners';
const browser = await puppeteer.connect({ browserWSEndpoint: 'ws://127.0.0.1:9222' });
const ctx = await browser.createBrowserContext();
const page = await ctx.newPage();
const resp = await page.goto(URL, { waitUntil: 'load', timeout: 60_000 });
console.log('status', resp?.status?.());
console.log('title', await page.title());
await browser.disconnect();

Run it back-to-back against a single server:

for i in $(seq 1 25); do node repro.mjs; done

Reliably aborts within 5--15 iterations. Each successful iteration prints status 200 and the real <title>; one iteration aborts the server with the V8 fatal above.

Reproduces independently of #2398

Confirmed by checking out and rebuilding both of the following commits and running the same 25-iteration loop:

So the V8 fatal is a pre-existing latent bug in Lightpanda's interaction with the V8 inspector. #2398 does exacerbate visibility because the deferral logic keeps inspector callbacks registered for longer (they were previously torn down with the page mid-fetch via the now-fixed UAF), so they're more likely to still be alive when a GC runs.

It also reproduces against the same Allbirds URL on main without #2398, just less reliably -- the original UAF tends to trip first.

Suggested directions

A few possibilities, none investigated deeply:

  1. Defer the response from sendFailure out of GC. Have the inspector channel buffer the response and post it via the message loop / a microtask instead of writing it inline. The discard happens during GC weak callbacks, but the response doesn't have to. Probably the cleanest fix and most aligned with what V8 expects from embedders.

  2. Drain pending inspector promises before GC starts. During Env.pumpMessageLoop (or before each Heap::CollectGarbage), ask the inspector to flush any responses that would otherwise be triggered from within weak callbacks. This is more invasive into the V8 inspector layer.

  3. Prevent stale promises from sitting around long enough to be GC'd. The Runtime.evaluate whose promise is being discarded was almost certainly issued during a worker's main-script eval and never resolved because the worker tore down. Cleaning up pending evaluates more aggressively when a worker / page exits would side-step the discard path entirely. Less surgical and may not fully cover the case where an evaluate is genuinely outliving its session.

The Lightpanda-side surface area is src/browser/js/Env.zig (pumpMessageLoop, runMacrotasks) and src/browser/js/Inspector.zig (channel implementation, where the embedder controls how sendResponse is delivered).

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions