Summary
Under sustained CDP load (back-to-back puppeteer-core connections against an iframe-heavy page that runs Web Workers), debug builds of Lightpanda hit a V8 fatal during garbage collection:
# Fatal error in ../../../src/heap/heap-allocator-inl.h, line 79
# Debug check failed: AllowHeapAllocation::IsAllowed().
Process aborts with SIGTRAP. This is a debug-only V8 invariant ("no heap allocation during weak callbacks"); release builds will not abort but the underlying misuse of the V8 inspector API is the same.
Trigger
The fatal fires from inside V8's first-pass weak-callback phase during an incremental GC, when V8's PromiseHandlerTracker discards a stale Runtime.evaluate promise and tries to send a failure response back over the inspector channel. Building that response needs a JS string allocation, which is forbidden during weak callbacks:
v8::internal::HeapAllocator::AllocateRaw
v8::String::NewFromOneByte
v8_inspector__Channel__IMPL::sendResponse
v8_crdtp::DomainDispatcher::sendResponse
v8_inspector::EvaluateCallback::sendFailure
v8_inspector::PromiseHandlerTracker::sendFailure
v8_inspector::PromiseHandlerTracker::discard
v8::internal::GlobalHandles::InvokeFirstPassWeakCallbacks <-- inside GC
v8::internal::Heap::PerformGarbageCollection
...
v8::platform::DefaultPlatform::PumpMessageLoop
browser.js.Env.pumpMessageLoop
browser.js.Local.runMacrotasks
browser.webapi.WorkerGlobalScope.importScript
...
browser.webapi.Worker.loadInitialScript
browser.webapi.Worker.httpDoneCallback
... HttpClient.processOneMessage ...
The Lightpanda-side entry is a Worker's httpDoneCallback -> loadInitialScript -> worker eval -> importScripts(...) -> runMacrotasks -> pumpMessageLoop. V8 decides to do an incremental GC inside the message loop, the GC's weak-callback phase invokes the inspector's PromiseHandlerTracker.discard, and the discard's sendFailure allocates -- fatal.
Reproduction
Build:
Server:
./zig-out/bin/lightpanda serve --host 127.0.0.1 --port 9222 --log-level warn
Driver script (repro.mjs, requires puppeteer-core):
import puppeteer from 'puppeteer-core';
const URL = 'https://www.allbirds.com/products/mens-wool-runners';
const browser = await puppeteer.connect({ browserWSEndpoint: 'ws://127.0.0.1:9222' });
const ctx = await browser.createBrowserContext();
const page = await ctx.newPage();
const resp = await page.goto(URL, { waitUntil: 'load', timeout: 60_000 });
console.log('status', resp?.status?.());
console.log('title', await page.title());
await browser.disconnect();
Run it back-to-back against a single server:
for i in $(seq 1 25); do node repro.mjs; done
Reliably aborts within 5--15 iterations. Each successful iteration prints status 200 and the real <title>; one iteration aborts the server with the V8 fatal above.
Reproduces independently of #2398
Confirmed by checking out and rebuilding both of the following commits and running the same 25-iteration loop:
So the V8 fatal is a pre-existing latent bug in Lightpanda's interaction with the V8 inspector. #2398 does exacerbate visibility because the deferral logic keeps inspector callbacks registered for longer (they were previously torn down with the page mid-fetch via the now-fixed UAF), so they're more likely to still be alive when a GC runs.
It also reproduces against the same Allbirds URL on main without #2398, just less reliably -- the original UAF tends to trip first.
Suggested directions
A few possibilities, none investigated deeply:
-
Defer the response from sendFailure out of GC. Have the inspector channel buffer the response and post it via the message loop / a microtask instead of writing it inline. The discard happens during GC weak callbacks, but the response doesn't have to. Probably the cleanest fix and most aligned with what V8 expects from embedders.
-
Drain pending inspector promises before GC starts. During Env.pumpMessageLoop (or before each Heap::CollectGarbage), ask the inspector to flush any responses that would otherwise be triggered from within weak callbacks. This is more invasive into the V8 inspector layer.
-
Prevent stale promises from sitting around long enough to be GC'd. The Runtime.evaluate whose promise is being discarded was almost certainly issued during a worker's main-script eval and never resolved because the worker tore down. Cleaning up pending evaluates more aggressively when a worker / page exits would side-step the discard path entirely. Less surgical and may not fully cover the case where an evaluate is genuinely outliving its session.
The Lightpanda-side surface area is src/browser/js/Env.zig (pumpMessageLoop, runMacrotasks) and src/browser/js/Inspector.zig (channel implementation, where the embedder controls how sendResponse is delivered).
Related
Summary
Under sustained CDP load (back-to-back
puppeteer-coreconnections against an iframe-heavy page that runs Web Workers), debug builds of Lightpanda hit a V8 fatal during garbage collection:Process aborts with
SIGTRAP. This is a debug-only V8 invariant ("no heap allocation during weak callbacks"); release builds will not abort but the underlying misuse of the V8 inspector API is the same.Trigger
The fatal fires from inside V8's first-pass weak-callback phase during an incremental GC, when V8's
PromiseHandlerTrackerdiscards a staleRuntime.evaluatepromise and tries to send afailureresponse back over the inspector channel. Building that response needs a JS string allocation, which is forbidden during weak callbacks:The Lightpanda-side entry is a Worker's
httpDoneCallback->loadInitialScript-> worker eval ->importScripts(...)->runMacrotasks->pumpMessageLoop. V8 decides to do an incremental GC inside the message loop, the GC's weak-callback phase invokes the inspector'sPromiseHandlerTracker.discard, and the discard'ssendFailureallocates -- fatal.Reproduction
Build:
Server:
Driver script (
repro.mjs, requirespuppeteer-core):Run it back-to-back against a single server:
Reliably aborts within 5--15 iterations. Each successful iteration prints
status 200and the real<title>; one iteration aborts the server with the V8 fatal above.Reproduces independently of #2398
Confirmed by checking out and rebuilding both of the following commits and running the same 25-iteration loop:
is_evaluatingfix from Defer page teardown while worker scripts are evaluating #2398 (1f761af2) -- aborted on run 11.92607ad7) -- aborted on run 6.So the V8 fatal is a pre-existing latent bug in Lightpanda's interaction with the V8 inspector. #2398 does exacerbate visibility because the deferral logic keeps inspector callbacks registered for longer (they were previously torn down with the page mid-fetch via the now-fixed UAF), so they're more likely to still be alive when a GC runs.
It also reproduces against the same Allbirds URL on
mainwithout #2398, just less reliably -- the original UAF tends to trip first.Suggested directions
A few possibilities, none investigated deeply:
Defer the response from
sendFailureout of GC. Have the inspector channel buffer the response and post it via the message loop / a microtask instead of writing it inline. The discard happens during GC weak callbacks, but the response doesn't have to. Probably the cleanest fix and most aligned with what V8 expects from embedders.Drain pending inspector promises before GC starts. During
Env.pumpMessageLoop(or before eachHeap::CollectGarbage), ask the inspector to flush any responses that would otherwise be triggered from within weak callbacks. This is more invasive into the V8 inspector layer.Prevent stale promises from sitting around long enough to be GC'd. The
Runtime.evaluatewhose promise is being discarded was almost certainly issued during a worker's main-script eval and never resolved because the worker tore down. Cleaning up pending evaluates more aggressively when a worker / page exits would side-step the discard path entirely. Less surgical and may not fully cover the case where an evaluate is genuinely outliving its session.The Lightpanda-side surface area is
src/browser/js/Env.zig(pumpMessageLoop,runMacrotasks) andsrc/browser/js/Inspector.zig(channel implementation, where the embedder controls howsendResponseis delivered).Related