Skip to content

Commit e1f63a8

Browse files
committed
fix(engine): make software-renderer screenshot guard opt-in (CI regression)
PR #822's software-renderer guard auto-flipped `captureMode` to "screenshot" whenever `resolveBrowserGpuMode` returned "software" — but the WebGL probe classifies GitHub Actions runners (SwiftShader) as "software" too, even though `HeadlessExperimental.beginFrame` works on them. As a result the guard pessimized mainstream CI, exceeding `ffmpegStreamingTimeout` (10 min default) on the largest fixtures and producing this regression on three shards of run 25841853063: - fast → sub-composition-video (390 frames, 1080×1920, video + sub-comp) - styles-e → style-13-prod (12 MB baseline, slow tag) - styles-g → overlay-montage-prod (42.36 s duration, 1080×1920) All three failed with `Streaming encode failed: FFmpeg exited with code 255 ... received signal 15`. Wall-clock for overlay-montage-prod jumped from 9 m 13 s (main, beginframe) to 10 m 38 s (PR-822, screenshot) — just past the streaming timeout. ffmpeg was alive and consuming frames the whole time; the producer side simply couldn't sustain the rate in screenshot mode for these fixtures. Fix: gate the auto-flip behind an opt-in env var `HYPERFRAMES_FORCE_SCREENSHOT_ON_SOFTWARE_GPU`. When unset (default), the GPU probe is skipped entirely and the original (pre-PR-822) capture-mode selection runs — Linux + headless-shell → beginframe, with the existing `probeBeginFrameSupport` fallback handling actual BeginFrame unavailability at runtime. Operators on hosts where BeginFrame is technically present but stalls under shader load (the CPU-only Linux sandbox where hf#677 was reproduced) set the env var to short-circuit BeginFrame entirely. Mirrored the same gate in `frameCapture.ts`'s preMode resolution so the expensive probe Chrome is also skipped on the chromeArgs side when the guard is off. Tests: kept all eight software-renderer tests; flipped the block-scoped default to "guard ON" (set in `beforeEach`) so the pre-existing assertions exercise the guard path. Added a positive-control test that pins the new default behavior: `Linux + software + headless-shell + guard OFF → beginframe` (the regression case). — Vai
1 parent d20e28a commit e1f63a8

3 files changed

Lines changed: 111 additions & 40 deletions

File tree

packages/engine/src/services/browserManager.test.ts

Lines changed: 41 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -128,14 +128,18 @@ describe("resolveBrowserGpuMode", () => {
128128
});
129129
});
130130

131-
describe("acquireBrowser — software-renderer guard", () => {
132-
// Tests for the defense-in-depth rule that a "software"-resolved GPU mode
133-
// must force `captureMode = "screenshot"` regardless of platform/binary.
134-
// Without this guard, BeginFrame attempts on a software-rendered compositor
135-
// hang at the CDP protocolTimeout (the hf#677 spike repro). Tests below
136-
// mock puppeteer at the module-import level so no real Chrome is launched.
131+
describe("acquireBrowser — software-renderer guard (opt-in)", () => {
132+
// Tests for the opt-in rule that, when
133+
// `HYPERFRAMES_FORCE_SCREENSHOT_ON_SOFTWARE_GPU=1` is set, a
134+
// "software"-resolved GPU mode must force `captureMode = "screenshot"`
135+
// regardless of platform/binary. Without the env var the guard is dormant —
136+
// mainstream CI hosts (SwiftShader on GitHub runners) report "software" but
137+
// BeginFrame works on them, and pessimizing those hosts caused regressions
138+
// on the largest fixtures (hf#822). Tests below mock puppeteer at the
139+
// module-import level so no real Chrome is launched.
137140

138141
const origPlatform = process.platform;
142+
const origGuardEnv = process.env.HYPERFRAMES_FORCE_SCREENSHOT_ON_SOFTWARE_GPU;
139143

140144
// Captures the most recent puppeteer.launch call so tests can assert on the
141145
// args/executablePath actually passed to Chrome.
@@ -173,16 +177,25 @@ describe("acquireBrowser — software-renderer guard", () => {
173177
// `vi.resetModules()` already gives each test a fresh `browserManager.js`
174178
// module — so `_softwareGuardWarned` starts at false per test. No
175179
// additional reset needed.
180+
// Default: guard ON for the bulk of tests in this block (they exercise
181+
// the guard's behavior). Individual tests opt back to "guard OFF" by
182+
// deleting the env var.
183+
process.env.HYPERFRAMES_FORCE_SCREENSHOT_ON_SOFTWARE_GPU = "1";
176184
});
177185

178186
afterEach(() => {
179187
Object.defineProperty(process, "platform", { value: origPlatform, configurable: true });
188+
if (origGuardEnv === undefined) {
189+
delete process.env.HYPERFRAMES_FORCE_SCREENSHOT_ON_SOFTWARE_GPU;
190+
} else {
191+
process.env.HYPERFRAMES_FORCE_SCREENSHOT_ON_SOFTWARE_GPU = origGuardEnv;
192+
}
180193
vi.restoreAllMocks();
181194
vi.doUnmock("puppeteer");
182195
vi.doUnmock("puppeteer-core");
183196
});
184197

185-
it("forces screenshot capture mode when browserGpuMode resolves to 'software' on Linux", async () => {
198+
it("forces screenshot capture mode when browserGpuMode resolves to 'software' on Linux (guard enabled)", async () => {
186199
installPuppeteerMock();
187200
const { acquireBrowser: acquire } = await import("./browserManager.js");
188201

@@ -210,6 +223,27 @@ describe("acquireBrowser — software-renderer guard", () => {
210223
expect(lastLaunchArgs).toContain("--no-sandbox");
211224
});
212225

226+
it("KEEPS beginframe capture mode on Linux+software when the guard is NOT enabled (default)", async () => {
227+
// Mirror of the "forces screenshot" test above with the guard OFF — this
228+
// is the production default that mainstream CI hits. Without this, GH
229+
// Actions runners (SwiftShader, reports "software") regress to screenshot
230+
// mode and the streaming encoder times out on shader-heavy fixtures
231+
// (hf#822 regression on overlay-montage-prod, sub-composition-video,
232+
// style-13-prod).
233+
delete process.env.HYPERFRAMES_FORCE_SCREENSHOT_ON_SOFTWARE_GPU;
234+
installPuppeteerMock();
235+
const { acquireBrowser: acquire } = await import("./browserManager.js");
236+
237+
const chromeArgs = ["--no-sandbox", "--enable-begin-frame-control"];
238+
const result = await acquire(chromeArgs, {
239+
chromePath: "/fake/chrome-headless-shell",
240+
browserGpuMode: "software",
241+
});
242+
243+
expect(result.captureMode).toBe("beginframe");
244+
expect(lastLaunchArgs).toContain("--enable-begin-frame-control");
245+
});
246+
213247
it("keeps beginframe capture mode when browserGpuMode is 'hardware' on Linux with headless-shell", async () => {
214248
installPuppeteerMock();
215249
const { acquireBrowser: acquire } = await import("./browserManager.js");

packages/engine/src/services/browserManager.ts

Lines changed: 48 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -260,6 +260,33 @@ export function _resetSoftwareGuardWarnedForTests(): void {
260260
_softwareGuardWarned = false;
261261
}
262262

263+
/**
264+
* Opt-in: force `captureMode = "screenshot"` whenever the GPU resolves to
265+
* "software" (no hardware WebGL).
266+
*
267+
* Why opt-in rather than always-on: GitHub Actions runners and other common
268+
* SwiftShader hosts report "software" from the WebGL probe but
269+
* `HeadlessExperimental.beginFrame` works reliably on them — so a blanket
270+
* software → screenshot flip pessimizes mainstream CI by ~1.5 min on
271+
* shader/composition-heavy fixtures and exceeds `ffmpegStreamingTimeout` on
272+
* the largest renders.
273+
*
274+
* The runtime probe (`probeBeginFrameSupport` below) already catches actual
275+
* BeginFrame *unavailability*. This env var exists for hosts where BeginFrame
276+
* is technically present but the compositor stalls under shader load (the
277+
* CPU-only Linux sandbox where hf#677 was reproduced). Operators on such
278+
* hosts set `HYPERFRAMES_FORCE_SCREENSHOT_ON_SOFTWARE_GPU=1` to short-circuit
279+
* BeginFrame entirely.
280+
*
281+
* Accepts the same truthy values as other engine env flags (`"1"`, `"true"`).
282+
*/
283+
export function isSoftwareScreenshotGuardEnabled(): boolean {
284+
const raw = process.env.HYPERFRAMES_FORCE_SCREENSHOT_ON_SOFTWARE_GPU;
285+
if (!raw) return false;
286+
const v = raw.trim().toLowerCase();
287+
return v === "1" || v === "true" || v === "yes" || v === "on";
288+
}
289+
263290
export interface AcquireBrowserOptions {
264291
/**
265292
* If the caller already resolved `browserGpuMode` (e.g. `frameCapture.ts`
@@ -301,31 +328,32 @@ export async function acquireBrowser(
301328
const isLinux = process.platform === "linux";
302329
const forceScreenshot = config?.forceScreenshot ?? DEFAULT_CONFIG.forceScreenshot;
303330

304-
// Resolve browserGpuMode. On software-renderer hosts (no hardware GPU /
305-
// SwiftShader) HeadlessExperimental.beginFrame is unreliable — the
306-
// compositor stalls indefinitely on shader-heavy frames and the CDP call
307-
// times out at protocolTimeout (default 5 min). Force screenshot mode
308-
// whenever resolveBrowserGpuMode resolves to "software", independent of
309-
// platform/binary. This is a separate defense from the producer-side
310-
// `captureHdrStage` guard (which unconditionally forces screenshot for the
311-
// HDR layered-composite path); this guard covers the SDR-render-on-software-
312-
// host case that captureHdrStage doesn't touch.
331+
// The software-renderer screenshot guard is opt-in (see
332+
// `isSoftwareScreenshotGuardEnabled` above). Skip the GPU probe entirely
333+
// unless the guard is enabled — the probe launches a throwaway Chrome to
334+
// run a WebGL availability check, so paying for it on every render when
335+
// the result is unused is wasteful. Operators on stall-prone hosts set
336+
// `HYPERFRAMES_FORCE_SCREENSHOT_ON_SOFTWARE_GPU=1` and accept the probe
337+
// cost as part of that contract.
313338
//
314339
// If the caller has already resolved the mode (frameCapture.ts does) it
315340
// hands us the value via options.resolvedBrowserGpuMode to avoid a second
316341
// resolution from raw config. The probe Promise is cached for the process
317-
// lifetime so the fallback path is still cheap.
318-
let resolvedGpuMode: "software" | "hardware";
319-
if (options.resolvedBrowserGpuMode) {
320-
resolvedGpuMode = options.resolvedBrowserGpuMode;
321-
} else {
322-
const browserGpuMode = config?.browserGpuMode ?? DEFAULT_CONFIG.browserGpuMode;
323-
resolvedGpuMode = await resolveBrowserGpuMode(browserGpuMode, {
324-
chromePath: headlessShell ?? undefined,
325-
browserTimeout: config?.browserTimeout,
326-
});
342+
// lifetime so even when both paths fire, only one Chrome launch happens.
343+
const guardEnabled = isSoftwareScreenshotGuardEnabled();
344+
let resolvedGpuMode: "software" | "hardware" | undefined;
345+
if (guardEnabled) {
346+
if (options.resolvedBrowserGpuMode) {
347+
resolvedGpuMode = options.resolvedBrowserGpuMode;
348+
} else {
349+
const browserGpuMode = config?.browserGpuMode ?? DEFAULT_CONFIG.browserGpuMode;
350+
resolvedGpuMode = await resolveBrowserGpuMode(browserGpuMode, {
351+
chromePath: headlessShell ?? undefined,
352+
browserTimeout: config?.browserTimeout,
353+
});
354+
}
327355
}
328-
const isSoftwareRenderer = resolvedGpuMode === "software";
356+
const isSoftwareRenderer = guardEnabled && resolvedGpuMode === "software";
329357

330358
let captureMode: CaptureMode;
331359
let executablePath: string | undefined;

packages/engine/src/services/frameCapture.ts

Lines changed: 22 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ import {
1818
releaseBrowser,
1919
forceReleaseBrowser,
2020
buildChromeArgs,
21+
isSoftwareScreenshotGuardEnabled,
2122
resolveBrowserGpuMode,
2223
resolveHeadlessShellPath,
2324
type CaptureMode,
@@ -195,27 +196,35 @@ export async function createCaptureSession(
195196
// need explicit clip+scale on `Page.captureScreenshot`, so fall back to
196197
// the screenshot path for any DPR > 1.
197198
const supersampling = (options.deviceScaleFactor ?? 1) > 1;
199+
// Mirror the software-renderer guard in `acquireBrowser`. The guard is
200+
// opt-in (env var `HYPERFRAMES_FORCE_SCREENSHOT_ON_SOFTWARE_GPU`); when it
201+
// is off we skip the GPU probe entirely to avoid the extra Chrome launch.
202+
// When on, the probe Promise is cached for the process lifetime, so
203+
// resolving here and threading the result to acquireBrowser collapses to
204+
// a single probe.
205+
const guardEnabled = isSoftwareScreenshotGuardEnabled();
198206
const requestedGpuMode = config?.browserGpuMode ?? DEFAULT_CONFIG.browserGpuMode;
199-
const resolvedGpuMode = await resolveBrowserGpuMode(requestedGpuMode, {
200-
chromePath: headlessShell ?? undefined,
201-
browserTimeout: config?.browserTimeout,
202-
});
203-
// Mirror the software-renderer guard in `acquireBrowser`: on a software
204-
// host the BeginFrame compositor stalls, so build the chromeArgs without
205-
// BeginFrame flags up-front. acquireBrowser will also strip them as a
206-
// belt-and-braces measure, but doing it here keeps the launch args clean.
207+
const resolvedGpuMode = guardEnabled
208+
? await resolveBrowserGpuMode(requestedGpuMode, {
209+
chromePath: headlessShell ?? undefined,
210+
browserTimeout: config?.browserTimeout,
211+
})
212+
: undefined;
213+
const isSoftwareRenderer = guardEnabled && resolvedGpuMode === "software";
207214
const preMode: CaptureMode =
208-
headlessShell && isLinux && !forceScreenshot && !supersampling && resolvedGpuMode !== "software"
215+
headlessShell && isLinux && !forceScreenshot && !supersampling && !isSoftwareRenderer
209216
? "beginframe"
210217
: "screenshot";
211218
const chromeArgs = buildChromeArgs(
212219
{ width: options.width, height: options.height, captureMode: preMode },
213-
{ ...config, browserGpuMode: resolvedGpuMode },
220+
resolvedGpuMode ? { ...config, browserGpuMode: resolvedGpuMode } : config,
214221
);
215222

216-
// Thread the already-resolved GPU mode into acquireBrowser so it doesn't
217-
// re-resolve from raw config. Promise-cached anyway, but removes the smell
218-
// of two parallel resolutions that future refactors could let diverge.
223+
// Thread the already-resolved GPU mode into acquireBrowser when available
224+
// so it doesn't re-resolve from raw config. (Both sides agree on whether
225+
// the guard is enabled because they read the same env var; if the env var
226+
// flips between these two calls — basically only possible in tests — the
227+
// guard semantics fall back to acquireBrowser's own resolution.)
219228
const { browser, captureMode } = await acquireBrowser(chromeArgs, config, {
220229
resolvedBrowserGpuMode: resolvedGpuMode,
221230
});

0 commit comments

Comments
 (0)