Skip to content

feat(engine,cli): drawElementImage fast-capture behind --experimental-fast-capture#1295

Open
vanceingalls wants to merge 8 commits into
mainfrom
drawelement-fast-capture
Open

feat(engine,cli): drawElementImage fast-capture behind --experimental-fast-capture#1295
vanceingalls wants to merge 8 commits into
mainfrom
drawelement-fast-capture

Conversation

@vanceingalls

@vanceingalls vanceingalls commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds an experimental frame-capture mode that reads DOM paint records directly via Chrome's canvas.drawElementImage API instead of Page.captureScreenshot. On DOM/CSS/GSAP compositions it is significantly faster on both platforms. It is fully opt-in behind a new CLI flag — default behaviour is byte-unchanged.

hyperframes render --experimental-fast-capture --output out.mp4

Benchmark — drawElement-only comps (14/20; excl. style-7/8/10/15 video-gated + raf-ball compat-route + iframe-render-compat rAF-sync):

Platform p50 p75 p90
macOS GPU 1.49× 1.56× 1.68×
Docker / SwiftShader 1.29× 1.53× 1.60×

Total wall-clock: macOS 358s → 324s (1.11×), Docker 987s → 853s (1.16×). The 6 excluded comps (4 video-gated → baseline route, 1 compat-forced screenshot, 1 iframe rAF-sync) measure init-overhead savings only — their fast renders take the same capture mode as baseline. Including them dilutes the drawElement speedup signal. Total wall-clock includes all 20 comps.

How it works

  • New capture mode "drawelement" (alongside "beginframe" / "screenshot"). A <canvas layoutsubtree> is injected around the composition root after page-ready; each frame clears the canvas, calls drawElementImage(root, 0, 0), and reads it back.
  • Format-aware encode — emits JPEG for opaque output (mp4) and PNG for transparent output (mov/webm/png-sequence).
  • Page-side compositing — drawElement and page-side shader compositing are mutually incompatible; resolveConfig auto-disables enablePageSideCompositing when fast-capture is on (shader transitions fall back to the Node-side layered blend).

Fixes

Paint-event sync + No cached paint record crash (71b0eb77)

drawElementImage draws from a snapshot recorded at the paint event. Called outside one it returns the previous frame's snapshot (silent stale output), or throws InvalidStateError: No cached paint record when no paint has landed. Fix: force a paint-level invalidation via a 1×1 sentinel outside the captured root, await the canvas paint event, draw inside the handler. Under BeginFrame control (Linux headless-shell) the sync is skipped — the per-frame HeadlessExperimental.beginFrame already paints a fresh snapshot.

45 s init stall on CSS/rAF-only compositions (71b0eb77)

Compositions driven purely by CSS animations / rAF never register window.__timelines[id], burning the full 45 s poll timeout. Hosts now opt out with data-no-timeline on [data-composition-id]. Measured: css-spinner 49.7 s → 3.9 s.

Body/HTML background lost in fast capture (223c4752)

drawElementImage only paints the captured subtree — ancestor backgrounds on <body> and <html> (outside the composition root) were invisible. Fix: walk ancestors before each frame and fill the canvas with the first non-transparent background-color found. Six comps went from 23–31 dB → 53–71 dB.

WebGL / WebGPU / 2D canvas content frozen at frame 0 (ec037e2a)

GPU-accelerated canvases present via compositor texture swap; their paint records freeze at the first frame. Fix: getContext instrumentation (via evaluateOnNewDocument) forces preserveDrawingBuffer: true for WebGL/WebGPU and tracks all accelerated canvases. Per frame: hide tracked canvases from the DOM, composite their live content via drawImage, then call drawElementImage. three.js, WebGL, WebGPU, and custom shaders work on the fast path.

Transparent comps in opaque mp4 encoded black (94c95c05)

Compositions with no author background and transparent content in an opaque-format render (e.g. mp4) were captured against black instead of white. Fix: white-fill the canvas before drawElementImage when no ancestor background is found and the output format is JPEG. webm-transparency: 3.4 dB → 61–67 dB.

forceScreenshot compat hints overridden (bf8922cb)

Fast capture was ignoring forceScreenshot compat hints set by the engine (raw rAF animations, alpha output). Fix: initializeSession short-circuits fast capture when cfg.forceScreenshot is set. raf-ball Docker: fully black → ∞ PSNR.

Compile-time video gate + BeginFrame liveness probe (b2612bfc)

The runtime hasVideo gate fired after browser launch — on Linux that launch is BeginFrame mode, and the gate's screenshot fallback then called Page.captureScreenshot on a BeginFrame browser, which hangs by design. The compiler knows videoCount before launch, so the gate now runs at compile time and disables drawElement only (not forces screenshot), letting the render take the platform's normal baseline route. A new probeBeginFrameLiveness() probe backstops genuinely-stalling comps. style-7/8/10/15 went from unrenderable on Docker to baseline-speed.

Runtime opacity → autoAlpha rewrite (5d53f13c)

Stacked position:absolute containers with opacity:0 children cause drawElementImage to capture black frames (240/240 deterministic on both macOS GPU and SwiftShader). Workaround: hf-early-stub rewrites opacityautoAlpha in all GSAP tween vars when window.__HF_FAST_CAPTURE_AUTOALPHA__ is set. autoAlpha sets visibility:hidden at 0, removing elements from the paint tree entirely instead of stacking them as promoted compositor layers. chat comp: 29.4 dB → 49.6 dB, zero blackout frames.

Video gate routes to baseline path, not screenshot (74ee7b60)

The compile-time gate (above) was setting forceScreenshot, but the Linux non-lowmem baseline captures via BeginFrame (~40% faster than screenshot on SwiftShader). Gate now sets cfg.useDrawElement = false only, letting mode resolution pick the platform baseline route. style-8 Docker: 68s → 46.6s (parity with baseline).

CI workflow env vars (8538a970)

validate-fast-video env vars were empty on push events; defaults now applied when unset.

Fallback gates

Limitations that route specific renders back to a correct baseline path automatically — the only cost is losing the speedup for that render, never a broken frame.

  • Video (153602d, updated b2612bfc, 74ee7b60) — compositions containing <video> disable drawElement at compile time and take the platform baseline route (BeginFrame on Linux, screenshot on macOS). Root cause: a Chromium opacity-animation bug on stacked transparent layers is proxied by video presence. HF_FAST_CAPTURE_VIDEO=true bypasses both gates (R&D escape hatch).
  • Transparent + SwiftShader (1601c1fe) — SwiftShader drops promoted compositor sub-layers onto transparent canvas destinations. Opaque output on SwiftShader uses drawElement fine; alpha output falls back to screenshot.
  • Supersampling (deviceScaleFactor > 1)drawElementImage rasterises at CSS pixels with no DPR option. Supersampled renders fall back to screenshot.

Wiring

CLI flag → env → engine config:

--experimental-fast-capturePRODUCER_EXPERIMENTAL_FAST_CAPTUREEngineConfig.useDrawElementresolveConfig → capture session. Threaded through the Docker render path (buildDockerRunArgs) as well.

Testing

  • Unit (drawElementService.test.ts, config.test.ts): SwiftShader detection, capture-mode routing, env resolution, page-side auto-disable. Full engine suite green (697 pass).
  • Regression (fast-capture-gsap, 53a09b7d): producer-level guard — renders a GSAP composition in fast mode and PSNR-checks against the screenshot baseline.
  • CI gate (.github/workflows/fast-video-validation.yml): fast-video PSNR ≥ 25 dB — currently fails by design; passes if/when an upstream Chromium fix lands, serving as the regression gate.
  • 20-comp benchmark (macOS GPU + Docker/SwiftShader): all 20 CI compositions rendered under both paths, PSNR and wall-clock compared. Floor PSNR 49 dB (lossy comps); 12 comps at ∞. Zero comps slower than baseline beyond noise. Results: HeygenVerse benchmark page.

🤖 Generated with Claude Code

Copy link
Copy Markdown
Collaborator Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@vanceingalls vanceingalls force-pushed the drawelement-fast-capture branch from e8cec72 to d2ef1f2 Compare June 9, 2026 08:27

@miguel-heygen miguel-heygen left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid experimental feature. The fallback routing logic (SwiftShader detection, transparent + GPU vs Docker paths, supersampling guard, page-side compositing auto-disable) is well thought through, and the format-aware JPEG/PNG encode choice is the right call — I can see from the PR description that the PNG-vs-JPEG bug was caught by the e2e run, which is exactly why it matters. A few small gaps worth addressing before this graduates out of --experimental.

P2 — captureDrawElementFrame: split(",") is fragile

packages/engine/src/services/drawElementService.ts, the base64 extraction:

const base64 = dataUrl.split(",")[1];

Base64 can't contain commas, so this is practically safe for a toDataURL response — but the intent is to grab everything after the first comma, and split(",")[1] silently truncates if the payload ever has one. Prefer:

const commaIdx = dataUrl.indexOf(",");
if (commaIdx === -1) throw new Error("drawElement: toDataURL returned malformed data URL");
const base64 = dataUrl.slice(commaIdx + 1);

Worth fixing before graduation: it's one line and removes any ambiguity.

P3 — BeginFrame response discarded in drawelement branch

packages/engine/src/services/frameCapture.ts, captureFrameCore:

await client.send("HeadlessExperimental.beginFrame", { ... });
// response ignored

Discarding the response is intentional (we don't rely on BeginFrame for the screenshot here), but beginFrameHasDamageCount/beginFrameNoDamageCount won't increment for drawElement renders. Diagnostics and any tooling that inspects those counters post-render will silently under-report. At minimum log hasDamage so render telemetry stays accurate; ideally update the session counters so the existing diagnostic surface works.

P3 — injectDrawElementCanvas idempotency path uncovered

drawElementService.test.ts mocks page.evaluate but never exercises the early-return branch (document.getElementById("__hf_de_canvas") already exists). Add a mock test: call the function twice and assert page.evaluate was called only once.

P3 — Supersampling fallback untested

The initDrawElementOrTransparentBackground path for useDrawElement=true + deviceScaleFactor>1 (logs the warning and falls back to screenshot capture) has no unit coverage. Since the function is private, this would need a thin integration shim or an indirect test via initializeSession. Minor, but worth noting before promotion.


Cloud/Lambda follow-up is fine — the flag silently no-ops there and the description is clear about it.

→ Approve with comments. Fallback logic and encoding are correct; the gaps above are small and appropriate to address before the flag loses the --experimental prefix.

@james-russo-rames-d-jusso james-russo-rames-d-jusso left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed at d2ef1f2d. Big experimental capture-mode add (449/23, 10 files). Net: architecture is sound, opt-in plumbing is correct, the SwiftShader and page-side-compositing incompatibility cases are handled with two layers of defense (resolveConfig force-off + per-frame skip in prepareFrameForCapture), and the PSNR=∞ result vs captureScreenshot is strong evidence the visual output is byte-identical on the harnessed compositions. No blockers from my pass.

Where I have non-blocker concerns: coverage shape (CI's regression-shards run with the flag OFF, so drawelement is not exercised across the composition matrix), the Docker env-only path silently no-opping, and the <canvas layoutsubtree> HTML-in-canvas pattern (worth flagging for Miguel given his prior investigation, though this use case is render-output capture rather than interactive UI).

Concerns

  • Regression suite runs with fast-capture OFF. The PSNR=∞ result + e2e drawElement canvas injected log come from a css-spinner → mp4 run plus the SwiftShader T1/T2 harness. The CI's regression-shards matrix is great for verifying nothing breaks when the flag is off but doesn't exercise the new drawelement mode across the diverse compositions (sub-compositions, iframes, HDR, font-variation, raf-ball, etc.). The composition-root reparenting in injectDrawElementCanvas (parent.insertBefore(canvas, root); canvas.appendChild(root);) changes the root's layout-parent — compositions with position: fixed descendants, viewport-relative units referencing the prior ancestor, or that contain iframes (the iframe-render-compat style is the canonical case) could subtly diverge in a way the single-composition harness wouldn't catch. Is the plan to add a fast-capture variant of the regression matrix in a follow-up before promoting beyond EXPERIMENTAL, or is the PSNR=∞ on css-spinner considered representative? Either is defensible — just want to set expectations on coverage.

  • Docker render silently drops env-only opt-in. buildDockerRunArgs builds positional argv only — it does not forward host env vars to the container. The CLI plumbing at render.ts:610 uses args["experimental-fast-capture"] === true, which is false for an absent flag even when PRODUCER_EXPERIMENTAL_FAST_CAPTURE=true is set in the host shell. So an operator who sets only the env var and runs hyperframes render against the Docker path gets a normal screenshot render. The flag description ("Env: PRODUCER_EXPERIMENTAL_FAST_CAPTURE.") doesn't qualify that the env fallback is in-process only. This is consistent with the --low-memory-mode idiom the PR cites, so it may be intentional — but the symmetric --low-memory-mode description has the same gap. Two reasonable resolutions: (a) document that the env fallback is in-process only and Docker needs the CLI flag, or (b) read env in the Docker option-prep path and synthesize the flag when present. Either is fine; (a) is cheaper if the existing pattern is the intent.

  • <canvas layoutsubtree> is HTML-in-canvas. Surfacing for Miguel given his prior investigation on avoiding the pattern. This use case is render-output capture (no DOM events traverse the canvas, no a11y surface, no interactive UI nested inside), so the concerns from that thread probably don't apply directly — but worth a 👀 from him given the team bias against the pattern shape.

Nits

  • captureDrawElementFrame hard-codes the JPEG quality default at 80 (drawElementService.ts:106). The downstream caller is captureFrameCore which passes options.quality ?? 80. If options.quality is ever undefined from an alternate entry point, the default could diverge from pageScreenshotCapture's. Probably fine — the integration harness exercises the real call chain — but a one-line "tracks pageScreenshotCapture default" comment would pin the parity.

  • logInitPhase interpolates session.captureMode at the time of log (frameCapture.ts:997). Pre-helper phases log [initSession:beginframe] or [initSession:screenshot]; the new helper flips to drawelement, and post-helper phases log [initSession:drawelement]. The drawElement canvas injected log fires after the flip, which is the clear signal — but a future log-reader chasing a render issue may misread the prefix change mid-stream as a session restart. A one-line comment on logInitPhase calling out the mid-init flip would save 10 minutes of confused log-grepping later.

  • drawElementService.integration.test.ts is a describe.skip documenting what was validated locally. Discoverable but adds zero CI coverage and can drift silently. Two paths: (a) leave as a validation record (current shape, fine as a doc), (b) gate behind a RUN_BROWSER_INTEGRATION_TESTS=1 env and add a nightly workflow that flips it on. Your call — current shape is honest about what's validated, so this is preference territory.

  • render.ts:610 uses === true for the inline option pass; render.ts:408-412 uses != null for the env override. Asymmetry is correct (env preserves on silence, option forwards false on silence), and it tracks the --low-memory-mode idiom — but a one-liner on the option-pass line explaining why the two checks differ would help the next person threading a new flag.

Questions

  • Composition coverage of the harness. Was the validation harness exercised against sub-composition (sub-composition-video), iframe (iframe-render-compat), HDR (hdr-regression), or font-variation compositions, or only css-spinner? The composition-root reparenting is the place I'd want fidelity evidence on those styles before flipping the flag on for any production-ish surface.

  • Cloud / Lambda env state. The PR body says cloud/render.ts and lambda/render.ts "silently no-op" — but resolveConfig reads PRODUCER_EXPERIMENTAL_FAST_CAPTURE globally, so if those infra surfaces happen to have the env var set (via deploy config), fast capture does activate. Is the "silently no-op" claim about the CLI flag wiring specifically (env-via-deploy-config is a supported operator escape hatch until the follow-up PR lands), or should those surfaces actively gate the env off until then?

  • macOS path. On Mac, headlessShell is null → preMode resolves to "screenshot" regardless. The new helper then flips captureMode to "drawelement" if useDrawElement is on and no SwiftShader. Per-frame branch gates BeginFrame on beginFrameTimeTicks > 0, which stays 0 in the screenshot-launched path — so the compositor advances naturally and the comment at frameCapture.ts:1402-1404 matches the behavior. Has the validation harness exercised macOS GPU, or only the Docker SwiftShader cases + the GPU-host T3?

What I didn't verify

  • Windows render compat. Render on windows-latest + Tests on windows-latest are still in-flight at review time. drawelement doesn't gate on platform, so a Windows session with the flag set would land in the same screenshot-launched + drawelement-mode path as macOS. If Windows CI surfaces something, this'll need a look.

  • useDrawElement: true with forceScreenshot: true simultaneously. resolveConfig doesn't appear to reconcile these — forceScreenshot is honored at preMode resolution, useDrawElement is honored at init-time. A session could land in drawelement mode on a screenshot-launched browser. That's actually the design for macOS GPU, so it's not necessarily wrong — flagging as "haven't dug into the operator-confusion surface."

  • --enable-features=CanvasDrawElement interaction with multiple feature-flag groups. Verified the flag is added globally in buildChromeArgs:554 and is the only --enable-features= on the command line (the other flag is --disable-features=… which is a separate category). Chrome's behavior for multiple --enable-features= flags is version-dependent, but since there's only one, this is unambiguous.

  • The integration record's PSNR=∞ claims — taken at the integration test file's word.

Clean execution; leaving as a comment.

Review by Rames D Jusso

@vanceingalls vanceingalls force-pushed the drawelement-fast-capture branch from d2ef1f2 to 7599a75 Compare June 9, 2026 20:08
Comment on lines +23 to +49
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f # v3

- name: Build test Docker image (cached)
uses: docker/build-push-action@10e90e3645eae34f1e60eeb005ba3a3d33f178e8 # v6
with:
context: .
file: Dockerfile.test
load: true
tags: hyperframes-producer:test
cache-from: type=gha,scope=regression-test-image
cache-to: type=gha,mode=max,scope=regression-test-image

- name: Validate fast-capture video (drawElement + BeginFrame)
run: |
docker run --rm \
--security-opt seccomp=unconfined \
--shm-size=4g \
-e PRODUCER_VALIDATE_COMP='${{ inputs.composition }}' \
-e PRODUCER_VALIDATE_MIN_PSNR='${{ inputs.min_psnr }}' \
--workdir /app/packages/producer \
--entrypoint bunx \
hyperframes-producer:test tsx scripts/validate-fast-video.ts
vanceingalls and others added 5 commits June 9, 2026 13:24
…-fast-capture

Add an experimental frame-capture mode that reads DOM paint records directly
via Chrome's canvas.drawElementImage API instead of Page.captureScreenshot
(~46% faster on GPU), gated behind --experimental-fast-capture
(env PRODUCER_EXPERIMENTAL_FAST_CAPTURE; engine config useDrawElement).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Renders a video composition baseline-vs-fast on a native amd64 Linux runner and
asserts the fast (drawElementImage) output matches via PSNR — validating the
BeginFrame paint path captures video, which couldn't be checked locally
(macOS has no BeginFrame; Docker-on-rosetta hung).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reverted before merge; workflow_dispatch is the intended trigger.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…pported

Validated on a native amd64 Linux runner that per-frame BeginFrame does NOT make
drawElementImage capture video (fast-vs-baseline ~12 dB, region black) — same as
macOS. So video falls back to screenshot on every platform, not just macOS.
Make the validation workflow manual-only (it fails by design until fast video
is implemented).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vanceingalls vanceingalls force-pushed the drawelement-fast-capture branch from 0b5eb1f to 153602d Compare June 9, 2026 20:41
Adds an opaque GSAP composition rendered with the experimental fast-capture
path (drawElementImage) via a new renderConfig.experimentalFastCapture meta flag.
Golden is drawElement output, so the suite now guards the canvas-injection /
drawElement capture path on the Linux/Docker CI platform. Verified passing
(visual PSNR 33-76 dB, all checkpoints).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=1080, height=1080">
<script src="https://cdnjs.cloudflare.com/ajax/libs/gsap/3.12.2/gsap.min.js"></script>
vanceingalls and others added 2 commits June 9, 2026 15:26
…e opt-out

Fast capture on hosts without BeginFrame (macOS) previously drew from a stale
snapshot — drawElementImage reads the snapshot recorded at the last paint event,
so unsynchronized captures were one frame behind and intermittently crashed with
'InvalidStateError: No cached paint record'. Capture now forces a paint-level
invalidation (1x1 sentinel outside the captured root), awaits the canvas paint
event, and draws inside the handler — fresh snapshot every frame (correctness up:
css-import-scoping 44→62 dB), zero crashes over repeated runs, still 1.33x faster
than screenshot. Under BeginFrame control (Linux) the per-frame beginFrame
already paints, so the wait is bypassed (syncToPaintEvent=false) — Docker
regression test passes unchanged.

Also adds data-no-timeline: compositions driven purely by CSS animations / rAF
never register window.__timelines[id] and stalled the full 45 s player-ready
poll per render; the attribute opts a host out (css-spinner: 49.7s → 3.9s).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Probing established the real failure mode: drawElementImage drops
layer-promoted subtrees while their transforms animate (sub-composition
entrance/zoom scenes capture black until settled). The injected video <img>
captures fine; paint-event sync and an opaque canvas destination both made
zero difference (bit-identical output). Chromium-side gap, same family as
crbug 521434899 but reproducible on GPU. HF_FAST_CAPTURE_VIDEO=true bypasses
the gate for future probing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants