Skip to content

fix(ci): pin chrome-headless-shell + clamp PSNR checkpoint to a valid frame#926

Open
jrusso1020 wants to merge 2 commits into
mainfrom
fix/ci-chrome-pin-and-psnr-harness
Open

fix(ci): pin chrome-headless-shell + clamp PSNR checkpoint to a valid frame#926
jrusso1020 wants to merge 2 commits into
mainfrom
fix/ci-chrome-pin-and-psnr-harness

Conversation

@jrusso1020
Copy link
Copy Markdown
Collaborator

What

Two narrow fixes to keep the regression suite green and reproducible. Stale baselines from the sub-comp refactor are being regenerated separately in #925; this PR is just the structural fixes that #925 can't make on its own.

  1. Pin chrome-headless-shell in Dockerfile.test to 148.0.7778.167 instead of @stable.
  2. Clamp the last PSNR checkpoint to a frame the video stream actually contains so the harness stops crashing on many-cuts.

Why

Chrome pin

@stable is a moving tag. Every Chrome stable promotion shifts pixel output enough to fail PSNR on the golden baselines, so the regression suite silently broke whenever Docker.test rebuilt against a freshly-promoted stable. Pinning to the version @stable currently resolves to (matching what main's regenerated baselines were captured under) makes Chrome bumps an explicit, batched-with-baseline-regen action. The comment on the RUN line spells out the bump procedure.

PSNR-parse crash on many-cuts

runTestSuite samples 100 checkpoints across min(rendered, snapshot) container duration. Container duration includes audio padding past the last video frame — many-cuts is 5.654s container vs 5.6s of video at 30fps = 168 frames. At i=99 the raw container duration mapped to time 5.59746s → frame index 168 (round(5.59746 × 30)), one past the last frame the stream contains. ffmpeg's psnr filter emits no average: line for a non-existent frame, so the harness crashed with Unable to parse PSNR output at 5.59746s — pre-existing on plain origin/main (#918 admin-merged through this same failure on shard-2). Miguel's regen via --update doesn't catch it because --update only writes the snapshot; it doesn't validate.

Subtracting one frame interval from the sampling duration guarantees the last checkpoint always lands on a real frame.

How

  • Dockerfile.test: chrome-headless-shell@stablechrome-headless-shell@148.0.7778.167 (+ a comment documenting the bump procedure).
  • packages/producer/src/regression-harness.ts: introduce sampleDuration = max(0, videoDuration - 1/fps) and use it in place of videoDuration when computing the per-checkpoint time. Also reuses the already-resolved fps variable inside the loop (was being recomputed via fpsToNumber(...) on every call to psnrAtCheckpoint).

Test plan

Local Docker reproduction:

bun run --cwd packages/producer docker:build:test
bun run --cwd packages/producer docker:test many-cuts                    # ✅ green
bun run --cwd packages/producer docker:test style-3-prod style-5-prod \
                                            sub-composition-video        # ✅ green

Coordination with #925

#925 (Miguel) regenerates the style-1-prod and style-12-prod baselines that drifted after #918's compiler refactor. That's content-level regen; this PR is structural. They're independent and either can land first — the other will then merge cleanly. Closes #919 (which had both changes plus baselines that would conflict with #925).

… frame

Two narrow fixes to keep the regression suite green and reproducible.
Stale baselines from the sub-composition refactor (PR #918) are being
regenerated separately in PR #925; this PR is just the structural
fixes that PR can't make on its own.

1. **Pin `chrome-headless-shell` in `Dockerfile.test`** to
   `148.0.7778.167` instead of `@stable`. `@stable` is a moving tag;
   every Chrome stable promotion shifts pixel output enough to fail
   PSNR on the golden baselines, so the regression suite silently
   broke whenever Docker.test rebuilt against a freshly-promoted
   stable. Pinning to the version `@stable` currently resolves to
   (matching what main's regenerated baselines were captured under)
   makes Chrome bumps an explicit, batched-with-baseline-regen
   action. Comment on the `RUN` line spells out the bump procedure.

2. **Clamp the last PSNR checkpoint to a frame the video stream
   actually contains.** `runTestSuite` samples 100 checkpoints across
   `min(rendered, snapshot)` container duration. Container duration
   includes audio padding past the last video frame — many-cuts is
   5.654s container vs 5.6s of video at 30fps = 168 frames. At i=99
   the raw container duration mapped to time 5.59746s → frame index
   168 (round(5.59746 × 30)), one past the last frame the stream
   contains. ffmpeg's `psnr` filter emits no `average:` line for a
   non-existent frame, so the harness crashed with `Unable to parse
   PSNR output at 5.59746s` — pre-existing on plain `origin/main`,
   which PR #918 admin-merged through on shard-2. Miguel's regen via
   `--update` didn't catch it because `--update` only writes the
   snapshot; it doesn't validate. Subtracting one frame interval
   from the sampling duration guarantees the last checkpoint always
   lands on a real frame.

Verified locally inside `Dockerfile.test`:

  bun run --cwd packages/producer docker:build:test
  bun run --cwd packages/producer docker:test many-cuts   # ✅ green
  bun run --cwd packages/producer docker:test style-3-prod \
    style-5-prod sub-composition-video                    # ✅ green
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants