Skip to content

Commit 3569c48

Browse files
authored
feat(producer): add Rio-style residual-RMS check to regression harness (#882)
* feat(producer): add Rio-style residual-RMS check to regression harness The existing audio comparison in the regression harness measures the Pearson correlation between RMS envelopes of the rendered and snapshot streams. That catches shape-level drift but is insensitive to level shifts, phase offsets, or codec-quantization noise — two streams can correlate >0.9 while differing audibly. Rio's approach (rio/tests/checksum.py:compare_audio_files_ffmpeg) is sample-level: subtract the snapshot from the rendered stream, run `astats`, read the residual Overall RMS in dBFS. Identical streams cancel to silence (-inf, or sub -90 dBFS for AAC-vs-AAC); anything >= -50 dBFS is considered drift. This commit adds the same check as an optional secondary gate: - utils/audioRegression.ts: new `computeAudioResidualRmsDb()` that spawns ffmpeg with the same filter graph Rio uses (aresample + pan + volume=-1 + amix + astats) and returns the parsed Overall RMS plus a pass/fail flag. - utils/audioRegression.test.ts: 3 new tests covering identical streams (-inf result), drifted streams (440Hz vs 880Hz sine), and missing-audio-stream input. - regression-harness.ts: optional `maxAudioResidualRmsDb` field in meta.json. Default is undefined (skip the check) so legacy fixtures aren't retroactively gated; new fixtures opt in by setting a threshold (e.g. -50). Harness emits `residualRmsDb` in the audio_comparison_complete JSON event and the pretty log line. The existing correlation check stays in place; the new residual check is independent. They measure complementary properties (shape vs sample-cancellation) and both should hold for a faithful render. * fix(producer): harden residual-RMS check (parser, duration guard, error surfacing) Addresses review feedback on PR #882: - Stateful astats parse: modern ffmpeg emits `Overall` on its own line followed by per-stat lines, so the single-line `Overall RMS level dB:` regex never fires on 6.x/7.x/8.x. Find the `Overall` header, take the next `RMS level dB:` line. Single-line fallback preserved for 4.x. - Pre-probe both inputs' audio durations and fail up-front if they differ by >5 ms — `amix=duration=shortest` was silently masking trailing audio differences. - Surface ffmpeg/ffprobe spawn errors, signal kills, and non-zero exits with a stderr tail. Previously every failure mode collapsed into "NaN, fail" with no diagnostic. - Extend `TestResult.audio` with `residualRmsDb` + `residualError`, propagate to `audio-failures.json`. - Fix `residualSuffix` formatter: NaN (real failure) was being rendered as "-inf dBFS" (perfect match). Split the branch on `Number.isNaN` separately from `Number.isFinite` and add an explicit error label.
1 parent cad9160 commit 3569c48

3 files changed

Lines changed: 476 additions & 5 deletions

File tree

packages/producer/src/regression-harness.ts

Lines changed: 93 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,11 @@ import { createRenderJob, executeRenderJob } from "./services/renderOrchestrator
1818
import { compileForRender } from "./services/htmlCompiler.js";
1919
import { validateCompilation } from "./services/compilationTester.js";
2020
import { extractMediaMetadata } from "./utils/ffprobe.js";
21-
import { buildRmsEnvelope, compareAudioEnvelopes } from "./utils/audioRegression.js";
21+
import {
22+
buildRmsEnvelope,
23+
compareAudioEnvelopes,
24+
computeAudioResidualRmsDb,
25+
} from "./utils/audioRegression.js";
2226
import { parseFps, fpsToNumber } from "@hyperframes/core";
2327
import {
2428
checkDistributedSupport,
@@ -38,6 +42,15 @@ type TestMetadata = {
3842
maxFrameFailures: number;
3943
minAudioCorrelation: number;
4044
maxAudioLagWindows: number;
45+
/**
46+
* Optional residual-RMS check. Subtracts the rendered audio from the
47+
* baseline and reads the residual Overall RMS via `astats`. A value
48+
* of `-50` treats residuals at-or-below -50 dBFS as effectively-
49+
* silent — i.e. the streams are sample-level equivalent. Omit
50+
* (undefined) to skip the check; fixtures authored before this field
51+
* was introduced have implicit `undefined`.
52+
*/
53+
maxAudioResidualRmsDb?: number;
4154
renderConfig: {
4255
/**
4356
* Frame rate. Stored on disk as a JSON number (integer fps, e.g. `30`)
@@ -140,6 +153,15 @@ type TestResult = {
140153
passed: boolean;
141154
correlation: number;
142155
lagWindows: number;
156+
/**
157+
* Residual Overall RMS (dBFS) of `rendered - snapshot`. Present only
158+
* when the fixture opts in via `meta.maxAudioResidualRmsDb`.
159+
* `Number.NEGATIVE_INFINITY` ⇒ perfect cancellation. `NaN` ⇒ residual
160+
* check could not run (missing ffmpeg, duration mismatch, ...); see
161+
* `audio.residualError` for the reason.
162+
*/
163+
residualRmsDb?: number;
164+
residualError?: string;
143165
};
144166
renderedOutputPath?: string;
145167
};
@@ -153,6 +175,28 @@ function logPretty(message: string, emoji = "•") {
153175
console.error(`${emoji} ${message}`);
154176
}
155177

178+
/**
179+
* Format the residual-RMS suffix used in the audio-quality log line.
180+
*
181+
* Three states must surface distinctly:
182+
* • `null` → fixture didn't opt into residual RMS → "" (no suffix)
183+
* • `NaN` → check ran but produced no parseable reading → "(error: ...)"
184+
* • `-Infinity` → perfect cancellation (identical streams) → "-inf dBFS"
185+
* • finite number → measured residual → "<value> dBFS"
186+
*
187+
* Pre-fix this branched on `Number.isFinite()` only, collapsing NaN
188+
* (a real-failure signal) into the `-inf` label (a perfect-match signal).
189+
*/
190+
function formatResidualSuffix(residualRmsDb: number | null, error: string | undefined): string {
191+
if (residualRmsDb === null && !error) return "";
192+
if (error) return `, residualRMS: error (${error})`;
193+
if (residualRmsDb === null || Number.isNaN(residualRmsDb)) {
194+
return ", residualRMS: error (no parseable reading)";
195+
}
196+
if (!Number.isFinite(residualRmsDb)) return ", residualRMS: -inf dBFS";
197+
return `, residualRMS: ${residualRmsDb.toFixed(2)} dBFS`;
198+
}
199+
156200
function parseArgs(argv: string[]): CliOptions {
157201
const testNames: string[] = [];
158202
const excludeTags: string[] = [];
@@ -229,6 +273,12 @@ function validateMetadata(meta: unknown): TestMetadata {
229273
if (typeof m.maxAudioLagWindows !== "number" || m.maxAudioLagWindows < 1) {
230274
throw new Error("meta.json: 'maxAudioLagWindows' must be >= 1");
231275
}
276+
if (
277+
m.maxAudioResidualRmsDb !== undefined &&
278+
(typeof m.maxAudioResidualRmsDb !== "number" || !Number.isFinite(m.maxAudioResidualRmsDb))
279+
) {
280+
throw new Error("meta.json: 'maxAudioResidualRmsDb' must be a finite number when present");
281+
}
232282
if (!m.renderConfig || typeof m.renderConfig !== "object") {
233283
throw new Error("meta.json: 'renderConfig' must be an object");
234284
}
@@ -671,16 +721,29 @@ function saveFailureDetails(
671721

672722
// Save audio failures
673723
if (result.audio && !result.audio.passed) {
724+
const residualRmsDb = result.audio.residualRmsDb;
725+
const residualError = result.audio.residualError;
726+
const residualThreshold = suite.meta.maxAudioResidualRmsDb;
727+
const residualExceeds =
728+
residualThreshold !== undefined &&
729+
typeof residualRmsDb === "number" &&
730+
Number.isFinite(residualRmsDb) &&
731+
residualRmsDb > residualThreshold;
674732
const audioReport = {
675733
summary: {
676734
correlation: result.audio.correlation,
677735
lagWindows: result.audio.lagWindows,
678736
threshold: suite.meta.minAudioCorrelation,
679737
maxLagWindows: suite.meta.maxAudioLagWindows,
738+
...(residualRmsDb !== undefined ? { residualRmsDb } : {}),
739+
...(residualThreshold !== undefined ? { residualThreshold } : {}),
740+
...(residualError ? { residualError } : {}),
680741
},
681742
analysis: {
682743
correlationBelowThreshold: result.audio.correlation < suite.meta.minAudioCorrelation,
683744
lagExceedsLimit: Math.abs(result.audio.lagWindows) > suite.meta.maxAudioLagWindows,
745+
residualExceedsThreshold: residualExceeds,
746+
residualCheckFailed: residualError !== undefined,
684747
},
685748
};
686749

@@ -1051,6 +1114,8 @@ async function runTestSuite(
10511114
let audioPassed = true;
10521115
let audioCorrelation = 1;
10531116
let audioLagWindows = 0;
1117+
let audioResidualRmsDb: number | null = null;
1118+
let audioResidualError: string | undefined;
10541119

10551120
if (!isPngSequence) {
10561121
logPretty("Comparing audio quality...", "🔊");
@@ -1068,13 +1133,35 @@ async function runTestSuite(
10681133
audioCorrelation = audio.correlation;
10691134
audioLagWindows = audio.lagWindows;
10701135
audioPassed = audio.correlation >= suite.meta.minAudioCorrelation;
1136+
1137+
// Sample-level residual-RMS check (complementary to the
1138+
// envelope-correlation gate above). Only runs when the fixture
1139+
// opts in via `maxAudioResidualRmsDb`; the correlation gate
1140+
// stays in place either way for legacy fixtures. Correlation
1141+
// measures shape similarity at envelope granularity; residual
1142+
// RMS measures sample-level cancellation — both surface
1143+
// different drift classes.
1144+
if (suite.meta.maxAudioResidualRmsDb !== undefined) {
1145+
const residual = computeAudioResidualRmsDb(
1146+
renderedOutputPath,
1147+
snapshotVideoPath,
1148+
suite.meta.maxAudioResidualRmsDb,
1149+
);
1150+
audioResidualRmsDb = residual.overallDb;
1151+
audioResidualError = residual.error;
1152+
if (!residual.ok) {
1153+
audioPassed = false;
1154+
}
1155+
}
10711156
}
10721157
}
10731158

10741159
result.audio = {
10751160
passed: audioPassed,
10761161
correlation: audioCorrelation,
10771162
lagWindows: audioLagWindows,
1163+
...(audioResidualRmsDb !== null ? { residualRmsDb: audioResidualRmsDb } : {}),
1164+
...(audioResidualError ? { residualError: audioResidualError } : {}),
10781165
};
10791166

10801167
console.log(
@@ -1084,17 +1171,20 @@ async function runTestSuite(
10841171
passed: audioPassed,
10851172
correlation: audioCorrelation,
10861173
lagWindows: audioLagWindows,
1174+
residualRmsDb: audioResidualRmsDb,
1175+
residualError: audioResidualError,
10871176
}),
10881177
);
10891178

1179+
const residualSuffix = formatResidualSuffix(audioResidualRmsDb, audioResidualError);
10901180
if (audioPassed) {
10911181
logPretty(
1092-
`Audio quality: PASSED (correlation: ${audioCorrelation.toFixed(3)}, lag: ${audioLagWindows})`,
1182+
`Audio quality: PASSED (correlation: ${audioCorrelation.toFixed(3)}, lag: ${audioLagWindows}${residualSuffix})`,
10931183
"✓",
10941184
);
10951185
} else {
10961186
logPretty(
1097-
`Audio quality: FAILED (correlation: ${audioCorrelation.toFixed(3)}, threshold: ${suite.meta.minAudioCorrelation})`,
1187+
`Audio quality: FAILED (correlation: ${audioCorrelation.toFixed(3)}, threshold: ${suite.meta.minAudioCorrelation}${residualSuffix})`,
10981188
"✗",
10991189
);
11001190
}

packages/producer/src/utils/audioRegression.test.ts

Lines changed: 90 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,13 @@
1-
import { describe, expect, it } from "vitest";
2-
import { buildRmsEnvelope, compareAudioEnvelopes } from "./audioRegression.js";
1+
import { spawnSync } from "node:child_process";
2+
import { mkdtempSync, rmSync } from "node:fs";
3+
import { tmpdir } from "node:os";
4+
import { join } from "node:path";
5+
import { afterAll, beforeAll, describe, expect, it } from "vitest";
6+
import {
7+
buildRmsEnvelope,
8+
compareAudioEnvelopes,
9+
computeAudioResidualRmsDb,
10+
} from "./audioRegression.js";
311

412
describe("compareAudioEnvelopes", () => {
513
it("treats silent-vs-silent audio as a perfect match", () => {
@@ -14,3 +22,83 @@ describe("compareAudioEnvelopes", () => {
1422
});
1523
});
1624
});
25+
26+
// Skip the spawn-based tests entirely on hosts without ffmpeg. The
27+
// regression harness only runs in environments where ffmpeg is present
28+
// (`Dockerfile.test`, dev boxes with apt's ffmpeg), so an absent ffmpeg
29+
// is a developer-laptop fact, not a producer regression.
30+
const HAS_FFMPEG = spawnSync("ffmpeg", ["-version"], { encoding: "utf-8" }).status === 0;
31+
32+
describe.skipIf(!HAS_FFMPEG)("computeAudioResidualRmsDb", () => {
33+
let tmp: string;
34+
35+
beforeAll(() => {
36+
tmp = mkdtempSync(join(tmpdir(), "hf-audio-residual-test-"));
37+
// Two test wavs: identical 1-second 440 Hz sine, and a 880 Hz sine
38+
// that's audibly different from the 440 reference.
39+
for (const [name, freq] of [
40+
["sine-440-a.wav", 440],
41+
["sine-440-b.wav", 440],
42+
["sine-880.wav", 880],
43+
] as const) {
44+
const result = spawnSync(
45+
"ffmpeg",
46+
[
47+
"-nostdin",
48+
"-v",
49+
"error",
50+
"-f",
51+
"lavfi",
52+
"-i",
53+
`sine=frequency=${freq}:duration=1:sample_rate=48000`,
54+
"-ac",
55+
"2",
56+
"-c:a",
57+
"pcm_s16le",
58+
join(tmp, name),
59+
],
60+
{ encoding: "utf-8" },
61+
);
62+
if (result.status !== 0) {
63+
throw new Error(`ffmpeg setup failed for ${name}: ${result.stderr}`);
64+
}
65+
}
66+
});
67+
68+
afterAll(() => {
69+
rmSync(tmp, { recursive: true, force: true });
70+
});
71+
72+
it("returns -inf (or very low dBFS) for two identical streams", () => {
73+
const result = computeAudioResidualRmsDb(
74+
join(tmp, "sine-440-a.wav"),
75+
join(tmp, "sine-440-b.wav"),
76+
);
77+
expect(result.ok).toBe(true);
78+
// 440-vs-440 PCM cancels to silence; ffmpeg reports -inf which we
79+
// normalize to NEGATIVE_INFINITY, OR a value well below -90 if the
80+
// resampler introduces sub-bit-quantization noise.
81+
expect(result.overallDb).toBeLessThan(-80);
82+
});
83+
84+
it("fails when streams are audibly different (440 Hz vs 880 Hz)", () => {
85+
const result = computeAudioResidualRmsDb(
86+
join(tmp, "sine-440-a.wav"),
87+
join(tmp, "sine-880.wav"),
88+
);
89+
expect(result.ok).toBe(false);
90+
// The residual of two uncorrelated unit-amplitude sines is roughly
91+
// the sum of both signals at near-full level — typically around
92+
// -3 dBFS in this resampled-stereo configuration.
93+
expect(result.overallDb).toBeGreaterThan(-30);
94+
});
95+
96+
it("reports ok=false when an input has no audio stream", () => {
97+
// A bare empty file: ffmpeg can't probe it, so the function reports
98+
// a parse failure (ok=false, NaN). Callers decide whether to treat
99+
// that as a pass (no-audio fixture) or a fail (audio expected).
100+
const result = computeAudioResidualRmsDb("/dev/null", join(tmp, "sine-440-a.wav"));
101+
expect(result.ok).toBe(false);
102+
expect(Number.isNaN(result.overallDb)).toBe(true);
103+
});
104+
});

0 commit comments

Comments
 (0)