Skip to content

Commit 95d2a94

Browse files
fix(engine): sample-accurate volume automation so dense fades keep their audio (#1117)
Animated media volume (GSAP/JS fades) dropped the audio track entirely for dense fades. The 60 Hz timeline probe emits 100-300 keyframes for a multi-second fade, which were folded into an FFmpeg `volume` expression nesting one `if(lt(t,...))` per keyframe. Past ~95 nested levels (build-dependent, lower on some Linux ffmpeg builds) the expression overflows FFmpeg's evaluator, fails filter-graph init, fails the whole mix, and the muxer omits audio — so a `data-volume="0"` fade-in rendered with no audio at all (follow-up to #1066; this is why #1064's own scenario regressed once the fade was dense enough). Apply volume automation as sample-accurate gain, layered so audio is never lost: 1. Primary: bake the envelope into the prepared PCM samples in-process (audioVolumeEnvelope.ts). The track WAV is always pcm_s16le/48k/stereo; multiply its samples by the interpolated envelope and atomically rename the result into place, then mix at unity. No expression, no keyframe ceiling, exact at every sample, and the downstream ffmpeg amix/AAC encode is untouched so golden baselines only change where a fade is applied. The RIFF parser scans chunks order-independently and accepts only 16-bit PCM, falling back otherwise. The output is written to a random-named sibling and renamed, so a crash can't leave a truncated WAV and there's no predictable-path write. 2. Fallback: RDP-bounded ffmpeg `volume` expression (0.5% tolerance, capped at 32 segments) for the rare case a WAV is not 16-bit PCM. 0.5% keeps the rendered envelope within ~0.2 dB of the source curve. 3. Backstop: if an automated mix still fails, retry once at base volume and surface the degradation rather than dropping the track. This mirrors how OSS NLEs render automation (sample-level gain): MoviePy, Kdenlive/Shotcut (MLT), Remotion. Verified end-to-end: a 297-keyframe fade that rendered with no audio now bakes all 297 keyframes sample-accurately. Adds unit tests for sample-accurate gain, track-start offset, base/tail holds, thousands of keyframes, order-independent chunk parsing, and format rejection, plus mixer regression tests for bounded nesting and the base-volume backstop.
1 parent b1f9587 commit 95d2a94

4 files changed

Lines changed: 618 additions & 44 deletions

File tree

packages/engine/src/services/audioMixer.test.ts

Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,121 @@ describe("processCompositionAudio", () => {
108108
expect(filter).toContain("adelay=2000|2000");
109109
});
110110

111+
it("bounds expression nesting for dense keyframe automation without dropping the envelope", async () => {
112+
const baseDir = mkdtempSync(join(tmpdir(), "hf-audio-base-"));
113+
const workDir = mkdtempSync(join(tmpdir(), "hf-audio-work-"));
114+
tempDirs.push(baseDir, workDir);
115+
116+
writeFileSync(join(baseDir, "bgm.wav"), "stub");
117+
118+
// Mirrors the 60 Hz timeline probe: a 10s eased fade emits hundreds of
119+
// keyframes. The nested-if volume expression must not grow one level per
120+
// keyframe — past ~95 levels FFmpeg fails filter-graph init and the audio
121+
// track is dropped entirely (GH #1066 follow-up).
122+
const keyframes = Array.from({ length: 300 }, (_, i) => {
123+
const time = (i / 299) * 10;
124+
const volume =
125+
time < 3 ? 0.8 * (time / 3) ** 2 : time < 7 ? 0.8 : 0.8 * (1 - (time - 7) / 3) ** 2;
126+
return { time, volume };
127+
});
128+
129+
const result = await processCompositionAudio(
130+
[
131+
{
132+
id: "bgm",
133+
src: "bgm.wav",
134+
start: 0,
135+
end: 10,
136+
mediaStart: 0,
137+
layer: 0,
138+
volume: 0,
139+
volumeKeyframes: keyframes,
140+
type: "audio",
141+
},
142+
],
143+
baseDir,
144+
workDir,
145+
join(baseDir, "out.m4a"),
146+
10,
147+
);
148+
149+
expect(result.success).toBe(true);
150+
151+
const mixArgs = runFfmpegMock.mock.calls[1]?.[0];
152+
const filterIndex = mixArgs.indexOf("-filter_complex");
153+
const filter = mixArgs[filterIndex + 1];
154+
155+
// One nested `if(lt(...))` is emitted per segment; cap it well under the
156+
// FFmpeg evaluator's nesting limit (MAX_VOLUME_SEGMENTS = 32).
157+
const nestingDepth = (filter.match(/if\(lt\(t/g) ?? []).length;
158+
expect(nestingDepth).toBeGreaterThan(1);
159+
expect(nestingDepth).toBeLessThan(32);
160+
161+
// The simplified envelope still spans the clip: silent start, audible peak.
162+
expect(filter).toContain(":eval=frame");
163+
expect(filter).toMatch(/volume=if\(lt\(t\\,[0-9.]+\)\\,0\+/);
164+
});
165+
166+
it("falls back to a static-volume mix instead of dropping audio when the automated mix fails", async () => {
167+
const baseDir = mkdtempSync(join(tmpdir(), "hf-audio-base-"));
168+
const workDir = mkdtempSync(join(tmpdir(), "hf-audio-work-"));
169+
tempDirs.push(baseDir, workDir);
170+
171+
writeFileSync(join(baseDir, "bgm.wav"), "stub");
172+
173+
// Simulate an ffmpeg build that rejects the automation expression: the
174+
// first mix attempt fails, the static-volume retry succeeds. (prepare =
175+
// call 0, automated mix = call 1, fallback mix = call 2.)
176+
runFfmpegMock
177+
.mockImplementationOnce(async () => ({
178+
success: true,
179+
durationMs: 1,
180+
stderr: "",
181+
exitCode: 0,
182+
}))
183+
.mockImplementationOnce(async () => ({
184+
success: false,
185+
durationMs: 1,
186+
stderr: "Error initializing filters",
187+
exitCode: 234,
188+
}));
189+
190+
const result = await processCompositionAudio(
191+
[
192+
{
193+
id: "bgm",
194+
src: "bgm.wav",
195+
start: 0,
196+
end: 5,
197+
mediaStart: 0,
198+
layer: 0,
199+
volume: 0.8,
200+
volumeKeyframes: [
201+
{ time: 0, volume: 0.8 },
202+
{ time: 5, volume: 0 },
203+
],
204+
type: "audio",
205+
},
206+
],
207+
baseDir,
208+
workDir,
209+
join(baseDir, "out.m4a"),
210+
5,
211+
);
212+
213+
expect(result.success).toBe(true);
214+
expect(result.tracksProcessed).toBe(1);
215+
expect(runFfmpegMock).toHaveBeenCalledTimes(3);
216+
// Degradation is surfaced, not silent — the track rendered at base volume.
217+
expect(result.error).toMatch(/base volume/i);
218+
219+
// The fallback mix omits the automation expression (base volume only).
220+
const fallbackArgs = runFfmpegMock.mock.calls[2]?.[0];
221+
const fallbackFilter = fallbackArgs[fallbackArgs.indexOf("-filter_complex") + 1];
222+
expect(fallbackFilter).not.toContain(":eval=frame");
223+
expect(fallbackFilter).toContain("volume=0.8");
224+
});
225+
111226
it("prepares percent-encoded non-Latin audio srcs from decoded filesystem paths", async () => {
112227
const baseDir = mkdtempSync(join(tmpdir(), "hf-audio-base-"));
113228
const workDir = mkdtempSync(join(tmpdir(), "hf-audio-work-"));

packages/engine/src/services/audioMixer.ts

Lines changed: 162 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ import { runFfmpeg } from "../utils/runFfmpeg.js";
1414
import { unwrapTemplate } from "../utils/htmlTemplate.js";
1515
import { resolveProjectRelativeSrc } from "./videoFrameExtractor.js";
1616
import type { AudioElement, AudioTrack, MixResult } from "./audioMixer.types.js";
17+
import { applyVolumeEnvelopeToWav } from "./audioVolumeEnvelope.js";
1718

1819
export type { AudioElement, MixResult } from "./audioMixer.types.js";
1920

@@ -30,10 +31,89 @@ function escapeExpressionCommas(expression: string): string {
3031
return expression.replace(/\\/g, "\\\\").replace(/,/g, "\\,");
3132
}
3233

33-
function buildVolumeExpression(track: AudioTrack): string {
34+
/**
35+
* Upper bound on volume-automation keyframes folded into the FFmpeg `volume`
36+
* expression. The expression nests one `if(lt(...))` per keyframe, and
37+
* FFmpeg's expression evaluator has a finite nesting depth: past ~95 levels
38+
* (build-dependent — lower on some Linux ffmpeg builds) `volume=...:eval=frame`
39+
* fails filter-graph init, which fails the whole mix and drops the audio track
40+
* entirely. The 60 Hz timeline probe routinely emits 100–300 keyframes for a
41+
* multi-second fade (GH #1066 follow-up: a 171-keyframe GSAP fade rendered with
42+
* no audio). 32 segments keeps a wide safety margin and is far more resolution
43+
* than a piecewise-linear volume envelope needs.
44+
*/
45+
const MAX_VOLUME_SEGMENTS = 32;
46+
47+
/**
48+
* Volume delta below which a keyframe is collinear enough to drop. Kept tight
49+
* (0.5% linear) so the rendered piecewise-linear envelope tracks the GSAP curve
50+
* the browser plays in preview to within ~0.2 dB across the audible range — well
51+
* under the ~1 dB loudness JND, so render stays WYSIWYG with preview. A full
52+
* ease-in/ease-out fade still reduces to ~25 segments, inside MAX_VOLUME_SEGMENTS.
53+
*/
54+
const VOLUME_SIMPLIFY_EPSILON = 0.005;
55+
56+
/**
57+
* Reduce a sorted keyframe list to a perceptually-equivalent piecewise-linear
58+
* envelope with a bounded segment count.
59+
*
60+
* Ramer–Douglas–Peucker drops control points lying within
61+
* `VOLUME_SIMPLIFY_EPSILON` of the line through their neighbours (a linear fade
62+
* collapses to its two endpoints; an eased fade to a handful). A uniform
63+
* downsample backstop then bounds pathological inputs (e.g. audio-rate volume
64+
* oscillation) to `MAX_VOLUME_SEGMENTS`. Endpoints are always preserved so the
65+
* envelope still spans the full clip.
66+
*/
67+
function simplifyVolumeKeyframes(
68+
keyframes: { time: number; volume: number }[],
69+
): { time: number; volume: number }[] {
70+
if (keyframes.length < 3) return keyframes;
71+
72+
const keep = new Array<boolean>(keyframes.length).fill(false);
73+
keep[0] = true;
74+
keep[keyframes.length - 1] = true;
75+
const stack: [number, number][] = [[0, keyframes.length - 1]];
76+
while (stack.length > 0) {
77+
const [startIndex, endIndex] = stack.pop()!;
78+
const start = keyframes[startIndex]!;
79+
const end = keyframes[endIndex]!;
80+
const span = end.time - start.time;
81+
let maxDistance = VOLUME_SIMPLIFY_EPSILON;
82+
let splitIndex = -1;
83+
for (let i = startIndex + 1; i < endIndex; i += 1) {
84+
const point = keyframes[i]!;
85+
const interpolated =
86+
span === 0
87+
? start.volume
88+
: start.volume + ((end.volume - start.volume) * (point.time - start.time)) / span;
89+
const distance = Math.abs(point.volume - interpolated);
90+
if (distance > maxDistance) {
91+
maxDistance = distance;
92+
splitIndex = i;
93+
}
94+
}
95+
if (splitIndex !== -1) {
96+
keep[splitIndex] = true;
97+
stack.push([startIndex, splitIndex], [splitIndex, endIndex]);
98+
}
99+
}
100+
101+
const simplified = keyframes.filter((_, i) => keep[i]);
102+
if (simplified.length <= MAX_VOLUME_SEGMENTS) return simplified;
103+
104+
const step = (simplified.length - 1) / (MAX_VOLUME_SEGMENTS - 1);
105+
const sampled: { time: number; volume: number }[] = [];
106+
for (let i = 0; i < MAX_VOLUME_SEGMENTS; i += 1) {
107+
const point = simplified[Math.round(i * step)]!;
108+
if (sampled.length === 0 || point.time > sampled.at(-1)!.time) sampled.push(point);
109+
}
110+
return sampled;
111+
}
112+
113+
function buildVolumeExpression(track: AudioTrack, ignoreKeyframes = false): string {
34114
const trimDuration = track.end - track.start;
35115
const staticVolume = clampVolume(track.volume);
36-
const keyframes = (track.volumeKeyframes ?? [])
116+
const keyframes = (ignoreKeyframes ? [] : (track.volumeKeyframes ?? []))
37117
.filter((keyframe) => Number.isFinite(keyframe.time) && Number.isFinite(keyframe.volume))
38118
.map((keyframe) => ({
39119
time: Math.max(0, Math.min(trimDuration, keyframe.time - track.start)),
@@ -57,14 +137,19 @@ function buildVolumeExpression(track: AudioTrack): string {
57137
}
58138
}
59139

60-
if (deduped.length === 1) {
61-
return `volume=${formatFilterNumber(deduped[0]!.volume)}`;
140+
// Collapse the densely-sampled probe output to a bounded piecewise-linear
141+
// envelope. Without this, the nested-if expression below grows one level per
142+
// keyframe and overflows FFmpeg's expression evaluator (see MAX_VOLUME_SEGMENTS).
143+
const simplified = simplifyVolumeKeyframes(deduped);
144+
145+
if (simplified.length === 1) {
146+
return `volume=${formatFilterNumber(simplified[0]!.volume)}`;
62147
}
63148

64-
let expression = formatFilterNumber(deduped.at(-1)!.volume);
65-
for (let i = deduped.length - 2; i >= 0; i -= 1) {
66-
const current = deduped[i]!;
67-
const next = deduped[i + 1]!;
149+
let expression = formatFilterNumber(simplified.at(-1)!.volume);
150+
for (let i = simplified.length - 2; i >= 0; i -= 1) {
151+
const current = simplified[i]!;
152+
const next = simplified[i + 1]!;
68153
const currentTime = formatFilterNumber(current.time);
69154
const nextTime = formatFilterNumber(next.time);
70155
const currentVolume = formatFilterNumber(current.volume);
@@ -299,42 +384,58 @@ async function mixAudioTracks(
299384
const outputDir = dirname(outputPath);
300385
if (!existsSync(outputDir)) mkdirSync(outputDir, { recursive: true });
301386

302-
const inputs: string[] = [];
303-
const filterParts: string[] = [];
304-
305-
tracks.forEach((track, i) => {
306-
inputs.push("-i", track.srcPath);
307-
const delayMs = Math.round(track.start * 1000);
308-
const trimDuration = track.end - track.start;
309-
const volumeFilter = buildVolumeExpression(track);
310-
filterParts.push(
311-
`[${i}:a]atrim=0:${trimDuration},${volumeFilter},adelay=${delayMs}|${delayMs},apad=whole_dur=${totalDuration}[a${i}]`,
312-
);
313-
});
314-
315-
const mixInputs = tracks.map((_, i) => `[a${i}]`).join("");
316-
const weights = tracks.map(() => "1").join(" ");
317-
const mixFilter = `${mixInputs}amix=inputs=${tracks.length}:duration=longest:dropout_transition=0:normalize=0:weights='${weights}'[mixed]`;
318-
const postMixGainFilter = `[mixed]volume=${masterOutputGain}[out]`;
319-
const fullFilter = [...filterParts, mixFilter, postMixGainFilter].join(";");
387+
const buildArgs = (ignoreAutomation: boolean): string[] => {
388+
const inputs: string[] = [];
389+
const filterParts: string[] = [];
390+
tracks.forEach((track, i) => {
391+
inputs.push("-i", track.srcPath);
392+
const delayMs = Math.round(track.start * 1000);
393+
const trimDuration = track.end - track.start;
394+
const volumeFilter = buildVolumeExpression(track, ignoreAutomation);
395+
filterParts.push(
396+
`[${i}:a]atrim=0:${trimDuration},${volumeFilter},adelay=${delayMs}|${delayMs},apad=whole_dur=${totalDuration}[a${i}]`,
397+
);
398+
});
320399

321-
const args = [
322-
...inputs,
323-
"-filter_complex",
324-
fullFilter,
325-
"-map",
326-
"[out]",
327-
"-acodec",
328-
"aac",
329-
"-b:a",
330-
"192k",
331-
"-t",
332-
String(totalDuration),
333-
"-y",
334-
outputPath,
335-
];
400+
const mixInputs = tracks.map((_, i) => `[a${i}]`).join("");
401+
const weights = tracks.map(() => "1").join(" ");
402+
const mixFilter = `${mixInputs}amix=inputs=${tracks.length}:duration=longest:dropout_transition=0:normalize=0:weights='${weights}'[mixed]`;
403+
const postMixGainFilter = `[mixed]volume=${masterOutputGain}[out]`;
404+
const fullFilter = [...filterParts, mixFilter, postMixGainFilter].join(";");
405+
406+
return [
407+
...inputs,
408+
"-filter_complex",
409+
fullFilter,
410+
"-map",
411+
"[out]",
412+
"-acodec",
413+
"aac",
414+
"-b:a",
415+
"192k",
416+
"-t",
417+
String(totalDuration),
418+
"-y",
419+
outputPath,
420+
];
421+
};
336422

337-
const result = await runFfmpeg(args, { signal, timeout: ffmpegProcessTimeout });
423+
let result = await runFfmpeg(buildArgs(false), { signal, timeout: ffmpegProcessTimeout });
424+
425+
// Defense in depth: volume automation is folded into an FFmpeg `volume`
426+
// expression whose evaluator limits are build-dependent (see
427+
// MAX_VOLUME_SEGMENTS). If that ever fails the mix, retry once without the
428+
// automation so the track renders at its base volume rather than being
429+
// dropped from the output entirely — a missing fade beats missing audio.
430+
let degradedAutomation = false;
431+
const hasAutomation = tracks.some((track) => (track.volumeKeyframes?.length ?? 0) > 0);
432+
if (!result.success && !signal?.aborted && hasAutomation) {
433+
const retry = await runFfmpeg(buildArgs(true), { signal, timeout: ffmpegProcessTimeout });
434+
if (retry.success) {
435+
result = retry;
436+
degradedAutomation = true;
437+
}
438+
}
338439

339440
if (signal?.aborted) {
340441
return {
@@ -360,6 +461,9 @@ async function mixAudioTracks(
360461
outputPath,
361462
durationMs: result.durationMs,
362463
tracksProcessed: tracks.length,
464+
error: degradedAutomation
465+
? "Volume automation exceeded this ffmpeg build's expression limits; rendered at base volume"
466+
: undefined,
363467
};
364468
}
365469

@@ -452,15 +556,29 @@ export async function processCompositionAudio(
452556
audioSrcPath = trimmedPath;
453557
}
454558

559+
// Primary volume-automation path: bake the envelope into the PCM samples
560+
// (sample-accurate, no keyframe ceiling). If the WAV isn't the expected
561+
// 16-bit PCM, fall back to the ffmpeg expression path by leaving the
562+
// keyframes on the track for buildVolumeExpression to handle.
563+
let bakedEnvelope = false;
564+
if (element.volumeKeyframes && element.volumeKeyframes.length > 0) {
565+
bakedEnvelope = applyVolumeEnvelopeToWav(
566+
audioSrcPath,
567+
element.volumeKeyframes,
568+
element.start,
569+
element.volume ?? 1.0,
570+
);
571+
}
455572
tracks.push({
456573
id: element.id,
457574
srcPath: audioSrcPath,
458575
start: element.start,
459576
end: element.end,
460577
mediaStart: element.mediaStart,
461578
duration: element.end - element.start,
462-
volume: element.volume ?? 1.0,
463-
volumeKeyframes: element.volumeKeyframes,
579+
// Gain is already in the samples when baked, so mix at unity.
580+
volume: bakedEnvelope ? 1.0 : (element.volume ?? 1.0),
581+
volumeKeyframes: bakedEnvelope ? undefined : element.volumeKeyframes,
464582
});
465583
} catch (err: unknown) {
466584
errors.push(`Error: ${element.id}${err instanceof Error ? err.message : String(err)}`);

0 commit comments

Comments
 (0)