feat(core,engine): data-duck — compile-time audio ducking under voice tracks#1337
feat(core,engine): data-duck — compile-time audio ducking under voice tracks#1337mvanhorn wants to merge 2 commits into
Conversation
|
Thanks for the work here — the compile-time ducking approach is well thought out, and the rendered evidence makes the output quality obvious. We're closing this because it extends the authoring format with three new HTML data attributes ( The feature itself isn't the problem — auto-ducking music under voiceover is a real gap. The right surface is a programmatic/JS API (e.g. a If you'd like to re-submit along those lines, happy to review it. |
What
Declarative audio ducking: a music track marked
data-duckautomatically lowers itself whenever anydata-role="voice"track is audible, with smooth fade ramps.data-duckaccepts dB ("-12dB") or a linear multiplier;data-duck-fadesets the ramp (default 0.3s). Rendered evidence (same composition with and withoutdata-duck):The difference row is the ducking, isolated: digital silence outside the voice windows (the two mixes are bit-identical there) and exactly the removed music energy inside them.
Why
Voiceover-over-music is the default shape of narrated video, and getting the balance right currently means hand-authoring volume keyframes against every voice clip's start and end, then re-timing them whenever narration shifts (#1176 was a user doing exactly this with GSAP volume tweens). Consumer editors auto-duck; neither HyperFrames nor Remotion offers it declaratively.
The timeline knows every overlap window at compile time, so ducking doesn't need sidechain analysis. It compiles to volume keyframes and rides the sample-accurate volume automation the engine already has (#1117) — deterministic, and identical in preview and render.
How
packages/core/src/compiler/timingCompiler.ts:compileAudioDuckingreadsdata-duck/data-duck-fade/data-role, computes voice-overlap intervals from clip timing (merging gaps shorter than 2x the fade so the music doesn't pump between close voice clips), and writes the resulting keyframes to an internaldata-hf-duck-keyframesattribute. Recompilation-safe: the attribute is regenerated, never accumulated.packages/engine/src/services/audioElementParser.ts(extracted from audioMixer): parses the duck keyframes into each track'svolumeKeyframes, multiplying with any authoreddata-volume-keyframesenvelope rather than replacing it.packages/core/src/runtime/mediaVolumeEnvelope.ts: preview applies the same multiplied envelope, so what you hear in preview is what renders.probeStage.ts: runtime-probed automation (GSAP volume tweens) multiplies with duck keyframes instead of overwriting them.data-duck/data-duck-fade/data-rolerows in the html-schema reference plus a concepts section.Out of scope by design: runtime-triggered audio (clips whose timing isn't declarative) — documented as a limitation.
Test plan
The waveform evidence above is real render output from this branch. Sample-subtracting the two renders: bit-identical outside voice windows (diff RMS −270dB), and inside them the measured delta is −37.7dB against a theoretical −37.6dB for a −12dB duck of this music bed — the envelope lands within 0.1dB of spec.
sayvoiceover over a music bed)