feat(core,engine): data-duck — compile-time audio ducking under voice tracks by mvanhorn · Pull Request #1337 · heygen-com/hyperframes

mvanhorn · 2026-06-11T03:43:30Z

What

Declarative audio ducking: a music track marked data-duck automatically lowers itself whenever any data-role="voice" track is audible, with smooth fade ramps.

<audio id="music" src="music.mp3" data-duck="-12dB" data-start="0" data-duration="30"></audio>
<audio id="vo" src="voiceover.mp3" data-role="voice" data-start="2" data-duration="6"></audio>

data-duck accepts dB ("-12dB") or a linear multiplier; data-duck-fade sets the ramp (default 0.3s). Rendered evidence (same composition with and without data-duck):

The difference row is the ducking, isolated: digital silence outside the voice windows (the two mixes are bit-identical there) and exactly the removed music energy inside them.

Why

Voiceover-over-music is the default shape of narrated video, and getting the balance right currently means hand-authoring volume keyframes against every voice clip's start and end, then re-timing them whenever narration shifts (#1176 was a user doing exactly this with GSAP volume tweens). Consumer editors auto-duck; neither HyperFrames nor Remotion offers it declaratively.

The timeline knows every overlap window at compile time, so ducking doesn't need sidechain analysis. It compiles to volume keyframes and rides the sample-accurate volume automation the engine already has (#1117) — deterministic, and identical in preview and render.

How

packages/core/src/compiler/timingCompiler.ts: compileAudioDucking reads data-duck / data-duck-fade / data-role, computes voice-overlap intervals from clip timing (merging gaps shorter than 2x the fade so the music doesn't pump between close voice clips), and writes the resulting keyframes to an internal data-hf-duck-keyframes attribute. Recompilation-safe: the attribute is regenerated, never accumulated.
packages/engine/src/services/audioElementParser.ts (extracted from audioMixer): parses the duck keyframes into each track's volumeKeyframes, multiplying with any authored data-volume-keyframes envelope rather than replacing it.
packages/core/src/runtime/mediaVolumeEnvelope.ts: preview applies the same multiplied envelope, so what you hear in preview is what renders.
probeStage.ts: runtime-probed automation (GSAP volume tweens) multiplies with duck keyframes instead of overwriting them.
Docs: data-duck / data-duck-fade / data-role rows in the html-schema reference plus a concepts section.

Out of scope by design: runtime-triggered audio (clips whose timing isn't declarative) — documented as a limitation.

Test plan

The waveform evidence above is real render output from this branch. Sample-subtracting the two renders: bit-identical outside voice windows (diff RMS −270dB), and inside them the measured delta is −37.7dB against a theoretical −37.6dB for a −12dB duck of this music bed — the envelope lands within 0.1dB of spec.

Unit tests added/updated (dB and linear gain parsing, overlap merging across fade gaps, ramp clamping at clip bounds, keyframe multiplication with authored envelopes, compiler integration, mixer envelope application)
Manual testing performed (real duck/no-duck renders above, macOS say voiceover over a music bed)
Documentation updated (html-schema reference, data-attributes concepts, docs nav)

… tracks

…rix)

miguel-heygen · 2026-06-12T19:21:22Z

Thanks for the work here — the compile-time ducking approach is well thought out, and the rendered evidence makes the output quality obvious.

We're closing this because it extends the authoring format with three new HTML data attributes (data-duck, data-duck-fade, data-role). The project has a hard rule against adding new attributes to the HTML schema: the authoring surface is intentionally stable and expanding it has downstream costs for every parser, validator, and tool that reads HyperFrames compositions.

The feature itself isn't the problem — auto-ducking music under voiceover is a real gap. The right surface is a programmatic/JS API (e.g. a duck(musicTrack, voiceTracks, options) helper that writes volume keyframes directly) rather than new declarative HTML attributes. That approach delivers the same deterministic compile-time keyframes without touching the authoring schema.

If you'd like to re-submit along those lines, happy to review it.

mvanhorn added 2 commits June 10, 2026 20:28

feat(core,engine): data-duck — compile-time audio ducking under voice…

2cd43a3

… tracks

chore: retrigger CI (shard-7 Docker Hub pull timeout canceled the mat…

375e298

…rix)

miguel-heygen closed this Jun 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core,engine): data-duck — compile-time audio ducking under voice tracks#1337

feat(core,engine): data-duck — compile-time audio ducking under voice tracks#1337
mvanhorn wants to merge 2 commits into
heygen-com:mainfrom
mvanhorn:fix/hyperframes-audio-autoduck

mvanhorn commented Jun 11, 2026

Uh oh!

miguel-heygen commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mvanhorn commented Jun 11, 2026

What

Why

How

Test plan

Uh oh!

miguel-heygen commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants