Skip to content

fix(hls): prime mp4 moov with a parsed audio packet on backward-seek muxer restart (#92)#94

Merged
superuser404notfound merged 2 commits into
superuser404notfound:mainfrom
thatcube:fix/vod-audio-eac3-wedge
Jul 1, 2026
Merged

fix(hls): prime mp4 moov with a parsed audio packet on backward-seek muxer restart (#92)#94
superuser404notfound merged 2 commits into
superuser404notfound:mainfrom
thatcube:fix/vod-audio-eac3-wedge

Conversation

@thatcube

@thatcube thatcube commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Summary

Follow-up to #92. Fixes an E-AC-3/AC-3 (and TrueHD) moov-before-audio-parsed wedge that can strand the fragmented-mp4 muxer in a permanent "forever-loading" state after a mid-file backward seek that lands out of the segment cache.

Root cause

MP4SegmentMuxer runs one long-lived mp4 AVFormatContext with +empty_moov+default_base_moof+frag_custom+delay_moov. Under +delay_moov, moov is written lazily on the first av_write_frame(ctx, nil).

FFmpeg's mp4 muxer builds the AC-3/E-AC-3/TrueHD sample-entry box (dac3/dec3/dmlp) from a parsed audio packet — it cannot emit moov for these codecs from codecpar alone (see movenc.c mov_write_dec3_tag / handle_eac3). So if the first moov flush fires video-only, mov_write_moov returns -22:

Cannot write moov atom before EAC3 packets parsed.

The cut then fails, the muxer wedges, and the segment is retried forever -> AVPlayer 503 -> forever-loading spinner.

On a backward seek out of cache, the producer rebuilds a fresh muxer at the restart segment. If that muxer's first moov flush -- either the #64 RAM-cap interim flushPendingFragment() or the first segment cut -- fires before any audio packet is written, it hits this. AAC never triggers it (its sample entry needs no parsed packet).

What changed

  • Latch audioPacketWritten once the first audio packet is written.
  • Latch audioNeedsParsedPacketForMoov at init from the audio codec_id (AC-3/E-AC-3/TrueHD) — the only codecs whose sample entry needs a parsed packet.
  • Gate the #64 RAM-cap interim flushPendingFragment() on those latches, so AAC (and every other codec) keeps its full RAM-cap bound and the stock code path.
  • In the video-leads-audio case, for those codecs only, proactively flush on the first audio packet so moov is emitted with a parsed audio packet present.

The first segment cut is intentionally left unguarded: if audio never arrives, moov must still be attempted (fail-as-before) rather than deferred forever -- guarding the cut too would convert the wedge into a guaranteed permanent stall. Audio routing/placement is untouched (no dropout regression).

Reproducibility (honest caveat)

This is hard to reproduce on demand and I could not capture a live before/after. The original triggering file's identity was lost (temp log GC'd). On the unpatched build, the closest-duration-match E-AC-3/HEVC file played through 93 producer restarts / 50 out-of-cache backward seeks with zero wedges, because in the normal seek path audio reaches the interleaver 4-30 ms after each restart and beats the first cut. The wedge needs the narrow case where a full segment of video (coarse interleave) or an audio-absent region precedes the first audio packet at the restart point.

So I am offering this as a low-risk guard against a real, mechanism-confirmed crash class -- it can only reduce the wedge surface and does not alter the healthy path. Happy to adjust or hold if you would prefer an issue-first discussion.

Test plan

  • Device / OS: Apple TV 4K (3rd generation, AppleTV14,1), tvOS 27.0 (build 24J5305f)
  • Source media: HEVC video + E-AC-3 (Dolby Digital Plus) audio, delivered through the host app's in-process loopback HLS (AetherEngine on tvOS)
  • Result: On the patched build, repeated out-of-cache backward seeks across ~50 producer restarts play through with no muxer wedge / forever-loading spinner and no audio dropout; AAC playback is unchanged (stock path). Host swift build and the tvOS device build both pass. See the Reproducibility caveat above — the wedge could not be reproduced on demand even on the unpatched build, so this is a mechanism-confirmed guard rather than a captured before/after.

Verification detail:

  • Mechanism independently verified against FFmpeg movenc.c (the dec3/dac3 refusal path) and against the vendored Libavformat.xcframework binary strings.
  • Confirmed a fresh MP4SegmentMuxer is allocated per restart (the latches reset correctly), the change is single-threaded on the producer's serial pump queue, and audio routing/placement is untouched.

Checklist

  • CHANGELOG.md updated
  • Commit messages follow Conventional Commits (fix(hls):, docs(changelog):)
  • The fix lives in the engine, not in a host-side workaround
  • No public API changes (the two new muxer latches are internal/private)

…muxer restart (superuser404notfound#92)

Under +delay_moov the mp4 muxer writes moov lazily on the first
av_write_frame(ctx, nil). For AC-3/E-AC-3/TrueHD the audio sample entry
(dac3/dec3/dmlp) can only be built from a PARSED audio packet, so a first
moov flush that fires video-only errors -22 "Cannot write moov atom
before EAC3 packets parsed", the segment cut fails, the muxer wedges, and
the segment is retried forever (AVPlayer 503 -> forever-loading spinner).

On a mid-file backward seek that lands out of the segment cache the
producer tears down and rebuilds a fresh muxer at the restart segment; if
that muxer's first moov flush (a superuser404notfound#64 RAM-cap interim flush, or the first
segment cut) fires before any audio packet is written, it hits this.
AAC never triggers it (its sample entry needs no parsed packet).

Fix:
- Latch audioPacketWritten once the first audio packet is written.
- Latch audioNeedsParsedPacketForMoov at init from the audio codec_id
  (AC-3/E-AC-3/TrueHD) — the only codecs whose sample entry needs a
  parsed packet.
- Scope BOTH new behaviors to that latch: the superuser404notfound#64 RAM-cap interim
  flushPendingFragment() guard AND the video-leads-audio proactive
  flush. For those codecs, proactively flush on the first audio packet
  so moov is emitted with a parsed audio packet present.

Because both arms are codec-scoped, AAC (and every other codec) keeps the
exact stock code path: no early proactive flush and the full RAM-cap
bound on flushPendingFragment. The first segment cut is intentionally
left unguarded: if audio never arrives, moov must still be attempted
(fail-as-before) rather than being deferred forever, which would convert
the wedge into a permanent stall. Audio routing/placement is untouched,
so no audio-dropout regression.
@thatcube thatcube force-pushed the fix/vod-audio-eac3-wedge branch from 5eb31fb to 218073f Compare July 1, 2026 03:31
@superuser404notfound superuser404notfound merged commit 8e4ed87 into superuser404notfound:main Jul 1, 2026
@superuser404notfound

Copy link
Copy Markdown
Owner

Merged, thank you. This is exactly the kind of engine-side, mechanism-confirmed fix worth taking: codec-scoped so AAC keeps the stock path, and it only narrows the wedge surface without perturbing the healthy path. The trace against movenc's dec3/dac3 refusal plus the vendored Libavformat strings was thorough, and the honest note about the missing before/after repro is appreciated, not a blocker.

One bit of context: the PR referenced #92, but #92 is the transient DV-P8 visual glitch, a separate failure mode; this moov wedge stands on its own, so nothing to change on your end. Thanks again for the careful work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants