Skip to content

Commit 55bb619

Browse files
fix(audio): stamp SW-decode buffers from a gapless sample clock (#89)
The software-path AudioDecoder stamped each CMSampleBuffer's presentationTimeStamp with the frame's container-quantized PTS. Container timebases are coarse (1 ms in MKV), so when a frame's duration is not an integer number of ticks the consecutive buffers no longer abut: a 1536-sample AC-3 frame is 34.83 ms at 44.1 kHz (vs exactly 32 ms at 48 kHz), so the integer-ms PTS step (34, 35, 35, 36 ...) leaves a sub-millisecond gap/overlap at every boundary and AVSampleBufferAudioRenderer reconciles a discontinuity at each frame (~29 Hz, a continuous crackle). 48 kHz AC-3 was silent on the same path; any non-integer-ms frame duration (48 kHz AAC/FLAC too) was affected. Derive each buffer's PTS from a running sample count anchored to the first frame so consecutive buffers abut to the sample. A real source discontinuity (> 100 ms off the predicted clock: seek/edit) re-anchors so genuine gaps are not papered over; flush() drops the anchor; the clock advances only on a successfully emitted buffer so a dropped buffer injects no phantom samples. Extracted as AudioClockAnchor, a pure mutating struct mirroring OutputTimestampSanitizer, with 7 unit tests reproducing the 44.1 kHz AC-3 crackle and covering jitter absorption, seek re-anchor, flush, the clean 48 kHz case, and dropped-buffer clock integrity. Full suite green (XCTest 208, Swift Testing 265). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XZTEfmztPE8hAdjHdBr9BH
1 parent e1cf6c2 commit 55bb619

4 files changed

Lines changed: 220 additions & 1 deletion

File tree

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
import Foundation
2+
import CoreMedia
3+
4+
/// Gapless presentation-clock for the software AudioDecoder output. Container per-packet PTS are
5+
/// quantized to the container timebase (1 ms for MKV). When the decoded frame duration is not an
6+
/// integer number of those ticks -- e.g. a 1536-sample AC-3 frame at 44.1 kHz is 34.83 ms, not an
7+
/// integer ms -- stamping every CMSampleBuffer with its own quantized PTS leaves a +/-0.5 ms
8+
/// gap/overlap between consecutive buffers, and AVSampleBufferAudioRenderer reconciles a
9+
/// discontinuity at each boundary (~29 audible clicks/sec, a continuous crackle). 48 kHz AC-3
10+
/// (1536 samples = exactly 32 ms) is integer-ms so it never showed.
11+
///
12+
/// Fix: anchor to the first frame's PTS, then advance by emitted sample count so consecutive buffers
13+
/// abut to the sample. `reset()` on flush; a real source PTS discontinuity (seek/edit, > 100 ms off
14+
/// the predicted clock) re-anchors so genuine gaps aren't papered over. Mirrors
15+
/// `OutputTimestampSanitizer`'s focused, unit-testable mutating-struct shape (issue #89).
16+
struct AudioClockAnchor {
17+
/// A real seek/edit moves the source clock by far more than container rounding jitter.
18+
static let discontinuityThresholdSeconds = 0.10
19+
20+
private var anchorPTS: CMTime = .invalid
21+
private var emittedSamplesSinceAnchor: Int64 = 0
22+
23+
/// Clear the anchor (flush/seek). The next `resolve` re-anchors to its `startPTS`.
24+
mutating func reset() {
25+
anchorPTS = .invalid
26+
emittedSamplesSinceAnchor = 0
27+
}
28+
29+
/// Decide the PTS to stamp on a buffer whose container PTS is `startPTS`, without mutating state.
30+
/// `reanchor` distinguishes a fresh anchor (first buffer or post-discontinuity) from a continued
31+
/// gapless run; pass it back to `commit` once the buffer is actually emitted.
32+
func resolve(startPTS: CMTime, sampleRate: Int32) -> (pts: CMTime, reanchor: Bool) {
33+
guard anchorPTS.isValid, startPTS.isValid else {
34+
return (startPTS, true) // first buffer after open/flush, or no source PTS to anchor to
35+
}
36+
let predicted = CMTimeAdd(
37+
anchorPTS,
38+
CMTime(value: emittedSamplesSinceAnchor, timescale: sampleRate)
39+
)
40+
if abs(CMTimeGetSeconds(CMTimeSubtract(startPTS, predicted))) > Self.discontinuityThresholdSeconds {
41+
return (startPTS, true) // real discontinuity (seek/edit): honour the source clock
42+
}
43+
return (predicted, false) // gapless continuation: abut to the sample, ignore container rounding
44+
}
45+
46+
/// Advance the clock after a buffer was successfully emitted at `pts`. Only called on success so a
47+
/// dropped buffer does not inject phantom samples into the running count.
48+
mutating func commit(pts: CMTime, reanchor: Bool, sampleCount: Int) {
49+
if reanchor {
50+
anchorPTS = pts
51+
emittedSamplesSinceAnchor = Int64(sampleCount)
52+
} else {
53+
emittedSamplesSinceAnchor += Int64(sampleCount)
54+
}
55+
}
56+
}

Sources/AetherEngine/Audio/AudioDecoder.swift

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,11 @@ final class AudioDecoder: @unchecked Sendable {
3333
private var pendingStartPTS: CMTime = .invalid
3434
private var pendingSampleCount: Int = 0
3535

36+
/// Gapless presentation clock (issue #89): stamps each buffer from a running sample count so
37+
/// consecutive buffers abut to the sample, instead of from each buffer's container-quantized PTS
38+
/// (which clicks at every frame for non-integer-ms frame durations like 1536-sample AC-3 @ 44.1 kHz).
39+
private var clock = AudioClockAnchor()
40+
3641
#if DEBUG
3742
private var _loggedZeroConvert = false
3843
#endif
@@ -170,6 +175,8 @@ final class AudioDecoder: @unchecked Sendable {
170175
avcodec_flush_buffers(ctx)
171176
// Drop the coalesced samples; after a seek they'd be at the wrong PTS anyway.
172177
resetPending()
178+
// Re-anchor the gapless clock to the post-seek PTS on the next emitted buffer.
179+
clock.reset()
173180
#if DEBUG
174181
_loggedZeroConvert = false
175182
#endif
@@ -356,12 +363,18 @@ final class AudioDecoder: @unchecked Sendable {
356363
return nil
357364
}
358365

366+
// Gapless PTS (issue #89): derive this buffer's start from the running sample count so
367+
// consecutive buffers abut exactly, instead of from the container-quantized PTS (which leaves
368+
// +/-0.5 ms gaps and per-frame clicks for non-integer-ms frame durations). Committed only on
369+
// success below, so a dropped buffer never advances the clock.
370+
let (outPTS, reanchor) = clock.resolve(startPTS: startPTS, sampleRate: sampleRate)
371+
359372
// Single timing entry: CoreMedia treats `duration` as per-SAMPLE, so LPCM must be 1/sampleRate. Stamping
360373
// the buffer total made GetDuration report totalSamples^2/sampleRate (~22s for 1024 samples), wedging
361374
// AudioPlaybackHost's buffer-ahead gate after one packet.
362375
var timing = CMSampleTimingInfo(
363376
duration: CMTime(value: 1, timescale: sampleRate),
364-
presentationTimeStamp: startPTS,
377+
presentationTimeStamp: outPTS,
365378
decodeTimeStamp: .invalid
366379
)
367380

@@ -382,6 +395,7 @@ final class AudioDecoder: @unchecked Sendable {
382395
)
383396
resetPending()
384397
guard status == noErr, let sample = sampleBuffer else { return nil }
398+
clock.commit(pts: outPTS, reanchor: reanchor, sampleCount: totalSamples)
385399
return sample
386400
}
387401

Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
import XCTest
2+
import CoreMedia
3+
@testable import AetherEngine
4+
5+
/// Issue #89: the software AudioDecoder stamped each CMSampleBuffer with its container-quantized PTS.
6+
/// For frame durations that are not an integer number of container ticks (1536-sample AC-3 @ 44.1 kHz
7+
/// = 34.83 ms in a 1 ms MKV timebase) consecutive buffers no longer abut, and
8+
/// AVSampleBufferAudioRenderer clicks at every frame (~29 Hz, a continuous crackle). AudioClockAnchor
9+
/// stamps from a running sample count so buffers abut to the sample, re-anchoring only on a real
10+
/// (> 100 ms) source discontinuity.
11+
final class AudioClockAnchorTests: XCTestCase {
12+
13+
/// Container-quantized PTS the demuxer hands us: MKV carries a 1 ms timebase, so the per-packet
14+
/// PTS is the frame's true time rounded to the nearest millisecond.
15+
private func containerPTS(frame n: Int, samplesPerFrame: Int, sampleRate: Int32) -> CMTime {
16+
let seconds = Double(n * samplesPerFrame) / Double(sampleRate)
17+
let ms = Int64((seconds * 1000).rounded())
18+
return CMTimeMake(value: ms, timescale: 1000)
19+
}
20+
21+
/// Drive the anchor exactly as AudioDecoder.emitPending does: resolve, then commit on success.
22+
@discardableResult
23+
private func runStream(_ anchor: inout AudioClockAnchor,
24+
ptsList: [CMTime],
25+
samplesPerFrame: Int,
26+
sampleRate: Int32) -> [CMTime] {
27+
var out: [CMTime] = []
28+
for pts in ptsList {
29+
let r = anchor.resolve(startPTS: pts, sampleRate: sampleRate)
30+
anchor.commit(pts: r.pts, reanchor: r.reanchor, sampleCount: samplesPerFrame)
31+
out.append(r.pts)
32+
}
33+
return out
34+
}
35+
36+
// MARK: - The crackle bug
37+
38+
func testConsecutiveBuffersAbutToTheSample_441kHzAC3() {
39+
let rate: Int32 = 44100
40+
let spf = 1536 // AC-3 frame
41+
var anchor = AudioClockAnchor()
42+
let ptsList = (0..<10).map { containerPTS(frame: $0, samplesPerFrame: spf, sampleRate: rate) }
43+
44+
let out = runStream(&anchor, ptsList: ptsList, samplesPerFrame: spf, sampleRate: rate)
45+
46+
let expectedStep = Double(spf) / Double(rate) // 34.8299... ms, the true frame length
47+
for n in 1..<out.count {
48+
let delta = CMTimeGetSeconds(CMTimeSubtract(out[n], out[n - 1]))
49+
XCTAssertEqual(delta, expectedStep, accuracy: 1e-6,
50+
"buffer \(n) must abut to the sample; got \(delta * 1000) ms, expected \(expectedStep * 1000) ms")
51+
}
52+
}
53+
54+
func testFirstBufferAnchorsToItsStartPTS() {
55+
var anchor = AudioClockAnchor()
56+
let start = CMTimeMake(value: 5000, timescale: 1000)
57+
let r = anchor.resolve(startPTS: start, sampleRate: 44100)
58+
XCTAssertTrue(r.reanchor)
59+
XCTAssertEqual(r.pts, start)
60+
}
61+
62+
// MARK: - Jitter is absorbed, real discontinuities re-anchor
63+
64+
func testSmallContainerJitterIsAbsorbed() {
65+
let rate: Int32 = 44100
66+
let spf = 1536
67+
var anchor = AudioClockAnchor()
68+
69+
let r0 = anchor.resolve(startPTS: .zero, sampleRate: rate)
70+
anchor.commit(pts: r0.pts, reanchor: r0.reanchor, sampleCount: spf)
71+
72+
// True next boundary is 34.83 ms; the container rounds it to 35 ms. The 0.17 ms jitter is what
73+
// produced the per-frame click, so it must be absorbed (predicted used, not the rounded PTS).
74+
let jittery = CMTimeMake(value: 35, timescale: 1000)
75+
let r1 = anchor.resolve(startPTS: jittery, sampleRate: rate)
76+
XCTAssertFalse(r1.reanchor, "sub-threshold container rounding must not re-anchor")
77+
XCTAssertEqual(CMTimeGetSeconds(r1.pts), Double(spf) / Double(rate), accuracy: 1e-6,
78+
"buffer must be stamped at the sample-accurate predicted time, not the rounded container PTS")
79+
}
80+
81+
func testRealDiscontinuityReanchors() {
82+
let rate: Int32 = 44100
83+
let spf = 1536
84+
var anchor = AudioClockAnchor()
85+
let ptsList = (0..<5).map { containerPTS(frame: $0, samplesPerFrame: spf, sampleRate: rate) }
86+
runStream(&anchor, ptsList: ptsList, samplesPerFrame: spf, sampleRate: rate)
87+
88+
let seek = CMTimeMake(value: 60_000, timescale: 1000) // a 60 s jump dwarfs container jitter
89+
let r = anchor.resolve(startPTS: seek, sampleRate: rate)
90+
XCTAssertTrue(r.reanchor)
91+
XCTAssertEqual(r.pts, seek, "a real source discontinuity must be honoured, not papered over")
92+
}
93+
94+
func testResetReanchorsNextBuffer() {
95+
let rate: Int32 = 44100
96+
let spf = 1536
97+
var anchor = AudioClockAnchor()
98+
let ptsList = (0..<5).map { containerPTS(frame: $0, samplesPerFrame: spf, sampleRate: rate) }
99+
runStream(&anchor, ptsList: ptsList, samplesPerFrame: spf, sampleRate: rate)
100+
101+
anchor.reset()
102+
103+
// 174 ms sits right on the predicted clock (5 * 1536 / 44100 = 174.1 ms); without reset it would
104+
// be absorbed as a continuation. After a flush it must re-anchor to its own PTS instead.
105+
let afterFlush = CMTimeMake(value: 174, timescale: 1000)
106+
let r = anchor.resolve(startPTS: afterFlush, sampleRate: rate)
107+
XCTAssertTrue(r.reanchor, "flush must drop the anchor so the post-seek buffer re-anchors")
108+
XCTAssertEqual(r.pts, afterFlush)
109+
}
110+
111+
// MARK: - The clean case stays clean, and dropped buffers don't drift the clock
112+
113+
func test48kHzAC3StaysGapless() {
114+
let rate: Int32 = 48000
115+
let spf = 1536 // exactly 32 ms, always was gapless
116+
var anchor = AudioClockAnchor()
117+
let ptsList = (0..<10).map { containerPTS(frame: $0, samplesPerFrame: spf, sampleRate: rate) }
118+
119+
let out = runStream(&anchor, ptsList: ptsList, samplesPerFrame: spf, sampleRate: rate)
120+
121+
let expectedStep = Double(spf) / Double(rate)
122+
for n in 1..<out.count {
123+
let delta = CMTimeGetSeconds(CMTimeSubtract(out[n], out[n - 1]))
124+
XCTAssertEqual(delta, expectedStep, accuracy: 1e-6)
125+
}
126+
}
127+
128+
func testDroppedBufferDoesNotAdvanceClock() {
129+
let rate: Int32 = 44100
130+
let spf = 1536
131+
var anchor = AudioClockAnchor()
132+
133+
// Frame 0 emits and commits: clock now holds 1536 samples.
134+
let r0 = anchor.resolve(startPTS: .zero, sampleRate: rate)
135+
anchor.commit(pts: r0.pts, reanchor: r0.reanchor, sampleCount: spf)
136+
137+
// Frame 1 resolves but its CMSampleBuffer creation fails, so it is never committed.
138+
_ = anchor.resolve(startPTS: containerPTS(frame: 1, samplesPerFrame: spf, sampleRate: rate), sampleRate: rate)
139+
140+
// Frame 2 must predict off the still-1536 sample count (one frame in), proving the dropped
141+
// buffer injected no phantom samples.
142+
let r2 = anchor.resolve(startPTS: containerPTS(frame: 2, samplesPerFrame: spf, sampleRate: rate), sampleRate: rate)
143+
XCTAssertFalse(r2.reanchor)
144+
XCTAssertEqual(CMTimeGetSeconds(r2.pts), Double(spf) / Double(rate), accuracy: 1e-6,
145+
"an uncommitted (dropped) buffer must not advance the sample clock")
146+
}
147+
}

docs/architecture.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,8 @@ Source URL ──► Demuxer ──┬─► SoftwareVideoDecoder (dav1d) ──
3838
AVR / speakers
3939
```
4040

41+
`AudioDecoder` stamps each `CMSampleBuffer` from a running sample count anchored to the first frame (`AudioClockAnchor`), not from the container-quantized per-packet PTS. Container timebases are coarse (1 ms in MKV), so when a frame's duration is not an integer number of ticks (a 1536-sample AC-3 frame is 34.83 ms at 44.1 kHz but exactly 32 ms at 48 kHz) the quantized PTS leave a sub-millisecond gap or overlap at every buffer boundary, and `AVSampleBufferAudioRenderer` reconciles a discontinuity at each one (~29 clicks/sec, a continuous crackle). Anchoring to the sample clock makes consecutive buffers abut exactly; a real source discontinuity (> 100 ms off the predicted clock, i.e. a seek or edit) re-anchors so genuine gaps are not papered over, and `flush()` drops the anchor. The clock advances only on a successfully emitted buffer, so a dropped buffer injects no phantom samples.
42+
4143
AV1+DV (Profile 10.0 / 10.1 / 10.4) routes through the native path on hardware-AV1 hosts via the `dav1` / `av01` track type plus the source's `dvvC` box. AV1+Atmos is genuinely rare in the wild (mastering still runs in HEVC overwhelmingly), so the SW pipeline's lack of Atmos passthrough is a theoretical limitation rather than a real one. The dispatch happens once at load time; hosts see a unified `@Published` state surface either way.
4244

4345
**Background audio (iOS).** When the app backgrounds while playing, the engine keeps audio going rather than tearing the pipeline down. The decision is a pure, unit-tested policy, `backgroundAction(isAudioBackend:hasSoftwareHost:keepVideoAlive:state:)`, driven from the `UIApplication` lifecycle observers; `keepVideoAlive` comes from `shouldKeepVideoAlive(enabled:pipActive:state:)` and is gated to iOS (tvOS always tears down, wedge-safe: a frozen decode session crossing a multi-hour suspension wedged `mediaserverd`). On the native path "keep audio alive" is just declining to tear down: `AVPlayer` under the `.playback` session keeps decoding. The software path has no `AVPlayer`, and its combined demux loop normally paces the whole loop (audio and video) on the video renderer's `isReadyForMoreMediaData`; once `AVSampleBufferDisplayLayer` stops draining in the background that gate never reopens and audio would starve. So the host enters `backgroundAudioOnly`: the loop drops video packets and paces on the audio renderer (`AudioOutput.isReadyForMoreMediaData`) instead, keeping `AVSampleBufferAudioRenderer` fed and the synchronizer advancing. On foreground return the flag clears, the video decoder and renderer flush, and video resyncs at the next keyframe with audio uninterrupted. Scope is the combined VOD loop (and live-without-DVR, which shares it); the DVR feeder loop is unchanged. Exercise it headless with `aetherctl bgaudio` (see [cli.md](cli.md)).

0 commit comments

Comments
 (0)