|
| 1 | +--- |
| 2 | +status: draft |
| 3 | +date: 2026-05-20 |
| 4 | +definition: coarse |
| 5 | +--- |
| 6 | + |
| 7 | +# Audio-only composition |
| 8 | + |
| 9 | +Engine support for HLS sources that contain only audio renditions |
| 10 | +(no video tracks declared in the manifest). The Case-1 Media-src |
| 11 | +feature per [clusters.md § Feature classification axes — Composition |
| 12 | +cases per mode](./clusters.md#composition-vs-policy-vs-middle-pattern): |
| 13 | +"Default composition handles a manifest that is genuinely audio-only |
| 14 | +(no other tracks present). Required to claim general permutation |
| 15 | +support." Sister to [video-only-composition](./video-only-composition.md) |
| 16 | +(parallel sibling on the inverse axis). |
| 17 | + |
| 18 | +A **Media-src feature** per |
| 19 | +[clusters.md § Feature classification axes](./clusters.md#media-src-vs-player-vs-borderline): |
| 20 | +without it, audio-only HLS sources don't play correctly. (Today the |
| 21 | +engine *tolerates* audio-only sources per the `engine.test.ts` |
| 22 | +"handles audio-only stream" test, but isn't *optimized* for them — |
| 23 | +work goes here.) |
| 24 | + |
| 25 | +## Status |
| 26 | + |
| 27 | +- **Composition:** partially supported. The HLS engine |
| 28 | + (`createSimpleHlsEngine`) tolerates audio-only sources today; the |
| 29 | + test suite includes `engine.test.ts` → "handles audio-only stream |
| 30 | + (no video tracks)" exercising basic playback. Tolerance derives |
| 31 | + from `setupVideoBufferActors` and `loadVideoSegments` no-op-ing |
| 32 | + when `presentation.videoTracks` is empty (rather than asserting). |
| 33 | +- **Definition depth:** coarse — scope sketched at the engine- |
| 34 | + variant level. Implementation specifics (engine-variant composition |
| 35 | + shape, audio-only-optimized buffer targets, etc.) tracked as open |
| 36 | + questions. |
| 37 | +- **Source material:** Notion epic #4a (Basic Audio-only, Cluster |
| 38 | + C/E, Media-src composition case 1, S/S sizing). |
| 39 | + |
| 40 | +## Phases of complexity |
| 41 | + |
| 42 | +Three brief phases covering the Case-1 scope: recognition, engine |
| 43 | +variant, and edge cases. |
| 44 | + |
| 45 | +| Phase | What | Notes | |
| 46 | +|---|---|---| |
| 47 | +| Audio-only manifest recognition | Parser surfaces `presentation.videoTracks` as empty when the multivariant playlist contains only `EXT-X-MEDIA:TYPE=AUDIO` renditions (no video). Engine state observably reflects "no video tracks." Tolerated today; the parser already produces empty `videoTracks` for these sources | Already works today per the engine test. Foundation for the rest of this feature's scope | |
| 48 | +| Audio-only engine variant | Explicit engine composition variant where video-side behaviors (`setupVideoBufferActors`, `loadVideoSegments`, `switchVideoQuality`, `selectVideoTrack`) are subtractively-composed-out rather than running as no-ops. Saves the no-op overhead and makes the "no video" state explicit in the composition. The current implicit tolerance becomes explicit | Composition-variant work. Per the failure-mode catalog: live vs VoD is a composition-time distinction, and the same principle extends here — audio-only vs A+V is composition-time. Two shapes: (a) audio-only variant composes a subset of behaviors (cleanest); (b) keep uniform composition with video-side no-ops (current state). Lean: (a) — explicit composition matches the SPF discipline | |
| 49 | +| Audio-only-optimized buffer / playback | Audio-only sources may benefit from different default tuning: shorter forward-buffer targets (audio has lower bandwidth; less ahead-buffering needed), no display-related work (no `requestAnimationFrame`, no PiP, no thermal pressure from decode), simpler `endOfStream` (single SourceBuffer to coordinate) | Tier 2-ish: optimization beyond minimum viability. Defer until usage signals from podcast / audio-only customers actually exist | |
| 50 | + |
| 51 | +## What's in scope vs out of scope |
| 52 | + |
| 53 | +**In scope:** |
| 54 | +- All three phases above for HLS audio-only-manifest sources |
| 55 | +- Engine-variant composition shape (subtractive composition of |
| 56 | + video-side behaviors) |
| 57 | +- Audio-only-specific buffer / playback optimizations |
| 58 | +- Confirming + extending the existing audio-only test coverage in |
| 59 | + `engine.test.ts` |
| 60 | + |
| 61 | +**Out of scope (separate concerns — the "use case composition" |
| 62 | +doc-type):** |
| 63 | +- **Audio-only mode override** *(Player feature, "use case |
| 64 | + composition" type — not yet formalized)* — subtract-down |
| 65 | + composition that produces audio-only delivery *even from mixed- |
| 66 | + manifest sources* (sources with both audio and video). This is |
| 67 | + the Case-2 Player feature per Notion's "Composition cases per |
| 68 | + mode" framing. Different concern: this feature handles audio- |
| 69 | + only-as-source-shape; the override case is audio-only-as- |
| 70 | + delivery-choice. Falls under the yet-to-be-formalized "use case |
| 71 | + composition" doc-type (parallel concepts include background-video |
| 72 | + playback, audio-podcast mode, etc.). |
| 73 | +- **Dynamic audio-only switching** *(Case 3, deprioritized per |
| 74 | + Notion)* — same engine, config/state-driven dynamic switching |
| 75 | + between Case 1 and Case 2. Notion epic #4c: "May not build." |
| 76 | + |
| 77 | +**Out of scope (different architectural layer):** |
| 78 | +- Adapter-level audio-only UI (cover art rendering, audio-podcast |
| 79 | + player chrome, etc.). Adapter / consumer territory. |
| 80 | +- Audio-only customer-facing modes ("Listen on the go" toggles). |
| 81 | + Adapter-level. |
| 82 | + |
| 83 | +## Likely cross-cutting impact |
| 84 | + |
| 85 | +Things this feature probably forces decisions on, not just additions: |
| 86 | + |
| 87 | +- **Engine composition shape for variants.** Composition-variant |
| 88 | + pattern from the failure-mode catalog: live vs VoD is the |
| 89 | + precedent. Audio-only-vs-A+V should follow the same shape — |
| 90 | + variant-specific composition rather than runtime no-ops. The |
| 91 | + current implicit-tolerance state is a transitional shape; this |
| 92 | + feature makes it explicit. |
| 93 | +- **Behavior composition subtraction.** Today's `createSimpleHlsEngine` |
| 94 | + composes a fixed list. Subtracting video-side behaviors for the |
| 95 | + audio-only variant means a different composition list — closer to |
| 96 | + `createAudioOnlyHlsEngine` (or similar) at the engine-factory |
| 97 | + level. Cross-cluster with [engine-adapter-integration](./engine-adapter-integration.md) |
| 98 | + on how engine variants are selected. |
| 99 | +- **Variant-decision signal source.** Same question as |
| 100 | + [live-stream-support](./live-stream-support.md)'s variant-decision |
| 101 | + open question: adapter-upfront opt-in vs detect-from-parser |
| 102 | + (engine sees `presentation.videoTracks === []` and routes to |
| 103 | + audio-only composition). Detect-and-route is more adaptive; |
| 104 | + adapter-upfront is simpler. Cross-feature with how live + DVR + |
| 105 | + LL-HLS variants get composed. |
| 106 | +- **Audio-only `endOfStream` gate.** Today's `endOfStream` gate in |
| 107 | + [mse-mms-pipeline](./mse-mms-pipeline.md) coordinates across |
| 108 | + video and audio buffers (`isLastSegmentAppended` per type + |
| 109 | + `mediaSource.readyState`). For audio-only, the gate naturally |
| 110 | + simplifies (single buffer to coordinate); per the catalog's |
| 111 | + composition-variant entry, the existing `endOfStream` behavior |
| 112 | + should compose unchanged — it reads `mediaSource.sourceBuffers` |
| 113 | + uniformly rather than per-type. Verify this empirically when the |
| 114 | + feature lands. |
| 115 | +- **Audio-only ABR semantics.** Audio renditions can be multi- |
| 116 | + bitrate (multiple AAC bitrate variants). When [audio-abr](./audio-abr.md) |
| 117 | + lands, audio-only composition + audio-abr is a natural pairing. |
| 118 | + The audio-only variant composes audio-abr in place of (or in |
| 119 | + addition to) `selectAudioTrack`. |
| 120 | +- **DOM exposure semantics.** `<video>` element with no video |
| 121 | + source still produces a visible (empty) element. Adapter may |
| 122 | + want to display cover-art or a podcast-style UI; engine doesn't |
| 123 | + intervene in DOM appearance. |
| 124 | + |
| 125 | +## Open questions |
| 126 | + |
| 127 | +- **Engine variant factory shape.** `createAudioOnlyHlsEngine` as a |
| 128 | + separate factory vs `createSimpleHlsEngine` with an audio-only |
| 129 | + config flag vs detect-and-route from the existing factory. The |
| 130 | + pattern question is shared across all composition variants |
| 131 | + (live, DVR, LL-HLS, DRM-required, audio-only, video-only). |
| 132 | +- **Variant-decision signal source.** Adapter-upfront vs detect- |
| 133 | + from-parser. Same cross-feature question as live-stream-support. |
| 134 | +- **Audio-only buffer-target tuning.** Default `forwardBuffer. |
| 135 | + bufferDuration` (30s today) — appropriate for audio-only or |
| 136 | + tune down? Empirical. |
| 137 | +- **Coordination with audio-abr (when it lands).** Composition of |
| 138 | + audio-only variant + audio-abr. |
| 139 | +- **Live + audio-only intersection.** Live audio-only streams |
| 140 | + (radio-stream-like) are a real shape. Composition of live engine |
| 141 | + + audio-only variant: how do they compose? |
| 142 | + |
| 143 | +## Related features |
| 144 | + |
| 145 | +- **[video-only-composition](./video-only-composition.md)** — |
| 146 | + parallel sibling on the inverse axis (video-only manifests). |
| 147 | +- **[audio-playback](./audio-playback.md)** — baseline audio |
| 148 | + handling; this feature is the engine-variant optimization for |
| 149 | + audio-only sources. The "Audio-only composition optimizations" |
| 150 | + bullet in audio-playback's What's not implemented points here. |
| 151 | +- **[engine-adapter-integration](./engine-adapter-integration.md)** — |
| 152 | + variant-decision lives here (which engine factory the adapter |
| 153 | + invokes). |
| 154 | +- **[mse-mms-pipeline](./mse-mms-pipeline.md)** — `endOfStream` |
| 155 | + gate is variant-agnostic per the conventions; audio-only should |
| 156 | + compose it unchanged. |
| 157 | +- **[buffer-management](./buffer-management.md)** — buffer-target |
| 158 | + tuning for audio-only. |
| 159 | +- **`[audio-abr]`** *(documented; pending implementation)* — |
| 160 | + natural pairing for audio-only + multi-bitrate audio. |
| 161 | +- **`[live-stream-support]`** *(not implemented)* — live + audio- |
| 162 | + only (radio streams) is a composition intersection. |
| 163 | + |
| 164 | +## See also |
| 165 | + |
| 166 | +- [clusters.md § Feature classification axes](./clusters.md#feature-classification-axes) |
| 167 | + — "Composition vs Policy vs middle pattern"; this feature is |
| 168 | + Notion's Case-1 (Media-src composition case) |
| 169 | +- [audio-playback.md](./audio-playback.md) — baseline audio |
| 170 | + handling |
| 171 | +- [SPF Epics Working Doc](https://www.notion.so/35f97a7f89d08123a13fecab1ca1cac4) |
| 172 | + — source material; epic #4a (Basic Audio-only); epic #4b (Audio- |
| 173 | + only Mode Override) is the Case-2 sister concern under the |
| 174 | + yet-to-be-formalized "use case composition" doc-type |
| 175 | +- [conventions/behaviors.md](../conventions/behaviors.md) — |
| 176 | + composition-variant discipline (live vs VoD pattern extends to |
| 177 | + audio-only vs A+V) |
| 178 | +- [`engine.test.ts` → "handles audio-only stream"](https://github.com/videojs/v10/blob/main/packages/spf/src/playback/engines/hls/tests/engine.test.ts) |
| 179 | + — existing basic-coverage test for this feature's status quo |
0 commit comments