You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
// Forecast the chunk plan without generating audio
147
150
let plan = tts.chunks(for: longArticle)
148
151
```
149
152
150
-
`generateWithManifest` populates ``TTSSegmentTiming/byteRangeInConcatenatedAudio`` for `pcm`
151
-
output today. Other formats leave it `nil` until the framework can compute byte ranges defensibly.
152
153
`generateAll` is implemented on top of the same path and returns `result.audio`.
153
154
154
155
`stream` segments always carry ``TTSSegmentTiming/uncomputed`` timing. Per-segment audio is the
155
156
raw chunk bytes, and final container offsets are only meaningful after concatenation.
156
157
158
+
#### Supported Timing
159
+
160
+
| Format | Byte range | Duration |
161
+
|---|---:|---:|
162
+
|`pcm`| yes | yes, when the provider supplies sample rate, channels, and bits per sample |
163
+
|`mp3`| yes (accounts for ID3v2, Xing/Info, and ID3v1 stripping) | not supported |
164
+
|`wav`, `flac`, `opus`, `aac`| not supported | not supported |
165
+
166
+
Unsupported values are reported as `nil` rather than guessed. Duration for `pcm` is computed as
167
+
`bytes / (sampleRate * channels * (bitsPerSample / 8))` when the provider's
168
+
``TTSProvider/resolvedEncoding(for:options:)`` returns a fully populated ``TTSAudioEncoding``.
169
+
Built-in providers override the hook only with values they have published documentation for.
170
+
``OpenAITTSProvider`` populates `pcm``sampleRate` and `bitsPerSample` from OpenAI's
171
+
`/v1/audio/speech` documentation but leaves `channels``nil` until the speech endpoint's channel
172
+
count is documented separately, so OpenAI `pcm``durationSeconds` remains `nil`. Custom providers
173
+
using the protocol's default implementation report `nil` PCM fields and therefore `nil` duration.
174
+
157
175
### TTSOptions
158
176
159
177
``TTSOptions`` controls per-request parameters:
@@ -188,12 +206,29 @@ For MP3 output, the concatenator strips ID3v2 headers, Xing/Info frames, and ID3
188
206
189
207
### Custom Providers
190
208
191
-
Conform to ``TTSProvider`` to use any speech synthesis backend. ``TTSClient`` delivers a ``TTSChunkContext`` carrying the chunk plan and requested encoding alongside each call. Providers should treat `context.encoding` as the authoritative source for the format to produce, and can additionally use it for logging or request correlation:
209
+
Conform to ``TTSProvider`` to use any speech synthesis backend. ``TTSClient`` delivers a
210
+
``TTSChunkContext`` carrying the chunk plan and requested encoding alongside each call. Providers
211
+
should treat `context.encoding` as the authoritative source for the format to produce, and can
212
+
additionally use it for logging or request correlation.
213
+
214
+
Override ``TTSProvider/resolvedEncoding(for:options:)`` to surface documented `pcm` sample rate,
215
+
channel count, and bit depth so the framework can compute ``TTSSegmentTiming/durationSeconds`` for
216
+
`pcm` segments. The default implementation returns ``TTSAudioEncoding`` with `nil` PCM fields, so
217
+
providers without published encoding values can omit it.
0 commit comments