You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Sources/AgentRunKit/Documentation.docc/Articles/MultimodalAndAudio.md
+47-4Lines changed: 47 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -118,6 +118,7 @@ These methods cover different use cases:
118
118
|`generate(text:voice:options:)`|`Data`| Single request, no chunking |
119
119
|`stream(text:voice:options:)`|`AsyncThrowingStream<TTSSegment, Error>`| Chunked, yields ordered ``TTSSegment`` values as they complete |
120
120
|`generateAll(text:voice:options:)`|`Data`| Chunked, concatenates all segments into one `Data`|
121
+
|`generateWithManifest(text:voice:options:)`|``TTSConcatenationResult``| Like `generateAll` but also returns a per-segment manifest of chunk, encoding, and timing |
121
122
|`chunks(for:)`|`[TTSChunk]`| The chunk plan this client will use, without invoking the provider |
122
123
123
124
```swift
@@ -134,10 +135,25 @@ for try await segment in tts.stream(text: longArticle) {
134
135
// Full concatenated output
135
136
let fullAudio =tryawait tts.generateAll(text: longArticle, options: TTSOptions(speed: 1.25))
136
137
138
+
// Concatenated audio plus a per-segment manifest
139
+
let result =tryawait tts.generateWithManifest(text: longArticle, options: TTSOptions(responseFormat: .pcm))
140
+
for entry in result.manifest {
141
+
iflet range = entry.timing.byteRangeInConcatenatedAudio {
142
+
print("chunk \(entry.chunk.index): bytes \(range) of result.audio")
143
+
}
144
+
}
145
+
137
146
// Forecast the chunk plan without generating audio
138
147
let plan = tts.chunks(for: longArticle)
139
148
```
140
149
150
+
`generateWithManifest` populates ``TTSSegmentTiming/byteRangeInConcatenatedAudio`` for `pcm`
151
+
output today. Other formats leave it `nil` until the framework can compute byte ranges defensibly.
152
+
`generateAll` is implemented on top of the same path and returns `result.audio`.
153
+
154
+
`stream` segments always carry ``TTSSegmentTiming/uncomputed`` timing. Per-segment audio is the
155
+
raw chunk bytes, and final container offsets are only meaningful after concatenation.
156
+
141
157
### TTSOptions
142
158
143
159
``TTSOptions`` controls per-request parameters:
@@ -147,13 +163,26 @@ let plan = tts.chunks(for: longArticle)
147
163
148
164
### How Chunking Works
149
165
150
-
The chunker splits input text on sentence boundaries using `NLTokenizer`. Sentences are packed into chunks up to the provider's `maxChunkCharacters` limit. Oversized sentences fall back to word-level, then character-level splitting. ``TTSClient`` dispatches up to `maxConcurrent` chunk requests in parallel using a task group. Results are buffered and yielded in original order.
166
+
The chunker splits input text on sentence boundaries using `NLTokenizer`. Sentences are packed up to
167
+
the provider's `maxChunkCharacters` limit. Oversized sentences fall back to word-level, then
168
+
character-level splitting.
169
+
170
+
``TTSClient`` dispatches up to `maxConcurrent` chunk requests in parallel. Results are buffered and
171
+
yielded in original order.
151
172
152
-
Each ``TTSSegment`` aggregates a ``TTSChunk`` (the unit of input text), a ``TTSAudioEncoding`` (the encoding ``TTSClient`` requested from the provider), a ``TTSSegmentTiming`` (audio-time metadata; both fields are `nil` until the framework computes them), and the audio bytes. The chunk, encoding, and timing are the canonical access path; flat computed properties on ``TTSSegment`` (`index`, `total`, `text`, `sourceRange`) forward to the chunk for log statements that need only those fields. For force-split chunks, `text` normalizes whitespace to single spaces while `sourceRange` covers the discontiguous span of the words it contains, preserving left-to-right monotonicity for caller-side highlighting and forced alignment.
173
+
Each ``TTSSegment`` carries a ``TTSChunk``, a ``TTSAudioEncoding``, a ``TTSSegmentTiming``, and the
174
+
audio bytes. The chunk, encoding, and timing fields are the canonical access path; flat properties
175
+
on ``TTSSegment`` forward to the chunk for compact logging.
153
176
154
-
``TTSClient/chunks(for:)`` returns the same ``TTSChunk`` values the stream will emit, without calling the provider. Use it to forecast chunk identity before generation or to drive offline planning.
177
+
For force-split chunks, `text` normalizes whitespace to single spaces while `sourceRange` covers the
178
+
span of the words it contains. That keeps ranges monotonic for caller-side highlighting and forced
179
+
alignment.
155
180
156
-
``TTSConcatenationResult`` and ``TTSManifestEntry`` describe the shape of a manifest-aware concatenation that pairs the audio bytes with a per-segment manifest of chunk, encoding, and timing.
181
+
``TTSClient/chunks(for:)`` returns the same ``TTSChunk`` values the stream will emit, without calling
182
+
the provider. Use it to forecast chunk identity before generation or to drive offline planning.
183
+
184
+
``TTSConcatenationResult`` and ``TTSManifestEntry`` pair concatenated audio bytes with a per-segment
185
+
manifest of chunk, encoding, and timing.
157
186
158
187
For MP3 output, the concatenator strips ID3v2 headers, Xing/Info frames, and ID3v1 tails from interior segments for clean concatenation.
159
188
@@ -185,6 +214,19 @@ let provider = MyTTSProvider(config: TTSProviderConfig(
185
214
let tts =TTSClient(provider: provider)
186
215
```
187
216
217
+
For HTTP-backed providers, ``HTTPDataRetry`` exposes the same retry primitive
218
+
``OpenAITTSProvider`` uses: exponential backoff with jitter and `Retry-After`-aware handling of
219
+
429 responses. Pass a ``RetryPolicy`` and receive `(Data, HTTPURLResponse)` on success or a
220
+
``TransportError`` on failure; cancellation propagates through `CancellationError`.
221
+
222
+
```swift
223
+
let (data, response) =tryawait HTTPDataRetry.perform(
224
+
urlRequest: request,
225
+
session: .shared,
226
+
retryPolicy: .default
227
+
)
228
+
```
229
+
188
230
## See Also
189
231
190
232
-<doc:AgentAndChat>
@@ -204,3 +246,4 @@ let tts = TTSClient(provider: provider)
0 commit comments