You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The chunker splits input text on sentence boundaries using `NLTokenizer`. Sentences are packed into chunks up to the provider's `maxChunkCharacters` limit. Oversized sentences fall back to word-level, then character-level splitting. ``TTSClient`` dispatches up to `maxConcurrent` chunk requests in parallel using a task group. Results are buffered and yielded in original order.
146
151
147
-
Each ``TTSSegment`` carries the chunker's output `text` and a `sourceRange` of UTF-8 byte offsets into the original input. For force-split chunks, `text` normalizes whitespace to single spaces while `sourceRange` covers the discontiguous span of the words it contains, preserving left-to-right monotonicity for caller-side highlighting and forced alignment.
152
+
Each ``TTSSegment`` aggregates a ``TTSChunk`` (the unit of input text), a ``TTSAudioEncoding`` (the encoding ``TTSClient`` requested from the provider), a ``TTSSegmentTiming`` (audio-time metadata; both fields are `nil` until the framework computes them), and the audio bytes. The chunk, encoding, and timing are the canonical access path; flat computed properties on ``TTSSegment`` (`index`, `total`, `text`, `sourceRange`) forward to the chunk for log statements that need only those fields. For force-split chunks, `text` normalizes whitespace to single spaces while `sourceRange` covers the discontiguous span of the words it contains, preserving left-to-right monotonicity for caller-side highlighting and forced alignment.
153
+
154
+
``TTSClient/chunks(for:)`` returns the same ``TTSChunk`` values the stream will emit, without calling the provider. Use it to forecast chunk identity before generation or to drive offline planning.
155
+
156
+
``TTSConcatenationResult`` and ``TTSManifestEntry`` describe the shape of a manifest-aware concatenation that pairs the audio bytes with a per-segment manifest of chunk, encoding, and timing.
148
157
149
158
For MP3 output, the concatenator strips ID3v2 headers, Xing/Info frames, and ID3v1 tails from interior segments for clean concatenation.
150
159
151
160
### Custom Providers
152
161
153
-
Conform to ``TTSProvider`` to use any speech synthesis backend:
162
+
Conform to ``TTSProvider`` to use any speech synthesis backend. ``TTSClient`` delivers a ``TTSChunkContext`` carrying the chunk plan and requested encoding alongside each call. Providers should treat `context.encoding` as the authoritative source for the format to produce, and can additionally use it for logging or request correlation:
154
163
155
164
```swift
156
165
structMyTTSProvider: TTSProvider {
157
166
let config: TTSProviderConfig
158
167
159
-
funcgenerate(text: String, voice: String, options: TTSOptions) asyncthrows-> Data {
168
+
funcgenerate(
169
+
text: String,
170
+
voice: String,
171
+
options: TTSOptions,
172
+
context: TTSChunkContext
173
+
) asyncthrows-> Data {
174
+
let chunkID ="\(context.chunk.index+1)/\(context.chunk.total)"
175
+
log("synthesizing \(chunkID) as \(context.encoding.mimeType)")
160
176
// Call your speech API and return audio bytes
161
177
}
162
178
}
@@ -181,4 +197,10 @@ let tts = TTSClient(provider: provider)
0 commit comments