Tom-Ryder
diff --git a/‎Sources/AgentRunKit/Documentation.docc/AgentRunKit.md‎
Lines changed: 6 additions & 0 deletions b/‎Sources/AgentRunKit/Documentation.docc/AgentRunKit.md‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎Sources/AgentRunKit/Documentation.docc/Articles/MultimodalAndAudio.md‎
Lines changed: 27 additions & 5 deletions b/‎Sources/AgentRunKit/Documentation.docc/Articles/MultimodalAndAudio.md‎
Lines changed: 27 additions & 5 deletions
diff --git a/‎Sources/AgentRunKit/TTS/OpenAITTSProvider.swift‎
Lines changed: 19 additions & 4 deletions b/‎Sources/AgentRunKit/TTS/OpenAITTSProvider.swift‎
Lines changed: 19 additions & 4 deletions
diff --git a/‎Sources/AgentRunKit/TTS/SentenceChunker.swift‎
Lines changed: 1 addition & 1 deletion b/‎Sources/AgentRunKit/TTS/SentenceChunker.swift‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎Sources/AgentRunKit/TTS/TTSAudioEncoding.swift‎
Lines changed: 43 additions & 0 deletions b/‎Sources/AgentRunKit/TTS/TTSAudioEncoding.swift‎
Lines changed: 43 additions & 0 deletions
diff --git a/‎Sources/AgentRunKit/TTS/TTSAudioFormat.swift‎
Lines changed: 27 additions & 0 deletions b/‎Sources/AgentRunKit/TTS/TTSAudioFormat.swift‎
Lines changed: 27 additions & 0 deletions
diff --git a/‎Sources/AgentRunKit/TTS/TTSChunk.swift‎
Lines changed: 17 additions & 0 deletions b/‎Sources/AgentRunKit/TTS/TTSChunk.swift‎
Lines changed: 17 additions & 0 deletions
diff --git a/‎Sources/AgentRunKit/TTS/TTSChunkContext.swift‎
Lines changed: 12 additions & 0 deletions b/‎Sources/AgentRunKit/TTS/TTSChunkContext.swift‎
Lines changed: 12 additions & 0 deletions
diff --git a/‎Sources/AgentRunKit/TTS/TTSClient.swift‎
Lines changed: 45 additions & 16 deletions b/‎Sources/AgentRunKit/TTS/TTSClient.swift‎
Lines changed: 45 additions & 16 deletions
@@ -164,8 +164,14 @@ For a complete walkthrough, see <doc:GettingStarted>.
 - ``TTSProvider``
 - ``TTSProviderConfig``
 - ``TTSAudioFormat``
+- ``TTSAudioEncoding``
 - ``OpenAITTSProvider``
 - ``TTSSegment``
+- ``TTSSegmentTiming``
+- ``TTSChunk``
+- ``TTSChunkContext``
+- ``TTSManifestEntry``
+- ``TTSConcatenationResult``
 - ``TTSOptions``
 
 ### MCP Integration
 
@@ -111,13 +111,14 @@ let tts = TTSClient(provider: provider, maxConcurrent: 4)
 
 ### Generating Audio
 
-Three methods cover different use cases:
+These methods cover different use cases:
 
 | Method | Returns | Behavior |
 |---|---|---|
 | `generate(text:voice:options:)` | `Data` | Single request, no chunking |
 | `stream(text:voice:options:)` | `AsyncThrowingStream<TTSSegment, Error>` | Chunked, yields ordered ``TTSSegment`` values as they complete |
 | `generateAll(text:voice:options:)` | `Data` | Chunked, concatenates all segments into one `Data` |
+| `chunks(for:)` | `[TTSChunk]` | The chunk plan this client will use, without invoking the provider |
 
 ```swift
 // Single generation
@@ -126,11 +127,15 @@ let audio = try await tts.generate(text: "Hello, world.", voice: "nova")
 // Streaming segments
 for try await segment in tts.stream(text: longArticle) {
     player.play(segment.audio)
-    print("chunk \(segment.index + 1)/\(segment.total) bytes \(segment.sourceRange): \(segment.text)")
+    let chunk = segment.chunk
+    print("chunk \(chunk.index + 1)/\(chunk.total) bytes \(chunk.sourceRange): \(chunk.text)")
 }
 
 // Full concatenated output
 let fullAudio = try await tts.generateAll(text: longArticle, options: TTSOptions(speed: 1.25))
+
+// Forecast the chunk plan without generating audio
+let plan = tts.chunks(for: longArticle)
 ```
 
 ### TTSOptions
@@ -144,19 +149,30 @@ let fullAudio = try await tts.generateAll(text: longArticle, options: TTSOptions
 
 The chunker splits input text on sentence boundaries using `NLTokenizer`. Sentences are packed into chunks up to the provider's `maxChunkCharacters` limit. Oversized sentences fall back to word-level, then character-level splitting. ``TTSClient`` dispatches up to `maxConcurrent` chunk requests in parallel using a task group. Results are buffered and yielded in original order.
 
-Each ``TTSSegment`` carries the chunker's output `text` and a `sourceRange` of UTF-8 byte offsets into the original input. For force-split chunks, `text` normalizes whitespace to single spaces while `sourceRange` covers the discontiguous span of the words it contains, preserving left-to-right monotonicity for caller-side highlighting and forced alignment.
+Each ``TTSSegment`` aggregates a ``TTSChunk`` (the unit of input text), a ``TTSAudioEncoding`` (the encoding ``TTSClient`` requested from the provider), a ``TTSSegmentTiming`` (audio-time metadata; both fields are `nil` until the framework computes them), and the audio bytes. The chunk, encoding, and timing are the canonical access path; flat computed properties on ``TTSSegment`` (`index`, `total`, `text`, `sourceRange`) forward to the chunk for log statements that need only those fields. For force-split chunks, `text` normalizes whitespace to single spaces while `sourceRange` covers the discontiguous span of the words it contains, preserving left-to-right monotonicity for caller-side highlighting and forced alignment.
+
+``TTSClient/chunks(for:)`` returns the same ``TTSChunk`` values the stream will emit, without calling the provider. Use it to forecast chunk identity before generation or to drive offline planning.
+
+``TTSConcatenationResult`` and ``TTSManifestEntry`` describe the shape of a manifest-aware concatenation that pairs the audio bytes with a per-segment manifest of chunk, encoding, and timing.
 
 For MP3 output, the concatenator strips ID3v2 headers, Xing/Info frames, and ID3v1 tails from interior segments for clean concatenation.
 
 ### Custom Providers
 
-Conform to ``TTSProvider`` to use any speech synthesis backend:
+Conform to ``TTSProvider`` to use any speech synthesis backend. ``TTSClient`` delivers a ``TTSChunkContext`` carrying the chunk plan and requested encoding alongside each call. Providers should treat `context.encoding` as the authoritative source for the format to produce, and can additionally use it for logging or request correlation:
 
 ```swift
 struct MyTTSProvider: TTSProvider {
     let config: TTSProviderConfig
 
-    func generate(text: String, voice: String, options: TTSOptions) async throws -> Data {
+    func generate(
+        text: String,
+        voice: String,
+        options: TTSOptions,
+        context: TTSChunkContext
+    ) async throws -> Data {
+        let chunkID = "\(context.chunk.index + 1)/\(context.chunk.total)"
+        log("synthesizing \(chunkID) as \(context.encoding.mimeType)")
         // Call your speech API and return audio bytes
     }
 }
@@ -181,4 +197,10 @@ let tts = TTSClient(provider: provider)
 - ``TTSProvider``
 - ``OpenAITTSProvider``
 - ``TTSSegment``
+- ``TTSSegmentTiming``
+- ``TTSChunk``
+- ``TTSChunkContext``
+- ``TTSAudioEncoding``
+- ``TTSManifestEntry``
+- ``TTSConcatenationResult``
 - ``TTSOptions``
@@ -31,7 +31,12 @@ public struct OpenAITTSProvider: TTSProvider, Sendable {
         )
     }
 
-    public func generate(text: String, voice: String, options: TTSOptions) async throws -> Data {
+    public func generate(
+        text: String,
+        voice: String,
+        options: TTSOptions,
+        context: TTSChunkContext
+    ) async throws -> Data {
         if let speed = options.speed {
             guard (0.25 ... 4.0).contains(speed) else {
                 throw TTSError.invalidConfiguration(
@@ -40,7 +45,12 @@ public struct OpenAITTSProvider: TTSProvider, Sendable {
             }
         }
 
-        let urlRequest = try buildURLRequest(text: text, voice: voice, options: options)
+        let urlRequest = try buildURLRequest(
+            text: text,
+            voice: voice,
+            options: options,
+            encoding: context.encoding
+        )
 
         do {
             let (data, _) = try await HTTPRetry.performData(
@@ -56,7 +66,12 @@ public struct OpenAITTSProvider: TTSProvider, Sendable {
         }
     }
 
-    func buildURLRequest(text: String, voice: String, options: TTSOptions) throws -> URLRequest {
+    func buildURLRequest(
+        text: String,
+        voice: String,
+        options: TTSOptions,
+        encoding: TTSAudioEncoding
+    ) throws -> URLRequest {
         let url = baseURL.appendingPathComponent("audio/speech")
         var urlRequest = URLRequest(url: url)
         urlRequest.httpMethod = "POST"
@@ -67,7 +82,7 @@ public struct OpenAITTSProvider: TTSProvider, Sendable {
             model: model,
             input: text,
             voice: voice,
-            responseFormat: (options.responseFormat ?? config.defaultFormat).rawValue,
+            responseFormat: encoding.format.rawValue,
             speed: options.speed
         )
         urlRequest.httpBody = try JSONEncoder().encode(body)
 
@@ -147,7 +147,7 @@ enum SentenceChunker {
         return (lowerOffset + trimShift) ..< (upperOffset + trimShift)
     }
 
-    private static func trimByteOffset(in original: String) -> Int {
+    static func trimByteOffset(in original: String) -> Int {
         guard let firstNonWS = original.unicodeScalars.firstIndex(where: {
             !CharacterSet.whitespacesAndNewlines.contains($0)
         }) else { return 0 }
 
@@ -0,0 +1,43 @@
+import Foundation
+
+/// The audio encoding ``TTSClient`` requests from a provider for a segment.
+public struct TTSAudioEncoding: Sendable, Equatable, Hashable, Codable {
+    public let format: TTSAudioFormat
+    public let mimeType: String
+    public let fileExtension: String
+    public let sampleRate: Int?
+    public let channels: Int?
+    public let bitsPerSample: Int?
+
+    public init(
+        format: TTSAudioFormat,
+        mimeType: String,
+        fileExtension: String,
+        sampleRate: Int? = nil,
+        channels: Int? = nil,
+        bitsPerSample: Int? = nil
+    ) {
+        self.format = format
+        self.mimeType = mimeType
+        self.fileExtension = fileExtension
+        self.sampleRate = sampleRate
+        self.channels = channels
+        self.bitsPerSample = bitsPerSample
+    }
+
+    public init(
+        _ format: TTSAudioFormat,
+        sampleRate: Int? = nil,
+        channels: Int? = nil,
+        bitsPerSample: Int? = nil
+    ) {
+        self.init(
+            format: format,
+            mimeType: format.mimeType,
+            fileExtension: format.fileExtension,
+            sampleRate: sampleRate,
+            channels: channels,
+            bitsPerSample: bitsPerSample
+        )
+    }
+}
@@ -0,0 +1,27 @@
+import Foundation
+
+/// An audio container or codec the orchestrator can request from a ``TTSProvider``.
+public enum TTSAudioFormat: String, Sendable, Codable, CaseIterable {
+    case mp3, opus, aac, flac, wav, pcm
+
+    public var mimeType: String {
+        switch self {
+        case .mp3:
+            "audio/mpeg"
+        case .opus:
+            "audio/opus"
+        case .aac:
+            "audio/aac"
+        case .flac:
+            "audio/flac"
+        case .wav:
+            "audio/wav"
+        case .pcm:
+            "audio/L16"
+        }
+    }
+
+    public var fileExtension: String {
+        rawValue
+    }
+}
@@ -0,0 +1,17 @@
+import Foundation
+
+/// A unit of input text that ``TTSClient`` synthesizes as one provider call.
+public struct TTSChunk: Sendable, Equatable, Hashable, Codable {
+    public let index: Int
+    public let total: Int
+    public let text: String
+    /// UTF-8 byte offsets into the original input string passed to the ``TTSClient`` call.
+    public let sourceRange: Range<Int>
+
+    public init(index: Int, total: Int, text: String, sourceRange: Range<Int>) {
+        self.index = index
+        self.total = total
+        self.text = text
+        self.sourceRange = sourceRange
+    }
+}
@@ -0,0 +1,12 @@
+import Foundation
+
+/// The chunk and requested encoding ``TTSClient`` delivers to a provider for one synthesis call.
+public struct TTSChunkContext: Sendable, Equatable, Codable {
+    public let chunk: TTSChunk
+    public let encoding: TTSAudioEncoding
+
+    public init(chunk: TTSChunk, encoding: TTSAudioEncoding) {
+        self.chunk = chunk
+        self.encoding = encoding
+    }
+}
@@ -22,11 +22,30 @@ public struct TTSClient<P: TTSProvider>: Sendable {
         guard !trimmed.isEmpty else {
             throw TTSError.emptyText
         }
+        let encoding = TTSAudioEncoding(options.responseFormat ?? provider.config.defaultFormat)
+        let leadingShift = SentenceChunker.trimByteOffset(in: text)
+        let chunk = TTSChunk(
+            index: 0,
+            total: 1,
+            text: trimmed,
+            sourceRange: leadingShift ..< (leadingShift + trimmed.utf8.count)
+        )
+        let context = TTSChunkContext(chunk: chunk, encoding: encoding)
         return try await provider.generate(
             text: trimmed,
             voice: voice ?? provider.config.defaultVoice,
-            options: options
+            options: options,
+            context: context
+        )
+    }
+
+    /// The chunk plan this client will use for a given input, without invoking the provider.
+    public func chunks(for text: String) -> [TTSChunk] {
+        let internalChunks = SentenceChunker.chunk(
+            text: text,
+            maxCharacters: provider.config.maxChunkCharacters
         )
+        return Self.makePublicChunks(internalChunks)
     }
 
     public func stream(
@@ -35,25 +54,28 @@ public struct TTSClient<P: TTSProvider>: Sendable {
         options: TTSOptions = TTSOptions()
     ) -> AsyncThrowingStream<TTSSegment, Error> {
         let resolvedVoice = voice ?? provider.config.defaultVoice
-        let chunks = SentenceChunker.chunk(
+        let internalChunks = SentenceChunker.chunk(
             text: text,
             maxCharacters: provider.config.maxChunkCharacters
         )
 
-        guard !chunks.isEmpty else {
+        guard !internalChunks.isEmpty else {
             return AsyncThrowingStream { $0.finish(throwing: TTSError.emptyText) }
         }
 
+        let publicChunks = Self.makePublicChunks(internalChunks)
+        let encoding = TTSAudioEncoding(options.responseFormat ?? provider.config.defaultFormat)
         let provider = provider
         let maxConcurrent = maxConcurrent
 
         return AsyncThrowingStream { continuation in
             let task = Task {
                 do {
                     try await Self.executeChunks(
-                        chunks,
+                        publicChunks,
                         voice: resolvedVoice,
                         options: options,
+                        encoding: encoding,
                         provider: provider,
                         maxConcurrent: maxConcurrent,
                         continuation: continuation
@@ -89,10 +111,18 @@ public struct TTSClient<P: TTSProvider>: Sendable {
         return result
     }
 
+    private static func makePublicChunks(_ internalChunks: [SentenceChunker.Chunk]) -> [TTSChunk] {
+        let total = internalChunks.count
+        return internalChunks.enumerated().map { index, chunk in
+            TTSChunk(index: index, total: total, text: chunk.text, sourceRange: chunk.sourceRange)
+        }
+    }
+
     private static func executeChunks(
-        _ chunks: [SentenceChunker.Chunk],
+        _ chunks: [TTSChunk],
         voice: String,
         options: TTSOptions,
+        encoding: TTSAudioEncoding,
         provider: P,
         maxConcurrent: Int,
         continuation: AsyncThrowingStream<TTSSegment, Error>.Continuation
@@ -107,28 +137,29 @@ public struct TTSClient<P: TTSProvider>: Sendable {
 
             while nextToYield < totalChunks {
                 while activeTasks < maxConcurrent, nextToSend < totalChunks {
-                    let chunkIndex = nextToSend
-                    let chunk = chunks[chunkIndex]
+                    let chunk = chunks[nextToSend]
+                    let context = TTSChunkContext(chunk: chunk, encoding: encoding)
                     group.addTask {
                         do {
                             let data = try await provider.generate(
                                 text: chunk.text,
                                 voice: voice,
-                                options: options
+                                options: options,
+                                context: context
                             )
-                            return (chunkIndex, data)
+                            return (chunk.index, data)
                         } catch is CancellationError {
                             throw CancellationError()
                         } catch let error as TransportError {
                             throw TTSError.chunkFailed(
-                                index: chunkIndex,
+                                index: chunk.index,
                                 total: totalChunks,
                                 sourceRange: chunk.sourceRange,
                                 error
                             )
                         } catch {
                             throw TTSError.chunkFailed(
-                                index: chunkIndex,
+                                index: chunk.index,
                                 total: totalChunks,
                                 sourceRange: chunk.sourceRange,
                                 TransportError.other(String(describing: error))
@@ -144,12 +175,10 @@ public struct TTSClient<P: TTSProvider>: Sendable {
                 buffer[index] = data
 
                 while let audio = buffer.removeValue(forKey: nextToYield) {
-                    let chunk = chunks[nextToYield]
                     continuation.yield(TTSSegment(
-                        index: nextToYield,
-                        total: totalChunks,
-                        text: chunk.text,
-                        sourceRange: chunk.sourceRange,
+                        chunk: chunks[nextToYield],
+                        encoding: encoding,
+                        timing: .uncomputed,
                         audio: audio
                     ))
                     nextToYield += 1
Original file line number	Diff line number	Diff line change
`@@ -147,7 +147,7 @@ enum SentenceChunker {`
`147`	`147`	`return (lowerOffset + trimShift) ..< (upperOffset + trimShift)`
`148`	`148`	`}`
`149`	`149`
`150`		`- private static func trimByteOffset(in original: String) -> Int {`
	`150`	`+ static func trimByteOffset(in original: String) -> Int {`
`151`	`151`	`guard let firstNonWS = original.unicodeScalars.firstIndex(where: {`
`152`	`152`	`!CharacterSet.whitespacesAndNewlines.contains($0)`
`153`	`153`	`}) else { return 0 }`