Skip to content

Commit b69d577

Browse files
kzuCopilot
andcommitted
Add Text to Speech support via ITextToSpeechClient
Implements ITextToSpeechClient for the xAI/Grok TTS API with support for both unary and streaming synthesis: - GrokTextToSpeechClient: unary via POST /v1/tts and streaming via WebSocket wss://.../v1/tts - GrokTextToSpeechOptions: extends TextToSpeechOptions with Grok-specific parameters (SampleRate, BitRate, OptimizeStreamingLatency, TextNormalization) - AsITextToSpeechClient extension on GrokClient for easy setup - Full unit test coverage for request mapping, codec/media-type resolution, error handling, and streaming event processing - readme: documents unary synthesis, streaming with progressive file writes, and Grok-specific options including available voices and audio formats Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 7ac1465 commit b69d577

8 files changed

Lines changed: 880 additions & 32 deletions

AGENTS.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# xAI SDK implementation notes
2+
3+
- `GrokClient` is primarily backed by generated gRPC protocol clients, but text to speech uses xAI's documented REST/WebSocket voice endpoints because there are no generated TTS protocol types in `src\xAI.Protocol`.
4+
- `AsITextToSpeechClient` returns an `ITextToSpeechClient` implementation that uses `POST /v1/tts` for unary audio and `wss://.../v1/tts` for streaming audio.
5+
- TTS defaults follow xAI docs: voice `eve`, language `en` when omitted by `TextToSpeechOptions`, and MP3 output when no codec is specified.

readme.md

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,12 @@ var chat = new GrokClient(Environment.GetEnvironmentVariable("XAI_API_KEY")!)
4545

4646
var images = new GrokClient(Environment.GetEnvironmentVariable("XAI_API_KEY")!)
4747
.AsIImageGenerator("grok-imagine-image");
48+
49+
var speech = new GrokClient(Environment.GetEnvironmentVariable("XAI_API_KEY")!)
50+
.AsITextToSpeechClient();
51+
52+
var audio = await speech.GetAudioAsync("Hello! Welcome to xAI text to speech.",
53+
new TextToSpeechOptions { VoiceId = "eve", Language = "en" });
4854
```
4955

5056
## File Attachments
@@ -393,6 +399,72 @@ var editedImage = (UriContent)result.Contents.First();
393399
Console.WriteLine($"Edited image URL: {editedImage.Uri}");
394400
```
395401

402+
## Text to Speech
403+
404+
Grok supports text to speech via the `ITextToSpeechClient` abstraction from Microsoft.Extensions.AI.
405+
Use `AsITextToSpeechClient` to get a TTS client:
406+
407+
```csharp
408+
var speech = new GrokClient(Environment.GetEnvironmentVariable("XAI_API_KEY")!)
409+
.AsITextToSpeechClient();
410+
```
411+
412+
### Unary (single response)
413+
414+
Call `GetAudioAsync` to synthesize speech in a single request. The result contains a `DataContent`
415+
with the audio bytes and media type:
416+
417+
```csharp
418+
var response = await speech.GetAudioAsync("Hello! Welcome to xAI text to speech.",
419+
new TextToSpeechOptions { VoiceId = "eve", Language = "en" });
420+
421+
var audio = (DataContent)response.Contents.First();
422+
// audio.MediaType == "audio/mpeg" (MP3 by default)
423+
await File.WriteAllBytesAsync("output.mp3", audio.Data.ToArray());
424+
```
425+
426+
Available voices include `ara`, `eve`, `leo`, `rex`, and `sal`. Defaults to `eve` and English when
427+
`VoiceId`/`Language` are not specified.
428+
429+
### Streaming
430+
431+
Call `GetStreamingAudioAsync` to receive audio chunks as they are generated, enabling low-latency
432+
playback or progressive file writes:
433+
434+
```csharp
435+
await using var fileStream = File.Create("output.mp3");
436+
437+
await foreach (var update in speech.GetStreamingAudioAsync("Hello from streaming TTS!",
438+
new TextToSpeechOptions { VoiceId = "eve", AudioFormat = "mp3" }))
439+
{
440+
if (update.Kind == TextToSpeechResponseUpdateKind.AudioUpdating)
441+
{
442+
foreach (var content in update.Contents.OfType<DataContent>())
443+
await fileStream.WriteAsync(content.Data);
444+
}
445+
}
446+
```
447+
448+
### Grok-Specific Options
449+
450+
Use `GrokTextToSpeechOptions` to control audio quality and streaming behavior beyond the base
451+
`TextToSpeechOptions`:
452+
453+
```csharp
454+
var options = new GrokTextToSpeechOptions
455+
{
456+
VoiceId = "rex",
457+
Language = "en",
458+
AudioFormat = "mp3", // mp3 | wav | pcm | mulaw | alaw
459+
SampleRate = 24000, // Hz
460+
BitRate = 128000, // bits per second (MP3 only)
461+
OptimizeStreamingLatency = 1, // 0–4; higher trades quality for lower latency
462+
TextNormalization = true, // expand abbreviations and numbers before synthesis
463+
};
464+
465+
var response = await speech.GetAudioAsync("Streaming at 24 kHz, 128 kbps.", options);
466+
```
467+
396468
<!-- #xai -->
397469

398470
# xAI.Protocol

src/xAI.Tests/ChatClientTests.cs

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,9 @@ public async Task OpenAIInvokesTools()
2121
{ "user", "What day is today?" },
2222
};
2323

24-
var chat = new OpenAIClient(Configuration["OPENAI_API_KEY"]!).GetChatClient("gpt-5.4").AsIChatClient()
24+
var chat = new OpenAIClient(Configuration["OPENAI_API_KEY"]!)
25+
.GetChatClient("gpt-5.4")
26+
.AsIChatClient()
2527
.AsBuilder()
2628
.UseFunctionInvocation(configure: client => client.MaximumIterationsPerRequest = 3)
2729
.UseLogging(output.AsLoggerFactory())
@@ -96,10 +98,10 @@ public async Task GrokInvokesTools()
9698
[SecretsFact("XAI_API_KEY")]
9799
public async Task GrokReasoningModelOutputsBothContentAndEncryptedReasoning()
98100
{
99-
var grok = new GrokClient(Configuration["XAI_API_KEY"]!).AsIChatClient("grok-4-1-fast");
101+
var grok = new GrokClient(Configuration["XAI_API_KEY"]!).AsIChatClient("grok-4-1-fast-reasoning");
100102

101103
var response = await grok.GetResponseAsync(
102-
"What is 3 + 4? Respond with just the number.",
104+
"What is 3 + 4? Respond with just the number, think about it really well.",
103105
new GrokChatOptions
104106
{
105107
UseEncryptedContent = true

0 commit comments

Comments
 (0)