Skip to content

Commit 2e4c942

Browse files
tombeckenhamclaudeautofix-ci[bot]AlemTuzlak
authored
feat(ai-grok): audio, speech, and realtime adapters + example wiring (#506)
* feat(ai-grok): add audio and speech adapters for xAI Add `grokSpeech` (TTS via /v1/tts), `grokTranscription` (STT via /v1/stt), and `grokRealtime` + `grokRealtimeToken` (Voice Agent via /v1/realtime) because xAI's standalone audio APIs were shipped publicly and the adapter previously exposed only text/image/summarize. The TTS/STT endpoints are not OpenAI-compatible so these adapters use direct fetch rather than the OpenAI SDK; the realtime API mirrors OpenAI's shape with URL/provider swaps. E2E coverage is wired via mock.mount('/v1/tts'...) on aimock. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Merge from upstream * feat(ai-grok): wire shared debug logger into audio and realtime adapters Adopt the @tanstack/ai/adapter-internals logger across grokSpeech, grokTranscription, grokRealtimeToken, and grokRealtime so users can toggle debug output the same way they do on other adapters — `debug: true` for full tracing, `debug: false` to silence, or a DebugConfig for per-category control and a custom Logger. Replaces the remaining console.error / console.warn calls in the realtime adapter with logger.errors so nothing is lost when debugging is off. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: apply automated fixes * fix(ai-grok): correct super() arg order in audio adapters The transcription and TTS adapters were calling super(config, model), but BaseTranscriptionAdapter/BaseTTSAdapter expect (model, config), causing TS2345 build errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ai-grok): pass logger to audio adapter tests After the logger was wired into the audio adapters, the unit tests need to provide one when calling transcribe/generateSpeech directly (activities normally inject it via resolveDebugOption). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(ai-grok): route audio adapter tests through core functions Per project convention, tests should not invoke adapter methods directly — they call generateSpeech()/generateTranscription() with the adapter instance, so the core function injects logger, emits events, and exercises the real public surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: apply automated fixes * fix(ai-grok): address cr-loop round 1 findings ai-grok realtime adapter: - cleanup pc/localStream/audioContext/dataChannel on connect() failure - dataChannelReady rejects on error/close/ICE-failed/timeout - RTCErrorEvent extracted properly instead of [object Event] - onmessage parse errors emit to consumers - input_audio_transcription no longer overrides caller on every update - response.done preserves idle mode after stopAudioCapture - setupOutputAudioAnalysis disposes prior audioElement, surfaces autoplay blocks - audioContext.resume failures emit error instead of silent swallow - currentMessageId reset on response.created (tool-only turns) - pc.onconnectionstatechange / oniceconnectionstatechange emit status_change - sendImage uses object image_url for OpenAI-realtime compatibility - unknown server events logged via default branch ai-grok TTS/STT: - getContentType returns audio/L16 for pcm (valid IANA MIME) - toAudioFile requires explicit audio_format for bare base64 - transcription option renamed format -> inverse_text_normalization ai-grok realtime token: - expires_at unit-safety guard (seconds vs ms) ai-grok types: - single source of truth for GrokRealtimeModel (model-meta) ai-grok tests: - cover aac/flac in pickCodec test - normalize header assertions via Headers() - add realtime-token unit-safety tests examples/ts-react-chat: - resolveModel fails loud via InvalidModelOverrideError (no silent fallback) - audio/speech/transcribe routes return 400 with structured body testing/e2e: - media-providers uses valid grok-2-image-1212 model - test-matrix imports from feature-support (dedupe) * fix(ai-grok): address cr-loop round 2 confirmation findings ai-grok realtime adapter: - shared teardownConnection() helper runs on SDP and post-SDP failure paths, disposing input/output analysers, audio element, mic, data channel, pc, and audio context - pre-open dataChannelReady rejection on failed/closed/disconnected pc states - pc.onconnectionstatechange is sole source of status_change (ice handler only rejects) - sendImage detects data: prefix (no more double-wrap) ai-grok audio utils: - malformed data: URI MIME parse throws instead of silently defaulting to audio/mpeg - empty/missing base64 payload throws - explicit audioFormat argument wins over URI-embedded MIME ai-grok TTS: - audio/L16 content-type includes required rate= parameter from modelOptions.sample_rate ai-grok tests: - realtime-token afterEach restores original XAI_API_KEY - new coverage for malformed data URIs, audioFormat precedence, and rate= in audio/L16 examples/ts-react-chat: - new UnknownProviderError typed class, 400 mapping in audio/speech/transcribe routes - server-fns ServerFnError wraps typed adapter errors with stable code/details * fix(ai-grok): address cr-loop round 3 confirmation findings examples/ts-react-chat: - generateSpeechFn/transcribeFn/generateSpeechStreamFn/transcribeStreamFn now wrap adapter construction with rethrowAudioAdapterError for consistent typed-error responses - realtime image display guards against data:/http(s): double-wrap ai-grok realtime adapter: - teardownConnection drains pendingEvents; sendEvent logs and skips after teardown ai-grok TTS: - sample_rate always forwarded in output_format so body and contentType rate agree * fix(ai-grok): address cr-loop round 4 confirmation findings ai-grok realtime adapter: - teardownConnection on getUserMedia failure (mic/pc/dataChannel leak on mic denial) - response.function_call_arguments.done drops event if call_id absent (no item_id fallback) - isTornDown set at top of teardown to guard handlers firing during close() awaits - setupInputAudioAnalysis/setupOutputAudioAnalysis skip when torn down - onconnectionstatechange no longer double-emits status_change during disconnect() ai-grok audio utils: - toAudioFile Blob/File branch prefers explicit audioFormat over Blob.type ai-grok TTS: - sample_rate forwarded only when caller provides one or codec is pcm (don't override server defaults for container codecs) Tests updated to cover new audioFormat precedence paths and adjusted sample_rate assertions. * fix(ai-grok): address cr-loop round 5 confirmation findings ai-grok realtime adapter: - pc.connectionState=failed triggers automatic teardownConnection (mic/pc/audioContext no longer leak on spontaneous failure) - flushPendingEvents wraps send in try/catch; emits error on failure instead of hanging caller - handleServerEvent case 'error' validates shape of event.error; preserves code/type/param; safe against null/missing fields - autoplay and audioContext.resume failures log without emitting fatal error events (routine browser-policy outcomes) - dataChannel.onerror/onclose gated behind isTornDown to suppress post-disconnect error events examples/ts-react-chat: - realtime.tsx handleImageUpload validates FileReader result, file.type, and base64 extraction; surfaces errors visibly * fix(ai-grok): extensionFor maps mulaw/alaw MIME types to sensible filenames utils/audio.ts produced 'audio.basic' and 'audio.x-alaw-basic' for mulaw/alaw via the default-branch MIME split. Servers using filename as a format hint now see 'audio.mulaw' / 'audio.alaw', matching the reverse toMimeType mapping. * ci: apply automated fixes * refactor(ai-grok): extract form/body builders, adopt ModelMeta convention, fix xAI realtime event names Refactors from user review: adapters: - tts.ts: extract buildTTSRequestBody helper (codec/sample_rate/voice default resolution + body assembly). Export getContentType for consumer use. - transcription.ts: extract buildTranscriptionFormData helper (wire-field mapping including xAI's named 'format' boolean toggle for inverse text normalization). model-meta.ts: audio and realtime models now use the same `as const satisfies ModelMeta` convention as chat/image models (GROK_TTS, GROK_STT, GROK_VOICE_FAST_1, GROK_VOICE_THINK_FAST_1) with input/output modalities and tool_calling / reasoning capabilities. realtime adapter: - Replace drive-by 'as' casts on untyped server events with runtime-checked readers (readString, readObject, readObjectArray); malformed frames return undefined instead of throwing a TypeError. - Accept both legacy OpenAI-realtime event names and current xAI voice-agent names per docs.x.ai: response.output_audio.* / response.output_audio_transcript.* / response.text.* (plus existing response.audio.* / response.audio_transcript.* / response.output_text.* aliases for compatibility). - RealtimeServerError type replaces repeated 'as Error & { code?: string }' casts. realtime token: - Wrap request body with { session: { model } } per xAI /v1/realtime/client_secrets schema (was bare { model } before). * test(ai-grok): cover realtime token body { session: { model } } shape * ci: apply automated fixes * refactor(ai-grok): drop @tanstack/ai-client peer dep by inlining realtime contract The RealtimeAdapter / RealtimeConnection interfaces are duplicated locally in src/realtime/realtime-contract.ts. The adapter imports them from there instead of @tanstack/ai-client, so consumers of @tanstack/ai-grok no longer have to install @tanstack/ai-client unless they also want to construct a RealtimeClient from it (structural typing covers that use case). @tanstack/ai-client stays as a devDependency to run a type-level drift check (tests/realtime-contract.drift.test-d.ts) that asserts our inlined contract is bidirectionally assignable to the canonical one. If ai-client ever changes the interface, that file will fail to compile and we update both in lockstep. publint --strict: clean. * ci: apply automated fixes * fix(ai-grok): address CodeRabbit PR review - tts.ts / transcription.ts: spread `defaultHeaders` BEFORE Authorization / Content-Type so a caller-supplied header can't silently clobber the bearer token or auth content-type. - utils/audio.ts: new `arrayBufferToBase64` helper — Buffer fast path on Node, chunked btoa fallback everywhere else (browser, Workers, Bun). Replaces the Node-only `Buffer.from(arrayBuffer).toString('base64')` in tts.ts. - transcription.ts: new `GrokTranscriptionWord` interface extends the core `TranscriptionWord` with optional `confidence` and `speaker`. The adapter now preserves both fields when xAI returns them, so callers that narrow via `as Array<GrokTranscriptionWord>` get the diarization output they asked for. Test expectations updated. - tts.ts: mulaw/alaw `contentType` now includes a `;rate=…` parameter (as `audio/PCMU` / `audio/PCMA` per RFC 3551) when the caller requests a non-default sample rate, instead of the 8 kHz-implying `audio/basic` / `audio/x-alaw-basic`. - realtime/adapter.ts: `conversation.item.truncated` flips mode back to `listening` so the visualiser can't get stuck on `speaking` after an interrupt. `sendEvent` wraps `dataChannel.send` in try/catch consistent with `flushPendingEvents`. The shared `emptyFrequencyData` / `emptyTimeDomainData` buffers are gone — `getAudioVisualization` returns a fresh `Uint8Array` per call so consumers can't mutate a module-level instance. - realtime/token.ts: adds a 15s `AbortController` timeout on the client_secrets request so a dead endpoint can't hang the caller forever. Validates `client_secret.value` / `expires_at` shape at runtime before dereferencing so a malformed response throws a descriptive error. - realtime/realtime-contract.ts: JSDoc filename ref updated. - examples/ts-react-chat audio/speech/transcribe routes: unify the 400 unknown_provider payload under the `provider` key (was `providerId`) to match the invalid_model_override branch and the request body. * ci: apply automated fixes --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Alem Tuzlak <t.zlak@hotmail.com>
1 parent af9eb7b commit 2e4c942

31 files changed

Lines changed: 3650 additions & 180 deletions
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
---
2+
'@tanstack/ai-grok': minor
3+
---
4+
5+
feat(ai-grok): add audio and speech adapters for xAI
6+
7+
Add three new tree-shakeable adapters that wrap xAI's audio APIs:
8+
9+
- `grokSpeech` / `createGrokSpeech` — text-to-speech via `POST /v1/tts`. Supports the 5 xAI voices (`eve`, `ara`, `rex`, `sal`, `leo`), MP3/WAV/PCM/μ-law/A-law codecs, and the `language`, `sample_rate`, `bit_rate`, `optimize_streaming_latency`, `text_normalization` provider options.
10+
- `grokTranscription` / `createGrokTranscription` — speech-to-text via `POST /v1/stt`. Passes through `language`, `diarize`, `multichannel`, `channels`, `audio_format`, and `sample_rate`; maps xAI's word-level timestamps to `TranscriptionResult.words`.
11+
- `grokRealtime` / `grokRealtimeToken` — Voice Agent (realtime) adapter for `wss://api.x.ai/v1/realtime` with ephemeral tokens via `/v1/realtime/client_secrets`. Supports the `grok-voice-fast-1.0` and `grok-voice-think-fast-1.0` models.
12+
13+
New model identifier exports: `GROK_TTS_MODELS`, `GROK_TRANSCRIPTION_MODELS`, `GROK_REALTIME_MODELS` and their corresponding types.

examples/ts-react-chat/src/lib/audio-providers.ts

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
* and audio generation flows.
77
*/
88

9-
export type SpeechProviderId = 'openai' | 'gemini' | 'fal'
9+
export type SpeechProviderId = 'openai' | 'gemini' | 'fal' | 'grok'
1010

1111
export interface SpeechProviderConfig {
1212
id: SpeechProviderId
@@ -55,9 +55,22 @@ export const SPEECH_PROVIDERS: ReadonlyArray<SpeechProviderConfig> = [
5555
],
5656
placeholder: 'Enter text to synthesize with Fal Kokoro…',
5757
},
58+
{
59+
id: 'grok',
60+
label: 'Grok TTS',
61+
model: 'grok-tts',
62+
voices: [
63+
{ id: 'eve', label: 'Eve' },
64+
{ id: 'ara', label: 'Ara' },
65+
{ id: 'rex', label: 'Rex' },
66+
{ id: 'sal', label: 'Sal' },
67+
{ id: 'leo', label: 'Leo' },
68+
],
69+
placeholder: 'Enter text for Grok speech…',
70+
},
5871
]
5972

60-
export type TranscriptionProviderId = 'openai' | 'fal'
73+
export type TranscriptionProviderId = 'openai' | 'fal' | 'grok'
6174

6275
export interface TranscriptionProviderConfig {
6376
id: TranscriptionProviderId
@@ -80,6 +93,12 @@ export const TRANSCRIPTION_PROVIDERS: ReadonlyArray<TranscriptionProviderConfig>
8093
model: 'fal-ai/whisper',
8194
description: 'Fal-hosted Whisper with word-level timestamps.',
8295
},
96+
{
97+
id: 'grok',
98+
label: 'Grok STT',
99+
model: 'grok-stt',
100+
description: 'xAI speech-to-text with word-level timestamps.',
101+
},
83102
]
84103

85104
export type AudioProviderId = 'gemini-lyria' | 'fal-audio' | 'fal-sfx'

examples/ts-react-chat/src/lib/server-audio-adapters.ts

Lines changed: 67 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
import { openaiSpeech, openaiTranscription } from '@tanstack/ai-openai'
99
import { geminiAudio, geminiSpeech } from '@tanstack/ai-gemini'
1010
import { falAudio, falSpeech, falTranscription } from '@tanstack/ai-fal'
11+
import { grokSpeech, grokTranscription } from '@tanstack/ai-grok'
1112
import type {
1213
AnyAudioAdapter,
1314
AnyTranscriptionAdapter,
@@ -27,7 +28,12 @@ function findConfig<T extends { id: string }>(
2728
id: string,
2829
): T {
2930
const match = list.find((entry) => entry.id === id)
30-
if (!match) throw new Error(`Unknown provider: ${id}`)
31+
if (!match) {
32+
throw new UnknownProviderError(
33+
id,
34+
list.map((entry) => entry.id),
35+
)
36+
}
3137
return match
3238
}
3339

@@ -40,6 +46,8 @@ export function buildSpeechAdapter(provider: SpeechProviderId): AnyTTSAdapter {
4046
return geminiSpeech(config.model as 'gemini-2.5-flash-preview-tts')
4147
case 'fal':
4248
return falSpeech(config.model)
49+
case 'grok':
50+
return grokSpeech(config.model as 'grok-tts')
4351
}
4452
}
4553

@@ -52,6 +60,8 @@ export function buildTranscriptionAdapter(
5260
return openaiTranscription(config.model as 'whisper-1')
5361
case 'fal':
5462
return falTranscription(config.model)
63+
case 'grok':
64+
return grokTranscription(config.model as 'grok-stt')
5565
}
5666
}
5767

@@ -72,15 +82,65 @@ export function buildAudioAdapter(
7282
}
7383
}
7484

85+
/**
86+
* Thrown when a caller supplies a `modelOverride` that is not present in the
87+
* provider's allowed model list. HTTP routes map this to a 400 response so the
88+
* user sees a clear rejection instead of silently getting output from the
89+
* default model.
90+
*/
91+
export class InvalidModelOverrideError extends Error {
92+
readonly code = 'invalid_model_override' as const
93+
readonly providerId: string
94+
readonly requestedModel: string
95+
readonly allowedModels: ReadonlyArray<string>
96+
97+
constructor(
98+
providerId: string,
99+
requestedModel: string,
100+
allowedModels: ReadonlyArray<string>,
101+
) {
102+
super(
103+
`Invalid model override "${requestedModel}" for provider "${providerId}". Allowed models: ${
104+
allowedModels.length > 0 ? allowedModels.join(', ') : '(none)'
105+
}`,
106+
)
107+
this.name = 'InvalidModelOverrideError'
108+
this.providerId = providerId
109+
this.requestedModel = requestedModel
110+
this.allowedModels = allowedModels
111+
}
112+
}
113+
114+
/**
115+
* Thrown when `findConfig` is called with a provider id that isn't in the
116+
* allowed list. In practice the route-level Zod enum schema already rejects
117+
* unknown providers before we ever reach this builder, so this is
118+
* defense-in-depth for callers that bypass Zod validation (e.g. server-fns
119+
* whose input schemas could drift from the provider registries).
120+
*/
121+
export class UnknownProviderError extends Error {
122+
readonly code = 'unknown_provider' as const
123+
readonly providerId: string
124+
readonly allowedProviders: ReadonlyArray<string>
125+
126+
constructor(providerId: string, allowedProviders: ReadonlyArray<string>) {
127+
super(
128+
`Unknown provider "${providerId}". Allowed providers: ${
129+
allowedProviders.length > 0 ? allowedProviders.join(', ') : '(none)'
130+
}`,
131+
)
132+
this.name = 'UnknownProviderError'
133+
this.providerId = providerId
134+
this.allowedProviders = allowedProviders
135+
}
136+
}
137+
75138
function resolveModel(
76139
config: (typeof AUDIO_PROVIDERS)[number],
77140
modelOverride: string | undefined,
78141
): string {
79142
if (!modelOverride) return config.model
80-
const allowed = config.models?.some((m) => m.id === modelOverride)
81-
if (allowed) return modelOverride
82-
console.warn(
83-
`[audio] rejected model override "${modelOverride}" for provider "${config.id}"; falling back to "${config.model}"`,
84-
)
85-
return config.model
143+
const allowedModels = config.models?.map((m) => m.id) ?? []
144+
if (allowedModels.includes(modelOverride)) return modelOverride
145+
throw new InvalidModelOverrideError(config.id, modelOverride, allowedModels)
86146
}

examples/ts-react-chat/src/lib/server-fns.ts

Lines changed: 108 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -12,14 +12,67 @@ import {
1212
} from '@tanstack/ai'
1313
import { openaiImage, openaiSummarize, openaiVideo } from '@tanstack/ai-openai'
1414
import {
15+
InvalidModelOverrideError,
16+
UnknownProviderError,
1517
buildAudioAdapter,
1618
buildSpeechAdapter,
1719
buildTranscriptionAdapter,
1820
} from './server-audio-adapters'
1921

20-
const SPEECH_PROVIDER_SCHEMA = z.enum(['openai', 'gemini', 'fal']).optional()
22+
/**
23+
* Server-fn error with a stable `code` property clients can switch on.
24+
*
25+
* TanStack Start's `createServerFn` surfaces thrown errors as a generic 500
26+
* without a structured payload. We can't influence the status code from here,
27+
* so we attach a `code` field the client can read to distinguish well-known
28+
* failure modes (invalid_model_override, unknown_provider) from truly
29+
* unexpected errors.
30+
*/
31+
class ServerFnError extends Error {
32+
readonly code: string
33+
readonly details?: Record<string, unknown>
2134

22-
const TRANSCRIPTION_PROVIDER_SCHEMA = z.enum(['openai', 'fal']).optional()
35+
constructor(
36+
code: string,
37+
message: string,
38+
details?: Record<string, unknown>,
39+
) {
40+
super(message)
41+
this.name = 'ServerFnError'
42+
this.code = code
43+
this.details = details
44+
}
45+
}
46+
47+
/**
48+
* Translate the typed audio-adapter errors into a `ServerFnError` with a stable
49+
* `code`. Any other error is re-thrown untouched so the framework's default
50+
* 500 path handles it.
51+
*/
52+
function rethrowAudioAdapterError(err: unknown): never {
53+
if (err instanceof InvalidModelOverrideError) {
54+
throw new ServerFnError('invalid_model_override', err.message, {
55+
providerId: err.providerId,
56+
requestedModel: err.requestedModel,
57+
allowedModels: err.allowedModels,
58+
})
59+
}
60+
if (err instanceof UnknownProviderError) {
61+
throw new ServerFnError('unknown_provider', err.message, {
62+
providerId: err.providerId,
63+
allowedProviders: err.allowedProviders,
64+
})
65+
}
66+
throw err
67+
}
68+
69+
const SPEECH_PROVIDER_SCHEMA = z
70+
.enum(['openai', 'gemini', 'fal', 'grok'])
71+
.optional()
72+
73+
const TRANSCRIPTION_PROVIDER_SCHEMA = z
74+
.enum(['openai', 'fal', 'grok'])
75+
.optional()
2376

2477
const AUDIO_PROVIDER_SCHEMA = z
2578
.enum(['gemini-lyria', 'fal-audio', 'fal-sfx'])
@@ -56,8 +109,17 @@ export const generateSpeechFn = createServerFn({ method: 'POST' })
56109
}),
57110
)
58111
.handler(async ({ data }) => {
112+
// `buildSpeechAdapter` can throw `UnknownProviderError` (defense-in-depth;
113+
// Zod should catch this first). Translate into a `ServerFnError` so
114+
// clients can distinguish it from a generic failure via the stable `code`.
115+
let adapter
116+
try {
117+
adapter = buildSpeechAdapter(data.provider ?? 'openai')
118+
} catch (err) {
119+
rethrowAudioAdapterError(err)
120+
}
59121
return generateSpeech({
60-
adapter: buildSpeechAdapter(data.provider ?? 'openai'),
122+
adapter,
61123
text: data.text,
62124
voice: data.voice,
63125
format: data.format,
@@ -73,8 +135,18 @@ export const transcribeFn = createServerFn({ method: 'POST' })
73135
}),
74136
)
75137
.handler(async ({ data }) => {
138+
// `buildTranscriptionAdapter` can throw `UnknownProviderError`
139+
// (defense-in-depth; Zod should catch this first). Translate into a
140+
// `ServerFnError` so clients can distinguish it from a generic failure
141+
// via the stable `code`.
142+
let adapter
143+
try {
144+
adapter = buildTranscriptionAdapter(data.provider ?? 'openai')
145+
} catch (err) {
146+
rethrowAudioAdapterError(err)
147+
}
76148
return generateTranscription({
77-
adapter: buildTranscriptionAdapter(data.provider ?? 'openai'),
149+
adapter,
78150
audio: data.audio,
79151
language: data.language,
80152
})
@@ -90,8 +162,18 @@ export const generateAudioFn = createServerFn({ method: 'POST' })
90162
}),
91163
)
92164
.handler(async ({ data }) => {
165+
// `buildAudioAdapter` can throw `InvalidModelOverrideError` (unknown
166+
// model id) or `UnknownProviderError` (defense-in-depth; Zod should
167+
// catch this first). Translate both into a `ServerFnError` so clients
168+
// can distinguish them from a generic failure via the stable `code`.
169+
let adapter
170+
try {
171+
adapter = buildAudioAdapter(data.provider ?? 'gemini-lyria', data.model)
172+
} catch (err) {
173+
rethrowAudioAdapterError(err)
174+
}
93175
return generateAudio({
94-
adapter: buildAudioAdapter(data.provider ?? 'gemini-lyria', data.model),
176+
adapter,
95177
prompt: data.prompt,
96178
duration: data.duration,
97179
})
@@ -195,9 +277,18 @@ export const generateSpeechStreamFn = createServerFn({ method: 'POST' })
195277
}),
196278
)
197279
.handler(({ data }) => {
280+
// `buildSpeechAdapter` can throw `UnknownProviderError` (defense-in-depth;
281+
// Zod should catch this first). Translate into a `ServerFnError` so
282+
// clients can distinguish it from a generic failure via the stable `code`.
283+
let adapter
284+
try {
285+
adapter = buildSpeechAdapter(data.provider ?? 'openai')
286+
} catch (err) {
287+
rethrowAudioAdapterError(err)
288+
}
198289
return toServerSentEventsResponse(
199290
generateSpeech({
200-
adapter: buildSpeechAdapter(data.provider ?? 'openai'),
291+
adapter,
201292
text: data.text,
202293
voice: data.voice,
203294
format: data.format,
@@ -215,9 +306,19 @@ export const transcribeStreamFn = createServerFn({ method: 'POST' })
215306
}),
216307
)
217308
.handler(({ data }) => {
309+
// `buildTranscriptionAdapter` can throw `UnknownProviderError`
310+
// (defense-in-depth; Zod should catch this first). Translate into a
311+
// `ServerFnError` so clients can distinguish it from a generic failure
312+
// via the stable `code`.
313+
let adapter
314+
try {
315+
adapter = buildTranscriptionAdapter(data.provider ?? 'openai')
316+
} catch (err) {
317+
rethrowAudioAdapterError(err)
318+
}
218319
return toServerSentEventsResponse(
219320
generateTranscription({
220-
adapter: buildTranscriptionAdapter(data.provider ?? 'openai'),
321+
adapter,
221322
audio: data.audio,
222323
language: data.language,
223324
stream: true,

0 commit comments

Comments
 (0)