Commit 2e4c942
feat(ai-grok): audio, speech, and realtime adapters + example wiring (#506)
* feat(ai-grok): add audio and speech adapters for xAI
Add `grokSpeech` (TTS via /v1/tts), `grokTranscription` (STT via /v1/stt),
and `grokRealtime` + `grokRealtimeToken` (Voice Agent via /v1/realtime)
because xAI's standalone audio APIs were shipped publicly and the
adapter previously exposed only text/image/summarize. The TTS/STT
endpoints are not OpenAI-compatible so these adapters use direct fetch
rather than the OpenAI SDK; the realtime API mirrors OpenAI's shape with
URL/provider swaps. E2E coverage is wired via mock.mount('/v1/tts'...)
on aimock.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Merge from upstream
* feat(ai-grok): wire shared debug logger into audio and realtime adapters
Adopt the @tanstack/ai/adapter-internals logger across grokSpeech,
grokTranscription, grokRealtimeToken, and grokRealtime so users can toggle
debug output the same way they do on other adapters — `debug: true` for full
tracing, `debug: false` to silence, or a DebugConfig for per-category control
and a custom Logger. Replaces the remaining console.error / console.warn
calls in the realtime adapter with logger.errors so nothing is lost when
debugging is off.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* ci: apply automated fixes
* fix(ai-grok): correct super() arg order in audio adapters
The transcription and TTS adapters were calling super(config, model),
but BaseTranscriptionAdapter/BaseTTSAdapter expect (model, config),
causing TS2345 build errors.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(ai-grok): pass logger to audio adapter tests
After the logger was wired into the audio adapters, the unit tests
need to provide one when calling transcribe/generateSpeech directly
(activities normally inject it via resolveDebugOption).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(ai-grok): route audio adapter tests through core functions
Per project convention, tests should not invoke adapter methods
directly — they call generateSpeech()/generateTranscription() with
the adapter instance, so the core function injects logger, emits
events, and exercises the real public surface.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* ci: apply automated fixes
* fix(ai-grok): address cr-loop round 1 findings
ai-grok realtime adapter:
- cleanup pc/localStream/audioContext/dataChannel on connect() failure
- dataChannelReady rejects on error/close/ICE-failed/timeout
- RTCErrorEvent extracted properly instead of [object Event]
- onmessage parse errors emit to consumers
- input_audio_transcription no longer overrides caller on every update
- response.done preserves idle mode after stopAudioCapture
- setupOutputAudioAnalysis disposes prior audioElement, surfaces autoplay blocks
- audioContext.resume failures emit error instead of silent swallow
- currentMessageId reset on response.created (tool-only turns)
- pc.onconnectionstatechange / oniceconnectionstatechange emit status_change
- sendImage uses object image_url for OpenAI-realtime compatibility
- unknown server events logged via default branch
ai-grok TTS/STT:
- getContentType returns audio/L16 for pcm (valid IANA MIME)
- toAudioFile requires explicit audio_format for bare base64
- transcription option renamed format -> inverse_text_normalization
ai-grok realtime token:
- expires_at unit-safety guard (seconds vs ms)
ai-grok types:
- single source of truth for GrokRealtimeModel (model-meta)
ai-grok tests:
- cover aac/flac in pickCodec test
- normalize header assertions via Headers()
- add realtime-token unit-safety tests
examples/ts-react-chat:
- resolveModel fails loud via InvalidModelOverrideError (no silent fallback)
- audio/speech/transcribe routes return 400 with structured body
testing/e2e:
- media-providers uses valid grok-2-image-1212 model
- test-matrix imports from feature-support (dedupe)
* fix(ai-grok): address cr-loop round 2 confirmation findings
ai-grok realtime adapter:
- shared teardownConnection() helper runs on SDP and post-SDP failure paths, disposing input/output analysers, audio element, mic, data channel, pc, and audio context
- pre-open dataChannelReady rejection on failed/closed/disconnected pc states
- pc.onconnectionstatechange is sole source of status_change (ice handler only rejects)
- sendImage detects data: prefix (no more double-wrap)
ai-grok audio utils:
- malformed data: URI MIME parse throws instead of silently defaulting to audio/mpeg
- empty/missing base64 payload throws
- explicit audioFormat argument wins over URI-embedded MIME
ai-grok TTS:
- audio/L16 content-type includes required rate= parameter from modelOptions.sample_rate
ai-grok tests:
- realtime-token afterEach restores original XAI_API_KEY
- new coverage for malformed data URIs, audioFormat precedence, and rate= in audio/L16
examples/ts-react-chat:
- new UnknownProviderError typed class, 400 mapping in audio/speech/transcribe routes
- server-fns ServerFnError wraps typed adapter errors with stable code/details
* fix(ai-grok): address cr-loop round 3 confirmation findings
examples/ts-react-chat:
- generateSpeechFn/transcribeFn/generateSpeechStreamFn/transcribeStreamFn now wrap adapter construction with rethrowAudioAdapterError for consistent typed-error responses
- realtime image display guards against data:/http(s): double-wrap
ai-grok realtime adapter:
- teardownConnection drains pendingEvents; sendEvent logs and skips after teardown
ai-grok TTS:
- sample_rate always forwarded in output_format so body and contentType rate agree
* fix(ai-grok): address cr-loop round 4 confirmation findings
ai-grok realtime adapter:
- teardownConnection on getUserMedia failure (mic/pc/dataChannel leak on mic denial)
- response.function_call_arguments.done drops event if call_id absent (no item_id fallback)
- isTornDown set at top of teardown to guard handlers firing during close() awaits
- setupInputAudioAnalysis/setupOutputAudioAnalysis skip when torn down
- onconnectionstatechange no longer double-emits status_change during disconnect()
ai-grok audio utils:
- toAudioFile Blob/File branch prefers explicit audioFormat over Blob.type
ai-grok TTS:
- sample_rate forwarded only when caller provides one or codec is pcm (don't override server defaults for container codecs)
Tests updated to cover new audioFormat precedence paths and adjusted sample_rate assertions.
* fix(ai-grok): address cr-loop round 5 confirmation findings
ai-grok realtime adapter:
- pc.connectionState=failed triggers automatic teardownConnection (mic/pc/audioContext no longer leak on spontaneous failure)
- flushPendingEvents wraps send in try/catch; emits error on failure instead of hanging caller
- handleServerEvent case 'error' validates shape of event.error; preserves code/type/param; safe against null/missing fields
- autoplay and audioContext.resume failures log without emitting fatal error events (routine browser-policy outcomes)
- dataChannel.onerror/onclose gated behind isTornDown to suppress post-disconnect error events
examples/ts-react-chat:
- realtime.tsx handleImageUpload validates FileReader result, file.type, and base64 extraction; surfaces errors visibly
* fix(ai-grok): extensionFor maps mulaw/alaw MIME types to sensible filenames
utils/audio.ts produced 'audio.basic' and 'audio.x-alaw-basic' for mulaw/alaw
via the default-branch MIME split. Servers using filename as a format hint
now see 'audio.mulaw' / 'audio.alaw', matching the reverse toMimeType mapping.
* ci: apply automated fixes
* refactor(ai-grok): extract form/body builders, adopt ModelMeta convention, fix xAI realtime event names
Refactors from user review:
adapters:
- tts.ts: extract buildTTSRequestBody helper (codec/sample_rate/voice default
resolution + body assembly). Export getContentType for consumer use.
- transcription.ts: extract buildTranscriptionFormData helper (wire-field
mapping including xAI's named 'format' boolean toggle for inverse text
normalization).
model-meta.ts: audio and realtime models now use the same
`as const satisfies ModelMeta` convention as chat/image models
(GROK_TTS, GROK_STT, GROK_VOICE_FAST_1, GROK_VOICE_THINK_FAST_1) with
input/output modalities and tool_calling / reasoning capabilities.
realtime adapter:
- Replace drive-by 'as' casts on untyped server events with runtime-checked
readers (readString, readObject, readObjectArray); malformed frames return
undefined instead of throwing a TypeError.
- Accept both legacy OpenAI-realtime event names and current xAI voice-agent
names per docs.x.ai: response.output_audio.* / response.output_audio_transcript.* /
response.text.* (plus existing response.audio.* / response.audio_transcript.* /
response.output_text.* aliases for compatibility).
- RealtimeServerError type replaces repeated 'as Error & { code?: string }' casts.
realtime token:
- Wrap request body with { session: { model } } per xAI /v1/realtime/client_secrets
schema (was bare { model } before).
* test(ai-grok): cover realtime token body { session: { model } } shape
* ci: apply automated fixes
* refactor(ai-grok): drop @tanstack/ai-client peer dep by inlining realtime contract
The RealtimeAdapter / RealtimeConnection interfaces are duplicated locally
in src/realtime/realtime-contract.ts. The adapter imports them from there
instead of @tanstack/ai-client, so consumers of @tanstack/ai-grok no longer
have to install @tanstack/ai-client unless they also want to construct a
RealtimeClient from it (structural typing covers that use case).
@tanstack/ai-client stays as a devDependency to run a type-level drift check
(tests/realtime-contract.drift.test-d.ts) that asserts our inlined contract
is bidirectionally assignable to the canonical one. If ai-client ever changes
the interface, that file will fail to compile and we update both in lockstep.
publint --strict: clean.
* ci: apply automated fixes
* fix(ai-grok): address CodeRabbit PR review
- tts.ts / transcription.ts: spread `defaultHeaders` BEFORE Authorization /
Content-Type so a caller-supplied header can't silently clobber the bearer
token or auth content-type.
- utils/audio.ts: new `arrayBufferToBase64` helper — Buffer fast path on
Node, chunked btoa fallback everywhere else (browser, Workers, Bun). Replaces
the Node-only `Buffer.from(arrayBuffer).toString('base64')` in tts.ts.
- transcription.ts: new `GrokTranscriptionWord` interface extends the core
`TranscriptionWord` with optional `confidence` and `speaker`. The adapter
now preserves both fields when xAI returns them, so callers that narrow via
`as Array<GrokTranscriptionWord>` get the diarization output they asked
for. Test expectations updated.
- tts.ts: mulaw/alaw `contentType` now includes a `;rate=…` parameter (as
`audio/PCMU` / `audio/PCMA` per RFC 3551) when the caller requests a
non-default sample rate, instead of the 8 kHz-implying `audio/basic` /
`audio/x-alaw-basic`.
- realtime/adapter.ts: `conversation.item.truncated` flips mode back to
`listening` so the visualiser can't get stuck on `speaking` after an
interrupt. `sendEvent` wraps `dataChannel.send` in try/catch consistent
with `flushPendingEvents`. The shared `emptyFrequencyData` /
`emptyTimeDomainData` buffers are gone — `getAudioVisualization`
returns a fresh `Uint8Array` per call so consumers can't mutate a
module-level instance.
- realtime/token.ts: adds a 15s `AbortController` timeout on the
client_secrets request so a dead endpoint can't hang the caller forever.
Validates `client_secret.value` / `expires_at` shape at runtime before
dereferencing so a malformed response throws a descriptive error.
- realtime/realtime-contract.ts: JSDoc filename ref updated.
- examples/ts-react-chat audio/speech/transcribe routes: unify the 400
unknown_provider payload under the `provider` key (was `providerId`)
to match the invalid_model_override branch and the request body.
* ci: apply automated fixes
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: Alem Tuzlak <t.zlak@hotmail.com>1 parent af9eb7b commit 2e4c942
31 files changed
Lines changed: 3650 additions & 180 deletions
File tree
- .changeset
- examples/ts-react-chat/src
- lib
- routes
- packages/typescript/ai-grok
- src
- adapters
- audio
- realtime
- utils
- tests
- testing/e2e
- src/lib
- tests
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | | - | |
| 9 | + | |
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
| |||
55 | 55 | | |
56 | 56 | | |
57 | 57 | | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
58 | 71 | | |
59 | 72 | | |
60 | | - | |
| 73 | + | |
61 | 74 | | |
62 | 75 | | |
63 | 76 | | |
| |||
80 | 93 | | |
81 | 94 | | |
82 | 95 | | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
83 | 102 | | |
84 | 103 | | |
85 | 104 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| 11 | + | |
11 | 12 | | |
12 | 13 | | |
13 | 14 | | |
| |||
27 | 28 | | |
28 | 29 | | |
29 | 30 | | |
30 | | - | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
31 | 37 | | |
32 | 38 | | |
33 | 39 | | |
| |||
40 | 46 | | |
41 | 47 | | |
42 | 48 | | |
| 49 | + | |
| 50 | + | |
43 | 51 | | |
44 | 52 | | |
45 | 53 | | |
| |||
52 | 60 | | |
53 | 61 | | |
54 | 62 | | |
| 63 | + | |
| 64 | + | |
55 | 65 | | |
56 | 66 | | |
57 | 67 | | |
| |||
72 | 82 | | |
73 | 83 | | |
74 | 84 | | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
75 | 138 | | |
76 | 139 | | |
77 | 140 | | |
78 | 141 | | |
79 | 142 | | |
80 | | - | |
81 | | - | |
82 | | - | |
83 | | - | |
84 | | - | |
85 | | - | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
86 | 146 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
| 15 | + | |
| 16 | + | |
15 | 17 | | |
16 | 18 | | |
17 | 19 | | |
18 | 20 | | |
19 | 21 | | |
20 | | - | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
21 | 34 | | |
22 | | - | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
23 | 76 | | |
24 | 77 | | |
25 | 78 | | |
| |||
56 | 109 | | |
57 | 110 | | |
58 | 111 | | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
59 | 121 | | |
60 | | - | |
| 122 | + | |
61 | 123 | | |
62 | 124 | | |
63 | 125 | | |
| |||
73 | 135 | | |
74 | 136 | | |
75 | 137 | | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
76 | 148 | | |
77 | | - | |
| 149 | + | |
78 | 150 | | |
79 | 151 | | |
80 | 152 | | |
| |||
90 | 162 | | |
91 | 163 | | |
92 | 164 | | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
93 | 175 | | |
94 | | - | |
| 176 | + | |
95 | 177 | | |
96 | 178 | | |
97 | 179 | | |
| |||
195 | 277 | | |
196 | 278 | | |
197 | 279 | | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
198 | 289 | | |
199 | 290 | | |
200 | | - | |
| 291 | + | |
201 | 292 | | |
202 | 293 | | |
203 | 294 | | |
| |||
215 | 306 | | |
216 | 307 | | |
217 | 308 | | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
218 | 319 | | |
219 | 320 | | |
220 | | - | |
| 321 | + | |
221 | 322 | | |
222 | 323 | | |
223 | 324 | | |
| |||
0 commit comments