Skip to content

Commit 9ce99ea

Browse files
tombeckenhamclaude
andauthored
feat(ai-grok): video generation adapter for the grok-imagine video models (#742)
* feat(ai-grok): video generation adapter for grok-imagine video models Add a grokVideo adapter for xAI's Imagine video models via the experimental generateVideo() jobs/polling architecture (createVideoJob posts to /v1/videos/generations, polling reads /v1/videos/{request_id}), with hosted video URL plus usage (unitsBilled seconds + exact cost in USD). Two models: - grok-imagine-video (v1.0): text-to-video and image-to-video, $0.05/s. - grok-imagine-video-1.5: image-to-video only, $0.08/s. xAI's API rejects text-to-video on 1.5, so the adapter fails fast with a clear error telling the caller to add a starting-frame image or use grok-imagine-video. Image-to-video starting frames are supplied as an `image` prompt part (resolveMediaPrompt convention; public URL or base64 data source). Adds native xAI Direct grok image (grok-imagine-image / -quality) and video entries to the ts-react-media example, plus docs, changeset, and the media skill. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(ai-grok): add duration range options to grok video adapter Replace the throwing `validateVideoDuration` with the standard duration-options mechanism. Both grok-imagine video models declare a continuous 1–15 integer- second range via a `GROK_VIDEO_DURATIONS` table, and the adapter overrides `availableDurations()` / `snapDuration()` (backed by the shared `snapToDurationOption` helper) so consumers can discover and pre-snap durations. `createVideoJob` now snaps the requested duration into range (clamp + round) instead of rejecting it, and the snapped value is spread after `...modelOptions` so it is authoritative. Adds the per-model `GrokVideoModelDurationByName` generic, narrows the `createVideoJob` signature to carry the size/duration type params, exports the new helpers/type, and documents the range in the media docs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent a7bcd34 commit 9ce99ea

17 files changed

Lines changed: 1722 additions & 24 deletions

File tree

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
'@tanstack/ai-grok': minor
3+
---
4+
5+
Add a `grokVideo` adapter for xAI's Imagine video models. `grok-imagine-video` (v1.0) supports text-to-video and image-to-video; `grok-imagine-video-1.5` is image-to-video only — a text-only prompt is rejected by the API, so the adapter fails fast with a clear error telling you to add a starting-frame image or use `grok-imagine-video`. Image-to-video starting frames are supplied as an `image` prompt part (public URL or base64 data source), with the text part describing the motion. Follows the experimental `generateVideo()` jobs/polling architecture: `createVideoJob` posts to `/v1/videos/generations`, status polling reads `/v1/videos/{request_id}`, and the completed result carries the hosted video URL plus usage (`unitsBilled` seconds and exact `cost` in USD). Sizing uses the aspect-ratio template consistent with the grok-imagine image models (`size: '16:9_720p'``aspect_ratio` / `resolution`), and durations are 1–15 integer seconds.

docs/adapters/grok.md

Lines changed: 70 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,17 +2,20 @@
22
title: Grok (xAI)
33
id: grok-adapter
44
order: 5
5-
description: "Use xAI Grok Responses models with TanStack AI — Grok 4.3 and Grok Build 0.1 via @tanstack/ai-grok."
5+
description: "Use xAI Grok models with TanStack AI — Grok 4.3, Grok Build 0.1, Grok Imagine image generation, and Grok Imagine video generation via @tanstack/ai-grok."
66
keywords:
77
- tanstack ai
88
- grok
99
- xai
1010
- grok 4.3
1111
- grok build
12+
- image generation
13+
- video generation
14+
- grok imagine
1215
- adapter
1316
---
1417

15-
The Grok text and summarization adapters provide access to xAI's Responses API for `grok-4.3` and `grok-build-0.1`.
18+
The Grok text and summarization adapters provide access to xAI's Responses API for `grok-4.3` and `grok-build-0.1`, plus Grok Imagine image generation and Grok Imagine video generation.
1619

1720
## Installation
1821

@@ -229,6 +232,67 @@ reachable; use a `data` source for private images. `grok-2-image-1212` is
229232
text-to-image only — image prompt parts are a compile-time type error and
230233
throw at runtime.
231234

235+
## Video Generation (Experimental)
236+
237+
Generate short video clips (1–15 seconds, with audio) with the Grok Imagine video models via xAI's asynchronous jobs/polling API.
238+
239+
Available models:
240+
241+
- `grok-imagine-video` (v1.0) — text-to-video and image-to-video, $0.05 per second of video.
242+
- `grok-imagine-video-1.5`**image-to-video only**, $0.08 per second of video. A text-only prompt is rejected by the API; the adapter fails fast with a clear error telling you to add a starting-frame image or use `grok-imagine-video`.
243+
244+
Text-to-video with the base `grok-imagine-video` model:
245+
246+
```typescript
247+
import { generateVideo, getVideoJobStatus } from "@tanstack/ai";
248+
import { grokVideo } from "@tanstack/ai-grok";
249+
250+
const adapter = grokVideo("grok-imagine-video");
251+
252+
// 1. Create the job
253+
const { jobId } = await generateVideo({
254+
adapter,
255+
prompt: "A red panda balancing on a bamboo stalk in the rain",
256+
size: "16:9_720p", // "aspectRatio" or "aspectRatio_resolution"
257+
duration: 5, // integer seconds, 1–15
258+
});
259+
260+
// 2. Poll until complete, then read the video URL
261+
let status = await getVideoJobStatus({ adapter, jobId });
262+
while (status.status !== "completed" && status.status !== "failed") {
263+
await new Promise((r) => setTimeout(r, 5000));
264+
status = await getVideoJobStatus({ adapter, jobId });
265+
}
266+
267+
console.log(status.url); // hosted .mp4 URL
268+
```
269+
270+
For image-to-video (required for `grok-imagine-video-1.5`, optional for `grok-imagine-video`), include an `image` prompt part as the starting frame and describe the desired motion in the text part. URL sources are fetched by xAI's servers (so they must be publicly reachable); use a `data` source for a base64 starting frame:
271+
272+
```typescript
273+
const { jobId } = await generateVideo({
274+
adapter: grokVideo("grok-imagine-video-1.5"),
275+
prompt: [
276+
{
277+
type: "text",
278+
content: "Make the waterfall crash down and slowly pan out the camera",
279+
},
280+
{
281+
type: "image",
282+
source: { type: "url", value: "https://example.com/waterfall-still.png" },
283+
},
284+
],
285+
size: "16:9_720p",
286+
duration: 10,
287+
});
288+
```
289+
290+
Like the Grok Imagine image models, sizing is aspect-ratio based: the `size` option takes an `aspectRatio_resolution` template. Supported aspect ratios are `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `3:2`, and `2:3`; supported resolutions are `480p`, `720p`, and `1080p` (e.g. `"9:16_1080p"`). The resolution suffix is optional.
291+
292+
When the job completes, the adapter reports usage on the result: `usage.unitsBilled` carries the billed seconds of video and `usage.cost` the exact cost in USD, both as returned by the xAI API.
293+
294+
See [Video Generation](../media/video-generation) for the full jobs/polling flow, streaming mode, and the `useGenerateVideo` hook.
295+
232296
## Text-to-Speech
233297

234298
Generate speech with Grok TTS:
@@ -325,6 +389,10 @@ Creates a Grok summarization adapter with an explicit API key.
325389

326390
Creates a Grok image generation adapter.
327391

392+
### `grokVideo(model, config?)` / `createGrokVideo(model, apiKey, config?)`
393+
394+
Creates a Grok video generation adapter (experimental) for the Grok Imagine video models (`'grok-imagine-video'`, `'grok-imagine-video-1.5'`).
395+
328396
### `grokSpeech(model, config?)` / `createGrokSpeech(model, apiKey, config?)`
329397

330398
Creates a Grok text-to-speech adapter.

docs/config.json

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -262,7 +262,7 @@
262262
"label": "Video Generation",
263263
"to": "media/video-generation",
264264
"addedAt": "2026-04-15",
265-
"updatedAt": "2026-06-08"
265+
"updatedAt": "2026-06-24"
266266
},
267267
{
268268
"label": "Generation Hooks",
@@ -434,7 +434,8 @@
434434
{
435435
"label": "Grok (xAI)",
436436
"to": "adapters/grok",
437-
"addedAt": "2026-04-15"
437+
"addedAt": "2026-04-15",
438+
"updatedAt": "2026-06-24"
438439
},
439440
{
440441
"label": "Groq",

docs/media/video-generation.md

Lines changed: 58 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,15 @@
22
title: Video Generation
33
id: video-generation
44
order: 6
5-
description: "Generate video from text prompts with OpenAI Sora or Google Veo using TanStack AI's experimental generateVideo() jobs/polling API."
5+
description: "Generate video from text prompts with OpenAI Sora, Google Veo, xAI Grok Imagine, or fal.ai using TanStack AI's experimental generateVideo() jobs/polling API."
66
keywords:
77
- tanstack ai
88
- video generation
99
- sora
1010
- veo
1111
- gemini
12+
- grok imagine
13+
- fal
1214
- generateVideo
1315
- jobs api
1416
- experimental
@@ -39,6 +41,8 @@ TanStack AI provides experimental support for video generation through dedicated
3941
Currently supported:
4042
- **OpenAI**: Sora-2 and Sora-2-Pro models (when available)
4143
- **Google Gemini**: Veo 3.1, Veo 3, and Veo 2 models (via the long-running operations API)
44+
- **Grok (xAI)**: grok-imagine-video (text-to-video + image-to-video) and grok-imagine-video-1.5 (image-to-video only) models
45+
- **fal.ai**: MiniMax, Luma, Kling, Hunyuan, and other hosted video models
4246

4347
## Basic Usage
4448

@@ -567,6 +571,59 @@ Adapters that haven't declared a per-model duration map keep the plain
567571
> Files API and requires your API key to download (send it as an
568572
> `x-goog-api-key` header or `key` query parameter).
569573
574+
### Grok (xAI Imagine) Model Options
575+
576+
Based on the [xAI video generation API](https://docs.x.ai/docs/guides/video-generations). Two models are available: `grok-imagine-video` (v1.0) supports **text-to-video and image-to-video**, while `grok-imagine-video-1.5` is **image-to-video only** (a text-only prompt is rejected by the API; the adapter throws a clear error pointing you at `grok-imagine-video`). Both are aspect-ratio sized — the generic `size` option takes an `aspectRatio_resolution` template (like the Grok Imagine image models), and clips can be 1–15 seconds long.
577+
578+
Text-to-video with the base model:
579+
580+
```typescript
581+
import { generateVideo } from '@tanstack/ai'
582+
import { grokVideo } from '@tanstack/ai-grok'
583+
584+
const { jobId } = await generateVideo({
585+
adapter: grokVideo('grok-imagine-video'),
586+
prompt: 'A beautiful sunset over the ocean',
587+
size: '16:9_720p', // aspect ratio: '1:1' | '16:9' | '9:16' | '4:3' | '3:4' | '3:2' | '2:3'
588+
// resolution (optional suffix): '480p' | '720p' | '1080p'
589+
duration: 5, // integer seconds, 1-15
590+
modelOptions: {
591+
aspect_ratio: '16:9', // Alternative way to specify the aspect ratio
592+
resolution: '720p', // Alternative way to specify the resolution
593+
duration: 5, // Alternative way to specify the duration
594+
},
595+
})
596+
```
597+
598+
Image-to-video (required for `grok-imagine-video-1.5`) — include an `image` prompt part as the starting frame. URL sources are fetched by xAI's servers (so they must be publicly reachable); use a `data` source for a base64 starting frame:
599+
600+
```typescript
601+
const { jobId } = await generateVideo({
602+
adapter: grokVideo('grok-imagine-video-1.5'),
603+
prompt: [
604+
{ type: 'text', content: 'Slowly pan out as the waves roll in' },
605+
{
606+
type: 'image',
607+
source: { type: 'url', value: 'https://example.com/still.png' },
608+
},
609+
],
610+
size: '16:9_720p',
611+
duration: 5,
612+
})
613+
```
614+
615+
Both models accept any whole second in the **1–15** range. A raw `duration` is coerced into that range rather than rejected — values are clamped to `[1, 15]` and rounded to the nearest second. Inspect or pre-snap the range the same way as Veo:
616+
617+
```typescript
618+
const adapter = grokVideo('grok-imagine-video')
619+
620+
adapter.availableDurations() // { kind: 'range', min: 1, max: 15, step: 1, unit: 'seconds' }
621+
adapter.snapDuration(2.5) // 3 — clamped/rounded into range
622+
adapter.snapDuration(99) // 15
623+
```
624+
625+
Generated clips include an audio track. When the job completes, the adapter reports `usage.unitsBilled` (billed seconds of video) and `usage.cost` (exact USD cost as returned by the API) on the result.
626+
570627
## Response Types
571628

572629
> **Note:** The interfaces below are the underlying adapter-level types. The `getVideoJobStatus()` helper returns a single merged object, `{ status, progress?, url?, error?, usage? }` — it does not return `jobId` or `expiresAt`.

examples/ts-react-media/.env.example

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,3 +5,7 @@ FAL_KEY=
55

66
# Get a Google API key at https://aistudio.google.com/apikey
77
GOOGLE_API_KEY=
8+
9+
# Get an xAI API key at https://console.x.ai — used by the "xAI Direct"
10+
# Grok Imagine video models (the other Grok Imagine entries go through fal).
11+
XAI_API_KEY=

examples/ts-react-media/package.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
"@tanstack/ai": "workspace:*",
1515
"@tanstack/ai-fal": "workspace:*",
1616
"@tanstack/ai-gemini": "workspace:*",
17+
"@tanstack/ai-grok": "workspace:*",
1718
"@tanstack/react-router": "^1.158.4",
1819
"@tanstack/react-start": "^1.159.0",
1920
"@tanstack/router-plugin": "^1.158.4",

examples/ts-react-media/src/components/ImageGenerator.tsx

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ function getImageSrc(image: { url?: string; b64Json?: string }): string {
2727

2828
const falModels = IMAGE_MODELS.filter((m) => m.provider === 'fal')
2929
const geminiModels = IMAGE_MODELS.filter((m) => m.provider === 'gemini')
30+
const xaiModels = IMAGE_MODELS.filter((m) => m.provider === 'xai')
3031

3132
export default function ImageGenerator({
3233
onImageGenerated,
@@ -161,6 +162,13 @@ export default function ImageGenerator({
161162
</option>
162163
))}
163164
</optgroup>
165+
<optgroup label="xAI (direct)">
166+
{xaiModels.map((model) => (
167+
<option key={model.id} value={model.id}>
168+
{model.name}
169+
</option>
170+
))}
171+
</optgroup>
164172
</select>
165173
{currentModel && selectedModel !== 'all' && (
166174
<p className="mt-1 text-xs text-gray-500">

examples/ts-react-media/src/components/VideoGenerator.tsx

Lines changed: 35 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ type JobState =
2121
model: string
2222
progress?: number | undefined
2323
}
24-
| { status: 'completed'; url: string; unitsBilled?: number }
24+
| { status: 'completed'; url: string; unitsBilled?: number; cost?: number }
2525
| { status: 'error'; message: string }
2626

2727
interface VideoGeneratorProps {
@@ -42,6 +42,8 @@ export default function VideoGenerator({
4242
const pollingRefs = useRef<Map<string, NodeJS.Timeout>>(new Map())
4343

4444
const filteredModels = VIDEO_MODELS.filter((m) => m.mode === mode)
45+
const falModels = filteredModels.filter((m) => m.provider === 'fal')
46+
const xaiModels = filteredModels.filter((m) => m.provider === 'xai')
4547

4648
useEffect(() => {
4749
if (initialImageUrl) {
@@ -97,6 +99,7 @@ export default function VideoGenerator({
9799
status: 'completed',
98100
url: url,
99101
unitsBilled: urlResult.usage?.unitsBilled,
102+
cost: urlResult.usage?.cost,
100103
},
101104
}))
102105
} else if (status.status === 'processing') {
@@ -164,8 +167,11 @@ export default function VideoGenerator({
164167
},
165168
}))
166169

170+
// Poll keyed by the UI model id, not result.model: the direct-xAI
171+
// entries share one adapter model ('grok-imagine-video-1.5'),
172+
// so result.model wouldn't identify the card (or the adapter) uniquely.
167173
const interval = setInterval(() => {
168-
pollStatus(result.jobId, result.model)
174+
pollStatus(result.jobId, modelId)
169175
}, 4000)
170176
pollingRefs.current.set(modelId, interval)
171177
} catch (err) {
@@ -249,11 +255,20 @@ export default function VideoGenerator({
249255
className="w-full px-4 py-3 bg-gray-800 border border-gray-700 rounded-lg text-white focus:outline-none focus:ring-2 focus:ring-blue-500 focus:border-transparent disabled:opacity-50"
250256
>
251257
<option value="all">All Models</option>
252-
{filteredModels.map((model) => (
253-
<option key={model.id} value={model.id}>
254-
{model.name}
255-
</option>
256-
))}
258+
<optgroup label="fal.ai">
259+
{falModels.map((model) => (
260+
<option key={model.id} value={model.id}>
261+
{model.name}
262+
</option>
263+
))}
264+
</optgroup>
265+
<optgroup label="xAI (direct)">
266+
{xaiModels.map((model) => (
267+
<option key={model.id} value={model.id}>
268+
{model.name}
269+
</option>
270+
))}
271+
</optgroup>
257272
</select>
258273
</div>
259274

@@ -406,12 +421,21 @@ export default function VideoGenerator({
406421
className="w-full h-auto"
407422
/>
408423
</div>
409-
{state.unitsBilled != null && (
424+
{state.cost != null ? (
410425
<p className="text-xs text-gray-500">
411-
Billed {state.unitsBilled} fal unit
412-
{state.unitsBilled === 1 ? '' : 's'} — multiply by the
413-
endpoint unit price for USD cost
426+
Billed ${state.cost.toFixed(3)}
427+
{state.unitsBilled != null
428+
? ` for ${state.unitsBilled} second${state.unitsBilled === 1 ? '' : 's'} of video`
429+
: ''}
414430
</p>
431+
) : (
432+
state.unitsBilled != null && (
433+
<p className="text-xs text-gray-500">
434+
Billed {state.unitsBilled} fal unit
435+
{state.unitsBilled === 1 ? '' : 's'} — multiply by the
436+
endpoint unit price for USD cost
437+
</p>
438+
)
415439
)}
416440
</>
417441
)}

0 commit comments

Comments
 (0)