Skip to content

Commit f93ba7d

Browse files
tombeckenhamclaude
andcommitted
feat(ai-grok): video generation adapter for the grok-imagine video models
Adds a grokVideo adapter to @tanstack/ai-grok for xAI's Imagine video models (grok-imagine-video at $0.05/s, grok-imagine-video-1.5-preview at $0.08/s) using the experimental generateVideo() jobs/polling architecture: POST /v1/videos/generations to create, GET /v1/videos/{request_id} to poll, hosted mp4 URL plus usage (billed seconds + exact USD cost) on completion. Sizing follows the grok-imagine aspect-ratio template ('16:9_720p' → aspect_ratio/resolution); durations are 1-15 integer seconds; image-to-video starting frames go through modelOptions.image. The Imagine video endpoints are plain JSON (not in the OpenAI SDK), so the adapter issues direct requests with an injectable fetch seam. Closes #705. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
1 parent eddfbbd commit f93ba7d

10 files changed

Lines changed: 1334 additions & 9 deletions

File tree

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
'@tanstack/ai-grok': minor
3+
---
4+
5+
Add a `grokVideo` adapter for the grok-imagine video models (`grok-imagine-video`, `grok-imagine-video-1.5-preview`) via xAI's Imagine API. Follows the experimental `generateVideo()` jobs/polling architecture: `createVideoJob` posts to `/v1/videos/generations`, status polling reads `/v1/videos/{request_id}`, and the completed result carries the hosted video URL plus usage (`unitsBilled` seconds and exact `cost` in USD). Sizing uses the aspect-ratio template consistent with the grok-imagine image models (`size: '16:9_720p'``aspect_ratio` / `resolution`), durations are 1–15 integer seconds, and image-to-video starting frames are supplied as an `image` prompt part (public URL or base64 data source), consistent with the multimodal prompt convention used by the other image/video adapters.

docs/adapters/grok.md

Lines changed: 69 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,18 +2,20 @@
22
title: Grok (xAI)
33
id: grok-adapter
44
order: 5
5-
description: "Use xAI Grok models with TanStack AI — Grok 4.1, Grok 4, Grok 3, and Grok 2 Image generation via @tanstack/ai-grok."
5+
description: "Use xAI Grok models with TanStack AI — Grok 4.1, Grok 4, Grok 3, Grok 2 Image generation, and Grok Imagine video generation via @tanstack/ai-grok."
66
keywords:
77
- tanstack ai
88
- grok
99
- xai
1010
- grok 4
1111
- grok 4.1
1212
- image generation
13+
- video generation
14+
- grok imagine
1315
- adapter
1416
---
1517

16-
The Grok adapter provides access to xAI's Grok models, including Grok 4.1, Grok 4, Grok 3, and image generation with Grok 2 Image.
18+
The Grok adapter provides access to xAI's Grok models, including Grok 4.1, Grok 4, Grok 3, image generation with Grok 2 Image, and video generation with the Grok Imagine video models.
1719

1820
## Installation
1921

@@ -205,6 +207,67 @@ reachable; use a `data` source for private images. `grok-2-image-1212` is
205207
text-to-image only — image prompt parts are a compile-time type error and
206208
throw at runtime.
207209

210+
## Video Generation (Experimental)
211+
212+
Generate short video clips (1–15 seconds, with audio) with the Grok Imagine video models via xAI's asynchronous jobs/polling API:
213+
214+
```typescript
215+
import { generateVideo, getVideoJobStatus } from "@tanstack/ai";
216+
import { grokVideo } from "@tanstack/ai-grok";
217+
218+
const adapter = grokVideo("grok-imagine-video");
219+
220+
// 1. Create the job
221+
const { jobId } = await generateVideo({
222+
adapter,
223+
prompt: "A red panda balancing on a bamboo stalk in the rain",
224+
size: "16:9_720p", // "aspectRatio" or "aspectRatio_resolution"
225+
duration: 5, // integer seconds, 1–15
226+
});
227+
228+
// 2. Poll until complete, then read the video URL
229+
let status = await getVideoJobStatus({ adapter, jobId });
230+
while (status.status !== "completed" && status.status !== "failed") {
231+
await new Promise((r) => setTimeout(r, 5000));
232+
status = await getVideoJobStatus({ adapter, jobId });
233+
}
234+
235+
console.log(status.url); // hosted .mp4 URL
236+
```
237+
238+
Available models:
239+
240+
- `grok-imagine-video` — text-to-video and image-to-video, $0.05 per second of video
241+
- `grok-imagine-video-1.5-preview` — preview of the next model generation, $0.08 per second
242+
243+
Like the Grok Imagine image models, sizing is aspect-ratio based: the `size` option takes an `aspectRatio_resolution` template. Supported aspect ratios are `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `3:2`, and `2:3`; supported resolutions are `480p`, `720p`, and `1080p` (e.g. `"9:16_1080p"`). The resolution suffix is optional.
244+
245+
For image-to-video, include an `image` prompt part as the starting frame and
246+
describe the desired motion in the text part. URL sources are fetched by xAI's
247+
servers (so they must be publicly reachable); use a `data` source for a base64
248+
starting frame:
249+
250+
```typescript
251+
const { jobId } = await generateVideo({
252+
adapter: grokVideo("grok-imagine-video"),
253+
prompt: [
254+
{
255+
type: "text",
256+
content: "Make the waterfall crash down and slowly pan out the camera",
257+
},
258+
{
259+
type: "image",
260+
source: { type: "url", value: "https://example.com/waterfall-still.png" },
261+
},
262+
],
263+
duration: 10,
264+
});
265+
```
266+
267+
When the job completes, the adapter reports usage on the result: `usage.unitsBilled` carries the billed seconds of video and `usage.cost` the exact cost in USD, both as returned by the xAI API.
268+
269+
See [Video Generation](../media/video-generation) for the full jobs/polling flow, streaming mode, and the `useGenerateVideo` hook.
270+
208271
## Text-to-Speech
209272

210273
Generate speech with Grok TTS:
@@ -308,6 +371,10 @@ Creates a Grok summarization adapter with an explicit API key.
308371

309372
Creates a Grok image generation adapter.
310373

374+
### `grokVideo(model, config?)` / `createGrokVideo(model, apiKey, config?)`
375+
376+
Creates a Grok video generation adapter (experimental) for the Grok Imagine video models (`'grok-imagine-video'`, `'grok-imagine-video-1.5-preview'`).
377+
311378
### `grokSpeech(model, config?)` / `createGrokSpeech(model, apiKey, config?)`
312379

313380
Creates a Grok text-to-speech adapter.

docs/config.json

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -262,7 +262,7 @@
262262
"label": "Video Generation",
263263
"to": "media/video-generation",
264264
"addedAt": "2026-04-15",
265-
"updatedAt": "2026-06-08"
265+
"updatedAt": "2026-06-23"
266266
},
267267
{
268268
"label": "Generation Hooks",
@@ -434,7 +434,8 @@
434434
{
435435
"label": "Grok (xAI)",
436436
"to": "adapters/grok",
437-
"addedAt": "2026-04-15"
437+
"addedAt": "2026-04-15",
438+
"updatedAt": "2026-06-23"
438439
},
439440
{
440441
"label": "Groq",

docs/media/video-generation.md

Lines changed: 45 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,15 @@
22
title: Video Generation
33
id: video-generation
44
order: 6
5-
description: "Generate video from text prompts with OpenAI Sora or Google Veo using TanStack AI's experimental generateVideo() jobs/polling API."
5+
description: "Generate video from text prompts with OpenAI Sora, Google Veo, xAI Grok Imagine, or fal.ai using TanStack AI's experimental generateVideo() jobs/polling API."
66
keywords:
77
- tanstack ai
88
- video generation
99
- sora
1010
- veo
1111
- gemini
12+
- grok imagine
13+
- fal
1214
- generateVideo
1315
- jobs api
1416
- experimental
@@ -39,6 +41,8 @@ TanStack AI provides experimental support for video generation through dedicated
3941
Currently supported:
4042
- **OpenAI**: Sora-2 and Sora-2-Pro models (when available)
4143
- **Google Gemini**: Veo 3.1, Veo 3, and Veo 2 models (via the long-running operations API)
44+
- **Grok (xAI)**: grok-imagine-video and grok-imagine-video-1.5-preview models
45+
- **fal.ai**: MiniMax, Luma, Kling, Hunyuan, and other hosted video models
4246

4347
## Basic Usage
4448

@@ -552,6 +556,46 @@ Adapters that haven't declared a per-model duration map keep the plain
552556
> Files API and requires your API key to download (send it as an
553557
> `x-goog-api-key` header or `key` query parameter).
554558
559+
### Grok (xAI Imagine) Model Options
560+
561+
Based on the [xAI video generation API](https://docs.x.ai/docs/guides/video-generations). The Grok Imagine models are aspect-ratio sized — the generic `size` option takes an `aspectRatio_resolution` template (like the Grok Imagine image models), and clips can be 1–15 seconds long:
562+
563+
```typescript
564+
import { generateVideo } from '@tanstack/ai'
565+
import { grokVideo } from '@tanstack/ai-grok'
566+
567+
const { jobId } = await generateVideo({
568+
adapter: grokVideo('grok-imagine-video'),
569+
prompt: 'A beautiful sunset over the ocean',
570+
size: '16:9_720p', // aspect ratio: '1:1' | '16:9' | '9:16' | '4:3' | '3:4' | '3:2' | '2:3'
571+
// resolution (optional suffix): '480p' | '720p' | '1080p'
572+
duration: 5, // integer seconds, 1-15
573+
modelOptions: {
574+
aspect_ratio: '16:9', // Alternative way to specify the aspect ratio
575+
resolution: '720p', // Alternative way to specify the resolution
576+
duration: 5, // Alternative way to specify the duration
577+
},
578+
})
579+
```
580+
581+
For image-to-video, include an `image` prompt part as the starting frame and describe the desired motion in the text part. URL sources are fetched by xAI's servers (so they must be publicly reachable); use a `data` source for a base64 starting frame:
582+
583+
```typescript
584+
const { jobId } = await generateVideo({
585+
adapter: grokVideo('grok-imagine-video'),
586+
prompt: [
587+
{ type: 'text', content: 'Slowly pan out as the waves roll in' },
588+
{
589+
type: 'image',
590+
source: { type: 'url', value: 'https://example.com/still.png' },
591+
},
592+
],
593+
duration: 5,
594+
})
595+
```
596+
597+
Generated clips include an audio track. When the job completes, the adapter reports `usage.unitsBilled` (billed seconds of video) and `usage.cost` (exact USD cost as returned by the API) on the result.
598+
555599
## Response Types
556600

557601
> **Note:** The interfaces below are the underlying adapter-level types. The `getVideoJobStatus()` helper returns a single merged object, `{ status, progress?, url?, error?, usage? }` — it does not return `jobId` or `expiresAt`.

0 commit comments

Comments
 (0)