Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/grok-imagine-video-adapter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
'@tanstack/ai-grok': minor
---

Add a `grokVideo` adapter for xAI's Imagine video models. `grok-imagine-video` (v1.0) supports text-to-video and image-to-video; `grok-imagine-video-1.5` is image-to-video only β€” a text-only prompt is rejected by the API, so the adapter fails fast with a clear error telling you to add a starting-frame image or use `grok-imagine-video`. Image-to-video starting frames are supplied as an `image` prompt part (public URL or base64 data source), with the text part describing the motion. Follows the experimental `generateVideo()` jobs/polling architecture: `createVideoJob` posts to `/v1/videos/generations`, status polling reads `/v1/videos/{request_id}`, and the completed result carries the hosted video URL plus usage (`unitsBilled` seconds and exact `cost` in USD). Sizing uses the aspect-ratio template consistent with the grok-imagine image models (`size: '16:9_720p'` β†’ `aspect_ratio` / `resolution`), and durations are 1–15 integer seconds.
72 changes: 70 additions & 2 deletions docs/adapters/grok.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,20 @@
title: Grok (xAI)
id: grok-adapter
order: 5
description: "Use xAI Grok Responses models with TanStack AI β€” Grok 4.3 and Grok Build 0.1 via @tanstack/ai-grok."
description: "Use xAI Grok models with TanStack AI β€” Grok 4.3, Grok Build 0.1, Grok Imagine image generation, and Grok Imagine video generation via @tanstack/ai-grok."
keywords:
- tanstack ai
- grok
- xai
- grok 4.3
- grok build
- image generation
- video generation
- grok imagine
- adapter
---

The Grok text and summarization adapters provide access to xAI's Responses API for `grok-4.3` and `grok-build-0.1`.
The Grok text and summarization adapters provide access to xAI's Responses API for `grok-4.3` and `grok-build-0.1`, plus Grok Imagine image generation and Grok Imagine video generation.

## Installation

Expand Down Expand Up @@ -203,6 +206,67 @@ reachable; use a `data` source for private images. `grok-2-image-1212` is
text-to-image only β€” image prompt parts are a compile-time type error and
throw at runtime.

## Video Generation (Experimental)

Generate short video clips (1–15 seconds, with audio) with the Grok Imagine video models via xAI's asynchronous jobs/polling API.

Available models:

- `grok-imagine-video` (v1.0) β€” text-to-video and image-to-video, $0.05 per second of video.
- `grok-imagine-video-1.5` β€” **image-to-video only**, $0.08 per second of video. A text-only prompt is rejected by the API; the adapter fails fast with a clear error telling you to add a starting-frame image or use `grok-imagine-video`.

Text-to-video with the base `grok-imagine-video` model:

```typescript
import { generateVideo, getVideoJobStatus } from "@tanstack/ai";
import { grokVideo } from "@tanstack/ai-grok";

const adapter = grokVideo("grok-imagine-video");

// 1. Create the job
const { jobId } = await generateVideo({
adapter,
prompt: "A red panda balancing on a bamboo stalk in the rain",
size: "16:9_720p", // "aspectRatio" or "aspectRatio_resolution"
duration: 5, // integer seconds, 1–15
});

// 2. Poll until complete, then read the video URL
let status = await getVideoJobStatus({ adapter, jobId });
while (status.status !== "completed" && status.status !== "failed") {
await new Promise((r) => setTimeout(r, 5000));
status = await getVideoJobStatus({ adapter, jobId });
}

console.log(status.url); // hosted .mp4 URL
```

For image-to-video (required for `grok-imagine-video-1.5`, optional for `grok-imagine-video`), include an `image` prompt part as the starting frame and describe the desired motion in the text part. URL sources are fetched by xAI's servers (so they must be publicly reachable); use a `data` source for a base64 starting frame:

```typescript
const { jobId } = await generateVideo({
adapter: grokVideo("grok-imagine-video-1.5"),
prompt: [
{
type: "text",
content: "Make the waterfall crash down and slowly pan out the camera",
},
{
type: "image",
source: { type: "url", value: "https://example.com/waterfall-still.png" },
},
],
size: "16:9_720p",
duration: 10,
});
```

Like the Grok Imagine image models, sizing is aspect-ratio based: the `size` option takes an `aspectRatio_resolution` template. Supported aspect ratios are `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `3:2`, and `2:3`; supported resolutions are `480p`, `720p`, and `1080p` (e.g. `"9:16_1080p"`). The resolution suffix is optional.

When the job completes, the adapter reports usage on the result: `usage.unitsBilled` carries the billed seconds of video and `usage.cost` the exact cost in USD, both as returned by the xAI API.

See [Video Generation](../media/video-generation) for the full jobs/polling flow, streaming mode, and the `useGenerateVideo` hook.

## Text-to-Speech

Generate speech with Grok TTS:
Expand Down Expand Up @@ -298,6 +362,10 @@ Creates a Grok summarization adapter with an explicit API key.

Creates a Grok image generation adapter.

### `grokVideo(model, config?)` / `createGrokVideo(model, apiKey, config?)`

Creates a Grok video generation adapter (experimental) for the Grok Imagine video models (`'grok-imagine-video'`, `'grok-imagine-video-1.5'`).

### `grokSpeech(model, config?)` / `createGrokSpeech(model, apiKey, config?)`

Creates a Grok text-to-speech adapter.
Expand Down
5 changes: 3 additions & 2 deletions docs/config.json
Original file line number Diff line number Diff line change
Expand Up @@ -262,7 +262,7 @@
"label": "Video Generation",
"to": "media/video-generation",
"addedAt": "2026-04-15",
"updatedAt": "2026-06-08"
"updatedAt": "2026-06-24"
},
{
"label": "Generation Hooks",
Expand Down Expand Up @@ -434,7 +434,8 @@
{
"label": "Grok (xAI)",
"to": "adapters/grok",
"addedAt": "2026-04-15"
"addedAt": "2026-04-15",
"updatedAt": "2026-06-24"
},
{
"label": "Groq",
Expand Down
59 changes: 58 additions & 1 deletion docs/media/video-generation.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,15 @@
title: Video Generation
id: video-generation
order: 6
description: "Generate video from text prompts with OpenAI Sora or Google Veo using TanStack AI's experimental generateVideo() jobs/polling API."
description: "Generate video from text prompts with OpenAI Sora, Google Veo, xAI Grok Imagine, or fal.ai using TanStack AI's experimental generateVideo() jobs/polling API."
keywords:
- tanstack ai
- video generation
- sora
- veo
- gemini
- grok imagine
- fal
- generateVideo
- jobs api
- experimental
Expand Down Expand Up @@ -39,6 +41,8 @@ TanStack AI provides experimental support for video generation through dedicated
Currently supported:
- **OpenAI**: Sora-2 and Sora-2-Pro models (when available)
- **Google Gemini**: Veo 3.1, Veo 3, and Veo 2 models (via the long-running operations API)
- **Grok (xAI)**: grok-imagine-video (text-to-video + image-to-video) and grok-imagine-video-1.5 (image-to-video only) models
- **fal.ai**: MiniMax, Luma, Kling, Hunyuan, and other hosted video models

## Basic Usage

Expand Down Expand Up @@ -552,6 +556,59 @@ Adapters that haven't declared a per-model duration map keep the plain
> Files API and requires your API key to download (send it as an
> `x-goog-api-key` header or `key` query parameter).

### Grok (xAI Imagine) Model Options

Based on the [xAI video generation API](https://docs.x.ai/docs/guides/video-generations). Two models are available: `grok-imagine-video` (v1.0) supports **text-to-video and image-to-video**, while `grok-imagine-video-1.5` is **image-to-video only** (a text-only prompt is rejected by the API; the adapter throws a clear error pointing you at `grok-imagine-video`). Both are aspect-ratio sized β€” the generic `size` option takes an `aspectRatio_resolution` template (like the Grok Imagine image models), and clips can be 1–15 seconds long.

Text-to-video with the base model:

```typescript
import { generateVideo } from '@tanstack/ai'
import { grokVideo } from '@tanstack/ai-grok'

const { jobId } = await generateVideo({
adapter: grokVideo('grok-imagine-video'),
prompt: 'A beautiful sunset over the ocean',
size: '16:9_720p', // aspect ratio: '1:1' | '16:9' | '9:16' | '4:3' | '3:4' | '3:2' | '2:3'
// resolution (optional suffix): '480p' | '720p' | '1080p'
duration: 5, // integer seconds, 1-15
modelOptions: {
aspect_ratio: '16:9', // Alternative way to specify the aspect ratio
resolution: '720p', // Alternative way to specify the resolution
duration: 5, // Alternative way to specify the duration
},
})
```

Image-to-video (required for `grok-imagine-video-1.5`) β€” include an `image` prompt part as the starting frame. URL sources are fetched by xAI's servers (so they must be publicly reachable); use a `data` source for a base64 starting frame:

```typescript
const { jobId } = await generateVideo({
adapter: grokVideo('grok-imagine-video-1.5'),
prompt: [
{ type: 'text', content: 'Slowly pan out as the waves roll in' },
{
type: 'image',
source: { type: 'url', value: 'https://example.com/still.png' },
},
],
size: '16:9_720p',
duration: 5,
})
```

Both models accept any whole second in the **1–15** range. A raw `duration` is coerced into that range rather than rejected β€” values are clamped to `[1, 15]` and rounded to the nearest second. Inspect or pre-snap the range the same way as Veo:

```typescript
const adapter = grokVideo('grok-imagine-video')

adapter.availableDurations() // { kind: 'range', min: 1, max: 15, step: 1, unit: 'seconds' }
adapter.snapDuration(2.5) // 3 β€” clamped/rounded into range
adapter.snapDuration(99) // 15
```

Generated clips include an audio track. When the job completes, the adapter reports `usage.unitsBilled` (billed seconds of video) and `usage.cost` (exact USD cost as returned by the API) on the result.

## Response Types

> **Note:** The interfaces below are the underlying adapter-level types. The `getVideoJobStatus()` helper returns a single merged object, `{ status, progress?, url?, error?, usage? }` β€” it does not return `jobId` or `expiresAt`.
Expand Down
4 changes: 4 additions & 0 deletions examples/ts-react-media/.env.example
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,7 @@ FAL_KEY=

# Get a Google API key at https://aistudio.google.com/apikey
GOOGLE_API_KEY=

# Get an xAI API key at https://console.x.ai β€” used by the "xAI Direct"
# Grok Imagine video models (the other Grok Imagine entries go through fal).
XAI_API_KEY=
1 change: 1 addition & 0 deletions examples/ts-react-media/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
"@tanstack/ai": "workspace:*",
"@tanstack/ai-fal": "workspace:*",
"@tanstack/ai-gemini": "workspace:*",
"@tanstack/ai-grok": "workspace:*",
"@tanstack/react-router": "^1.158.4",
"@tanstack/react-start": "^1.159.0",
"@tanstack/router-plugin": "^1.158.4",
Expand Down
8 changes: 8 additions & 0 deletions examples/ts-react-media/src/components/ImageGenerator.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ function getImageSrc(image: { url?: string; b64Json?: string }): string {

const falModels = IMAGE_MODELS.filter((m) => m.provider === 'fal')
const geminiModels = IMAGE_MODELS.filter((m) => m.provider === 'gemini')
const xaiModels = IMAGE_MODELS.filter((m) => m.provider === 'xai')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟑 Minor | ⚑ Quick win

Update the reference-image support copy to include xAI direct models.

The new xAI direct model group makes the helper text inaccurate (it currently implies only Gemini supports reference images), which can confuse users.

Suggested fix
-              Supported by Gemini multimodal models only
-              (gemini-3.1-flash-image-preview, gemini-3-pro-image-preview)
+              Supported by Gemini and xAI direct multimodal models
+              (gemini-3.1-flash-image-preview, gemini-3-pro-image-preview,
+              grok-imagine-image, grok-imagine-image-quality)

Also applies to: 165-171

πŸ€– Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/ts-react-media/src/components/ImageGenerator.tsx` at line 30, The
helper text or copy related to reference image support (located around lines
165-171 in the ImageGenerator component) currently only mentions Gemini as
supporting reference images, which is now inaccurate. Update the reference image
support text to explicitly mention that both Gemini and xAI direct models
support this feature, ensuring users understand all available options for models
that accept reference images.


export default function ImageGenerator({
onImageGenerated,
Expand Down Expand Up @@ -161,6 +162,13 @@ export default function ImageGenerator({
</option>
))}
</optgroup>
<optgroup label="xAI (direct)">
{xaiModels.map((model) => (
<option key={model.id} value={model.id}>
{model.name}
</option>
))}
</optgroup>
</select>
{currentModel && selectedModel !== 'all' && (
<p className="mt-1 text-xs text-gray-500">
Expand Down
46 changes: 35 additions & 11 deletions examples/ts-react-media/src/components/VideoGenerator.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ type JobState =
model: string
progress?: number | undefined
}
| { status: 'completed'; url: string; unitsBilled?: number }
| { status: 'completed'; url: string; unitsBilled?: number; cost?: number }
| { status: 'error'; message: string }

interface VideoGeneratorProps {
Expand All @@ -42,6 +42,8 @@ export default function VideoGenerator({
const pollingRefs = useRef<Map<string, NodeJS.Timeout>>(new Map())

const filteredModels = VIDEO_MODELS.filter((m) => m.mode === mode)
const falModels = filteredModels.filter((m) => m.provider === 'fal')
const xaiModels = filteredModels.filter((m) => m.provider === 'xai')

useEffect(() => {
if (initialImageUrl) {
Expand Down Expand Up @@ -97,6 +99,7 @@ export default function VideoGenerator({
status: 'completed',
url: url,
unitsBilled: urlResult.usage?.unitsBilled,
cost: urlResult.usage?.cost,
},
}))
} else if (status.status === 'processing') {
Expand Down Expand Up @@ -164,8 +167,11 @@ export default function VideoGenerator({
},
}))

// Poll keyed by the UI model id, not result.model: the direct-xAI
// entries share one adapter model ('grok-imagine-video-1.5'),
// so result.model wouldn't identify the card (or the adapter) uniquely.
const interval = setInterval(() => {
pollStatus(result.jobId, result.model)
pollStatus(result.jobId, modelId)
}, 4000)
pollingRefs.current.set(modelId, interval)
} catch (err) {
Expand Down Expand Up @@ -249,11 +255,20 @@ export default function VideoGenerator({
className="w-full px-4 py-3 bg-gray-800 border border-gray-700 rounded-lg text-white focus:outline-none focus:ring-2 focus:ring-blue-500 focus:border-transparent disabled:opacity-50"
>
<option value="all">All Models</option>
{filteredModels.map((model) => (
<option key={model.id} value={model.id}>
{model.name}
</option>
))}
<optgroup label="fal.ai">
{falModels.map((model) => (
<option key={model.id} value={model.id}>
{model.name}
</option>
))}
</optgroup>
<optgroup label="xAI (direct)">
{xaiModels.map((model) => (
<option key={model.id} value={model.id}>
{model.name}
</option>
))}
</optgroup>
</select>
</div>

Expand Down Expand Up @@ -406,12 +421,21 @@ export default function VideoGenerator({
className="w-full h-auto"
/>
</div>
{state.unitsBilled != null && (
{state.cost != null ? (
<p className="text-xs text-gray-500">
Billed {state.unitsBilled} fal unit
{state.unitsBilled === 1 ? '' : 's'} β€” multiply by the
endpoint unit price for USD cost
Billed ${state.cost.toFixed(3)}
{state.unitsBilled != null
? ` for ${state.unitsBilled} second${state.unitsBilled === 1 ? '' : 's'} of video`
: ''}
</p>
) : (
state.unitsBilled != null && (
<p className="text-xs text-gray-500">
Billed {state.unitsBilled} fal unit
{state.unitsBilled === 1 ? '' : 's'} β€” multiply by the
endpoint unit price for USD cost
</p>
)
)}
</>
)}
Expand Down
Loading
Loading