You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(ai-grok): grok-imagine-video-1.5 is image-to-video only
Confirmed against the live xAI API: grok-imagine-video-1.5 rejects
text-to-video ("Text-to-video is not supported for this model") and only
generates from a starting frame. createVideoJob now requires exactly one
image prompt part and throws a clear error otherwise; model-meta, provider
options, docs, changeset, and the media skill describe it as image-to-video
only. The ts-react-media example drops the (non-working) text-to-video entry
and keeps the image-to-video one.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: .changeset/grok-imagine-video-adapter.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,4 +2,4 @@
2
2
'@tanstack/ai-grok': minor
3
3
---
4
4
5
-
Add a `grokVideo` adapter for the `grok-imagine-video-1.5` model via xAI's Imagine API. Follows the experimental `generateVideo()` jobs/polling architecture: `createVideoJob` posts to `/v1/videos/generations`, status polling reads `/v1/videos/{request_id}`, and the completed result carries the hosted video URL plus usage (`unitsBilled` seconds and exact `cost` in USD). Sizing uses the aspect-ratio template consistent with the grok-imagine image models (`size: '16:9_720p'` → `aspect_ratio` / `resolution`), durations are 1–15 integer seconds, and image-to-video starting frames are supplied as an `image` prompt part (public URL or base64 data source), consistent with the multimodal prompt convention used by the other image/video adapters.
5
+
Add a `grokVideo` adapter for the `grok-imagine-video-1.5` model via xAI's Imagine API. The model is image-to-video only: every request needs exactly one `image` prompt part as the starting frame (public URL or base64 data source), with the text part describing the motion. Follows the experimental `generateVideo()` jobs/polling architecture: `createVideoJob` posts to `/v1/videos/generations`, status polling reads `/v1/videos/{request_id}`, and the completed result carries the hosted video URL plus usage (`unitsBilled` seconds and exact `cost` in USD). Sizing uses the aspect-ratio template consistent with the grok-imagine image models (`size: '16:9_720p'` → `aspect_ratio` / `resolution`), and durations are 1–15 integer seconds.
-`grok-imagine-video-1.5` — text-to-video and image-to-video, $0.08 per second of video. Per xAI's docs a starting image is optional for text-to-video and required for image-to-video.
254
+
-`grok-imagine-video-1.5` — image-to-video, $0.08 per second of video.
241
255
242
256
Like the Grok Imagine image models, sizing is aspect-ratio based: the `size` option takes an `aspectRatio_resolution` template. Supported aspect ratios are `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `3:2`, and `2:3`; supported resolutions are `480p`, `720p`, and `1080p` (e.g. `"9:16_1080p"`). The resolution suffix is optional.
243
257
244
-
For image-to-video, include an `image` prompt part as the starting frame and
245
-
describe the desired motion in the text part. URL sources are fetched by xAI's
246
-
servers (so they must be publicly reachable); use a `data` source for a base64
247
-
starting frame:
248
-
249
-
```typescript
250
-
const { jobId } =awaitgenerateVideo({
251
-
adapter: grokVideo("grok-imagine-video-1.5"),
252
-
prompt: [
253
-
{
254
-
type: "text",
255
-
content: "Make the waterfall crash down and slowly pan out the camera",
When the job completes, the adapter reports usage on the result: `usage.unitsBilled` carries the billed seconds of video and `usage.cost` the exact cost in USD, both as returned by the xAI API.
267
259
268
260
See [Video Generation](../media/video-generation) for the full jobs/polling flow, streaming mode, and the `useGenerateVideo` hook.
Copy file name to clipboardExpand all lines: docs/media/video-generation.md
+8-18Lines changed: 8 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -558,15 +558,21 @@ Adapters that haven't declared a per-model duration map keep the plain
558
558
559
559
### Grok (xAI Imagine) Model Options
560
560
561
-
Based on the [xAI video generation API](https://docs.x.ai/docs/guides/video-generations). The Grok Imagine models are aspect-ratio sized — the generic `size` option takes an `aspectRatio_resolution` template (like the Grok Imagine image models), and clips can be 1–15 seconds long:
561
+
Based on the [xAI video generation API](https://docs.x.ai/docs/guides/video-generations). `grok-imagine-video-1.5` is **image-to-video only**: every request must include an `image` prompt part as the starting frame, with the text part describing the desired motion. URL sources are fetched by xAI's servers (so they must be publicly reachable); use a `data` source for a base64 starting frame. The model is aspect-ratio sized — the generic `size` option takes an `aspectRatio_resolution` template (like the Grok Imagine image models), and clips can be 1–15 seconds long:
562
562
563
563
```typescript
564
564
import { generateVideo } from'@tanstack/ai'
565
565
import { grokVideo } from'@tanstack/ai-grok'
566
566
567
567
const { jobId } =awaitgenerateVideo({
568
568
adapter: grokVideo('grok-imagine-video-1.5'),
569
-
prompt: 'A beautiful sunset over the ocean',
569
+
prompt: [
570
+
{ type: 'text', content: 'Slowly pan out as the waves roll in' },
For image-to-video, include an `image` prompt part as the starting frame and describe the desired motion in the text part. URL sources are fetched by xAI's servers (so they must be publicly reachable); use a `data` source for a base64 starting frame:
582
-
583
-
```typescript
584
-
const { jobId } =awaitgenerateVideo({
585
-
adapter: grokVideo('grok-imagine-video-1.5'),
586
-
prompt: [
587
-
{ type: 'text', content: 'Slowly pan out as the waves roll in' },
Generated clips include an audio track. When the job completes, the adapter reports `usage.unitsBilled` (billed seconds of video) and `usage.cost` (exact USD cost as returned by the API) on the result.
0 commit comments