diff --git a/.changeset/grok-imagine-video-adapter.md b/.changeset/grok-imagine-video-adapter.md new file mode 100644 index 000000000..717a299f4 --- /dev/null +++ b/.changeset/grok-imagine-video-adapter.md @@ -0,0 +1,5 @@ +--- +'@tanstack/ai-grok': minor +--- + +Add a `grokVideo` adapter for xAI's Imagine video models. `grok-imagine-video` (v1.0) supports text-to-video and image-to-video; `grok-imagine-video-1.5` is image-to-video only — a text-only prompt is rejected by the API, so the adapter fails fast with a clear error telling you to add a starting-frame image or use `grok-imagine-video`. Image-to-video starting frames are supplied as an `image` prompt part (public URL or base64 data source), with the text part describing the motion. Follows the experimental `generateVideo()` jobs/polling architecture: `createVideoJob` posts to `/v1/videos/generations`, status polling reads `/v1/videos/{request_id}`, and the completed result carries the hosted video URL plus usage (`unitsBilled` seconds and exact `cost` in USD). Sizing uses the aspect-ratio template consistent with the grok-imagine image models (`size: '16:9_720p'` → `aspect_ratio` / `resolution`), and durations are 1–15 integer seconds. diff --git a/docs/adapters/grok.md b/docs/adapters/grok.md index 5b4e043be..860bf200e 100644 --- a/docs/adapters/grok.md +++ b/docs/adapters/grok.md @@ -2,17 +2,20 @@ title: Grok (xAI) id: grok-adapter order: 5 -description: "Use xAI Grok Responses models with TanStack AI — Grok 4.3 and Grok Build 0.1 via @tanstack/ai-grok." +description: "Use xAI Grok models with TanStack AI — Grok 4.3, Grok Build 0.1, Grok Imagine image generation, and Grok Imagine video generation via @tanstack/ai-grok." keywords: - tanstack ai - grok - xai - grok 4.3 - grok build + - image generation + - video generation + - grok imagine - adapter --- -The Grok text and summarization adapters provide access to xAI's Responses API for `grok-4.3` and `grok-build-0.1`. +The Grok text and summarization adapters provide access to xAI's Responses API for `grok-4.3` and `grok-build-0.1`, plus Grok Imagine image generation and Grok Imagine video generation. ## Installation @@ -203,6 +206,67 @@ reachable; use a `data` source for private images. `grok-2-image-1212` is text-to-image only — image prompt parts are a compile-time type error and throw at runtime. +## Video Generation (Experimental) + +Generate short video clips (1–15 seconds, with audio) with the Grok Imagine video models via xAI's asynchronous jobs/polling API. + +Available models: + +- `grok-imagine-video` (v1.0) — text-to-video and image-to-video, $0.05 per second of video. +- `grok-imagine-video-1.5` — **image-to-video only**, $0.08 per second of video. A text-only prompt is rejected by the API; the adapter fails fast with a clear error telling you to add a starting-frame image or use `grok-imagine-video`. + +Text-to-video with the base `grok-imagine-video` model: + +```typescript +import { generateVideo, getVideoJobStatus } from "@tanstack/ai"; +import { grokVideo } from "@tanstack/ai-grok"; + +const adapter = grokVideo("grok-imagine-video"); + +// 1. Create the job +const { jobId } = await generateVideo({ + adapter, + prompt: "A red panda balancing on a bamboo stalk in the rain", + size: "16:9_720p", // "aspectRatio" or "aspectRatio_resolution" + duration: 5, // integer seconds, 1–15 +}); + +// 2. Poll until complete, then read the video URL +let status = await getVideoJobStatus({ adapter, jobId }); +while (status.status !== "completed" && status.status !== "failed") { + await new Promise((r) => setTimeout(r, 5000)); + status = await getVideoJobStatus({ adapter, jobId }); +} + +console.log(status.url); // hosted .mp4 URL +``` + +For image-to-video (required for `grok-imagine-video-1.5`, optional for `grok-imagine-video`), include an `image` prompt part as the starting frame and describe the desired motion in the text part. URL sources are fetched by xAI's servers (so they must be publicly reachable); use a `data` source for a base64 starting frame: + +```typescript +const { jobId } = await generateVideo({ + adapter: grokVideo("grok-imagine-video-1.5"), + prompt: [ + { + type: "text", + content: "Make the waterfall crash down and slowly pan out the camera", + }, + { + type: "image", + source: { type: "url", value: "https://example.com/waterfall-still.png" }, + }, + ], + size: "16:9_720p", + duration: 10, +}); +``` + +Like the Grok Imagine image models, sizing is aspect-ratio based: the `size` option takes an `aspectRatio_resolution` template. Supported aspect ratios are `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `3:2`, and `2:3`; supported resolutions are `480p`, `720p`, and `1080p` (e.g. `"9:16_1080p"`). The resolution suffix is optional. + +When the job completes, the adapter reports usage on the result: `usage.unitsBilled` carries the billed seconds of video and `usage.cost` the exact cost in USD, both as returned by the xAI API. + +See [Video Generation](../media/video-generation) for the full jobs/polling flow, streaming mode, and the `useGenerateVideo` hook. + ## Text-to-Speech Generate speech with Grok TTS: @@ -298,6 +362,10 @@ Creates a Grok summarization adapter with an explicit API key. Creates a Grok image generation adapter. +### `grokVideo(model, config?)` / `createGrokVideo(model, apiKey, config?)` + +Creates a Grok video generation adapter (experimental) for the Grok Imagine video models (`'grok-imagine-video'`, `'grok-imagine-video-1.5'`). + ### `grokSpeech(model, config?)` / `createGrokSpeech(model, apiKey, config?)` Creates a Grok text-to-speech adapter. diff --git a/docs/config.json b/docs/config.json index 966b75108..72782240a 100644 --- a/docs/config.json +++ b/docs/config.json @@ -262,7 +262,7 @@ "label": "Video Generation", "to": "media/video-generation", "addedAt": "2026-04-15", - "updatedAt": "2026-06-08" + "updatedAt": "2026-06-24" }, { "label": "Generation Hooks", @@ -434,7 +434,8 @@ { "label": "Grok (xAI)", "to": "adapters/grok", - "addedAt": "2026-04-15" + "addedAt": "2026-04-15", + "updatedAt": "2026-06-24" }, { "label": "Groq", diff --git a/docs/media/video-generation.md b/docs/media/video-generation.md index eebbdf530..940de2915 100644 --- a/docs/media/video-generation.md +++ b/docs/media/video-generation.md @@ -2,13 +2,15 @@ title: Video Generation id: video-generation order: 6 -description: "Generate video from text prompts with OpenAI Sora or Google Veo using TanStack AI's experimental generateVideo() jobs/polling API." +description: "Generate video from text prompts with OpenAI Sora, Google Veo, xAI Grok Imagine, or fal.ai using TanStack AI's experimental generateVideo() jobs/polling API." keywords: - tanstack ai - video generation - sora - veo - gemini + - grok imagine + - fal - generateVideo - jobs api - experimental @@ -39,6 +41,8 @@ TanStack AI provides experimental support for video generation through dedicated Currently supported: - **OpenAI**: Sora-2 and Sora-2-Pro models (when available) - **Google Gemini**: Veo 3.1, Veo 3, and Veo 2 models (via the long-running operations API) +- **Grok (xAI)**: grok-imagine-video (text-to-video + image-to-video) and grok-imagine-video-1.5 (image-to-video only) models +- **fal.ai**: MiniMax, Luma, Kling, Hunyuan, and other hosted video models ## Basic Usage @@ -552,6 +556,59 @@ Adapters that haven't declared a per-model duration map keep the plain > Files API and requires your API key to download (send it as an > `x-goog-api-key` header or `key` query parameter). +### Grok (xAI Imagine) Model Options + +Based on the [xAI video generation API](https://docs.x.ai/docs/guides/video-generations). Two models are available: `grok-imagine-video` (v1.0) supports **text-to-video and image-to-video**, while `grok-imagine-video-1.5` is **image-to-video only** (a text-only prompt is rejected by the API; the adapter throws a clear error pointing you at `grok-imagine-video`). Both are aspect-ratio sized — the generic `size` option takes an `aspectRatio_resolution` template (like the Grok Imagine image models), and clips can be 1–15 seconds long. + +Text-to-video with the base model: + +```typescript +import { generateVideo } from '@tanstack/ai' +import { grokVideo } from '@tanstack/ai-grok' + +const { jobId } = await generateVideo({ + adapter: grokVideo('grok-imagine-video'), + prompt: 'A beautiful sunset over the ocean', + size: '16:9_720p', // aspect ratio: '1:1' | '16:9' | '9:16' | '4:3' | '3:4' | '3:2' | '2:3' + // resolution (optional suffix): '480p' | '720p' | '1080p' + duration: 5, // integer seconds, 1-15 + modelOptions: { + aspect_ratio: '16:9', // Alternative way to specify the aspect ratio + resolution: '720p', // Alternative way to specify the resolution + duration: 5, // Alternative way to specify the duration + }, +}) +``` + +Image-to-video (required for `grok-imagine-video-1.5`) — include an `image` prompt part as the starting frame. URL sources are fetched by xAI's servers (so they must be publicly reachable); use a `data` source for a base64 starting frame: + +```typescript +const { jobId } = await generateVideo({ + adapter: grokVideo('grok-imagine-video-1.5'), + prompt: [ + { type: 'text', content: 'Slowly pan out as the waves roll in' }, + { + type: 'image', + source: { type: 'url', value: 'https://example.com/still.png' }, + }, + ], + size: '16:9_720p', + duration: 5, +}) +``` + +Both models accept any whole second in the **1–15** range. A raw `duration` is coerced into that range rather than rejected — values are clamped to `[1, 15]` and rounded to the nearest second. Inspect or pre-snap the range the same way as Veo: + +```typescript +const adapter = grokVideo('grok-imagine-video') + +adapter.availableDurations() // { kind: 'range', min: 1, max: 15, step: 1, unit: 'seconds' } +adapter.snapDuration(2.5) // 3 — clamped/rounded into range +adapter.snapDuration(99) // 15 +``` + +Generated clips include an audio track. When the job completes, the adapter reports `usage.unitsBilled` (billed seconds of video) and `usage.cost` (exact USD cost as returned by the API) on the result. + ## Response Types > **Note:** The interfaces below are the underlying adapter-level types. The `getVideoJobStatus()` helper returns a single merged object, `{ status, progress?, url?, error?, usage? }` — it does not return `jobId` or `expiresAt`. diff --git a/examples/ts-react-media/.env.example b/examples/ts-react-media/.env.example index b7c897653..fdf123604 100644 --- a/examples/ts-react-media/.env.example +++ b/examples/ts-react-media/.env.example @@ -5,3 +5,7 @@ FAL_KEY= # Get a Google API key at https://aistudio.google.com/apikey GOOGLE_API_KEY= + +# Get an xAI API key at https://console.x.ai — used by the "xAI Direct" +# Grok Imagine video models (the other Grok Imagine entries go through fal). +XAI_API_KEY= diff --git a/examples/ts-react-media/package.json b/examples/ts-react-media/package.json index 80bc30ce8..9adf242d1 100644 --- a/examples/ts-react-media/package.json +++ b/examples/ts-react-media/package.json @@ -14,6 +14,7 @@ "@tanstack/ai": "workspace:*", "@tanstack/ai-fal": "workspace:*", "@tanstack/ai-gemini": "workspace:*", + "@tanstack/ai-grok": "workspace:*", "@tanstack/react-router": "^1.158.4", "@tanstack/react-start": "^1.159.0", "@tanstack/router-plugin": "^1.158.4", diff --git a/examples/ts-react-media/src/components/ImageGenerator.tsx b/examples/ts-react-media/src/components/ImageGenerator.tsx index ca72e3823..9b4d5fd29 100644 --- a/examples/ts-react-media/src/components/ImageGenerator.tsx +++ b/examples/ts-react-media/src/components/ImageGenerator.tsx @@ -27,6 +27,7 @@ function getImageSrc(image: { url?: string; b64Json?: string }): string { const falModels = IMAGE_MODELS.filter((m) => m.provider === 'fal') const geminiModels = IMAGE_MODELS.filter((m) => m.provider === 'gemini') +const xaiModels = IMAGE_MODELS.filter((m) => m.provider === 'xai') export default function ImageGenerator({ onImageGenerated, @@ -161,6 +162,13 @@ export default function ImageGenerator({ ))} + + {xaiModels.map((model) => ( + + ))} + {currentModel && selectedModel !== 'all' && (

diff --git a/examples/ts-react-media/src/components/VideoGenerator.tsx b/examples/ts-react-media/src/components/VideoGenerator.tsx index 5661df9ac..f31a8078e 100644 --- a/examples/ts-react-media/src/components/VideoGenerator.tsx +++ b/examples/ts-react-media/src/components/VideoGenerator.tsx @@ -21,7 +21,7 @@ type JobState = model: string progress?: number | undefined } - | { status: 'completed'; url: string; unitsBilled?: number } + | { status: 'completed'; url: string; unitsBilled?: number; cost?: number } | { status: 'error'; message: string } interface VideoGeneratorProps { @@ -42,6 +42,8 @@ export default function VideoGenerator({ const pollingRefs = useRef>(new Map()) const filteredModels = VIDEO_MODELS.filter((m) => m.mode === mode) + const falModels = filteredModels.filter((m) => m.provider === 'fal') + const xaiModels = filteredModels.filter((m) => m.provider === 'xai') useEffect(() => { if (initialImageUrl) { @@ -97,6 +99,7 @@ export default function VideoGenerator({ status: 'completed', url: url, unitsBilled: urlResult.usage?.unitsBilled, + cost: urlResult.usage?.cost, }, })) } else if (status.status === 'processing') { @@ -164,8 +167,11 @@ export default function VideoGenerator({ }, })) + // Poll keyed by the UI model id, not result.model: the direct-xAI + // entries share one adapter model ('grok-imagine-video-1.5'), + // so result.model wouldn't identify the card (or the adapter) uniquely. const interval = setInterval(() => { - pollStatus(result.jobId, result.model) + pollStatus(result.jobId, modelId) }, 4000) pollingRefs.current.set(modelId, interval) } catch (err) { @@ -249,11 +255,20 @@ export default function VideoGenerator({ className="w-full px-4 py-3 bg-gray-800 border border-gray-700 rounded-lg text-white focus:outline-none focus:ring-2 focus:ring-blue-500 focus:border-transparent disabled:opacity-50" > - {filteredModels.map((model) => ( - - ))} + + {falModels.map((model) => ( + + ))} + + + {xaiModels.map((model) => ( + + ))} + @@ -406,12 +421,21 @@ export default function VideoGenerator({ className="w-full h-auto" /> - {state.unitsBilled != null && ( + {state.cost != null ? (

- Billed {state.unitsBilled} fal unit - {state.unitsBilled === 1 ? '' : 's'} — multiply by the - endpoint unit price for USD cost + Billed ${state.cost.toFixed(3)} + {state.unitsBilled != null + ? ` for ${state.unitsBilled} second${state.unitsBilled === 1 ? '' : 's'} of video` + : ''}

+ ) : ( + state.unitsBilled != null && ( +

+ Billed {state.unitsBilled} fal unit + {state.unitsBilled === 1 ? '' : 's'} — multiply by the + endpoint unit price for USD cost +

+ ) )} )} diff --git a/examples/ts-react-media/src/lib/models.ts b/examples/ts-react-media/src/lib/models.ts index cfa36dfc5..5947febe5 100644 --- a/examples/ts-react-media/src/lib/models.ts +++ b/examples/ts-react-media/src/lib/models.ts @@ -15,6 +15,22 @@ export const IMAGE_MODELS = [ sizeType: 'aspect_ratio' as const, provider: 'fal' as const, }, + { + id: 'grok-imagine-image', + name: 'Grok Imagine (xAI Direct)', + description: 'xAI Imagine API via the native grokImage adapter', + defaultSize: '16:9' as const, + sizeType: 'aspect_ratio' as const, + provider: 'xai' as const, + }, + { + id: 'grok-imagine-image-quality', + name: 'Grok Imagine Quality (xAI Direct)', + description: 'Higher-quality xAI Imagine images via the native adapter', + defaultSize: '16:9' as const, + sizeType: 'aspect_ratio' as const, + provider: 'xai' as const, + }, { id: 'fal-ai/flux-2/klein/9b', name: 'FLUX.2 Klein 9B', @@ -79,48 +95,72 @@ export const VIDEO_MODELS = [ name: 'Kling 3 Pro (Text-to-Video)', description: 'High-quality text-to-video generation', mode: 'text-to-video' as const, + provider: 'fal' as const, }, { id: 'fal-ai/kling-video/v3/pro/image-to-video', name: 'Kling 3 Pro (Image-to-Video)', description: 'Animate images with Kling', mode: 'image-to-video' as const, + provider: 'fal' as const, }, { id: 'fal-ai/veo3.1', name: 'Veo 3.1 (Text-to-Video)', description: 'Google Veo text-to-video', mode: 'text-to-video' as const, + provider: 'fal' as const, }, { id: 'fal-ai/veo3.1/image-to-video', name: 'Veo 3.1 (Image-to-Video)', description: 'Google Veo image-to-video', mode: 'image-to-video' as const, + provider: 'fal' as const, }, { id: 'xai/grok-imagine-video/text-to-video', name: 'Grok Imagine Video (Text-to-Video)', description: 'xAI video generation from text', mode: 'text-to-video' as const, + provider: 'fal' as const, }, { id: 'xai/grok-imagine-video/image-to-video', name: 'Grok Imagine Video (Image-to-Video)', description: 'xAI animate images to video', mode: 'image-to-video' as const, + provider: 'fal' as const, + }, + { + id: 'grok-imagine-video', + name: 'Grok Imagine Video 1.0 (Text-to-Video)', + description: + 'xAI Imagine API via the native grokVideo adapter (v1.0 supports text-to-video)', + mode: 'text-to-video' as const, + provider: 'xai' as const, + }, + { + id: 'grok-imagine-video-1.5/image-to-video', + name: 'Grok Imagine Video 1.5 (Image-to-Video)', + description: + 'Animate a starting frame via the native grokVideo adapter (1.5 is image-to-video only)', + mode: 'image-to-video' as const, + provider: 'xai' as const, }, { id: 'fal-ai/ltx-2.3/text-to-video/fast', name: 'LTX-2.3 Fast (Text-to-Video)', description: 'Fast text-to-video generation', mode: 'text-to-video' as const, + provider: 'fal' as const, }, { id: 'fal-ai/ltx-2.3/image-to-video/fast', name: 'LTX-2.3 Fast (Image-to-Video)', description: 'Fast image-to-video animation', mode: 'image-to-video' as const, + provider: 'fal' as const, }, ] as const diff --git a/examples/ts-react-media/src/lib/server-functions.ts b/examples/ts-react-media/src/lib/server-functions.ts index d4b010ad2..1b3b52639 100644 --- a/examples/ts-react-media/src/lib/server-functions.ts +++ b/examples/ts-react-media/src/lib/server-functions.ts @@ -1,9 +1,9 @@ import { createServerFn } from '@tanstack/react-start' import { falImage, falVideo } from '@tanstack/ai-fal' import { geminiImage } from '@tanstack/ai-gemini' +import { grokImage, grokVideo } from '@tanstack/ai-grok' import { generateImage, generateVideo, getVideoJobStatus } from '@tanstack/ai' -import type { FalModel } from '@tanstack/ai-fal' import type { ImagePart, MediaInputMetadata, @@ -67,6 +67,21 @@ function asImageToVideoPrompt( return narrowed } +/** + * Resolves the video adapter for a UI model id. The native grok-imagine + * entries hit xAI's Imagine API directly via the `grokVideo` adapter + * (XAI_API_KEY); everything else is a fal-hosted model. + */ +function videoAdapterForModel(model: string) { + if (model === 'grok-imagine-video') { + return grokVideo('grok-imagine-video') + } + if (model === 'grok-imagine-video-1.5/image-to-video') { + return grokVideo('grok-imagine-video-1.5') + } + return falVideo(model) +} + export const generateImageFn = createServerFn({ method: 'POST' }) .inputValidator((data: { prompt: MediaPrompt; model: string }) => { if (!hasPromptContent(data.prompt)) throw new Error('Prompt is required') @@ -104,6 +119,26 @@ export const generateImageFn = createServerFn({ method: 'POST' }) modelOptions: { aspect_ratio: '16:9' }, }) } + case 'grok-imagine-image': { + // Direct xAI Imagine API (XAI_API_KEY) via the native grokImage + // adapter — no fal in between. The grok-imagine models accept image + // prompt parts for image-conditioned generation, so we narrow with + // asImagePrompt. Sizing uses the aspect-ratio template. + return generateImage({ + adapter: grokImage('grok-imagine-image'), + prompt: asImagePrompt(data.prompt), + numberOfImages: 1, + size: '16:9', + }) + } + case 'grok-imagine-image-quality': { + return generateImage({ + adapter: grokImage('grok-imagine-image-quality'), + prompt: asImagePrompt(data.prompt), + numberOfImages: 1, + size: '16:9', + }) + } case 'fal-ai/flux-2/klein/9b': { // NOTE: Newer models are untyped (at the moment) return generateImage({ @@ -214,6 +249,18 @@ export const createVideoJobFn = createServerFn({ method: 'POST' }) }, }) } + case 'grok-imagine-video': { + // Direct xAI Imagine API (XAI_API_KEY) — no fal in between. The base + // grok-imagine-video (v1.0) supports text-to-video; durations are + // 1-15 integer seconds. Completed jobs report usage.unitsBilled + // (billed seconds) and usage.cost (exact USD). + return generateVideo({ + adapter: grokVideo('grok-imagine-video'), + prompt: asTextPrompt(data.prompt), + size: '16:9_720p', + duration: 5, + }) + } case 'fal-ai/ltx-2.3/text-to-video/fast': { return generateVideo({ adapter: falVideo('fal-ai/ltx-2.3/text-to-video/fast'), @@ -252,6 +299,17 @@ export const createVideoJobFn = createServerFn({ method: 'POST' }) }, }) } + case 'grok-imagine-video-1.5/image-to-video': { + // Direct xAI Imagine API. The starting frame is supplied as an image + // prompt part (asImageToVideoPrompt requires one); the grokVideo + // adapter forwards it to the Imagine endpoint as the start frame. + return generateVideo({ + adapter: grokVideo('grok-imagine-video-1.5'), + prompt: asImageToVideoPrompt(data.prompt), + size: '16:9_720p', + duration: 5, + }) + } case 'fal-ai/ltx-2.3/image-to-video/fast': { return generateVideo({ adapter: falVideo('fal-ai/ltx-2.3/image-to-video/fast'), @@ -265,9 +323,9 @@ export const createVideoJobFn = createServerFn({ method: 'POST' }) }) export const getVideoStatusFn = createServerFn({ method: 'GET' }) - .inputValidator((data: { jobId: string; model: FalModel }) => data) + .inputValidator((data: { jobId: string; model: string }) => data) .handler(async ({ data }) => { - const adapter = falVideo(data.model) + const adapter = videoAdapterForModel(data.model) return await getVideoJobStatus({ adapter, jobId: data.jobId, @@ -277,7 +335,7 @@ export const getVideoStatusFn = createServerFn({ method: 'GET' }) export const getVideoUrlFn = createServerFn({ method: 'GET' }) .inputValidator((data: { jobId: string; model: string }) => data) .handler(async ({ data }) => { - const adapter = falVideo(data.model) + const adapter = videoAdapterForModel(data.model) return await getVideoJobStatus({ adapter, jobId: data.jobId, diff --git a/packages/ai-grok/src/adapters/video.ts b/packages/ai-grok/src/adapters/video.ts new file mode 100644 index 000000000..a59c45230 --- /dev/null +++ b/packages/ai-grok/src/adapters/video.ts @@ -0,0 +1,462 @@ +import { resolveMediaPrompt } from '@tanstack/ai' +import { BaseVideoAdapter, snapToDurationOption } from '@tanstack/ai/adapters' +import { toRunErrorPayload } from '@tanstack/ai/adapter-internals' +import { getGrokApiKeyFromEnv, withGrokDefaults } from '../utils/client' +import { + getGrokVideoDurationOptions, + isImageToVideoOnlyModel, + parseGrokVideoSize, + validateVideoSize, +} from '../video/video-provider-options' +import type { DurationOptions } from '@tanstack/ai/adapters' +import type { + ImagePart, + MediaInputMetadata, + TokenUsage, + VideoGenerationOptions, + VideoJobResult, + VideoStatusResult, + VideoUrlResult, +} from '@tanstack/ai' +import type { GrokVideoModel } from '../model-meta' +import type { + GrokVideoModelDurationByName, + GrokVideoModelInputModalitiesByName, + GrokVideoModelProviderOptionsByName, + GrokVideoModelSizeByName, + GrokVideoProviderOptions, +} from '../video/video-provider-options' +import type { GrokClientConfig } from '../utils' + +/** + * Configuration for Grok video adapter. + * + * @experimental Video generation is an experimental feature and may change. + */ +export interface GrokVideoConfig extends GrokClientConfig {} + +/** + * xAI bills video generation in "USD ticks": 10^10 ticks per US dollar + * (e.g. one grok-imagine-video-1.5 second costs $0.08 = 800_000_000 ticks). + */ +const USD_TICKS_PER_DOLLAR = 10_000_000_000 + +/** Response of POST /v1/videos/generations. */ +interface GrokVideoCreateResponse { + request_id?: string +} + +/** Response of GET /v1/videos/{request_id}. */ +interface GrokVideoStatusResponse { + status?: string + progress?: number + model?: string + video?: { + url?: string + duration?: number + } + usage?: { + cost_in_usd_ticks?: number + } + error?: string +} + +/** + * Convert a TanStack ImagePart to the URL string accepted by xAI's Imagine + * video endpoint: public URLs pass through (fetched by xAI's servers), data + * sources become base64 data URIs. + */ +function imagePartToUrl(part: ImagePart): string { + if (part.source.type === 'url') return part.source.value + return `data:${part.source.mimeType};base64,${part.source.value}` +} + +function buildGrokVideoUsage( + response: GrokVideoStatusResponse, +): TokenUsage | undefined { + const seconds = response.video?.duration + const ticks = response.usage?.cost_in_usd_ticks + if (seconds === undefined && ticks === undefined) return undefined + return { + promptTokens: 0, + completionTokens: 0, + totalTokens: 0, + ...(seconds !== undefined && { unitsBilled: seconds }), + ...(ticks !== undefined && { cost: ticks / USD_TICKS_PER_DOLLAR }), + } +} + +/** + * Grok Video Generation Adapter (xAI Imagine API) + * + * Tree-shakeable adapter for the grok-imagine video models using the + * async jobs/polling architecture: create a generation request, poll it, + * then read the completed video URL. + * + * `grok-imagine-video` (v1.0) supports text-to-video and image-to-video. + * `grok-imagine-video-1.5` is image-to-video only — every request needs an + * image prompt part as the starting frame, and the adapter rejects a + * text-only prompt with a clear error rather than a raw API 400. + * + * The Imagine video endpoints are not part of the OpenAI SDK surface (and + * xAI rejects the SDK's multipart paths), so requests are plain JSON calls + * issued with the configured `fetch` (or the global one). + * + * @experimental Video generation is an experimental feature and may change. + * + * Features: + * - Async job-based video generation (1–15 second clips with audio) + * - Aspect-ratio sizing via the "aspectRatio_resolution" size template + * (e.g. '16:9_720p'), consistent with the grok-imagine image models + * - Image-to-video via an `image` prompt part (starting frame URL or data URI) + * - Usage reporting: billed seconds (`unitsBilled`) and exact cost + */ +export class GrokVideoAdapter< + TModel extends GrokVideoModel, +> extends BaseVideoAdapter< + TModel, + GrokVideoProviderOptions, + GrokVideoModelProviderOptionsByName, + GrokVideoModelSizeByName, + GrokVideoModelInputModalitiesByName, + GrokVideoModelDurationByName +> { + readonly name = 'grok' as const + + private readonly clientConfig: GrokVideoConfig + + constructor(config: GrokVideoConfig, model: TModel) { + super({}, model) + this.clientConfig = withGrokDefaults(config) + } + + private get fetch(): ( + input: string, + init?: RequestInit, + ) => Promise { + return this.clientConfig.fetch ?? fetch + } + + private async request( + path: string, + init?: Omit, + ): Promise { + return await this.fetch(`${this.clientConfig.baseURL}${path}`, { + ...init, + headers: { + 'Content-Type': 'application/json', + Authorization: `Bearer ${this.clientConfig.apiKey}`, + }, + }) + } + + /** + * Reads the error message out of an Imagine API error body + * (`{"code": "...", "error": "..."}`), falling back to the raw text. + */ + private async errorMessage(response: Response): Promise { + const body = await response.text() + try { + const parsed: unknown = JSON.parse(body) + if ( + typeof parsed === 'object' && + parsed !== null && + 'error' in parsed && + typeof parsed.error === 'string' + ) { + return parsed.error + } + } catch { + // not JSON — fall through to the raw body + } + return body + } + + async createVideoJob( + options: VideoGenerationOptions< + GrokVideoProviderOptions, + GrokVideoModelSizeByName[TModel], + GrokVideoModelDurationByName[TModel] + >, + ): Promise { + const { model, size, modelOptions, logger } = options + + validateVideoSize(model, size) + + // Coerce the requested duration into the model's valid range (1–15s, + // integer) instead of rejecting it — `snapDuration` clamps and rounds. + // modelOptions wins over the generic `duration`, mirroring the size + // precedence below. + const rawDuration = modelOptions?.duration ?? options.duration + const duration = + rawDuration !== undefined ? this.snapDuration(rawDuration) : undefined + + // The interleaved prompt decomposes into verbatim text plus typed media + // buckets. The Imagine video endpoint takes a text prompt and an optional + // starting frame; reject the modalities it can't consume. + const resolved = resolveMediaPrompt(options.prompt) + if (resolved.videos.length > 0) { + throw new Error( + `${this.name}.createVideoJob does not support video prompt parts (model: ${model}).`, + ) + } + if (resolved.audios.length > 0) { + throw new Error( + `${this.name}.createVideoJob does not support audio prompt parts (model: ${model}).`, + ) + } + // grok-imagine-video-1.5 is image-to-video only — text-to-video is + // rejected by the API, so fail fast with a clear, actionable message + // pointing at the model that does support text-to-video. + if (resolved.images.length === 0 && isImageToVideoOnlyModel(model)) { + throw new Error( + `${this.name}: ${model} does not support text-to-video — it is image-to-video only. ` + + `Include an image prompt part as the starting frame, or use 'grok-imagine-video' for text-to-video.`, + ) + } + if (resolved.images.length > 1) { + throw new Error( + `${this.name}: ${model} accepts at most one starting-frame image; received ${resolved.images.length}.`, + ) + } + + // Image-to-video: the single image prompt part becomes the starting frame + // and the prompt text describes the desired motion. URL sources are + // fetched by xAI's servers; data sources are sent as base64 data URIs. + const [startFrame] = resolved.images + + // The generic `size` option carries an "aspectRatio_resolution" template + // (e.g. '16:9_720p') and maps to the Imagine API's `aspect_ratio` / + // `resolution` parameters; explicit modelOptions win over the template. + const parsedSize = size !== undefined ? parseGrokVideoSize(size) : undefined + const request = { + model, + prompt: resolved.text, + ...(startFrame && { image: { url: imagePartToUrl(startFrame) } }), + ...(parsedSize && { + aspect_ratio: parsedSize.aspectRatio, + ...(parsedSize.resolution !== undefined && { + resolution: parsedSize.resolution, + }), + }), + ...modelOptions, + // Spread after modelOptions so the snapped duration is authoritative + // (modelOptions.duration is folded into `duration` via snapDuration above). + ...(duration !== undefined && { duration }), + } + + try { + logger.request( + `activity=video.create provider=${this.name} model=${model} size=${size ?? 'default'} duration=${duration ?? 'default'}`, + { provider: this.name, model }, + ) + + const response = await this.request('/videos/generations', { + method: 'POST', + body: JSON.stringify(request), + }) + if (!response.ok) { + throw new Error( + `grok: video generation request failed (${response.status} ${response.statusText}): ${await this.errorMessage(response)}`, + ) + } + + const result = (await response.json()) as GrokVideoCreateResponse + if (!result.request_id) { + throw new Error( + 'grok: video generation response contained no request_id', + ) + } + return { jobId: result.request_id, model } + } catch (error: unknown) { + logger.errors(`${this.name}.createVideoJob fatal`, { + error: toRunErrorPayload(error, `${this.name}.createVideoJob failed`), + source: `${this.name}.createVideoJob`, + }) + throw error + } + } + + private async retrieveJob(jobId: string): Promise { + const response = await this.request(`/videos/${jobId}`) + if (!response.ok) { + const error = new Error( + `grok: video status request failed (${response.status} ${response.statusText}): ${await this.errorMessage(response)}`, + ) + ;(error as { status?: number }).status = response.status + throw error + } + return (await response.json()) as GrokVideoStatusResponse + } + + async getVideoStatus(jobId: string): Promise { + let response: GrokVideoStatusResponse + try { + response = await this.retrieveJob(jobId) + } catch (error) { + if ((error as { status?: number }).status === 404) { + return { jobId, status: 'failed', error: 'Job not found' } + } + throw error + } + + return { + jobId, + status: this.mapStatus(response.status), + ...(response.progress !== undefined && { progress: response.progress }), + ...(response.error !== undefined && { error: response.error }), + } + } + + async getVideoUrl(jobId: string): Promise { + let response: GrokVideoStatusResponse + try { + response = await this.retrieveJob(jobId) + } catch (error) { + if ((error as { status?: number }).status === 404) { + throw new Error(`Video job not found: ${jobId}`) + } + throw error + } + + const status = this.mapStatus(response.status) + if (status === 'failed') { + throw new Error( + `Video generation failed${response.error ? `: ${response.error}` : ''}. Job ID: ${jobId}`, + ) + } + const url = response.video?.url + if (!url) { + throw new Error( + `Video is not ready for download. Check status first. Job ID: ${jobId}`, + ) + } + + const usage = buildGrokVideoUsage(response) + return { + jobId, + url, + ...(usage && { usage }), + } + } + + /** + * Maps Imagine API job statuses onto the generic video status set. The + * API reports 'pending' while queued/generating (with a numeric + * `progress`), then a terminal 'done' / 'failed' / 'expired'. + */ + protected mapStatus( + apiStatus: string | undefined, + ): 'pending' | 'processing' | 'completed' | 'failed' { + switch (apiStatus) { + case 'pending': + case 'queued': + return 'pending' + case 'done': + case 'completed': + case 'succeeded': + return 'completed' + case 'failed': + case 'expired': + case 'error': + case 'cancelled': + return 'failed' + case undefined: + default: + return 'processing' + } + } + + /** + * Both grok-imagine video models accept a continuous 1–15 integer-second + * range. Consumers can use this to render UI without provider knowledge. + */ + override availableDurations(): DurationOptions< + GrokVideoModelDurationByName[TModel] + > { + return getGrokVideoDurationOptions(this.model) + } + + /** + * Coerce a raw seconds value to the closest valid duration (clamped to + * [1, 15] and rounded to whole seconds). + */ + override snapDuration( + seconds: number, + ): GrokVideoModelDurationByName[TModel] | undefined { + return snapToDurationOption(seconds, this.availableDurations()) + } +} + +/** + * Creates a Grok video adapter with an explicit API key. + * Type resolution happens here at the call site. + * + * @experimental Video generation is an experimental feature and may change. + * + * @param model - The model name (e.g., 'grok-imagine-video') + * @param apiKey - Your xAI API key + * @param config - Optional additional configuration + * @returns Configured Grok video adapter instance with resolved types + * + * @example + * ```typescript + * // grok-imagine-video (v1.0) supports text-to-video. + * const adapter = createGrokVideo('grok-imagine-video', 'xai-...'); + * + * const { jobId } = await generateVideo({ + * adapter, + * prompt: 'A beautiful sunset over the ocean', + * size: '16:9_720p', + * duration: 5 + * }); + * ``` + */ +export function createGrokVideo( + model: TModel, + apiKey: string, + config?: Omit, +): GrokVideoAdapter { + return new GrokVideoAdapter({ apiKey, ...config }, model) +} + +/** + * Creates a Grok video adapter with automatic API key detection from environment variables. + * Type resolution happens here at the call site. + * + * Looks for `XAI_API_KEY` in: + * - `process.env` (Node.js) + * - `window.env` (Browser with injected env) + * + * @experimental Video generation is an experimental feature and may change. + * + * @param model - The model name (e.g., 'grok-imagine-video-1.5') + * @param config - Optional configuration (excluding apiKey which is auto-detected) + * @returns Configured Grok video adapter instance with resolved types + * @throws Error if XAI_API_KEY is not found in environment + * + * @example + * ```typescript + * // Automatically uses XAI_API_KEY from environment + * const adapter = grokVideo('grok-imagine-video-1.5'); + * + * // Image-to-video only: the prompt must carry a starting-frame image part. + * const { jobId } = await generateVideo({ + * adapter, + * prompt: [ + * { type: 'text', content: 'Make the cat start playing the piano' }, + * { type: 'image', source: { type: 'url', value: 'https://example.com/cat.png' } }, + * ], + * }); + * + * // Poll for status + * const status = await getVideoJobStatus({ adapter, jobId }); + * ``` + */ +export function grokVideo( + model: TModel, + config?: Omit, +): GrokVideoAdapter { + const apiKey = getGrokApiKeyFromEnv() + return createGrokVideo(model, apiKey, config) +} diff --git a/packages/ai-grok/src/index.ts b/packages/ai-grok/src/index.ts index 142ab3346..e342645ca 100644 --- a/packages/ai-grok/src/index.ts +++ b/packages/ai-grok/src/index.ts @@ -31,6 +31,27 @@ export type { GrokImageModelProviderOptionsByName, } from './image/image-provider-options' +// Video adapter - for video generation (xAI Imagine API) +export { + GrokVideoAdapter, + createGrokVideo, + grokVideo, + type GrokVideoConfig, +} from './adapters/video' +export { + GROK_VIDEO_DURATIONS, + getGrokVideoDurationOptions, +} from './video/video-provider-options' +export type { + GrokVideoProviderOptions, + GrokVideoModelProviderOptionsByName, + GrokVideoModelSizeByName, + GrokVideoModelDurationByName, + GrokVideoAspectRatio, + GrokVideoResolution, + GrokVideoSize, +} from './video/video-provider-options' + // Speech (TTS) adapter - for text-to-speech export { GrokSpeechAdapter, @@ -68,6 +89,7 @@ export type { ResolveInputModalities, GrokChatModel, GrokImageModel, + GrokVideoModel, GrokTTSModel, GrokTranscriptionModel, GrokRealtimeModel, @@ -75,6 +97,7 @@ export type { export { GROK_CHAT_MODELS, GROK_IMAGE_MODELS, + GROK_VIDEO_MODELS, GROK_TTS_MODELS, GROK_TRANSCRIPTION_MODELS, GROK_REALTIME_MODELS, diff --git a/packages/ai-grok/src/model-meta.ts b/packages/ai-grok/src/model-meta.ts index 6f9caa6a2..91c0d6105 100644 --- a/packages/ai-grok/src/model-meta.ts +++ b/packages/ai-grok/src/model-meta.ts @@ -91,6 +91,47 @@ const GROK_IMAGINE_IMAGE_QUALITY = { }, } as const satisfies ModelMeta +// Imagine API video models. Pricing is per second of generated video +// (output only); generated videos carry an audio track. +// +// grok-imagine-video (v1.0) supports both text-to-video (a starting image is +// optional) and image-to-video. grok-imagine-video-1.5 is image-to-video +// only: a starting-frame image is required (the text prompt describes the +// desired motion) — its text-to-video is rejected by the API. +const GROK_IMAGINE_VIDEO = { + name: 'grok-imagine-video', + supports: { + input: ['text', 'image'], + output: ['video', 'audio'], + }, + pricing: { + input: { + normal: 0, + }, + output: { + // per second of video + normal: 0.05, + }, + }, +} as const satisfies ModelMeta + +const GROK_IMAGINE_VIDEO_1_5 = { + name: 'grok-imagine-video-1.5', + supports: { + input: ['text', 'image'], + output: ['video', 'audio'], + }, + pricing: { + input: { + normal: 0, + }, + output: { + // per second of video + normal: 0.08, + }, + }, +} as const satisfies ModelMeta + const GROK_4_3 = { name: 'grok-4.3', context_window: 1_000_000, @@ -145,6 +186,16 @@ export const GROK_IMAGE_MODELS = [ GROK_IMAGINE_IMAGE_QUALITY.name, ] as const +/** + * Grok Video Generation Models (xAI Imagine API) + * + * @experimental Video generation is an experimental feature and may change. + */ +export const GROK_VIDEO_MODELS = [ + GROK_IMAGINE_VIDEO.name, + GROK_IMAGINE_VIDEO_1_5.name, +] as const + // xAI's `/v1/tts` endpoint is endpoint-addressed and does not take a `model` // parameter. This synthetic identifier satisfies the SDK's `TTSOptions.model` // contract and provides a stable value for logging and fixture matching. @@ -198,6 +249,7 @@ export const GROK_REALTIME_MODELS = [ export type GrokChatModel = (typeof GROK_CHAT_MODELS)[number] export type GrokImageModel = (typeof GROK_IMAGE_MODELS)[number] +export type GrokVideoModel = (typeof GROK_VIDEO_MODELS)[number] export type GrokTTSModel = (typeof GROK_TTS_MODELS)[number] export type GrokTranscriptionModel = (typeof GROK_TRANSCRIPTION_MODELS)[number] export type GrokRealtimeModel = (typeof GROK_REALTIME_MODELS)[number] diff --git a/packages/ai-grok/src/video/video-provider-options.ts b/packages/ai-grok/src/video/video-provider-options.ts new file mode 100644 index 000000000..b84c03f8b --- /dev/null +++ b/packages/ai-grok/src/video/video-provider-options.ts @@ -0,0 +1,241 @@ +/** + * Grok Video Generation Provider Options (xAI Imagine API) + * + * Based on https://docs.x.ai/docs/guides/video-generations + * + * @experimental Video generation is an experimental feature and may change. + */ + +import type { DurationOptions } from '@tanstack/ai/adapters' +import type { GrokVideoModel } from '../model-meta' + +/** + * Aspect ratios accepted by the grok-imagine video models. + * + * Note: this is a narrower set than the grok-imagine image models — the + * video endpoint rejects the phone-screen ratios ('9:19.5', '9:20', …) and + * 'auto'. + * + * @experimental Video generation is an experimental feature and may change. + */ +export type GrokVideoAspectRatio = + | '1:1' + | '16:9' + | '9:16' + | '4:3' + | '3:4' + | '3:2' + | '2:3' + +/** + * Resolution tiers for the grok-imagine video models. + * + * @experimental Video generation is an experimental feature and may change. + */ +export type GrokVideoResolution = '480p' | '720p' | '1080p' + +/** + * Size strings for grok-imagine video models. The Imagine API is + * aspect-ratio based rather than pixel-size based; like the grok-imagine + * image models, the generic `size` option uses an + * `aspectRatio_resolution` template ("16:9_720p") — the resolution suffix + * is optional ("16:9" uses the API default). + * + * @experimental Video generation is an experimental feature and may change. + */ +export type GrokVideoSize = + | GrokVideoAspectRatio + | `${GrokVideoAspectRatio}_${GrokVideoResolution}` + +const GROK_VIDEO_ASPECT_RATIOS: ReadonlyArray = [ + '1:1', + '16:9', + '9:16', + '4:3', + '3:4', + '3:2', + '2:3', +] + +const GROK_VIDEO_RESOLUTIONS: ReadonlyArray = ['480p', '720p', '1080p'] + +/** + * Video duration limits enforced by the Imagine API (seconds). + */ +export const GROK_VIDEO_MIN_DURATION = 1 +export const GROK_VIDEO_MAX_DURATION = 15 + +/** + * Parses a grok video size string into its components. + * Format: "aspectRatio" or "aspectRatio_resolution", + * e.g. "16:9_720p" → { aspectRatio: "16:9", resolution: "720p" }. + * Returns undefined when the string doesn't match the template. + */ +export function parseGrokVideoSize( + size: string, +): { aspectRatio: string; resolution?: string } | undefined { + const match = size.match(/^([\d.]+:[\d.]+)(?:_(.+))?$/) + const [, aspectRatio, resolution] = match ?? [] + if (aspectRatio === undefined) return undefined + return { aspectRatio, ...(resolution !== undefined && { resolution }) } +} + +/** + * Validate the `size` template for a given grok video model. + * + * @experimental Video generation is an experimental feature and may change. + */ +export function validateVideoSize( + model: string, + size?: string, +): asserts size is GrokVideoSize | undefined { + if (size === undefined) return + const parsed = parseGrokVideoSize(size) + if (!parsed || !GROK_VIDEO_ASPECT_RATIOS.includes(parsed.aspectRatio)) { + throw new Error( + `Size "${size}" is not supported by model "${model}". Expected ` + + `"aspectRatio" or "aspectRatio_resolution" (e.g. "16:9_720p") with ` + + `aspect ratio one of: ${GROK_VIDEO_ASPECT_RATIOS.join(', ')}`, + ) + } + if ( + parsed.resolution !== undefined && + !GROK_VIDEO_RESOLUTIONS.includes(parsed.resolution) + ) { + throw new Error( + `Resolution "${parsed.resolution}" is not supported by model "${model}". ` + + `Supported resolutions: ${GROK_VIDEO_RESOLUTIONS.join(', ')}`, + ) + } +} + +/** + * Per-model duration type. The Imagine API accepts any integer second in the + * 1–15 range, so this is a continuous range expressed as `number` (a literal + * union can't represent it). `snapDuration()` coerces a raw seconds value into + * the valid range at runtime. + * + * @experimental Video generation is an experimental feature and may change. + */ +export type GrokVideoModelDurationByName = { + 'grok-imagine-video': number + 'grok-imagine-video-1.5': number +} + +/** + * Runtime duration table backing `availableDurations()` / `snapDuration()`. + * Both grok-imagine video models accept the same continuous 1–15 integer-second + * range. + * + * @experimental Video generation is an experimental feature and may change. + */ +export const GROK_VIDEO_DURATIONS: { + readonly [TModel in GrokVideoModel]: DurationOptions< + GrokVideoModelDurationByName[TModel] + > +} = { + 'grok-imagine-video': { + kind: 'range', + min: GROK_VIDEO_MIN_DURATION, + max: GROK_VIDEO_MAX_DURATION, + step: 1, + unit: 'seconds', + }, + 'grok-imagine-video-1.5': { + kind: 'range', + min: GROK_VIDEO_MIN_DURATION, + max: GROK_VIDEO_MAX_DURATION, + step: 1, + unit: 'seconds', + }, +} + +/** + * Look up the duration options for a grok video model. + * + * @experimental Video generation is an experimental feature and may change. + */ +export function getGrokVideoDurationOptions( + model: TModel, +): DurationOptions { + return GROK_VIDEO_DURATIONS[model] +} + +/** + * Provider-specific options for grok video generation. These map directly + * onto the Imagine API request body and take precedence over the generic + * `size` / `duration` options when both are provided. + * + * @experimental Video generation is an experimental feature and may change. + */ +export interface GrokVideoProviderOptions { + /** + * Output aspect ratio. + */ + aspect_ratio?: GrokVideoAspectRatio + + /** + * Output resolution tier. + */ + resolution?: GrokVideoResolution + + /** + * Video duration in integer seconds (1–15). + */ + duration?: number +} + +/** + * Type-only map from model name to its specific provider options. + * + * @experimental Video generation is an experimental feature and may change. + */ +export type GrokVideoModelProviderOptionsByName = { + 'grok-imagine-video': GrokVideoProviderOptions + 'grok-imagine-video-1.5': GrokVideoProviderOptions +} + +/** + * Type-only map from model name to its supported `size` strings. + * + * @experimental Video generation is an experimental feature and may change. + */ +export type GrokVideoModelSizeByName = { + 'grok-imagine-video': GrokVideoSize + 'grok-imagine-video-1.5': GrokVideoSize +} + +/** + * Type-only map from model name to the non-text prompt modalities it accepts. + * Both models accept an `image` prompt part as the starting frame: + * `grok-imagine-video` (v1.0) does text-to-video and image-to-video, while + * `grok-imagine-video-1.5` is image-to-video only (the image is required). + * + * @experimental Video generation is an experimental feature and may change. + */ +export type GrokVideoModelInputModalitiesByName = { + 'grok-imagine-video': readonly ['image'] + 'grok-imagine-video-1.5': readonly ['image'] +} + +/** + * Models that only support image-to-video — a starting-frame image is + * required and text-to-video is rejected by the Imagine API. Used by the + * adapter to fail fast with a clear message instead of surfacing the raw + * "Text-to-video is not supported for this model" 400. + * + * @experimental Video generation is an experimental feature and may change. + */ +const GROK_VIDEO_IMAGE_TO_VIDEO_ONLY: ReadonlySet = new Set([ + 'grok-imagine-video-1.5', +]) + +/** + * True when the model only supports image-to-video (a starting frame is + * required). + * + * @experimental Video generation is an experimental feature and may change. + */ +export function isImageToVideoOnlyModel(model: string): boolean { + return GROK_VIDEO_IMAGE_TO_VIDEO_ONLY.has(model) +} diff --git a/packages/ai-grok/tests/video-adapter.test.ts b/packages/ai-grok/tests/video-adapter.test.ts new file mode 100644 index 000000000..a6239adbc --- /dev/null +++ b/packages/ai-grok/tests/video-adapter.test.ts @@ -0,0 +1,644 @@ +import { describe, expect, it, vi } from 'vitest' +import { resolveDebugOption } from '@tanstack/ai/adapter-internals' +import { + GrokVideoAdapter, + createGrokVideo, + grokVideo, +} from '../src/adapters/video' +import { + getGrokVideoDurationOptions, + parseGrokVideoSize, + validateVideoSize, +} from '../src/video/video-provider-options' + +const testLogger = resolveDebugOption(false) + +function jsonResponse(body: unknown, status = 200): Response { + return new Response(JSON.stringify(body), { + status, + headers: { 'Content-Type': 'application/json' }, + }) +} + +/** + * A `vi.fn` fetch stub with the real fetch parameter list, so call + * assertions (`mock.calls[0]`) are typed as `[input, init?]`. + */ +function mockFetch(handler: () => Response) { + return vi.fn(async (_input: string | URL | Request, _init?: RequestInit) => + handler(), + ) +} + +/** + * Builds an adapter whose HTTP layer is the provided mock — injected via + * the adapter config's `fetch` seam, so no globals are touched. + */ +function adapterWithFetch( + fetchMock: ( + input: string | URL | Request, + init?: RequestInit, + ) => Promise, +) { + return createGrokVideo('grok-imagine-video-1.5', 'test-api-key', { + fetch: fetchMock, + }) +} + +/** + * grok-imagine-video-1.5 is image-to-video only, so every request needs a + * starting-frame image part. This builds a text + image prompt for the + * request-shape / status / error tests. + */ +function i2vPrompt(text = 'p') { + return [ + { type: 'text' as const, content: text }, + { + type: 'image' as const, + source: { type: 'url' as const, value: 'https://example.com/start.png' }, + }, + ] +} + +describe('Grok Video Adapter', () => { + describe('factories', () => { + it('creates an adapter with the provided API key', () => { + const adapter = createGrokVideo('grok-imagine-video-1.5', 'test-api-key') + expect(adapter).toBeInstanceOf(GrokVideoAdapter) + expect(adapter.kind).toBe('video') + expect(adapter.name).toBe('grok') + expect(adapter.model).toBe('grok-imagine-video-1.5') + }) + + it('grokVideo reads XAI_API_KEY from the environment', () => { + vi.stubEnv('XAI_API_KEY', 'env-key') + try { + const adapter = grokVideo('grok-imagine-video-1.5') + expect(adapter).toBeInstanceOf(GrokVideoAdapter) + } finally { + vi.unstubAllEnvs() + } + }) + }) + + describe('createVideoJob', () => { + it('posts a JSON request to the Imagine generations endpoint', async () => { + const fetchMock = mockFetch(() => jsonResponse({ request_id: 'req-123' })) + const adapter = adapterWithFetch(fetchMock) + + const result = await adapter.createVideoJob({ + model: 'grok-imagine-video-1.5', + prompt: i2vPrompt('A red ball bouncing once'), + size: '16:9_720p', + duration: 5, + logger: testLogger, + }) + + expect(result).toEqual({ + jobId: 'req-123', + model: 'grok-imagine-video-1.5', + }) + expect(fetchMock).toHaveBeenCalledTimes(1) + const [url, init] = fetchMock.mock.calls[0]! + expect(url).toBe('https://api.x.ai/v1/videos/generations') + expect(init?.method).toBe('POST') + expect(init?.headers).toMatchObject({ + 'Content-Type': 'application/json', + Authorization: 'Bearer test-api-key', + }) + expect(JSON.parse(String(init?.body))).toEqual({ + model: 'grok-imagine-video-1.5', + prompt: 'A red ball bouncing once', + image: { url: 'https://example.com/start.png' }, + aspect_ratio: '16:9', + resolution: '720p', + duration: 5, + }) + }) + + it('maps a bare aspect-ratio size without a resolution', async () => { + const fetchMock = mockFetch(() => jsonResponse({ request_id: 'r' })) + const adapter = adapterWithFetch(fetchMock) + + await adapter.createVideoJob({ + model: 'grok-imagine-video-1.5', + prompt: i2vPrompt(), + size: '9:16', + logger: testLogger, + }) + + const body = JSON.parse(String(fetchMock.mock.calls[0]![1]?.body)) + expect(body.aspect_ratio).toBe('9:16') + expect(body).not.toHaveProperty('resolution') + expect(body).not.toHaveProperty('duration') + }) + + it('passes modelOptions through', async () => { + const fetchMock = mockFetch(() => jsonResponse({ request_id: 'r' })) + const adapter = adapterWithFetch(fetchMock) + + await adapter.createVideoJob({ + model: 'grok-imagine-video-1.5', + prompt: i2vPrompt('make the waterfall crash down'), + modelOptions: { + resolution: '1080p', + duration: 10, + }, + logger: testLogger, + }) + + const body = JSON.parse(String(fetchMock.mock.calls[0]![1]?.body)) + expect(body.prompt).toBe('make the waterfall crash down') + expect(body.resolution).toBe('1080p') + expect(body.duration).toBe(10) + }) + + it('maps an image prompt part to the starting frame (image-to-video)', async () => { + const fetchMock = mockFetch(() => jsonResponse({ request_id: 'r' })) + const adapter = adapterWithFetch(fetchMock) + + await adapter.createVideoJob({ + model: 'grok-imagine-video-1.5', + prompt: [ + { type: 'text', content: 'make the waterfall crash down' }, + { + type: 'image', + source: { type: 'url', value: 'https://example.com/still.png' }, + }, + ], + duration: 10, + logger: testLogger, + }) + + const body = JSON.parse(String(fetchMock.mock.calls[0]![1]?.body)) + // Prompt text is sent verbatim; the image becomes the starting frame. + expect(body.prompt).toBe('make the waterfall crash down') + expect(body.image).toEqual({ url: 'https://example.com/still.png' }) + expect(body.duration).toBe(10) + }) + + it('sends a base64 data source as a data URI starting frame', async () => { + const fetchMock = mockFetch(() => jsonResponse({ request_id: 'r' })) + const adapter = adapterWithFetch(fetchMock) + + await adapter.createVideoJob({ + model: 'grok-imagine-video-1.5', + prompt: [ + { type: 'text', content: 'pan out slowly' }, + { + type: 'image', + source: { type: 'data', mimeType: 'image/png', value: 'AAAA' }, + }, + ], + logger: testLogger, + }) + + const body = JSON.parse(String(fetchMock.mock.calls[0]![1]?.body)) + expect(body.image).toEqual({ url: 'data:image/png;base64,AAAA' }) + }) + + it('rejects more than one image prompt part before calling the API', async () => { + const fetchMock = mockFetch(() => jsonResponse({ request_id: 'r' })) + const adapter = adapterWithFetch(fetchMock) + + await expect( + adapter.createVideoJob({ + model: 'grok-imagine-video-1.5', + prompt: [ + { type: 'text', content: 'p' }, + { + type: 'image', + source: { type: 'url', value: 'https://example.com/a.png' }, + }, + { + type: 'image', + source: { type: 'url', value: 'https://example.com/b.png' }, + }, + ], + logger: testLogger, + }), + ).rejects.toThrow(/at most one starting-frame image/) + expect(fetchMock).not.toHaveBeenCalled() + }) + + it('rejects video and audio prompt parts before calling the API', async () => { + const fetchMock = mockFetch(() => jsonResponse({ request_id: 'r' })) + const adapter = adapterWithFetch(fetchMock) + + await expect( + adapter.createVideoJob({ + model: 'grok-imagine-video-1.5', + prompt: [ + { type: 'text', content: 'p' }, + { + type: 'video', + source: { type: 'url', value: 'https://example.com/clip.mp4' }, + }, + ], + logger: testLogger, + }), + ).rejects.toThrow(/does not support video prompt parts/) + expect(fetchMock).not.toHaveBeenCalled() + }) + + it('rejects a text-only prompt on 1.5 — image-to-video only', async () => { + const fetchMock = mockFetch(() => jsonResponse({ request_id: 'r' })) + const adapter = adapterWithFetch(fetchMock) + + await expect( + adapter.createVideoJob({ + model: 'grok-imagine-video-1.5', + prompt: 'a red ball bouncing once', + logger: testLogger, + }), + ).rejects.toThrow(/does not support text-to-video/) + expect(fetchMock).not.toHaveBeenCalled() + }) + + it('allows a text-only prompt on grok-imagine-video (text-to-video)', async () => { + const fetchMock = mockFetch(() => jsonResponse({ request_id: 'tv-1' })) + const adapter = createGrokVideo('grok-imagine-video', 'test-api-key', { + fetch: fetchMock, + }) + + const result = await adapter.createVideoJob({ + model: 'grok-imagine-video', + prompt: 'A beautiful sunset over the ocean', + size: '16:9_720p', + duration: 5, + logger: testLogger, + }) + + expect(result).toEqual({ jobId: 'tv-1', model: 'grok-imagine-video' }) + const body = JSON.parse(String(fetchMock.mock.calls[0]![1]?.body)) + expect(body.prompt).toBe('A beautiful sunset over the ocean') + expect(body).not.toHaveProperty('image') + }) + + it('maps a starting frame on grok-imagine-video (image-to-video)', async () => { + const fetchMock = mockFetch(() => jsonResponse({ request_id: 'iv-1' })) + const adapter = createGrokVideo('grok-imagine-video', 'test-api-key', { + fetch: fetchMock, + }) + + await adapter.createVideoJob({ + model: 'grok-imagine-video', + prompt: i2vPrompt('animate this'), + logger: testLogger, + }) + + const body = JSON.parse(String(fetchMock.mock.calls[0]![1]?.body)) + expect(body.image).toEqual({ url: 'https://example.com/start.png' }) + expect(body.prompt).toBe('animate this') + }) + + it('lets modelOptions win over the generic size template', async () => { + const fetchMock = mockFetch(() => jsonResponse({ request_id: 'r' })) + const adapter = adapterWithFetch(fetchMock) + + await adapter.createVideoJob({ + model: 'grok-imagine-video-1.5', + prompt: i2vPrompt(), + size: '16:9_480p', + modelOptions: { resolution: '1080p' }, + logger: testLogger, + }) + + const body = JSON.parse(String(fetchMock.mock.calls[0]![1]?.body)) + expect(body.aspect_ratio).toBe('16:9') + expect(body.resolution).toBe('1080p') + }) + + it('rejects unsupported sizes before calling the API', async () => { + const fetchMock = mockFetch(() => jsonResponse({ request_id: 'r' })) + const adapter = adapterWithFetch(fetchMock) + + await expect( + adapter.createVideoJob({ + model: 'grok-imagine-video-1.5', + prompt: 'p', + // @ts-expect-error invalid size is also rejected at compile time + size: '7:5', + logger: testLogger, + }), + ).rejects.toThrow(/Size "7:5" is not supported/) + await expect( + adapter.createVideoJob({ + model: 'grok-imagine-video-1.5', + prompt: 'p', + // @ts-expect-error invalid resolution is also rejected at compile time + size: '16:9_9k', + logger: testLogger, + }), + ).rejects.toThrow(/Resolution "9k" is not supported/) + expect(fetchMock).not.toHaveBeenCalled() + }) + + it('snaps out-of-range and non-integer durations into the valid range', async () => { + // [requested, snapped]: clamp to [1, 15], round to whole seconds. + const cases: Array<[number, number]> = [ + [0, 1], + [16, 15], + [2.5, 3], + [7, 7], + ] + for (const [requested, snapped] of cases) { + const fetchMock = mockFetch(() => jsonResponse({ request_id: 'r' })) + const adapter = adapterWithFetch(fetchMock) + await adapter.createVideoJob({ + model: 'grok-imagine-video-1.5', + prompt: i2vPrompt(), + duration: requested, + logger: testLogger, + }) + const body = JSON.parse(String(fetchMock.mock.calls[0]![1]?.body)) + expect(body.duration).toBe(snapped) + } + }) + + it('snaps a duration supplied via modelOptions', async () => { + const fetchMock = mockFetch(() => jsonResponse({ request_id: 'r' })) + const adapter = adapterWithFetch(fetchMock) + + await adapter.createVideoJob({ + model: 'grok-imagine-video-1.5', + prompt: i2vPrompt(), + modelOptions: { duration: 99 }, + logger: testLogger, + }) + + const body = JSON.parse(String(fetchMock.mock.calls[0]![1]?.body)) + expect(body.duration).toBe(15) + }) + + it('surfaces API error messages from the xAI error body', async () => { + const fetchMock = mockFetch(() => + jsonResponse( + { + code: 'invalid-argument', + error: 'Duration must be between 1 and 15 seconds', + }, + 400, + ), + ) + const adapter = adapterWithFetch(fetchMock) + + await expect( + adapter.createVideoJob({ + model: 'grok-imagine-video-1.5', + prompt: i2vPrompt(), + logger: testLogger, + }), + ).rejects.toThrow( + /video generation request failed \(400.*Duration must be between 1 and 15 seconds/, + ) + }) + + it('throws when the response carries no request_id', async () => { + const fetchMock = mockFetch(() => jsonResponse({})) + const adapter = adapterWithFetch(fetchMock) + + await expect( + adapter.createVideoJob({ + model: 'grok-imagine-video-1.5', + prompt: i2vPrompt(), + logger: testLogger, + }), + ).rejects.toThrow(/no request_id/) + }) + + it('honours a custom baseURL', async () => { + const fetchMock = mockFetch(() => jsonResponse({ request_id: 'r' })) + const adapter = createGrokVideo('grok-imagine-video-1.5', 'k', { + baseURL: 'https://proxy.example.com/v1', + fetch: fetchMock, + }) + + await adapter.createVideoJob({ + model: 'grok-imagine-video-1.5', + prompt: i2vPrompt(), + logger: testLogger, + }) + + expect(fetchMock.mock.calls[0]![0]).toBe( + 'https://proxy.example.com/v1/videos/generations', + ) + }) + }) + + describe('getVideoStatus', () => { + it('maps a pending job with progress', async () => { + const fetchMock = mockFetch(() => + jsonResponse({ status: 'pending', progress: 18 }), + ) + const adapter = adapterWithFetch(fetchMock) + + const status = await adapter.getVideoStatus('req-123') + + expect(fetchMock.mock.calls[0]![0]).toBe( + 'https://api.x.ai/v1/videos/req-123', + ) + expect(status).toEqual({ + jobId: 'req-123', + status: 'pending', + progress: 18, + }) + }) + + it('maps a done job to completed', async () => { + const fetchMock = mockFetch(() => + jsonResponse({ + status: 'done', + progress: 100, + video: { url: 'https://vidgen.x.ai/video.mp4', duration: 5 }, + }), + ) + const adapter = adapterWithFetch(fetchMock) + + expect(await adapter.getVideoStatus('req-123')).toEqual({ + jobId: 'req-123', + status: 'completed', + progress: 100, + }) + }) + + it.each(['failed', 'expired'])('maps %s to failed', async (apiStatus) => { + const fetchMock = mockFetch(() => + jsonResponse({ status: apiStatus, error: 'moderation' }), + ) + const adapter = adapterWithFetch(fetchMock) + + expect(await adapter.getVideoStatus('req-123')).toEqual({ + jobId: 'req-123', + status: 'failed', + error: 'moderation', + }) + }) + + it('maps an unknown in-flight status to processing', async () => { + const fetchMock = mockFetch(() => jsonResponse({ status: 'generating' })) + const adapter = adapterWithFetch(fetchMock) + + expect((await adapter.getVideoStatus('req-123')).status).toBe( + 'processing', + ) + }) + + it('reports a 404 as a failed job rather than throwing', async () => { + const fetchMock = mockFetch(() => + jsonResponse( + { code: 'not-found', error: 'Failed to read static file.' }, + 404, + ), + ) + const adapter = adapterWithFetch(fetchMock) + + expect(await adapter.getVideoStatus('missing')).toEqual({ + jobId: 'missing', + status: 'failed', + error: 'Job not found', + }) + }) + + it('throws on non-404 API errors', async () => { + const fetchMock = mockFetch(() => + jsonResponse({ error: 'server exploded' }, 500), + ) + const adapter = adapterWithFetch(fetchMock) + + await expect(adapter.getVideoStatus('req-123')).rejects.toThrow( + /video status request failed \(500/, + ) + }) + }) + + describe('getVideoUrl', () => { + it('returns the video URL with billed seconds and exact cost', async () => { + const fetchMock = mockFetch(() => + jsonResponse({ + status: 'done', + progress: 100, + model: 'grok-imagine-video-1.5', + video: { + url: 'https://vidgen.x.ai/video.mp4', + duration: 5, + }, + usage: { cost_in_usd_ticks: 2_500_000_000 }, + }), + ) + const adapter = adapterWithFetch(fetchMock) + + expect(await adapter.getVideoUrl('req-123')).toEqual({ + jobId: 'req-123', + url: 'https://vidgen.x.ai/video.mp4', + usage: { + promptTokens: 0, + completionTokens: 0, + totalTokens: 0, + unitsBilled: 5, + cost: 0.25, + }, + }) + }) + + it('omits usage when the response carries none', async () => { + const fetchMock = mockFetch(() => + jsonResponse({ + status: 'done', + video: { url: 'https://vidgen.x.ai/video.mp4' }, + }), + ) + const adapter = adapterWithFetch(fetchMock) + + expect(await adapter.getVideoUrl('req-123')).toEqual({ + jobId: 'req-123', + url: 'https://vidgen.x.ai/video.mp4', + }) + }) + + it('throws when the job is not finished yet', async () => { + const fetchMock = mockFetch(() => + jsonResponse({ status: 'pending', progress: 40 }), + ) + const adapter = adapterWithFetch(fetchMock) + + await expect(adapter.getVideoUrl('req-123')).rejects.toThrow( + /not ready for download/, + ) + }) + + it('throws with the provider error when the job failed', async () => { + const fetchMock = mockFetch(() => + jsonResponse({ status: 'failed', error: 'moderation' }), + ) + const adapter = adapterWithFetch(fetchMock) + + await expect(adapter.getVideoUrl('req-123')).rejects.toThrow( + /Video generation failed: moderation/, + ) + }) + + it('throws a not-found error for unknown jobs', async () => { + const fetchMock = mockFetch(() => + jsonResponse({ code: 'not-found', error: 'nope' }, 404), + ) + const adapter = adapterWithFetch(fetchMock) + + await expect(adapter.getVideoUrl('missing')).rejects.toThrow( + /Video job not found: missing/, + ) + }) + }) + + describe('video provider option helpers', () => { + it('parses size templates', () => { + expect(parseGrokVideoSize('16:9_720p')).toEqual({ + aspectRatio: '16:9', + resolution: '720p', + }) + expect(parseGrokVideoSize('3:4')).toEqual({ aspectRatio: '3:4' }) + expect(parseGrokVideoSize('not-a-size')).toBeUndefined() + }) + + it('validates sizes', () => { + expect(() => validateVideoSize('m', '16:9')).not.toThrow() + expect(() => validateVideoSize('m', '2:3_1080p')).not.toThrow() + expect(() => validateVideoSize('m', undefined)).not.toThrow() + expect(() => validateVideoSize('m', '9:19.5')).toThrow(/not supported/) + expect(() => validateVideoSize('m', 'auto')).toThrow(/not supported/) + expect(() => validateVideoSize('m', '16:9_2k')).toThrow(/Resolution/) + }) + + it('exposes the 1–15s duration range via getGrokVideoDurationOptions', () => { + expect(getGrokVideoDurationOptions('grok-imagine-video')).toEqual({ + kind: 'range', + min: 1, + max: 15, + step: 1, + unit: 'seconds', + }) + expect(getGrokVideoDurationOptions('grok-imagine-video-1.5')).toEqual({ + kind: 'range', + min: 1, + max: 15, + step: 1, + unit: 'seconds', + }) + }) + + it('availableDurations / snapDuration coerce raw seconds into range', () => { + const adapter = createGrokVideo('grok-imagine-video', 'test-api-key') + expect(adapter.availableDurations()).toEqual({ + kind: 'range', + min: 1, + max: 15, + step: 1, + unit: 'seconds', + }) + expect(adapter.snapDuration(0)).toBe(1) + expect(adapter.snapDuration(16)).toBe(15) + expect(adapter.snapDuration(2.5)).toBe(3) + expect(adapter.snapDuration(7)).toBe(7) + }) + }) +}) diff --git a/packages/ai/skills/ai-core/media-generation/SKILL.md b/packages/ai/skills/ai-core/media-generation/SKILL.md index cae40b000..966e6253e 100644 --- a/packages/ai/skills/ai-core/media-generation/SKILL.md +++ b/packages/ai/skills/ai-core/media-generation/SKILL.md @@ -3,10 +3,10 @@ name: ai-core/media-generation description: > Image, audio, video, speech (TTS), and transcription generation using activity-specific adapters: generateImage() with openaiImage/geminiImage, - generateAudio() with geminiAudio/falAudio, generateVideo() with - openaiVideo/geminiVideo (async polling, per-model typed durations), - generateSpeech() with openaiSpeech, generateTranscription() with - openaiTranscription. React hooks: useGenerateImage, useGenerateAudio, + generateAudio() with geminiAudio/falAudio, generateVideo() with async + polling (openaiVideo/geminiVideo/grokVideo/falVideo, per-model typed + durations), generateSpeech() with openaiSpeech, generateTranscription() + with openaiTranscription. React hooks: useGenerateImage, useGenerateAudio, useGenerateSpeech, useTranscription, useGenerateVideo. TanStack Start server function integration with toServerSentEventsResponse. type: sub-skill @@ -454,6 +454,13 @@ const { jobId } = await generateVideo({ // (x-goog-api-key header or ?key= query parameter). ``` +Other video adapters: `openaiVideo('sora-2')` (pixel sizes like `'1280x720'`, +durations 4/8/12s, single `input_reference` image prompt part), `grokVideo(...)` +(`grok-imagine-video` does text-to-video + image-to-video; `grok-imagine-video-1.5` is +image-to-video only — needs an `image` prompt part as the starting frame, text-only throws; +aspect-ratio size template like `'16:9_720p'`, integer durations 1-15s, reports +`usage.unitsBilled` seconds and exact `usage.cost`), and `falVideo(...)` (hosted models, see cost tracking below). + Client hook with job tracking: ```tsx diff --git a/pnpm-lock.yaml b/pnpm-lock.yaml index daec47674..c1fdd141c 100644 --- a/pnpm-lock.yaml +++ b/pnpm-lock.yaml @@ -555,6 +555,9 @@ importers: '@tanstack/ai-gemini': specifier: workspace:* version: link:../../packages/ai-gemini + '@tanstack/ai-grok': + specifier: workspace:* + version: link:../../packages/ai-grok '@tanstack/react-router': specifier: ^1.158.4 version: 1.159.5(react-dom@19.2.3(react@19.2.3))(react@19.2.3)