TanStack
diff --git a/‎.changeset/image-and-video-inputs.md‎
Lines changed: 4 additions & 1 deletion b/‎.changeset/image-and-video-inputs.md‎
Lines changed: 4 additions & 1 deletion
diff --git a/‎docs/media/video-generation.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/media/video-generation.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎examples/ts-react-media/src/components/ImageGenerator.tsx‎
Lines changed: 82 additions & 4 deletions b/‎examples/ts-react-media/src/components/ImageGenerator.tsx‎
Lines changed: 82 additions & 4 deletions
diff --git a/‎examples/ts-react-media/src/components/VideoGenerator.tsx‎
Lines changed: 16 additions & 11 deletions b/‎examples/ts-react-media/src/components/VideoGenerator.tsx‎
Lines changed: 16 additions & 11 deletions
diff --git a/‎examples/ts-react-media/src/lib/media.ts‎
Lines changed: 78 additions & 0 deletions b/‎examples/ts-react-media/src/lib/media.ts‎
Lines changed: 78 additions & 0 deletions
@@ -5,6 +5,7 @@
 '@tanstack/ai-fal': minor
 '@tanstack/ai-grok': minor
 '@tanstack/ai-openrouter': minor
+'@tanstack/ai-client': minor
 '@tanstack/ai-event-client': patch
 ---
 
@@ -17,11 +18,13 @@ Provider behavior in this release:
 - **OpenAI image** — Prompts with image parts route `gpt-image-2` / `gpt-image-1` / `gpt-image-1-mini` to `images.edit()` (up to 16 source images plus optional mask); `dall-e-2` routes to `images.edit()` with one source image; `dall-e-3` rejects image parts at compile time and at runtime.
 - **OpenAI video** — Sora-2 / Sora-2-Pro accept a single image part as `input_reference`; passing more than one throws.
 - **Gemini image** — Native models (`gemini-*-flash-image`, "nano-banana") map prompt parts 1:1 onto multimodal `contents`, preserving interleaved order. Imagen is text-only (compile-time + runtime rejection).
-- **fal.ai** — Field names resolve per endpoint from a map generated from the fal SDK's endpoint types (362 endpoints with nonstandard fields, e.g. nano-banana edit → `image_urls`, Kling i2v start frame → `image_url`, Veo first-last-frame → `first_frame_url` / `last_frame_url`). Defaults for endpoints not in the map: single → `image_url`, multiple → `image_urls`; `role: 'mask'` → `mask_url`; `role: 'control'` → `control_image_url`; `role: 'reference'` / `'character'` → `reference_image_urls`; video `role: 'start_frame'` / `'end_frame'` → `start_image_url` / `end_image_url`. Per-model prompt modalities are derived at the type level from the SDK's endpoint input types. Regenerate the map after a fal SDK bump with `pnpm generate:fal-image-fields` (a unit test fails when it goes stale).
+- **fal.ai** — Field names resolve per endpoint from a map generated from the fal SDK's endpoint types (362 endpoints with nonstandard fields, e.g. nano-banana edit → `image_urls`, Kling i2v start frame → `image_url`, Veo first-last-frame → `first_frame_url` / `last_frame_url`). Defaults for endpoints not in the map: single → `image_url`, multiple → `image_urls`; `role: 'mask'` → `mask_url`; `role: 'control'` → `control_image_url`; `role: 'reference'` / `'character'` → `reference_image_urls`; video `role: 'start_frame'` / `'end_frame'` → `start_image_url` / `end_image_url`. Per-model prompt modalities are derived at the type level from the SDK's endpoint input types. Regenerate the map after a fal SDK bump with `pnpm generate:fal-image-fields` (a unit test fails when it goes stale). In `FalImageProviderOptions` / `FalVideoProviderOptions`, media-conditioning fields the mappers can populate (`image_url`, `start_image_url`, `video_url`, `audio_url`, …) are demoted from required to optional — supply them as prompt parts, or keep passing them explicitly via `modelOptions`.
 - **Grok** — New `grok-imagine-image` / `grok-imagine-image-quality` models. Prompts with image parts route to xAI's JSON `/v1/images/edits` endpoint (up to 3 source images, addressed by xAI in request order; the prompt is sent verbatim). `role: 'mask'` / `'control'` throw. Their `size` uses an `aspectRatio_resolution` template (`'16:9_2k'`, suffix optional) mirroring Gemini's native image models. `grok-2-image-1212` remains text-to-image only.
 - **OpenRouter** — Prompt parts map 1:1 onto multimodal `text` / `image_url` chat content parts, preserving interleaved order, and are forwarded to the underlying image model. URL sources pass through verbatim (no fetching or re-encoding in your process); `data` sources become data URIs.
 - **Anthropic** — Unchanged (no image generation API).
 
 A new `resolveMediaPrompt()` utility (exported from `@tanstack/ai`) is the single downrev point from the canonical interleaved prompt shape to flattened text + per-modality part buckets, for adapter authors.
 
+On the client side, `ImageGenerateInput.prompt` and `VideoGenerateInput.prompt` (`@tanstack/ai-client`, and the `useGenerateImage` / `useGenerateVideo` hooks built on them) are widened from `string` to the same `MediaPrompt` shape, so prompt parts can be sent from the browser through your server route to `generateImage()` / `generateVideo()`.
+
 Closes #618.
@@ -444,7 +444,7 @@ await generateVideo({
 | Provider     | Image-to-Video Behavior                                                                                  |
 | ------------ | -------------------------------------------------------------------------------------------------------- |
 | **OpenAI**   | Sora-2 / Sora-2-Pro → the image part goes to `input_reference`; flattened text is the prompt. Single image only — throws if more than one. |
-| **fal.ai**   | Field names resolve per endpoint from a map generated from the fal SDK's endpoint types — e.g. `role: 'start_frame'` lands on `image_url` for Kling/Veo image-to-video, `first_frame_url` for first-last-frame endpoints, and `start_image_url` otherwise. Defaults: single input → `image_url` (start frame); `role: 'end_frame'` → `end_image_url`; `role: 'reference'` / `'character'` → `reference_image_urls`. Override per-endpoint via `modelOptions`. |
+| **fal.ai**   | Field names resolve per endpoint from a map generated from the fal SDK's endpoint types — e.g. `role: 'start_frame'` lands on `image_url` for Kling/Veo image-to-video, `first_frame_url` for first-last-frame endpoints, and `start_image_url` otherwise. Defaults: single input → `image_url` (start frame); `role: 'end_frame'` → `end_image_url`; `role: 'reference'` / `'character'` → `reference_image_urls`. Override per-endpoint via `modelOptions` — the media-conditioning fields are typed optional there (even when the endpoint requires them) since they usually arrive as prompt parts. |
 | **Gemini**   | Veo adapter not yet implemented — image prompt parts will be supported when Veo lands.                    |
 
 Adapters whose underlying API can't accept image inputs throw a clear
 
@@ -1,10 +1,13 @@
-import { useState } from 'react'
-import { ImageIcon, Loader2, Shuffle } from 'lucide-react'
+import { useRef, useState } from 'react'
+import { ImageIcon, Loader2, Plus, Shuffle, X } from 'lucide-react'
 import type { ImageGenerationResult } from '@tanstack/ai'
+import type { MediaPrompt } from '@tanstack/ai/client'
 
 import { generateImageFn } from '@/lib/server-functions'
 import { getRandomImagePrompt } from '@/lib/prompts'
 import { IMAGE_MODELS } from '@/lib/models'
+import { readImageFile, toImagePart } from '@/lib/media'
+import type { AttachedImage } from '@/lib/media'
 
 interface ImageGeneratorProps {
   onImageGenerated?: (imageUrl: string) => void
@@ -32,11 +35,37 @@ export default function ImageGenerator({
   const [selectedModel, setSelectedModel] = useState<string>('all')
   const [isLoading, setIsLoading] = useState(false)
   const [results, setResults] = useState<Record<string, ModelResult>>({})
+  const [images, setImages] = useState<Array<AttachedImage>>([])
+  const fileInputRef = useRef<HTMLInputElement>(null)
 
   const currentModel = IMAGE_MODELS.find((m) => m.id === selectedModel)
 
+  // When images are attached, send an ordered parts array (text first, then one
+  // image part per attachment). Otherwise send the plain string. Only image-capable
+  // models accept image inputs — unsupported models surface a server error.
+  const buildPrompt = (): MediaPrompt => {
+    if (images.length === 0) return prompt
+    return [
+      { type: 'text', content: prompt },
+      ...images.map((image) => toImagePart(image)),
+    ]
+  }
+
+  const handleImageSelect = async (e: React.ChangeEvent<HTMLInputElement>) => {
+    const files = Array.from(e.target.files ?? [])
+    if (fileInputRef.current) fileInputRef.current.value = ''
+    if (files.length === 0) return
+    const attached = await Promise.all(files.map((file) => readImageFile(file)))
+    setImages((prev) => [...prev, ...attached])
+  }
+
+  const removeImage = (id: string) => {
+    setImages((prev) => prev.filter((image) => image.id !== id))
+  }
+
   const handleGenerate = async () => {
     if (!prompt.trim()) return
+    const builtPrompt = buildPrompt()
 
     setIsLoading(true)
     setResults({})
@@ -53,7 +82,7 @@ export default function ImageGenerator({
       const promises = IMAGE_MODELS.map(async (model) => {
         try {
           const response = await generateImageFn({
-            data: { prompt, model: model.id },
+            data: { prompt: builtPrompt, model: model.id },
           })
           setResults((prev) => ({
             ...prev,
@@ -83,7 +112,7 @@ export default function ImageGenerator({
 
       try {
         const response = await generateImageFn({
-          data: { prompt, model: selectedModel },
+          data: { prompt: builtPrompt, model: selectedModel },
         })
         setResults({ [selectedModel]: { status: 'success', result: response } })
         const image = response.images[0]
@@ -162,6 +191,55 @@ export default function ImageGenerator({
           />
         </div>
 
+        <div>
+          <div className="flex items-center justify-between mb-2">
+            <label className="text-sm font-medium text-gray-300">
+              Reference Images
+            </label>
+            <span className="text-xs text-gray-500">
+              Supported by Gemini native (NanoBanana) models only
+            </span>
+          </div>
+          <div className="flex flex-wrap gap-2">
+            {images.map((image) => (
+              <div
+                key={image.id}
+                className="relative w-20 h-20 rounded-lg overflow-hidden border border-gray-700"
+              >
+                <img
+                  src={image.dataUrl}
+                  alt={image.name}
+                  className="w-full h-full object-cover"
+                />
+                <button
+                  onClick={() => removeImage(image.id)}
+                  disabled={isLoading}
+                  className="absolute top-1 right-1 p-0.5 bg-gray-900/80 hover:bg-gray-800 rounded-full text-white disabled:opacity-50"
+                  aria-label={`Remove ${image.name}`}
+                >
+                  <X className="w-3.5 h-3.5" />
+                </button>
+              </div>
+            ))}
+            <button
+              onClick={() => fileInputRef.current?.click()}
+              disabled={isLoading}
+              className="w-20 h-20 flex flex-col items-center justify-center gap-1 border-2 border-dashed border-gray-600 hover:border-gray-500 rounded-lg text-gray-400 hover:text-gray-300 transition-colors disabled:opacity-50"
+            >
+              <Plus className="w-5 h-5" />
+              <span className="text-xs">Add</span>
+            </button>
+          </div>
+          <input
+            ref={fileInputRef}
+            type="file"
+            accept="image/*"
+            multiple
+            onChange={handleImageSelect}
+            className="hidden"
+          />
+        </div>
+
         <button
           onClick={handleGenerate}
           disabled={isLoading || !prompt.trim()}
 
@@ -9,6 +9,7 @@ import {
 } from '@/lib/server-functions'
 import { VIDEO_MODELS } from '@/lib/models'
 import { getRandomVideoPrompt } from '@/lib/prompts'
+import { imageUrlToPart, readImageFile } from '@/lib/media'
 
 type JobState =
   | { status: 'idle' }
@@ -61,15 +62,12 @@ export default function VideoGenerator({
     }
   }, [])
 
-  const handleImageSelect = (e: React.ChangeEvent<HTMLInputElement>) => {
+  const handleImageSelect = async (e: React.ChangeEvent<HTMLInputElement>) => {
     const file = e.target.files?.[0]
+    if (fileInputRef.current) fileInputRef.current.value = ''
     if (!file) return
-
-    const reader = new FileReader()
-    reader.onload = (event) => {
-      setImagePreview(event.target?.result as string)
-    }
-    reader.readAsDataURL(file)
+    const attached = await readImageFile(file)
+    setImagePreview(attached.dataUrl)
   }
 
   const clearImage = () => {
@@ -136,13 +134,20 @@ export default function VideoGenerator({
     }))
 
     try {
-      const imageUrl =
-        mode === 'image-to-video' ? (imagePreview ?? undefined) : undefined
+      // Image-to-video sends the start frame as a prompt part — the fal
+      // adapter routes `role: 'start_frame'` to the endpoint's start-image
+      // field (e.g. `image_url` on Kling i2v).
+      const builtPrompt =
+        mode === 'image-to-video' && imagePreview
+          ? [
+              { type: 'text' as const, content: prompt },
+              imageUrlToPart(imagePreview, { role: 'start_frame' }),
+            ]
+          : prompt
       const result = await createVideoJobFn({
         data: {
-          prompt,
+          prompt: builtPrompt,
           model: modelId,
-          ...(imageUrl !== undefined && { imageUrl }),
         },
       })
 
 
@@ -0,0 +1,78 @@
+import type { MediaInputMetadata, MediaPromptPart } from '@tanstack/ai/client'
+
+/**
+ * An image the user attached as conditioning input. `dataUrl` is the full
+ * `data:<mime>;base64,...` string used directly for the thumbnail preview;
+ * `base64` is the same payload with the prefix stripped for the prompt part.
+ */
+export interface AttachedImage {
+  id: string
+  name: string
+  mimeType: string
+  /** Full data URL, used for the <img> preview. */
+  dataUrl: string
+  /** Base64 payload without the `data:` prefix, used for the prompt part. */
+  base64: string
+}
+
+/** Reads a File into an AttachedImage (data URL preview + raw base64 payload). */
+export function readImageFile(file: File): Promise<AttachedImage> {
+  return new Promise((resolve, reject) => {
+    const reader = new FileReader()
+    reader.onerror = () =>
+      reject(reader.error ?? new Error('Failed to read file'))
+    reader.onload = () => {
+      const dataUrl = reader.result
+      if (typeof dataUrl !== 'string') {
+        reject(new Error('Unexpected file read result'))
+        return
+      }
+      const base64 = dataUrl.slice(dataUrl.indexOf(',') + 1)
+      resolve({
+        id: crypto.randomUUID(),
+        name: file.name,
+        mimeType: file.type,
+        dataUrl,
+        base64,
+      })
+    }
+    reader.readAsDataURL(file)
+  })
+}
+
+/** Builds an image prompt part from an attached image, with optional role hint. */
+export function toImagePart(
+  image: AttachedImage,
+  metadata?: MediaInputMetadata,
+): MediaPromptPart {
+  return {
+    type: 'image',
+    source: { type: 'data', value: image.base64, mimeType: image.mimeType },
+    ...(metadata ? { metadata } : {}),
+  }
+}
+
+/**
+ * Builds an image prompt part from a URL string — either a remote URL
+ * (passed through as a `url` source) or a `data:` URL (decomposed into a
+ * `data` source so adapters that upload files get the raw payload).
+ */
+export function imageUrlToPart(
+  url: string,
+  metadata?: MediaInputMetadata,
+): MediaPromptPart {
+  const meta = metadata ? { metadata } : {}
+  if (!url.startsWith('data:')) {
+    return { type: 'image', source: { type: 'url', value: url }, ...meta }
+  }
+  const comma = url.indexOf(',')
+  const mimeType = url.slice(5, comma).split(';')[0]
+  if (comma === -1 || !mimeType) {
+    throw new Error('data: URL is missing a mime type')
+  }
+  return {
+    type: 'image',
+    source: { type: 'data', value: url.slice(comma + 1), mimeType },
+    ...meta,
+  }
+}