Skip to content

Commit 349eb0d

Browse files
tombeckenhamclaude
andcommitted
feat: client-side multimodal prompts, e2e coverage, media example, fal field demotion
- ai-client: widen ImageGenerateInput.prompt / VideoGenerateInput.prompt from string to MediaPrompt so useGenerateImage/useGenerateVideo can carry image parts from the browser; re-export the MediaPrompt types from @tanstack/ai/client - ai-fal: demote media-conditioning fields (FalImageFieldName set plus video_url/video_urls/reference_video_urls/audio_url) from required to optional in FalImageProviderOptions / FalVideoProviderOptions — i2v endpoints declare e.g. image_url as required, but with a multimodal prompt the start frame arrives as a prompt part; modelOptions stays available as the explicit escape hatch - e2e: real coverage for image-to-image (OpenAI /v1/images/edits) and image-to-video (Sora multipart /v1/videos with input_reference) — the installed aimock 1.29 mocks both multipart endpoints, so the previous "aimock can't mock this" empty provider sets were stale. New specs run all three transports and assert via aimock's request journal that the expected wire endpoint was hit. ImageGenUI/VideoGenUI gain a file input, feature routing/fixtures/onVideo registration added, README matrix updated - examples/ts-react-media: ImageGenerator gains a multi-image reference picker (Gemini native models); VideoGenerator sends the start frame as a prompt part with role 'start_frame' instead of modelOptions URLs; server functions narrow the wire prompt per model and throw on unsupported part kinds instead of dropping them Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 36d7683 commit 349eb0d

26 files changed

Lines changed: 731 additions & 93 deletions

.changeset/image-and-video-inputs.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
'@tanstack/ai-fal': minor
66
'@tanstack/ai-grok': minor
77
'@tanstack/ai-openrouter': minor
8+
'@tanstack/ai-client': minor
89
'@tanstack/ai-event-client': patch
910
---
1011

@@ -17,11 +18,13 @@ Provider behavior in this release:
1718
- **OpenAI image** — Prompts with image parts route `gpt-image-2` / `gpt-image-1` / `gpt-image-1-mini` to `images.edit()` (up to 16 source images plus optional mask); `dall-e-2` routes to `images.edit()` with one source image; `dall-e-3` rejects image parts at compile time and at runtime.
1819
- **OpenAI video** — Sora-2 / Sora-2-Pro accept a single image part as `input_reference`; passing more than one throws.
1920
- **Gemini image** — Native models (`gemini-*-flash-image`, "nano-banana") map prompt parts 1:1 onto multimodal `contents`, preserving interleaved order. Imagen is text-only (compile-time + runtime rejection).
20-
- **fal.ai** — Field names resolve per endpoint from a map generated from the fal SDK's endpoint types (362 endpoints with nonstandard fields, e.g. nano-banana edit → `image_urls`, Kling i2v start frame → `image_url`, Veo first-last-frame → `first_frame_url` / `last_frame_url`). Defaults for endpoints not in the map: single → `image_url`, multiple → `image_urls`; `role: 'mask'``mask_url`; `role: 'control'``control_image_url`; `role: 'reference'` / `'character'``reference_image_urls`; video `role: 'start_frame'` / `'end_frame'``start_image_url` / `end_image_url`. Per-model prompt modalities are derived at the type level from the SDK's endpoint input types. Regenerate the map after a fal SDK bump with `pnpm generate:fal-image-fields` (a unit test fails when it goes stale).
21+
- **fal.ai** — Field names resolve per endpoint from a map generated from the fal SDK's endpoint types (362 endpoints with nonstandard fields, e.g. nano-banana edit → `image_urls`, Kling i2v start frame → `image_url`, Veo first-last-frame → `first_frame_url` / `last_frame_url`). Defaults for endpoints not in the map: single → `image_url`, multiple → `image_urls`; `role: 'mask'` → `mask_url`; `role: 'control'` → `control_image_url`; `role: 'reference'` / `'character'` → `reference_image_urls`; video `role: 'start_frame'` / `'end_frame'` → `start_image_url` / `end_image_url`. Per-model prompt modalities are derived at the type level from the SDK's endpoint input types. Regenerate the map after a fal SDK bump with `pnpm generate:fal-image-fields` (a unit test fails when it goes stale). In `FalImageProviderOptions` / `FalVideoProviderOptions`, media-conditioning fields the mappers can populate (`image_url`, `start_image_url`, `video_url`, `audio_url`, …) are demoted from required to optional — supply them as prompt parts, or keep passing them explicitly via `modelOptions`.
2122
- **Grok** — New `grok-imagine-image` / `grok-imagine-image-quality` models. Prompts with image parts route to xAI's JSON `/v1/images/edits` endpoint (up to 3 source images, addressed by xAI in request order; the prompt is sent verbatim). `role: 'mask'` / `'control'` throw. Their `size` uses an `aspectRatio_resolution` template (`'16:9_2k'`, suffix optional) mirroring Gemini's native image models. `grok-2-image-1212` remains text-to-image only.
2223
- **OpenRouter** — Prompt parts map 1:1 onto multimodal `text` / `image_url` chat content parts, preserving interleaved order, and are forwarded to the underlying image model. URL sources pass through verbatim (no fetching or re-encoding in your process); `data` sources become data URIs.
2324
- **Anthropic** — Unchanged (no image generation API).
2425

2526
A new `resolveMediaPrompt()` utility (exported from `@tanstack/ai`) is the single downrev point from the canonical interleaved prompt shape to flattened text + per-modality part buckets, for adapter authors.
2627

28+
On the client side, `ImageGenerateInput.prompt` and `VideoGenerateInput.prompt` (`@tanstack/ai-client`, and the `useGenerateImage` / `useGenerateVideo` hooks built on them) are widened from `string` to the same `MediaPrompt` shape, so prompt parts can be sent from the browser through your server route to `generateImage()` / `generateVideo()`.
29+
2730
Closes #618.

docs/media/video-generation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -444,7 +444,7 @@ await generateVideo({
444444
| Provider | Image-to-Video Behavior |
445445
| ------------ | -------------------------------------------------------------------------------------------------------- |
446446
| **OpenAI** | Sora-2 / Sora-2-Pro → the image part goes to `input_reference`; flattened text is the prompt. Single image only — throws if more than one. |
447-
| **fal.ai** | Field names resolve per endpoint from a map generated from the fal SDK's endpoint types — e.g. `role: 'start_frame'` lands on `image_url` for Kling/Veo image-to-video, `first_frame_url` for first-last-frame endpoints, and `start_image_url` otherwise. Defaults: single input → `image_url` (start frame); `role: 'end_frame'``end_image_url`; `role: 'reference'` / `'character'``reference_image_urls`. Override per-endpoint via `modelOptions`. |
447+
| **fal.ai** | Field names resolve per endpoint from a map generated from the fal SDK's endpoint types — e.g. `role: 'start_frame'` lands on `image_url` for Kling/Veo image-to-video, `first_frame_url` for first-last-frame endpoints, and `start_image_url` otherwise. Defaults: single input → `image_url` (start frame); `role: 'end_frame'``end_image_url`; `role: 'reference'` / `'character'``reference_image_urls`. Override per-endpoint via `modelOptions` — the media-conditioning fields are typed optional there (even when the endpoint requires them) since they usually arrive as prompt parts. |
448448
| **Gemini** | Veo adapter not yet implemented — image prompt parts will be supported when Veo lands. |
449449

450450
Adapters whose underlying API can't accept image inputs throw a clear

examples/ts-react-media/src/components/ImageGenerator.tsx

Lines changed: 82 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,13 @@
1-
import { useState } from 'react'
2-
import { ImageIcon, Loader2, Shuffle } from 'lucide-react'
1+
import { useRef, useState } from 'react'
2+
import { ImageIcon, Loader2, Plus, Shuffle, X } from 'lucide-react'
33
import type { ImageGenerationResult } from '@tanstack/ai'
4+
import type { MediaPrompt } from '@tanstack/ai/client'
45

56
import { generateImageFn } from '@/lib/server-functions'
67
import { getRandomImagePrompt } from '@/lib/prompts'
78
import { IMAGE_MODELS } from '@/lib/models'
9+
import { readImageFile, toImagePart } from '@/lib/media'
10+
import type { AttachedImage } from '@/lib/media'
811

912
interface ImageGeneratorProps {
1013
onImageGenerated?: (imageUrl: string) => void
@@ -32,11 +35,37 @@ export default function ImageGenerator({
3235
const [selectedModel, setSelectedModel] = useState<string>('all')
3336
const [isLoading, setIsLoading] = useState(false)
3437
const [results, setResults] = useState<Record<string, ModelResult>>({})
38+
const [images, setImages] = useState<Array<AttachedImage>>([])
39+
const fileInputRef = useRef<HTMLInputElement>(null)
3540

3641
const currentModel = IMAGE_MODELS.find((m) => m.id === selectedModel)
3742

43+
// When images are attached, send an ordered parts array (text first, then one
44+
// image part per attachment). Otherwise send the plain string. Only image-capable
45+
// models accept image inputs — unsupported models surface a server error.
46+
const buildPrompt = (): MediaPrompt => {
47+
if (images.length === 0) return prompt
48+
return [
49+
{ type: 'text', content: prompt },
50+
...images.map((image) => toImagePart(image)),
51+
]
52+
}
53+
54+
const handleImageSelect = async (e: React.ChangeEvent<HTMLInputElement>) => {
55+
const files = Array.from(e.target.files ?? [])
56+
if (fileInputRef.current) fileInputRef.current.value = ''
57+
if (files.length === 0) return
58+
const attached = await Promise.all(files.map((file) => readImageFile(file)))
59+
setImages((prev) => [...prev, ...attached])
60+
}
61+
62+
const removeImage = (id: string) => {
63+
setImages((prev) => prev.filter((image) => image.id !== id))
64+
}
65+
3866
const handleGenerate = async () => {
3967
if (!prompt.trim()) return
68+
const builtPrompt = buildPrompt()
4069

4170
setIsLoading(true)
4271
setResults({})
@@ -53,7 +82,7 @@ export default function ImageGenerator({
5382
const promises = IMAGE_MODELS.map(async (model) => {
5483
try {
5584
const response = await generateImageFn({
56-
data: { prompt, model: model.id },
85+
data: { prompt: builtPrompt, model: model.id },
5786
})
5887
setResults((prev) => ({
5988
...prev,
@@ -83,7 +112,7 @@ export default function ImageGenerator({
83112

84113
try {
85114
const response = await generateImageFn({
86-
data: { prompt, model: selectedModel },
115+
data: { prompt: builtPrompt, model: selectedModel },
87116
})
88117
setResults({ [selectedModel]: { status: 'success', result: response } })
89118
const image = response.images[0]
@@ -162,6 +191,55 @@ export default function ImageGenerator({
162191
/>
163192
</div>
164193

194+
<div>
195+
<div className="flex items-center justify-between mb-2">
196+
<label className="text-sm font-medium text-gray-300">
197+
Reference Images
198+
</label>
199+
<span className="text-xs text-gray-500">
200+
Supported by Gemini native (NanoBanana) models only
201+
</span>
202+
</div>
203+
<div className="flex flex-wrap gap-2">
204+
{images.map((image) => (
205+
<div
206+
key={image.id}
207+
className="relative w-20 h-20 rounded-lg overflow-hidden border border-gray-700"
208+
>
209+
<img
210+
src={image.dataUrl}
211+
alt={image.name}
212+
className="w-full h-full object-cover"
213+
/>
214+
<button
215+
onClick={() => removeImage(image.id)}
216+
disabled={isLoading}
217+
className="absolute top-1 right-1 p-0.5 bg-gray-900/80 hover:bg-gray-800 rounded-full text-white disabled:opacity-50"
218+
aria-label={`Remove ${image.name}`}
219+
>
220+
<X className="w-3.5 h-3.5" />
221+
</button>
222+
</div>
223+
))}
224+
<button
225+
onClick={() => fileInputRef.current?.click()}
226+
disabled={isLoading}
227+
className="w-20 h-20 flex flex-col items-center justify-center gap-1 border-2 border-dashed border-gray-600 hover:border-gray-500 rounded-lg text-gray-400 hover:text-gray-300 transition-colors disabled:opacity-50"
228+
>
229+
<Plus className="w-5 h-5" />
230+
<span className="text-xs">Add</span>
231+
</button>
232+
</div>
233+
<input
234+
ref={fileInputRef}
235+
type="file"
236+
accept="image/*"
237+
multiple
238+
onChange={handleImageSelect}
239+
className="hidden"
240+
/>
241+
</div>
242+
165243
<button
166244
onClick={handleGenerate}
167245
disabled={isLoading || !prompt.trim()}

examples/ts-react-media/src/components/VideoGenerator.tsx

Lines changed: 16 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ import {
99
} from '@/lib/server-functions'
1010
import { VIDEO_MODELS } from '@/lib/models'
1111
import { getRandomVideoPrompt } from '@/lib/prompts'
12+
import { imageUrlToPart, readImageFile } from '@/lib/media'
1213

1314
type JobState =
1415
| { status: 'idle' }
@@ -61,15 +62,12 @@ export default function VideoGenerator({
6162
}
6263
}, [])
6364

64-
const handleImageSelect = (e: React.ChangeEvent<HTMLInputElement>) => {
65+
const handleImageSelect = async (e: React.ChangeEvent<HTMLInputElement>) => {
6566
const file = e.target.files?.[0]
67+
if (fileInputRef.current) fileInputRef.current.value = ''
6668
if (!file) return
67-
68-
const reader = new FileReader()
69-
reader.onload = (event) => {
70-
setImagePreview(event.target?.result as string)
71-
}
72-
reader.readAsDataURL(file)
69+
const attached = await readImageFile(file)
70+
setImagePreview(attached.dataUrl)
7371
}
7472

7573
const clearImage = () => {
@@ -136,13 +134,20 @@ export default function VideoGenerator({
136134
}))
137135

138136
try {
139-
const imageUrl =
140-
mode === 'image-to-video' ? (imagePreview ?? undefined) : undefined
137+
// Image-to-video sends the start frame as a prompt part — the fal
138+
// adapter routes `role: 'start_frame'` to the endpoint's start-image
139+
// field (e.g. `image_url` on Kling i2v).
140+
const builtPrompt =
141+
mode === 'image-to-video' && imagePreview
142+
? [
143+
{ type: 'text' as const, content: prompt },
144+
imageUrlToPart(imagePreview, { role: 'start_frame' }),
145+
]
146+
: prompt
141147
const result = await createVideoJobFn({
142148
data: {
143-
prompt,
149+
prompt: builtPrompt,
144150
model: modelId,
145-
...(imageUrl !== undefined && { imageUrl }),
146151
},
147152
})
148153

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
import type { MediaInputMetadata, MediaPromptPart } from '@tanstack/ai/client'
2+
3+
/**
4+
* An image the user attached as conditioning input. `dataUrl` is the full
5+
* `data:<mime>;base64,...` string used directly for the thumbnail preview;
6+
* `base64` is the same payload with the prefix stripped for the prompt part.
7+
*/
8+
export interface AttachedImage {
9+
id: string
10+
name: string
11+
mimeType: string
12+
/** Full data URL, used for the <img> preview. */
13+
dataUrl: string
14+
/** Base64 payload without the `data:` prefix, used for the prompt part. */
15+
base64: string
16+
}
17+
18+
/** Reads a File into an AttachedImage (data URL preview + raw base64 payload). */
19+
export function readImageFile(file: File): Promise<AttachedImage> {
20+
return new Promise((resolve, reject) => {
21+
const reader = new FileReader()
22+
reader.onerror = () =>
23+
reject(reader.error ?? new Error('Failed to read file'))
24+
reader.onload = () => {
25+
const dataUrl = reader.result
26+
if (typeof dataUrl !== 'string') {
27+
reject(new Error('Unexpected file read result'))
28+
return
29+
}
30+
const base64 = dataUrl.slice(dataUrl.indexOf(',') + 1)
31+
resolve({
32+
id: crypto.randomUUID(),
33+
name: file.name,
34+
mimeType: file.type,
35+
dataUrl,
36+
base64,
37+
})
38+
}
39+
reader.readAsDataURL(file)
40+
})
41+
}
42+
43+
/** Builds an image prompt part from an attached image, with optional role hint. */
44+
export function toImagePart(
45+
image: AttachedImage,
46+
metadata?: MediaInputMetadata,
47+
): MediaPromptPart {
48+
return {
49+
type: 'image',
50+
source: { type: 'data', value: image.base64, mimeType: image.mimeType },
51+
...(metadata ? { metadata } : {}),
52+
}
53+
}
54+
55+
/**
56+
* Builds an image prompt part from a URL string — either a remote URL
57+
* (passed through as a `url` source) or a `data:` URL (decomposed into a
58+
* `data` source so adapters that upload files get the raw payload).
59+
*/
60+
export function imageUrlToPart(
61+
url: string,
62+
metadata?: MediaInputMetadata,
63+
): MediaPromptPart {
64+
const meta = metadata ? { metadata } : {}
65+
if (!url.startsWith('data:')) {
66+
return { type: 'image', source: { type: 'url', value: url }, ...meta }
67+
}
68+
const comma = url.indexOf(',')
69+
const mimeType = url.slice(5, comma).split(';')[0]
70+
if (comma === -1 || !mimeType) {
71+
throw new Error('data: URL is missing a mime type')
72+
}
73+
return {
74+
type: 'image',
75+
source: { type: 'data', value: url.slice(comma + 1), mimeType },
76+
...meta,
77+
}
78+
}

0 commit comments

Comments
 (0)