Skip to content

Commit 4270795

Browse files
luojun96Rader
authored andcommitted
aigateway support text-image-to-video apis
1 parent cf3e548 commit 4270795

28 files changed

Lines changed: 5496 additions & 258 deletions

.specs/text-image-to-video/api.md

Lines changed: 337 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,337 @@
1+
# AIGateway Video API
2+
3+
## Overview
4+
5+
AIGateway exposes one normalized OpenAI-compatible async video API surface for both text-to-video and image-to-video generation.
6+
7+
Public endpoints:
8+
9+
```text
10+
POST /v1/videos
11+
GET /v1/videos/{video_id}
12+
GET /v1/videos/{video_id}/content
13+
```
14+
15+
The public API does not expose provider-specific paths, IDs, request fields, or download URLs. AIGateway normalizes those differences internally.
16+
17+
For an end-to-end usage walkthrough, see [user-guide.md](./user-guide.md).
18+
19+
## Auth
20+
21+
Video APIs require the normal AIGateway API key auth used by other OpenAI-compatible inference APIs:
22+
23+
```http
24+
Authorization: Bearer <api_key_or_access_token>
25+
```
26+
27+
## Resource Model
28+
29+
The client-facing `video_id` is owned by AIGateway, not by the downstream provider.
30+
31+
Example:
32+
33+
```text
34+
video_8b1d8d4f7c5b4d58b4d7f3a7f8c9d001
35+
```
36+
37+
This lets AIGateway:
38+
39+
- authorize follow-up reads by owner
40+
- avoid cross-provider ID collisions
41+
- route retrieve and content requests back to the backend used at create time
42+
43+
## Create Video
44+
45+
```http
46+
POST /v1/videos
47+
```
48+
49+
Supports:
50+
51+
- text-to-video
52+
- image-to-video by `input_reference`
53+
54+
### JSON Request
55+
56+
Required fields:
57+
58+
- `model`
59+
- `prompt`
60+
61+
Optional fields:
62+
63+
- `size`
64+
- `seconds`
65+
- `input_reference`
66+
- additional OpenAI-compatible fields supported by the downstream OpenAI-compatible backend
67+
68+
`size` follows the public OpenAI-compatible contract and should use `{width}x{height}` values such as `1280x720` or `1920x1080`.
69+
Provider-native adapters may normalize that value internally. For example, MiniMax uses a native `resolution` field and AIGateway maps supported values like `1280x720 -> 720P` and `1920x1080 -> 1080P`.
70+
71+
JSON request shape:
72+
73+
```json
74+
{
75+
"model": "video-model",
76+
"prompt": "A paper airplane flying through a neon city",
77+
"size": "1280x720",
78+
"seconds": 5,
79+
"input_reference": {
80+
"image_url": "https://example.com/frame.png"
81+
}
82+
}
83+
```
84+
85+
`input_reference` supports:
86+
87+
```json
88+
{
89+
"file_id": "file_xxx"
90+
}
91+
```
92+
93+
or
94+
95+
```json
96+
{
97+
"image_url": "https://example.com/frame.png"
98+
}
99+
```
100+
101+
Provider-native adapters may support only part of the public `input_reference` surface. For example, the internal LightX2V adapter supports multipart uploaded images and JSON `image_url`, but rejects JSON `file_id`.
102+
103+
Invalid:
104+
105+
- missing `model`
106+
- missing `prompt`
107+
- empty `input_reference`
108+
- `input_reference` with neither `file_id` nor `image_url`
109+
110+
### Multipart Request
111+
112+
`POST /v1/videos` also accepts multipart form data.
113+
114+
Relevant fields:
115+
116+
- `model`
117+
- `prompt`
118+
- `size`
119+
- `seconds`
120+
- `input_reference` file
121+
122+
Multipart `size` uses the same OpenAI-compatible `{width}x{height}` contract as JSON requests.
123+
124+
Multipart text-only requests remain valid. Provider-native adapters may normalize them into backend-specific JSON create requests when no uploaded `input_reference` file is present.
125+
126+
Allowed uploaded `input_reference` content types:
127+
128+
- `image/jpeg`
129+
- `image/png`
130+
- `image/webp`
131+
132+
Multipart example:
133+
134+
```bash
135+
curl -X POST "$AIGATEWAY_BASE_URL/v1/videos" \
136+
-H "Authorization: Bearer $AIGATEWAY_API_KEY" \
137+
-F "model=image-video-model" \
138+
-F "prompt=Animate this still frame into a cinematic shot" \
139+
-F "seconds=5" \
140+
-F "size=1280x720" \
141+
-F "input_reference=@frame.png;type=image/png"
142+
```
143+
144+
### Successful Response
145+
146+
```json
147+
{
148+
"id": "video_8b1d8d4f7c5b4d58b4d7f3a7f8c9d001",
149+
"object": "video",
150+
"status": "queued"
151+
}
152+
```
153+
154+
Status values returned by AIGateway are normalized to OpenAI-compatible values such as:
155+
156+
- `queued`
157+
- `in_progress`
158+
- `completed`
159+
- `failed`
160+
- `cancelled`
161+
162+
### Error Response
163+
164+
Create errors use the normal OpenAI-compatible error envelope:
165+
166+
```json
167+
{
168+
"error": {
169+
"code": "invalid_request_error",
170+
"message": "Model and prompt cannot be empty",
171+
"type": "invalid_request_error"
172+
}
173+
}
174+
```
175+
176+
Common create error codes:
177+
178+
- `invalid_request_error`
179+
- `unsupported_model`
180+
- `content_policy_violation`
181+
- `moderation_error`
182+
- `internal_error`
183+
184+
Example provider-specific validation error:
185+
186+
```json
187+
{
188+
"error": {
189+
"code": "invalid_request_error",
190+
"message": "selected model does not support input_reference.file_id",
191+
"type": "invalid_request_error"
192+
}
193+
}
194+
```
195+
196+
## Get Video
197+
198+
```http
199+
GET /v1/videos/{video_id}
200+
```
201+
202+
Returns the normalized video object for a gateway-owned `video_id`.
203+
204+
Example:
205+
206+
```json
207+
{
208+
"id": "video_8b1d8d4f7c5b4d58b4d7f3a7f8c9d001",
209+
"object": "video",
210+
"status": "completed",
211+
"created_at": 1713945600
212+
}
213+
```
214+
215+
If the video failed, the object may include an embedded resource-level error:
216+
217+
```json
218+
{
219+
"id": "video_8b1d8d4f7c5b4d58b4d7f3a7f8c9d001",
220+
"object": "video",
221+
"status": "failed",
222+
"error": {
223+
"code": "generation_failed",
224+
"message": "provider generation failed"
225+
}
226+
}
227+
```
228+
229+
HTTP-level lookup errors still use the normal top-level `error` envelope.
230+
231+
## Download Video Content
232+
233+
```http
234+
GET /v1/videos/{video_id}/content
235+
```
236+
237+
This endpoint streams the generated video bytes back through AIGateway.
238+
239+
Typical content types:
240+
241+
- `video/mp4`
242+
- `application/octet-stream`
243+
244+
Example:
245+
246+
```bash
247+
curl -L "$AIGATEWAY_BASE_URL/v1/videos/$VIDEO_ID/content" \
248+
-H "Authorization: Bearer $AIGATEWAY_API_KEY" \
249+
-o output.mp4
250+
```
251+
252+
### Variant Passthrough
253+
254+
AIGateway preserves the `variant` query parameter for compatible backends.
255+
256+
Examples:
257+
258+
```text
259+
GET /v1/videos/{video_id}/content?variant=video
260+
GET /v1/videos/{video_id}/content?variant=thumbnail
261+
GET /v1/videos/{video_id}/content?variant=spritesheet
262+
```
263+
264+
Whether a backend supports a specific variant depends on the downstream provider. Unsupported variants are surfaced as downstream/provider errors.
265+
266+
## Ownership and Authorization
267+
268+
Video resources are private to the authenticated owner that created them.
269+
270+
For follow-up operations:
271+
272+
- `GET /v1/videos/{video_id}`
273+
- `GET /v1/videos/{video_id}/content`
274+
275+
AIGateway verifies that the current user owns the `video_id`. Cross-user access returns `not_found`.
276+
277+
## Provider Normalization Notes
278+
279+
The external API is stable even when provider APIs differ internally.
280+
281+
Examples of internal normalization:
282+
283+
- OpenAI-compatible backends accept the public `size` field directly in `{width}x{height}` format.
284+
- MiniMax does not use the same public shape. AIGateway maps supported OpenAI-compatible sizes to MiniMax `resolution` values:
285+
- `1280x720` and `720x1280` -> `720P`
286+
- `1920x1080` and `1080x1920` -> `1080P`
287+
- native MiniMax values `720P`, `768P`, and `1080P` are also accepted for compatibility
288+
- unsupported sizes such as `1024x1792` are rejected as `invalid_request_error`
289+
- Internal LightX2V uses provider-native create and status routes. AIGateway maps public OpenAI-compatible `size` into LightX2V `width` and `height`, streams `/content` directly from `/v1/files/download/outputs/videos/{task_id}.mp4`, and supports image-guided requests only through multipart upload or JSON `image_url`.
290+
- Internal LightX2V is selected only for internal CSGHub deployed video models, identified by `CSGHubModelID != ""` plus `RuntimeFramework == "lightx2v"`.
291+
- Internal LightX2V treats multipart-with-image as image-guided create to `/v1/tasks/video/form`, but multipart text-only requests fall back to the normal JSON create path `/v1/tasks/video`.
292+
- Current internal LightX2V-backed OpenCSG Wan model targets are `Wan-AI/Wan2.2-T2V-A14B` for text-to-video and `Wan-AI/Wan2.2-I2V-A14B` for image-to-video.
293+
294+
- AIGateway rewrites provider video IDs to gateway-owned `video_id`
295+
- provider-specific polling APIs are hidden behind `GET /v1/videos/{video_id}`
296+
- provider-specific file download or download URL flows are hidden behind `GET /v1/videos/{video_id}/content`
297+
- provider-native status values are normalized to OpenAI-compatible values
298+
299+
## Current Public DTO Shape
300+
301+
Create request:
302+
303+
```json
304+
{
305+
"model": "string",
306+
"prompt": "string",
307+
"size": "string",
308+
"seconds": 5,
309+
"input_reference": {
310+
"file_id": "string",
311+
"image_url": "string"
312+
}
313+
}
314+
```
315+
316+
Video object:
317+
318+
```json
319+
{
320+
"id": "string",
321+
"object": "video",
322+
"created_at": 0,
323+
"completed_at": 0,
324+
"expires_at": 0,
325+
"status": "queued",
326+
"model": "string",
327+
"prompt": "string",
328+
"size": "string",
329+
"seconds": 0,
330+
"progress": 0.5,
331+
"error": {
332+
"code": "string",
333+
"message": "string"
334+
},
335+
"remixed_from_video_id": "string"
336+
}
337+
```

0 commit comments

Comments
 (0)