Skip to content

amortegui84/ai-video-api-reference

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

AI Video API Reference

Expert reference for async video and image generation APIs. Covers authentication, submit/poll/retrieve patterns, asset management, error handling, and ComfyUI integration.

Built from real-world production experience implementing these providers in comfyui-seedance.


Providers covered

Provider Use case
AnyFast Seedance 2.0 video generation — text-to-video, image-to-video, multimodal refs. Full asset management (upload → Active poll → asset:// URI)
fal.ai Queue-based video generation. Auth header uses Key not Bearer.
Replicate Prediction API — submit + poll + output array
NVIDIA NIM LLM inference (OpenAI-compatible). Stable Video Diffusion. Cosmos world models. Free tier available.
OpenAI gpt-image-1 Image inpainting via mask. Multipart/form-data. Base64 output. Org verification required.

What's in the skill file

SKILL_VIDEO_API_EXPERT.md is a structured reference document organized into 12 sections:

  1. General async pattern — Submit → Poll → Retrieve. BFS walker for schema-drifting JSON.
  2. AnyFast — Seedance 2.0 generation endpoint, asset lifecycle (CreateAssetGroup → CreateAsset → ListAssets poll → asset://), anti-error patterns.
  3. fal.ai — Queue API, Seedance app IDs, auth gotcha (Key not Bearer).
  4. Replicate — Prediction API, cold start, output array.
  5. NVIDIA NIM — OpenAI-compatible LLM endpoint, Stable Video Diffusion, Cosmos models, DeepSeek reasoning activation, free tier.
  6. OpenAI gpt-image-1 — Inpainting API, mask convention (ComfyUI vs GPT), supported sizes, pricing, org verification.
  7. Robust implementation patterns — Retry with backoff, ID extraction, polling with deadline, asset lifecycle diagram.
  8. ComfyUI node integration — Custom types, optional inputs, OUTPUT_NODE, IS_CHANGED, tensor↔PIL↔base64, VIDEO input.
  9. Debugging checklist — Submit errors, asset-not-found, poll never ends, error message extraction.
  10. Provider comparison table — Auth, format, response mode, status values, output paths, rate limits.
  11. Master gotchas list — 20 non-obvious issues confirmed in production.
  12. Project structure — comfyui-seedance function map and data flow.

How to use

With Claude Code

Add the file as context at the start of a session:

claude --context SKILL_VIDEO_API_EXPERT.md

Or reference it in your CLAUDE.md:

Read SKILL_VIDEO_API_EXPERT.md for reference on video/image generation API patterns.

As a reference doc

Open SKILL_VIDEO_API_EXPERT.md directly. Each section is self-contained. Use the debugging checklist and gotchas list when troubleshooting API integrations.


Key patterns at a glance

Auth headers by provider

# AnyFast, Replicate, NVIDIA NIM, OpenAI
"Authorization": f"Bearer {api_key}"

# fal.ai — uses Key, not Bearer
"Authorization": f"Key {api_key}"

# NVIDIA NIM — key prefix is nvapi-
"Authorization": f"Bearer nvapi-{key}"

AnyFast asset lifecycle (required for first_frame)

POST /volc/asset/CreateAssetGroup  →  group_id
POST /volc/asset/CreateAsset       →  asset_id
POST /volc/asset/ListAssets        →  poll until Status == "Active"   (timeout: 300s)
Use in generation: "asset://asset-id"  (lowercase)

Poll status values

Provider Pending Running Done Failed
AnyFast QUEUING PROCESSING SUCCESS FAILED
fal.ai IN_QUEUE IN_PROGRESS COMPLETED
Replicate starting processing succeeded failed

gpt-image-1 mask convention

ComfyUI MASK: white (1.0) = area to edit
GPT mask PNG: transparent (alpha=0) = area to edit

→ Where ComfyUI mask > 0.5 → set alpha = 0 in RGBA PNG

Top 5 gotchas

  1. AnyFast GroupType in ListAssets — groups have no type. GroupType: "AIGC" always returns Items: []. Remove it.
  2. fal.ai auth is Key, not Bearer — every other provider uses Bearer.
  3. AnyFast asset:// is lowercaseAsset:// may be rejected by the generation endpoint.
  4. AnyFast AssetType in multipart — omitting it defaults to "Image" silently, even for video/audio.
  5. gpt-image-1 only accepts 3 sizes1024x1024, 1536x1024, 1024x1536. No other dimensions accepted.

Full list of 20 gotchas in SKILL_VIDEO_API_EXPERT.md §11.


Related projects


License

Apache 2.0

About

Expert reference for async video/image generation APIs: AnyFast, fal.ai, Replicate, NVIDIA NIM, OpenAI gpt-image-1 - patterns, gotchas, ComfyUI integration

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors