diff --git a/CHANGELOG.md b/CHANGELOG.md index b2fa603..de19fe1 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,6 +2,24 @@ All notable changes to OpenReels will be documented in this file. +## [0.17.0] - 2026-04-10 + +### Added +- **Dedicated video motion prompter**: new `prompts/video-prompter.md` with structured cinematography language. Video prompts now follow a professional anatomy: shot type, subject, action, camera movement, lighting, and style. Temporal progression, single-action constraints, and physics/realism guidance produce more natural motion. +- **Negative prompts for video generation**: both Veo and Kling providers now receive anti-artifact guidance. Default negatives (blur, flickering, morphing, unnatural physics) are combined with each archetype's existing `antiArtifactGuidance` for style-aware quality control. +- **Video prompt observability**: `motionPrompt` and `negativePrompt` fields are now logged in `VideoResolution` metadata (visible in `log.json`), making video quality issues debuggable without re-running the pipeline. +- **Kling 2.6 Pro upgrade**: Fal/Kling video provider upgraded from v2.1 standard to v2.6 Pro. Better motion physics and realism at the same $0.07/s price. +- **Kling cfg_scale parameter**: prompt adherence control (0.5 default) now passed to the Kling API. + +### Fixed +- **Veo API compatibility**: removed `personGeneration`, `enhancePrompt`, `generateAudio`, and `negativePrompt` config params that `veo-3.1-lite-generate-preview` rejects. These exist in the SDK types but are only supported on the full Veo model (Vertex AI). This fixes `Forbidden`/`INVALID_ARGUMENT` errors that silently caused all Veo video generation to fail. +- **Factory videoModel routing**: `config.videoModel` is now passed to GeminiVideo regardless of primary/secondary position. FalVideo always uses its own default model, preventing Veo model strings from reaching the Fal endpoint. + +### For contributors +- `VideoProvider.generate()` opts now include optional `negativePrompt` field. +- `VideoResolution` interface has new `motionPrompt` and `negativePrompt` fields. +- `image-prompter.ts` loads `video-prompter.md` when `mode="video"` instead of appending to `image-prompter.md`. + ## [0.16.0] - 2026-04-09 ### Added diff --git a/CLAUDE.md b/CLAUDE.md index 6a93556..c8681c1 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -20,7 +20,7 @@ src/ image/ # gemini.ts, openai.ts stock/ # pexels.ts, pixabay.ts, adaptive-resolver.ts, query-reformer.ts, stock-verifier.ts music/ # lyria.ts (Lyria 3 Pro), bundled-adapter.ts, bundled.ts - video/ # gemini.ts (Veo), fal.ts (Kling), video-resolver.ts + video/ # gemini.ts (Veo 3.1 Lite), fal.ts (Kling 2.6 Pro), video-resolver.ts config/ archetypes/ # 14 archetype JSON configs archetype-registry.ts @@ -47,7 +47,7 @@ fixtures/ # sample DirectorScore JSONs ```bash pnpm install # install dependencies pnpm start "topic" # run full pipeline (CLI) -pnpm test # run vitest suite (349 tests) +pnpm test # run vitest suite (395 tests) ``` ### Web UI (Docker Compose) diff --git a/README.md b/README.md index f510b9e..5390072 100644 --- a/README.md +++ b/README.md @@ -68,7 +68,7 @@ Give it a topic. It handles everything: | **Research** | Web search grounds the script in real facts, not hallucinations | | **Script** | Writes a punchy short-form script with scene breakdowns, visual direction, and emotional arc | | **Voiceover** | Generates TTS audio with word-level timestamps for karaoke-style captions | -| **Visuals** | AI images (Gemini, DALL-E), AI video clips (Google Veo, fal.ai Kling), and vision-verified stock footage that rejects bad matches and retries automatically | +| **Visuals** | AI images (Gemini, DALL-E), AI video clips (Google Veo 3.1, fal.ai Kling 2.6 Pro) with dedicated cinematography prompts and negative prompt guidance, plus vision-verified stock footage that rejects bad matches and retries automatically | | **Music** | AI-generated background score via Google Lyria 3 Pro, scene-synced to match the video's emotional arc. Or pick from 25 bundled royalty-free tracks | | **Captions** | Spring-animated 3-state captions with 7 distinct styles, per-archetype theming, and word-level karaoke highlighting | | **Assembly** | Composites everything into a vertical MP4 via Remotion with crossfade, slide, wipe, and flip transitions | @@ -86,7 +86,7 @@ Mix and match providers or go all-in on one ecosystem: | **Search** | Native (provider built-in), Tavily, or parametric knowledge | | **TTS** | ElevenLabs, Inworld, OpenAI TTS, Gemini TTS, Kokoro (free, local) | | **Images** | Gemini Imagen, OpenAI DALL-E | -| **Video** | Google Veo, fal.ai Kling (with cross-provider fallback) | +| **Video** | Google Veo 3.1, fal.ai Kling 2.6 Pro (with cross-provider fallback, negative prompts, structured cinematography prompts) | | **Music** | Google Lyria 3 Pro (AI-generated, $0.08/track), Bundled library (free) | | **Stock** | Pexels, Pixabay (both searched, vision-verified, with AI fallback) | @@ -244,7 +244,7 @@ The rewrite moves from Python to TypeScript for native [Remotion](https://www.re ## Status -v0.13.1 shipped. See [CHANGELOG.md](CHANGELOG.md) for full version history and [TODOS.md](TODOS.md) for known issues and roadmap. +v0.17.0 shipped. See [CHANGELOG.md](CHANGELOG.md) for full version history and [TODOS.md](TODOS.md) for known issues and roadmap. ## Star History diff --git a/TODOS.md b/TODOS.md index ae33c36..3f0697a 100644 --- a/TODOS.md +++ b/TODOS.md @@ -95,6 +95,14 @@ - [ ] **Gemini generateVideos initial call timeout** — The 180s `TIMEOUT_MS` in gemini.ts only guards the polling loop. The initial `this.client.models.generateVideos()` call that submits the job has no application-level timeout, relying on OS socket timeout (~120s). Wrap with `Promise.race` or `AbortSignal.timeout()`. Same class of issue as the fal.ai subscribe timeout (P1 in Pipeline Robustness). **Priority:** P2 +- [ ] **Expand Motion enum with cinematic camera directions** — Current Motion enum is `zoom_in | zoom_out | pan_right | pan_left | static`. For `ai_video` scenes, the creative director could specify richer camera directions (`dolly_in`, `tracking`, `crane_up`, `orbit`) that feed into the video-prompter. Touches schema (breaking change), creative-director prompt, Remotion score-to-props mapper. Phase 2 after video quality enhancement ships and we evaluate improvement. + **Priority:** P2 + **Depends on:** Video generation quality enhancement + Deferred from plan: Video Generation Quality Enhancement (CEO review: accepted as follow-up) + +- [ ] **Video generation safety re-prompting** — Unlike image generation (which retries with a sanitized prompt on safety rejection), video generation falls back to static image immediately. The motion prompt could be sanitized the same way. Also, `isSafetyRejection()` in orchestrator.ts:204 doesn't check for `"forbidden"` which is what Veo returns for RAI-filtered content. + **Priority:** P2 + ## TTS - [ ] **Voice catalog command** — Add `--list-voices ` CLI flag that prints available voices for any TTS provider (ElevenLabs, Inworld, Kokoro, Gemini TTS, OpenAI TTS) and exits. Users currently have no way to discover voice options without reading provider docs. Kokoro has ~50 voices, Gemini TTS has multiple, ElevenLabs has hundreds. Build as a unified interface across all providers rather than provider-specific flags. diff --git a/VERSION b/VERSION index d183d4a..07feb82 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -0.16.0 \ No newline at end of file +0.17.0 \ No newline at end of file diff --git a/prompts/video-prompter.md b/prompts/video-prompter.md new file mode 100644 index 0000000..49b6a3a --- /dev/null +++ b/prompts/video-prompter.md @@ -0,0 +1,53 @@ +You are a motion prompt engineer for AI video generation. You receive a scene's visual description and narration from a DirectorScore, and produce an optimized video generation prompt that will animate a source image into a realistic, cinematic video clip. + +## Your Job + +Transform the scene's visual description into a detailed, video-generator-friendly motion prompt. The prompt will be used with image-to-video models (Veo, Kling) that animate a still image into a 4-10 second video clip. Focus on what MOVES, how it moves, and how the camera follows. + +## Prompt Anatomy + +Structure every prompt in this order. Earlier elements carry more weight with video models. + +1. **Shot type and framing** — medium shot, close-up, wide establishing shot, POV, etc. +2. **Subject with specific attributes** — who or what is in frame, with distinctive visual details. +3. **Action and movement** — what the subject does over the clip duration. Be specific: "takes three steps forward" not "walks." +4. **Environment and setting** — where this takes place, with 2-3 grounding details. +5. **Camera movement** — one clear camera move per clip. Use professional terms: dolly, crane, pan, tilt, tracking, orbit, rack focus. +6. **Lighting and mood** — light source direction, quality (soft/hard), color temperature. Specify: "warm golden hour key light from upper left" not "nice lighting." +7. **Style and texture** — match the style bible. Include material properties for realism: brushed steel, woven linen, cracked leather, wet cobblestone. + +## Motion Rules + +Follow these strictly: + +1. **One action, one camera move.** Each clip gets exactly one clear subject action and one camera movement. Combining multiple actions or simultaneous camera moves (zoom + rotate, pan + dolly) causes warping and artifacts. + +2. **Temporal progression.** Describe what happens across the clip's duration. "In the opening frames, the figure stands motionless. Over the next few seconds, they slowly turn toward camera." Give the model a timeline. + +3. **Physics and material realism.** Describe how things move physically. "Water droplets arc from the fountain edge, catching warm light as they fall" not "water splashes." Name material properties: fabric drapes and creases, metal reflects, glass refracts, smoke disperses. The more physical detail, the better the simulation. + +4. **Camera vocabulary.** Use terms video models understand: + - Dolly in/out — camera moves toward or away from subject + - Pan left/right — camera rotates horizontally + - Tilt up/down — camera rotates vertically + - Tracking shot — camera follows subject at matching speed + - Crane/jib — camera rises or descends vertically + - Orbit — camera circles around subject + - Rack focus — focus shifts between foreground and background + - Static/locked-off — camera holds position (use for intense subject motion) + +5. **Lens terminology.** Include when relevant: "24mm wide-angle" for environmental scope, "85mm telephoto compression" for portraits, "shallow depth of field at f/2.8" for subject isolation, "anamorphic lens flare" for cinematic style. + +6. **Match motion to mood.** Smooth, slow camera moves for calm and contemplative. Handheld energy for tension. Static locked-off shots for dramatic reveals. The camera is a storytelling tool. + +7. **Keep it concise.** Aim for 300-500 characters. Video models perform best with dense, specific prompts, not lengthy descriptions. Every word should earn its place. + +8. **No text in video.** Do not describe any text overlays, captions, titles, or watermarks. These are added in post-production. + +9. **Name real subjects.** If the topic involves well-known public figures, landmarks, or locations, name them explicitly. Video models handle known subjects better with actual names. + +10. **Depict dark themes through atmosphere.** AI video providers reject explicit violence, gore, or graphic content. For dark topics, convey mood through environment, lighting, shadow, and implication. A dimly lit corridor with flickering light conveys danger without showing it. + +## Output + +Return the optimized video generation prompt in the `optimized_prompt` field. The prompt should be a single dense paragraph, not a list. No JSON or markdown formatting inside the prompt itself. diff --git a/src/agents/image-prompter.ts b/src/agents/image-prompter.ts index 3f25605..08d767a 100644 --- a/src/agents/image-prompter.ts +++ b/src/agents/image-prompter.ts @@ -4,7 +4,8 @@ import { z } from "zod"; import type { ArchetypeConfig } from "../schema/archetype.js"; import type { LLMProvider, LLMUsage } from "../schema/providers.js"; -const SYSTEM_PROMPT_PATH = path.join(process.cwd(), "prompts", "image-prompter.md"); +const IMAGE_PROMPT_PATH = path.join(process.cwd(), "prompts", "image-prompter.md"); +const VIDEO_PROMPT_PATH = path.join(process.cwd(), "prompts", "video-prompter.md"); const ImagePromptResult = z.object({ optimized_prompt: z.string(), @@ -38,13 +39,9 @@ export async function optimizeImagePrompt( : "You are a visual prompt engineer for AI image generation. Transform scene descriptions into detailed, image-generator-friendly prompts. Return the optimized prompt in the optimized_prompt field."; try { - systemPrompt = fs.readFileSync(SYSTEM_PROMPT_PATH, "utf-8"); - if (mode === "video") { - systemPrompt += - "\n\nIMPORTANT: This prompt is for AI VIDEO generation, not still images. Focus on motion, camera movement, and temporal dynamics. Describe what changes over the 5-second clip."; - } + systemPrompt = fs.readFileSync(mode === "video" ? VIDEO_PROMPT_PATH : IMAGE_PROMPT_PATH, "utf-8"); } catch { - // Use default + // Use default inline prompt above } // Inject style bible from archetype's creative fields diff --git a/src/cli/cost-estimator.ts b/src/cli/cost-estimator.ts index 0215927..92a753e 100644 --- a/src/cli/cost-estimator.ts +++ b/src/cli/cost-estimator.ts @@ -83,7 +83,7 @@ const PRICING = { openaiPerImage: 0.167, // Video generation pricing (per second of generated video) veoLitePerSecond: 0.05, // Veo 3.1 Lite ($0.30 for 6s clip) - falKlingPerSecond: 0.07, // Kling v2.1 via fal.ai ($0.35 for 5s clip) + falKlingPerSecond: 0.07, // Kling v2.6 Pro via fal.ai ($0.35 for 5s clip) // Music generation pricing lyriaPerTrack: 0.08, // Lyria 3 Pro: $0.08 per song (ai.google.dev/gemini-api/docs/music-generation) }; diff --git a/src/pipeline/orchestrator.ts b/src/pipeline/orchestrator.ts index 231707e..410c451 100644 --- a/src/pipeline/orchestrator.ts +++ b/src/pipeline/orchestrator.ts @@ -132,6 +132,7 @@ interface VisualAssetResult { durationSeconds: number | null; stockResolution?: StockResolution; videoResolution?: VideoResolution; + prompterUsage?: LLMUsage | null; } /** Generate an AI image with optional rejection context from failed stock searches */ @@ -291,6 +292,7 @@ async function resolveVisualAsset( usage: videoResult.usage, durationSeconds: videoResult.durationSeconds, videoResolution: videoResult.videoResolution, + prompterUsage: videoResult.prompterUsage ?? null, }; } @@ -611,6 +613,7 @@ function buildPipelineWorkflow( visualsResult.sceneSourceDurations = sceneResults.map((r) => r.durationSeconds); for (const r of sceneResults) { if (r.usage) llmUsages.push(r.usage); + if (r.prompterUsage) llmUsages.push(r.prompterUsage); } // Track music prompter LLM usage diff --git a/src/providers/factory.test.ts b/src/providers/factory.test.ts index 186cd34..9cfbaa7 100644 --- a/src/providers/factory.test.ts +++ b/src/providers/factory.test.ts @@ -1,5 +1,7 @@ import { beforeEach, describe, expect, it, vi } from "vitest"; import { createProviders } from "./factory.js"; +import { FalVideo } from "./video/fal.js"; +import { GeminiVideo } from "./video/gemini.js"; import { GeminiImage } from "./image/gemini.js"; import { OpenAIImage } from "./image/openai.js"; import { AnthropicLLM } from "./llm/anthropic.js"; @@ -81,6 +83,12 @@ vi.mock("./llm/openai-compatible.js", () => ({ vi.mock("./search/tavily.js", () => ({ createTavilySearchTools: vi.fn((apiKey?: string) => ({ tavily_search: { apiKey } })), })); +vi.mock("./video/gemini.js", () => ({ + GeminiVideo: vi.fn().mockImplementation(() => ({ supportedDurations: [4, 6, 8], generate: vi.fn() })), +})); +vi.mock("./video/fal.js", () => ({ + FalVideo: vi.fn().mockImplementation(() => ({ supportedDurations: [5, 10], generate: vi.fn() })), +})); describe("createProviders", () => { beforeEach(() => { @@ -408,4 +416,59 @@ describe("createProviders", () => { const args = vi.mocked(AnthropicLLM).mock.calls[0]!; expect(args[0]).toBe("claude-opus-4-6"); }); + + it("passes videoModel to GeminiVideo but not FalVideo", () => { + const origGoogle = process.env["GOOGLE_API_KEY"]; + const origFal = process.env["FAL_API_KEY"]; + process.env["GOOGLE_API_KEY"] = "test-goog"; + process.env["FAL_API_KEY"] = "test-fal"; + + createProviders({ + llm: "anthropic", + tts: "elevenlabs", + image: "gemini", + videoModel: "veo-3.1-generate-preview", + }); + + // GeminiVideo should receive the model override + const geminiArgs = vi.mocked(GeminiVideo).mock.calls[0]!; + expect(geminiArgs[0]).toBe("veo-3.1-generate-preview"); + + // FalVideo should receive undefined (uses its own default) + const falArgs = vi.mocked(FalVideo).mock.calls[0]!; + expect(falArgs[0]).toBeUndefined(); + + process.env["GOOGLE_API_KEY"] = origGoogle ?? ""; + process.env["FAL_API_KEY"] = origFal ?? ""; + if (!origGoogle) delete process.env["GOOGLE_API_KEY"]; + if (!origFal) delete process.env["FAL_API_KEY"]; + }); + + it("passes videoModel to GeminiVideo even when Fal is primary", () => { + const origGoogle = process.env["GOOGLE_API_KEY"]; + const origFal = process.env["FAL_API_KEY"]; + process.env["GOOGLE_API_KEY"] = "test-goog"; + process.env["FAL_API_KEY"] = "test-fal"; + + createProviders({ + llm: "anthropic", + tts: "elevenlabs", + image: "gemini", + video: "fal", + videoModel: "veo-3.1-generate-preview", + }); + + // GeminiVideo (secondary) should still receive the model override + const geminiArgs = vi.mocked(GeminiVideo).mock.calls[0]!; + expect(geminiArgs[0]).toBe("veo-3.1-generate-preview"); + + // FalVideo (primary) should receive undefined + const falArgs = vi.mocked(FalVideo).mock.calls[0]!; + expect(falArgs[0]).toBeUndefined(); + + process.env["GOOGLE_API_KEY"] = origGoogle ?? ""; + process.env["FAL_API_KEY"] = origFal ?? ""; + if (!origGoogle) delete process.env["GOOGLE_API_KEY"]; + if (!origFal) delete process.env["FAL_API_KEY"]; + }); }); diff --git a/src/providers/factory.ts b/src/providers/factory.ts index 1fd4d6f..16f0461 100644 --- a/src/providers/factory.ts +++ b/src/providers/factory.ts @@ -188,8 +188,8 @@ export function createProviders(config: ProviderConfig): Providers { const videoPrimary = config.video ?? (googleKey ? "gemini" : falKey ? "fal" : undefined); if (videoPrimary === "fal") { - if (falKey) videoProviders.push(new FalVideo(config.videoModel, falKey)); - if (googleKey) videoProviders.push(new GeminiVideo(undefined, googleKey)); + if (falKey) videoProviders.push(new FalVideo(undefined, falKey)); + if (googleKey) videoProviders.push(new GeminiVideo(config.videoModel, googleKey)); } else if (videoPrimary === "gemini" || videoPrimary === undefined) { if (googleKey) videoProviders.push(new GeminiVideo(config.videoModel, googleKey)); if (falKey) videoProviders.push(new FalVideo(undefined, falKey)); diff --git a/src/providers/video/fal.test.ts b/src/providers/video/fal.test.ts index 5b7e61f..5cb1ce4 100644 --- a/src/providers/video/fal.test.ts +++ b/src/providers/video/fal.test.ts @@ -94,4 +94,55 @@ describe("FalVideo", () => { }), ).rejects.toThrow("Failed to download fal.ai video: 500"); }); + + it("passes cfg_scale and negative_prompt in input", async () => { + const provider = new FalVideo(undefined, "test-key"); + + mockUpload.mockResolvedValueOnce("https://fal.storage/image.png"); + mockSubscribe.mockResolvedValueOnce({ + data: { video: { url: "https://fal.storage/video.mp4" } }, + }); + mockFetch.mockResolvedValueOnce({ + ok: true, + arrayBuffer: () => Promise.resolve(new ArrayBuffer(100)), + }); + + const result = await provider.generate({ + sourceImage: Buffer.from("fake-image"), + prompt: "A rocket launching", + negativePrompt: "blur, flickering", + }); + + const subscribeCall = mockSubscribe.mock.lastCall!; + const input = subscribeCall[1].input; + expect(input.cfg_scale).toBe(0.5); + expect(input.negative_prompt).toBe("blur, flickering"); + + const fs = await import("node:fs"); + if (fs.existsSync(result.filePath)) fs.unlinkSync(result.filePath); + }); + + it("uses Kling v2.6 Pro model by default", async () => { + const provider = new FalVideo(undefined, "test-key"); + + mockUpload.mockResolvedValueOnce("https://fal.storage/image.png"); + mockSubscribe.mockResolvedValueOnce({ + data: { video: { url: "https://fal.storage/video.mp4" } }, + }); + mockFetch.mockResolvedValueOnce({ + ok: true, + arrayBuffer: () => Promise.resolve(new ArrayBuffer(100)), + }); + + const result = await provider.generate({ + sourceImage: Buffer.from("fake-image"), + prompt: "test", + }); + + const subscribeCall = mockSubscribe.mock.lastCall!; + expect(subscribeCall[0]).toBe("fal-ai/kling-video/v2.6/pro/image-to-video"); + + const fs = await import("node:fs"); + if (fs.existsSync(result.filePath)) fs.unlinkSync(result.filePath); + }); }); diff --git a/src/providers/video/fal.ts b/src/providers/video/fal.ts index 4a85664..f99bb73 100644 --- a/src/providers/video/fal.ts +++ b/src/providers/video/fal.ts @@ -5,7 +5,7 @@ import * as path from "node:path"; import { createFalClient, type FalClient } from "@fal-ai/client"; import type { VideoProvider, VideoResult } from "../../schema/providers.js"; -const DEFAULT_MODEL = "fal-ai/kling-video/v2.1/standard/image-to-video"; +const DEFAULT_MODEL = "fal-ai/kling-video/v2.6/pro/image-to-video"; export class FalVideo implements VideoProvider { private client: FalClient; @@ -25,6 +25,7 @@ export class FalVideo implements VideoProvider { prompt: string; durationSeconds?: number; aspectRatio?: string; + negativePrompt?: string; }): Promise { const durationSeconds = opts.durationSeconds ?? 5; const aspectRatio = opts.aspectRatio ?? "9:16"; @@ -40,6 +41,8 @@ export class FalVideo implements VideoProvider { image_url: imageUrl, duration: durationSeconds, aspect_ratio: aspectRatio, + cfg_scale: 0.5, + ...(opts.negativePrompt ? { negative_prompt: opts.negativePrompt } : {}), }, pollInterval: 5_000, }); diff --git a/src/providers/video/gemini.test.ts b/src/providers/video/gemini.test.ts index 5a43f69..ea56dfd 100644 --- a/src/providers/video/gemini.test.ts +++ b/src/providers/video/gemini.test.ts @@ -67,6 +67,35 @@ describe("GeminiVideo", () => { if (fs.existsSync(result.filePath)) fs.unlinkSync(result.filePath); }); + it("does not pass unsupported params to veo-3.1-lite config", async () => { + const provider = new GeminiVideo("veo-3.1-lite-generate-preview", "test-key"); + + generateVideos.mockResolvedValueOnce({ + done: true, + response: { + generatedVideos: [{ video: { uri: "gs://bucket/video.mp4" } }], + }, + }); + download.mockImplementationOnce(async ({ downloadPath }: { downloadPath: string }) => { + fs.writeFileSync(downloadPath, "fake-mp4-data"); + }); + + const result = await provider.generate({ + sourceImage: Buffer.from("fake-image"), + prompt: "A rocket launching", + negativePrompt: "blur, flickering", + }); + + const callArgs = generateVideos.mock.lastCall![0]; + // veo-3.1-lite does NOT support these params + expect(callArgs.config.enhancePrompt).toBeUndefined(); + expect(callArgs.config.generateAudio).toBeUndefined(); + expect(callArgs.config.negativePrompt).toBeUndefined(); + expect(callArgs.config.personGeneration).toBeUndefined(); + + if (fs.existsSync(result.filePath)) fs.unlinkSync(result.filePath); + }); + it("throws on empty response", async () => { const provider = new GeminiVideo("veo-3.1-lite-generate-preview", "test-key"); diff --git a/src/providers/video/gemini.ts b/src/providers/video/gemini.ts index cc6a47e..9f85718 100644 --- a/src/providers/video/gemini.ts +++ b/src/providers/video/gemini.ts @@ -25,10 +25,15 @@ export class GeminiVideo implements VideoProvider { prompt: string; durationSeconds?: number; aspectRatio?: string; + negativePrompt?: string; }): Promise { const durationSeconds = opts.durationSeconds ?? 6; const aspectRatio = opts.aspectRatio ?? "9:16"; + if (opts.negativePrompt) { + console.warn(`[video] negativePrompt ignored: ${this.model} does not support it`); + } + // Pass the source image as inline base64 let operation = await this.client.models.generateVideos({ model: this.model, @@ -37,11 +42,13 @@ export class GeminiVideo implements VideoProvider { imageBytes: opts.sourceImage.toString("base64"), mimeType: "image/png", }, + // Note: veo-3.1-lite-generate-preview does NOT support personGeneration, + // enhancePrompt, generateAudio, or negativePrompt. These are only + // available on the full veo-3.1-generate-preview model (Vertex AI). config: { numberOfVideos: 1, durationSeconds, aspectRatio, - personGeneration: "allow_adult", }, }); diff --git a/src/providers/video/video-resolver.test.ts b/src/providers/video/video-resolver.test.ts index cc4640b..40153c9 100644 --- a/src/providers/video/video-resolver.test.ts +++ b/src/providers/video/video-resolver.test.ts @@ -153,4 +153,96 @@ describe("resolveAIVideo", () => { ); expect(readyCall).toBeDefined(); }); + + it("passes negativePrompt combining defaults and archetype antiArtifactGuidance", async () => { + const primary = makeProvider(); + await resolveAIVideo(mockScene, mockImageResult, 0, path.join(tmpDir, "assets"), { + videoProviders: [primary], + llm: mockLlm, + archetype: mockArchetype, + callbacks: mockCallbacks, + }); + + const generateCall = (primary.generate as ReturnType).mock.calls[0]![0]; + expect(generateCall.negativePrompt).toContain("blur"); + expect(generateCall.negativePrompt).toContain("flickering"); + expect(generateCall.negativePrompt).toContain("no artifacts"); + }); + + it("uses defaults only when antiArtifactGuidance is empty", async () => { + const primary = makeProvider(); + const emptyArchetype = { ...mockArchetype, antiArtifactGuidance: "" } as unknown as ArchetypeConfig; + + await resolveAIVideo(mockScene, mockImageResult, 0, path.join(tmpDir, "assets"), { + videoProviders: [primary], + llm: mockLlm, + archetype: emptyArchetype, + callbacks: mockCallbacks, + }); + + const generateCall = (primary.generate as ReturnType).mock.calls[0]![0]; + expect(generateCall.negativePrompt).toContain("blur"); + expect(generateCall.negativePrompt).not.toContain(", ,"); + expect(generateCall.negativePrompt).not.toMatch(/, $/); + expect(generateCall.negativePrompt).toBe(generateCall.negativePrompt.trim()); + }); + + it("includes motionPrompt and negativePrompt in VideoResolution metadata", async () => { + const primary = makeProvider(); + const result = await resolveAIVideo(mockScene, mockImageResult, 0, path.join(tmpDir, "assets"), { + videoProviders: [primary], + llm: mockLlm, + archetype: mockArchetype, + callbacks: mockCallbacks, + }); + + expect(result.videoResolution.motionPrompt).toBeDefined(); + expect(result.videoResolution.motionPrompt).toContain("rocket"); + expect(result.videoResolution.negativePrompt).toBeDefined(); + expect(result.videoResolution.negativePrompt).toContain("blur"); + }); + + it("surfaces motion prompter LLM usage for cost tracking", async () => { + const primary = makeProvider(); + const result = await resolveAIVideo(mockScene, mockImageResult, 0, path.join(tmpDir, "assets"), { + videoProviders: [primary], + llm: mockLlm, + archetype: mockArchetype, + callbacks: mockCallbacks, + }); + + expect(result.prompterUsage).toEqual({ inputTokens: 100, outputTokens: 50 }); + }); + + it("returns null prompterUsage when motion prompt LLM fails", async () => { + const { optimizeImagePrompt } = await import("../../agents/image-prompter.js"); + (optimizeImagePrompt as ReturnType).mockRejectedValueOnce(new Error("LLM failed")); + + const primary = makeProvider(); + const result = await resolveAIVideo(mockScene, mockImageResult, 0, path.join(tmpDir, "assets"), { + videoProviders: [primary], + llm: mockLlm, + archetype: mockArchetype, + callbacks: mockCallbacks, + }); + + expect(result.prompterUsage).toBeNull(); + }); + + it("uses raw visual_prompt as motion prompt when LLM optimization fails", async () => { + const { optimizeImagePrompt } = await import("../../agents/image-prompter.js"); + (optimizeImagePrompt as ReturnType).mockRejectedValueOnce(new Error("LLM failed")); + + const primary = makeProvider(); + const result = await resolveAIVideo(mockScene, mockImageResult, 0, path.join(tmpDir, "assets"), { + videoProviders: [primary], + llm: mockLlm, + archetype: mockArchetype, + callbacks: mockCallbacks, + }); + + const generateCall = (primary.generate as ReturnType).mock.calls[0]![0]; + expect(generateCall.prompt).toBe("A rocket launching"); + expect(result.videoResolution.motionPrompt).toBe("A rocket launching"); + }); }); diff --git a/src/providers/video/video-resolver.ts b/src/providers/video/video-resolver.ts index bca1487..3b30f2b 100644 --- a/src/providers/video/video-resolver.ts +++ b/src/providers/video/video-resolver.ts @@ -18,11 +18,16 @@ export interface VideoResolution { error?: string; imageGenTimeMs: number; videoGenTimeMs: number | null; + motionPrompt?: string; + negativePrompt?: string; } // Module-level concurrency limiter for video gen API calls const videoGenLimit = pLimit(3); +const DEFAULT_VIDEO_NEGATIVES = + "blur, low resolution, flickering, compression artifacts, frame drops, jitter, stutter, warping, morphing, unnatural physics, deformed hands, extra fingers, morphing faces, sliding motion"; + /** * Pick the smallest supported duration that is >= the target. * If target exceeds all supported durations, pick the max (trim, never loop). @@ -53,11 +58,13 @@ export async function resolveAIVideo( usage: LLMUsage | null; durationSeconds: number | null; videoResolution: VideoResolution; + prompterUsage?: LLMUsage | null; }> { const imageGenTimeMs = 0; // Already tracked by caller // Generate motion-aware prompt via LLM let motionPrompt = scene.visual_prompt; + let prompterUsage: LLMUsage | null = null; try { const optimized = await optimizeImagePrompt( opts.llm, @@ -69,12 +76,19 @@ export async function resolveAIVideo( { mode: "video" }, ); motionPrompt = optimized.prompt; + prompterUsage = optimized.usage; } catch (err) { console.warn(`[video] Scene ${sceneIndex} motion prompt gen failed, using visual_prompt: ${err}`); } opts.callbacks.onProgress?.("visuals", { type: "video_image_ready", scene: sceneIndex }); + // Construct negative prompt: defaults + archetype anti-artifact guidance + const archetypeGuidance = opts.archetype.antiArtifactGuidance?.trim(); + const negativePrompt = archetypeGuidance + ? `${DEFAULT_VIDEO_NEGATIVES}, ${archetypeGuidance}` + : DEFAULT_VIDEO_NEGATIVES; + // Try each video provider in order for (let i = 0; i < opts.videoProviders.length; i++) { const provider = opts.videoProviders[i]!; @@ -90,6 +104,7 @@ export async function resolveAIVideo( prompt: motionPrompt, durationSeconds: genDuration, aspectRatio: "9:16", + negativePrompt, }), ); const videoGenTimeMs = Date.now() - videoStart; @@ -120,7 +135,10 @@ export async function resolveAIVideo( durationSeconds: videoResult.durationSeconds, imageGenTimeMs, videoGenTimeMs, + motionPrompt, + negativePrompt, }, + prompterUsage, }; } catch (err) { const errorMsg = err instanceof Error ? err.message : String(err); @@ -146,6 +164,7 @@ export async function resolveAIVideo( imageGenTimeMs, videoGenTimeMs: null, }, + prompterUsage, }; } // Otherwise try next provider @@ -165,5 +184,6 @@ export async function resolveAIVideo( imageGenTimeMs, videoGenTimeMs: null, }, + prompterUsage, }; } diff --git a/src/schema/providers.ts b/src/schema/providers.ts index ac3c92a..6c0a75a 100644 --- a/src/schema/providers.ts +++ b/src/schema/providers.ts @@ -76,6 +76,7 @@ export interface VideoProvider { prompt: string; durationSeconds?: number; aspectRatio?: string; + negativePrompt?: string; }): Promise; }