Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,24 @@

All notable changes to OpenReels will be documented in this file.

## [0.17.0] - 2026-04-10

### Added
- **Dedicated video motion prompter**: new `prompts/video-prompter.md` with structured cinematography language. Video prompts now follow a professional anatomy: shot type, subject, action, camera movement, lighting, and style. Temporal progression, single-action constraints, and physics/realism guidance produce more natural motion.
- **Negative prompts for video generation**: both Veo and Kling providers now receive anti-artifact guidance. Default negatives (blur, flickering, morphing, unnatural physics) are combined with each archetype's existing `antiArtifactGuidance` for style-aware quality control.
- **Video prompt observability**: `motionPrompt` and `negativePrompt` fields are now logged in `VideoResolution` metadata (visible in `log.json`), making video quality issues debuggable without re-running the pipeline.
- **Kling 2.6 Pro upgrade**: Fal/Kling video provider upgraded from v2.1 standard to v2.6 Pro. Better motion physics and realism at the same $0.07/s price.
- **Kling cfg_scale parameter**: prompt adherence control (0.5 default) now passed to the Kling API.

### Fixed
- **Veo API compatibility**: removed `personGeneration`, `enhancePrompt`, `generateAudio`, and `negativePrompt` config params that `veo-3.1-lite-generate-preview` rejects. These exist in the SDK types but are only supported on the full Veo model (Vertex AI). This fixes `Forbidden`/`INVALID_ARGUMENT` errors that silently caused all Veo video generation to fail.
- **Factory videoModel routing**: `config.videoModel` is now passed to GeminiVideo regardless of primary/secondary position. FalVideo always uses its own default model, preventing Veo model strings from reaching the Fal endpoint.

### For contributors
- `VideoProvider.generate()` opts now include optional `negativePrompt` field.
- `VideoResolution` interface has new `motionPrompt` and `negativePrompt` fields.
- `image-prompter.ts` loads `video-prompter.md` when `mode="video"` instead of appending to `image-prompter.md`.

## [0.16.0] - 2026-04-09

### Added
Expand Down
4 changes: 2 additions & 2 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ src/
image/ # gemini.ts, openai.ts
stock/ # pexels.ts, pixabay.ts, adaptive-resolver.ts, query-reformer.ts, stock-verifier.ts
music/ # lyria.ts (Lyria 3 Pro), bundled-adapter.ts, bundled.ts
video/ # gemini.ts (Veo), fal.ts (Kling), video-resolver.ts
video/ # gemini.ts (Veo 3.1 Lite), fal.ts (Kling 2.6 Pro), video-resolver.ts
config/
archetypes/ # 14 archetype JSON configs
archetype-registry.ts
Expand All @@ -47,7 +47,7 @@ fixtures/ # sample DirectorScore JSONs
```bash
pnpm install # install dependencies
pnpm start "topic" # run full pipeline (CLI)
pnpm test # run vitest suite (349 tests)
pnpm test # run vitest suite (395 tests)
```

### Web UI (Docker Compose)
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ Give it a topic. It handles everything:
| **Research** | Web search grounds the script in real facts, not hallucinations |
| **Script** | Writes a punchy short-form script with scene breakdowns, visual direction, and emotional arc |
| **Voiceover** | Generates TTS audio with word-level timestamps for karaoke-style captions |
| **Visuals** | AI images (Gemini, DALL-E), AI video clips (Google Veo, fal.ai Kling), and vision-verified stock footage that rejects bad matches and retries automatically |
| **Visuals** | AI images (Gemini, DALL-E), AI video clips (Google Veo 3.1, fal.ai Kling 2.6 Pro) with dedicated cinematography prompts and negative prompt guidance, plus vision-verified stock footage that rejects bad matches and retries automatically |
| **Music** | AI-generated background score via Google Lyria 3 Pro, scene-synced to match the video's emotional arc. Or pick from 25 bundled royalty-free tracks |
| **Captions** | Spring-animated 3-state captions with 7 distinct styles, per-archetype theming, and word-level karaoke highlighting |
| **Assembly** | Composites everything into a vertical MP4 via Remotion with crossfade, slide, wipe, and flip transitions |
Expand All @@ -86,7 +86,7 @@ Mix and match providers or go all-in on one ecosystem:
| **Search** | Native (provider built-in), Tavily, or parametric knowledge |
| **TTS** | ElevenLabs, Inworld, OpenAI TTS, Gemini TTS, Kokoro (free, local) |
| **Images** | Gemini Imagen, OpenAI DALL-E |
| **Video** | Google Veo, fal.ai Kling (with cross-provider fallback) |
| **Video** | Google Veo 3.1, fal.ai Kling 2.6 Pro (with cross-provider fallback, negative prompts, structured cinematography prompts) |
| **Music** | Google Lyria 3 Pro (AI-generated, $0.08/track), Bundled library (free) |
| **Stock** | Pexels, Pixabay (both searched, vision-verified, with AI fallback) |

Expand Down Expand Up @@ -244,7 +244,7 @@ The rewrite moves from Python to TypeScript for native [Remotion](https://www.re

## Status

v0.13.1 shipped. See [CHANGELOG.md](CHANGELOG.md) for full version history and [TODOS.md](TODOS.md) for known issues and roadmap.
v0.17.0 shipped. See [CHANGELOG.md](CHANGELOG.md) for full version history and [TODOS.md](TODOS.md) for known issues and roadmap.

## Star History

Expand Down
8 changes: 8 additions & 0 deletions TODOS.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,14 @@
- [ ] **Gemini generateVideos initial call timeout** — The 180s `TIMEOUT_MS` in gemini.ts only guards the polling loop. The initial `this.client.models.generateVideos()` call that submits the job has no application-level timeout, relying on OS socket timeout (~120s). Wrap with `Promise.race` or `AbortSignal.timeout()`. Same class of issue as the fal.ai subscribe timeout (P1 in Pipeline Robustness).
**Priority:** P2

- [ ] **Expand Motion enum with cinematic camera directions** — Current Motion enum is `zoom_in | zoom_out | pan_right | pan_left | static`. For `ai_video` scenes, the creative director could specify richer camera directions (`dolly_in`, `tracking`, `crane_up`, `orbit`) that feed into the video-prompter. Touches schema (breaking change), creative-director prompt, Remotion score-to-props mapper. Phase 2 after video quality enhancement ships and we evaluate improvement.
**Priority:** P2
**Depends on:** Video generation quality enhancement
Deferred from plan: Video Generation Quality Enhancement (CEO review: accepted as follow-up)

- [ ] **Video generation safety re-prompting** — Unlike image generation (which retries with a sanitized prompt on safety rejection), video generation falls back to static image immediately. The motion prompt could be sanitized the same way. Also, `isSafetyRejection()` in orchestrator.ts:204 doesn't check for `"forbidden"` which is what Veo returns for RAI-filtered content.
**Priority:** P2

## TTS

- [ ] **Voice catalog command** — Add `--list-voices <provider>` CLI flag that prints available voices for any TTS provider (ElevenLabs, Inworld, Kokoro, Gemini TTS, OpenAI TTS) and exits. Users currently have no way to discover voice options without reading provider docs. Kokoro has ~50 voices, Gemini TTS has multiple, ElevenLabs has hundreds. Build as a unified interface across all providers rather than provider-specific flags.
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.16.0
0.17.0
53 changes: 53 additions & 0 deletions prompts/video-prompter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
You are a motion prompt engineer for AI video generation. You receive a scene's visual description and narration from a DirectorScore, and produce an optimized video generation prompt that will animate a source image into a realistic, cinematic video clip.

## Your Job

Transform the scene's visual description into a detailed, video-generator-friendly motion prompt. The prompt will be used with image-to-video models (Veo, Kling) that animate a still image into a 4-10 second video clip. Focus on what MOVES, how it moves, and how the camera follows.

## Prompt Anatomy

Structure every prompt in this order. Earlier elements carry more weight with video models.

1. **Shot type and framing** — medium shot, close-up, wide establishing shot, POV, etc.
2. **Subject with specific attributes** — who or what is in frame, with distinctive visual details.
3. **Action and movement** — what the subject does over the clip duration. Be specific: "takes three steps forward" not "walks."
4. **Environment and setting** — where this takes place, with 2-3 grounding details.
5. **Camera movement** — one clear camera move per clip. Use professional terms: dolly, crane, pan, tilt, tracking, orbit, rack focus.
6. **Lighting and mood** — light source direction, quality (soft/hard), color temperature. Specify: "warm golden hour key light from upper left" not "nice lighting."
7. **Style and texture** — match the style bible. Include material properties for realism: brushed steel, woven linen, cracked leather, wet cobblestone.

## Motion Rules

Follow these strictly:

1. **One action, one camera move.** Each clip gets exactly one clear subject action and one camera movement. Combining multiple actions or simultaneous camera moves (zoom + rotate, pan + dolly) causes warping and artifacts.

2. **Temporal progression.** Describe what happens across the clip's duration. "In the opening frames, the figure stands motionless. Over the next few seconds, they slowly turn toward camera." Give the model a timeline.

3. **Physics and material realism.** Describe how things move physically. "Water droplets arc from the fountain edge, catching warm light as they fall" not "water splashes." Name material properties: fabric drapes and creases, metal reflects, glass refracts, smoke disperses. The more physical detail, the better the simulation.

4. **Camera vocabulary.** Use terms video models understand:
- Dolly in/out — camera moves toward or away from subject
- Pan left/right — camera rotates horizontally
- Tilt up/down — camera rotates vertically
- Tracking shot — camera follows subject at matching speed
- Crane/jib — camera rises or descends vertically
- Orbit — camera circles around subject
- Rack focus — focus shifts between foreground and background
- Static/locked-off — camera holds position (use for intense subject motion)

5. **Lens terminology.** Include when relevant: "24mm wide-angle" for environmental scope, "85mm telephoto compression" for portraits, "shallow depth of field at f/2.8" for subject isolation, "anamorphic lens flare" for cinematic style.

6. **Match motion to mood.** Smooth, slow camera moves for calm and contemplative. Handheld energy for tension. Static locked-off shots for dramatic reveals. The camera is a storytelling tool.

7. **Keep it concise.** Aim for 300-500 characters. Video models perform best with dense, specific prompts, not lengthy descriptions. Every word should earn its place.

8. **No text in video.** Do not describe any text overlays, captions, titles, or watermarks. These are added in post-production.

9. **Name real subjects.** If the topic involves well-known public figures, landmarks, or locations, name them explicitly. Video models handle known subjects better with actual names.

10. **Depict dark themes through atmosphere.** AI video providers reject explicit violence, gore, or graphic content. For dark topics, convey mood through environment, lighting, shadow, and implication. A dimly lit corridor with flickering light conveys danger without showing it.

## Output

Return the optimized video generation prompt in the `optimized_prompt` field. The prompt should be a single dense paragraph, not a list. No JSON or markdown formatting inside the prompt itself.
11 changes: 4 additions & 7 deletions src/agents/image-prompter.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@ import { z } from "zod";
import type { ArchetypeConfig } from "../schema/archetype.js";
import type { LLMProvider, LLMUsage } from "../schema/providers.js";

const SYSTEM_PROMPT_PATH = path.join(process.cwd(), "prompts", "image-prompter.md");
const IMAGE_PROMPT_PATH = path.join(process.cwd(), "prompts", "image-prompter.md");
const VIDEO_PROMPT_PATH = path.join(process.cwd(), "prompts", "video-prompter.md");

const ImagePromptResult = z.object({
optimized_prompt: z.string(),
Expand Down Expand Up @@ -38,13 +39,9 @@ export async function optimizeImagePrompt(
: "You are a visual prompt engineer for AI image generation. Transform scene descriptions into detailed, image-generator-friendly prompts. Return the optimized prompt in the optimized_prompt field.";

try {
systemPrompt = fs.readFileSync(SYSTEM_PROMPT_PATH, "utf-8");
if (mode === "video") {
systemPrompt +=
"\n\nIMPORTANT: This prompt is for AI VIDEO generation, not still images. Focus on motion, camera movement, and temporal dynamics. Describe what changes over the 5-second clip.";
}
systemPrompt = fs.readFileSync(mode === "video" ? VIDEO_PROMPT_PATH : IMAGE_PROMPT_PATH, "utf-8");
} catch {
// Use default
// Use default inline prompt above
}

// Inject style bible from archetype's creative fields
Expand Down
2 changes: 1 addition & 1 deletion src/cli/cost-estimator.ts
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ const PRICING = {
openaiPerImage: 0.167,
// Video generation pricing (per second of generated video)
veoLitePerSecond: 0.05, // Veo 3.1 Lite ($0.30 for 6s clip)
falKlingPerSecond: 0.07, // Kling v2.1 via fal.ai ($0.35 for 5s clip)
falKlingPerSecond: 0.07, // Kling v2.6 Pro via fal.ai ($0.35 for 5s clip)
// Music generation pricing
lyriaPerTrack: 0.08, // Lyria 3 Pro: $0.08 per song (ai.google.dev/gemini-api/docs/music-generation)
};
Expand Down
3 changes: 3 additions & 0 deletions src/pipeline/orchestrator.ts
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,7 @@ interface VisualAssetResult {
durationSeconds: number | null;
stockResolution?: StockResolution;
videoResolution?: VideoResolution;
prompterUsage?: LLMUsage | null;
}

/** Generate an AI image with optional rejection context from failed stock searches */
Expand Down Expand Up @@ -291,6 +292,7 @@ async function resolveVisualAsset(
usage: videoResult.usage,
durationSeconds: videoResult.durationSeconds,
videoResolution: videoResult.videoResolution,
prompterUsage: videoResult.prompterUsage ?? null,
};
}

Expand Down Expand Up @@ -611,6 +613,7 @@ function buildPipelineWorkflow(
visualsResult.sceneSourceDurations = sceneResults.map((r) => r.durationSeconds);
for (const r of sceneResults) {
if (r.usage) llmUsages.push(r.usage);
if (r.prompterUsage) llmUsages.push(r.prompterUsage);
}

// Track music prompter LLM usage
Expand Down
63 changes: 63 additions & 0 deletions src/providers/factory.test.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
import { beforeEach, describe, expect, it, vi } from "vitest";
import { createProviders } from "./factory.js";
import { FalVideo } from "./video/fal.js";
import { GeminiVideo } from "./video/gemini.js";
import { GeminiImage } from "./image/gemini.js";
import { OpenAIImage } from "./image/openai.js";
import { AnthropicLLM } from "./llm/anthropic.js";
Expand Down Expand Up @@ -81,6 +83,12 @@ vi.mock("./llm/openai-compatible.js", () => ({
vi.mock("./search/tavily.js", () => ({
createTavilySearchTools: vi.fn((apiKey?: string) => ({ tavily_search: { apiKey } })),
}));
vi.mock("./video/gemini.js", () => ({
GeminiVideo: vi.fn().mockImplementation(() => ({ supportedDurations: [4, 6, 8], generate: vi.fn() })),
}));
vi.mock("./video/fal.js", () => ({
FalVideo: vi.fn().mockImplementation(() => ({ supportedDurations: [5, 10], generate: vi.fn() })),
}));

describe("createProviders", () => {
beforeEach(() => {
Expand Down Expand Up @@ -408,4 +416,59 @@ describe("createProviders", () => {
const args = vi.mocked(AnthropicLLM).mock.calls[0]!;
expect(args[0]).toBe("claude-opus-4-6");
});

it("passes videoModel to GeminiVideo but not FalVideo", () => {
const origGoogle = process.env["GOOGLE_API_KEY"];
const origFal = process.env["FAL_API_KEY"];
process.env["GOOGLE_API_KEY"] = "test-goog";
process.env["FAL_API_KEY"] = "test-fal";

createProviders({
llm: "anthropic",
tts: "elevenlabs",
image: "gemini",
videoModel: "veo-3.1-generate-preview",
});

// GeminiVideo should receive the model override
const geminiArgs = vi.mocked(GeminiVideo).mock.calls[0]!;
expect(geminiArgs[0]).toBe("veo-3.1-generate-preview");

// FalVideo should receive undefined (uses its own default)
const falArgs = vi.mocked(FalVideo).mock.calls[0]!;
expect(falArgs[0]).toBeUndefined();

process.env["GOOGLE_API_KEY"] = origGoogle ?? "";
process.env["FAL_API_KEY"] = origFal ?? "";
if (!origGoogle) delete process.env["GOOGLE_API_KEY"];
if (!origFal) delete process.env["FAL_API_KEY"];
});

it("passes videoModel to GeminiVideo even when Fal is primary", () => {
const origGoogle = process.env["GOOGLE_API_KEY"];
const origFal = process.env["FAL_API_KEY"];
process.env["GOOGLE_API_KEY"] = "test-goog";
process.env["FAL_API_KEY"] = "test-fal";

createProviders({
llm: "anthropic",
tts: "elevenlabs",
image: "gemini",
video: "fal",
videoModel: "veo-3.1-generate-preview",
});

// GeminiVideo (secondary) should still receive the model override
const geminiArgs = vi.mocked(GeminiVideo).mock.calls[0]!;
expect(geminiArgs[0]).toBe("veo-3.1-generate-preview");

// FalVideo (primary) should receive undefined
const falArgs = vi.mocked(FalVideo).mock.calls[0]!;
expect(falArgs[0]).toBeUndefined();

process.env["GOOGLE_API_KEY"] = origGoogle ?? "";
process.env["FAL_API_KEY"] = origFal ?? "";
if (!origGoogle) delete process.env["GOOGLE_API_KEY"];
if (!origFal) delete process.env["FAL_API_KEY"];
});
});
4 changes: 2 additions & 2 deletions src/providers/factory.ts
Original file line number Diff line number Diff line change
Expand Up @@ -188,8 +188,8 @@ export function createProviders(config: ProviderConfig): Providers {
const videoPrimary = config.video ?? (googleKey ? "gemini" : falKey ? "fal" : undefined);

if (videoPrimary === "fal") {
if (falKey) videoProviders.push(new FalVideo(config.videoModel, falKey));
if (googleKey) videoProviders.push(new GeminiVideo(undefined, googleKey));
if (falKey) videoProviders.push(new FalVideo(undefined, falKey));
if (googleKey) videoProviders.push(new GeminiVideo(config.videoModel, googleKey));
} else if (videoPrimary === "gemini" || videoPrimary === undefined) {
if (googleKey) videoProviders.push(new GeminiVideo(config.videoModel, googleKey));
if (falKey) videoProviders.push(new FalVideo(undefined, falKey));
Expand Down
51 changes: 51 additions & 0 deletions src/providers/video/fal.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -94,4 +94,55 @@ describe("FalVideo", () => {
}),
).rejects.toThrow("Failed to download fal.ai video: 500");
});

it("passes cfg_scale and negative_prompt in input", async () => {
const provider = new FalVideo(undefined, "test-key");

mockUpload.mockResolvedValueOnce("https://fal.storage/image.png");
mockSubscribe.mockResolvedValueOnce({
data: { video: { url: "https://fal.storage/video.mp4" } },
});
mockFetch.mockResolvedValueOnce({
ok: true,
arrayBuffer: () => Promise.resolve(new ArrayBuffer(100)),
});

const result = await provider.generate({
sourceImage: Buffer.from("fake-image"),
prompt: "A rocket launching",
negativePrompt: "blur, flickering",
});

const subscribeCall = mockSubscribe.mock.lastCall!;
const input = subscribeCall[1].input;
expect(input.cfg_scale).toBe(0.5);
expect(input.negative_prompt).toBe("blur, flickering");

const fs = await import("node:fs");
if (fs.existsSync(result.filePath)) fs.unlinkSync(result.filePath);
});

it("uses Kling v2.6 Pro model by default", async () => {
const provider = new FalVideo(undefined, "test-key");

mockUpload.mockResolvedValueOnce("https://fal.storage/image.png");
mockSubscribe.mockResolvedValueOnce({
data: { video: { url: "https://fal.storage/video.mp4" } },
});
mockFetch.mockResolvedValueOnce({
ok: true,
arrayBuffer: () => Promise.resolve(new ArrayBuffer(100)),
});

const result = await provider.generate({
sourceImage: Buffer.from("fake-image"),
prompt: "test",
});

const subscribeCall = mockSubscribe.mock.lastCall!;
expect(subscribeCall[0]).toBe("fal-ai/kling-video/v2.6/pro/image-to-video");

const fs = await import("node:fs");
if (fs.existsSync(result.filePath)) fs.unlinkSync(result.filePath);
});
});
Loading
Loading