🎥 LTX-Video 2.0 — Image-to-Video Workflow

The LTX-Video 2.0 I2V workflow generates a video clip from a starting image, guided by a text prompt. It uses a two-pass sampling pipeline (initial generation → spatial upscale → refinement) with optional audio synthesis via the LTX Audio VAE.

Workflow file: comfyui_workflows/video_ltx2_i2v.json

Workflow Overview

flowchart LR
    A[Start Image] --> B[Resize & Preprocess]
    H[Audio File] --> I[Trim Audio]
    I --> J[Audio VAE Encode]
    B --> C[LTX-Video 2.0<br/>Pass 1: Generate]
    J --> C
    C --> D[Spatial Upscale 2x]
    D --> E[Pass 2: Refine<br/>Distilled LoRA]
    E --> F[VAE Decode<br/>Video + Audio]
    F --> G[Save Video]

Models Required

Component	Filename	Node
Checkpoint	`ltx-2-19b-dev-fp8.safetensors`	`92:1`
Text Encoder	`gemma_3_12B_it_fp4_mixed.safetensors`	`92:60`
Audio VAE	`ltx-2-19b-dev-fp8.safetensors`	`92:48`
Distilled LoRA	`ltx-2-19b-distilled-lora-384.safetensors`	`92:68`
Spatial Upscaler	`ltx-2-spatial-upscaler-x2-1.0.safetensors`	`92:76`

API Parameter Reference

All parameters are modified by loading the workflow JSON, editing the relevant node's inputs, and POSTing to the ComfyUI /prompt endpoint:

const workflow = JSON.parse(workflowJson);
workflow["98"].inputs.image = "my_start_image.png";  // example
await queuePrompt(workflow);

🖼️ Start Image

The source image that the video will animate from.

Property	Node ID	Key	Default
Image filename	`98`	`inputs.image`	`"Image_fx(21).jpg"`

"98": {
  "inputs": {
    "image": "my_image.png"
  },
  "class_type": "LoadImage"
}

Note

The image must exist in ComfyUI's input/ directory. Upload it via the ComfyUI /upload/image endpoint first, or place it there manually.

The image is automatically resized to 1280×720 by node 102:

Property	Node ID	Key	Default
Width	`102`	`inputs.resize_type.width`	`1280`
Height	`102`	`inputs.resize_type.height`	`720`

✏️ Prompt Text

Controls what the generated video depicts. There is a positive prompt (what to generate) and a negative prompt (what to avoid).

Positive Prompt

Property	Node ID	Key
Prompt text	`92:3`	`inputs.text`

"92:3": {
  "inputs": {
    "text": "A sweeping cinematic shot of a mountain landscape at sunset..."
  },
  "class_type": "CLIPTextEncode"
}

Negative Prompt

Property	Node ID	Key	Default
Negative text	`92:4`	`inputs.text`	`"blurry, low quality, still frame, frames, watermark, overlay, titles, has blurbox, has subtitles"`

"92:4": {
  "inputs": {
    "text": "blurry, low quality, watermark"
  },
  "class_type": "CLIPTextEncode"
}

🎞️ FPS (Frames Per Second)

The frame rate appears in three nodes and should be kept consistent across all of them:

Property	Node ID	Key	Default
Conditioning FPS	`92:22`	`inputs.frame_rate`	`25`
Latent Audio FPS	`92:51`	`inputs.frame_rate`	`25`
Output Video FPS	`92:97`	`inputs.fps`	`25`

// Set all three to match
workflow["92:22"].inputs.frame_rate = 30;
workflow["92:51"].inputs.frame_rate = 30;
workflow["92:97"].inputs.fps = 30;

Important

All three FPS values must match. Mismatched values will cause audio/video desync or generation errors.

🔢 Frame Count & Duration Seconds

Duration in this application is strictly tied to the FPS (frames per second) and the required LTX-Video frame boundaries.

Duration formula: frames / fps = seconds

Frames	20 FPS (Default)	24 FPS	25 FPS	30 FPS
9	0.45s	0.38s	0.36s	0.30s
17	0.85s	0.71s	0.68s	0.57s
25	1.25s	1.04s	1.00s	0.83s
33	1.65s	1.38s	1.32s	1.10s
81	4.05s	3.38s	3.24s	2.70s
121	6.05s	5.04s	4.84s	4.03s

Important

Why do durations have strange decimal values?
LTX-Video 2.0 requires frame counts following the formula (n × 8) + 1 (9, 17, 25, 33... 81, 121...).
To ensure "mathematically perfect" playback without skipped frames or jitter, the application automatically snaps every selection's duration to the nearest valid frame boundary based on your project's FPS.

Note

Higher frame counts require significantly more VRAM. 81 or 121 frames are the standard benchmarks for high-quality cinematic clips.

🔊 Audio Source

This workflow uses a LoadAudio node to intake an external audio file, trims its duration to match the video, and encodes it using the LTXV Audio VAE Encode node. The synthesized video will match the pacing and characteristics of the provided audio.

Property	Node ID	Key	Default
Audio filename	`92:113`	`inputs.audio`	`"Bob Marly-Get Up, Stand Up_Vocals.mp3"`
Audio VAE Encode	`92:117`	`inputs.audio`	Linked to TrimAudioDuration (`92:115`)
Audio Start Time (s)	`92:115`	`inputs.start_index`	`20`
Audio Duration (s)	`92:115`	`inputs.duration`	`3`

Tip

The audio is loaded from an external file in ComfyUI's input directory. You can specify the file name by modifying the inputs.audio property on node 92:113. The duration and start time can be adjusted in the Trim Audio Duration node (92:115). For perfect synchronization, the audio duration should match your calculated video duration (frames / fps).

Audio VAE Model

The Audio VAE model is loaded by node 92:48:

Property	Node ID	Key	Default
Checkpoint name	`92:48`	`inputs.ckpt_name`	`"ltx-2-19b-dev-fp8.safetensors"`

📂 Output File Location

The saved video output path and format.

Property	Node ID	Key	Default
Filename prefix	`75`	`inputs.filename_prefix`	`"video/LTX_2.0_i2v"`
Format	`75`	`inputs.format`	`"auto"`
Codec	`75`	`inputs.codec`	`"auto"`

"75": {
  "inputs": {
    "filename_prefix": "video/my_project/clip_001",
    "format": "auto",
    "codec": "auto"
  },
  "class_type": "SaveVideo"
}

Note

The filename_prefix is relative to ComfyUI's output/ directory. A counter suffix is appended automatically (e.g., clip_001_00001.mp4). Subdirectories are created automatically.

Full API Example

import { queuePrompt } from './services/comfyService';
import workflowJson from '../comfyui_workflows/video_ltx2_i2v.json';

// Deep clone to avoid mutating the template
const workflow = JSON.parse(JSON.stringify(workflowJson));

// 1. Set start image
workflow["98"].inputs.image = "my_photo.jpg";

// 2. Set prompt
workflow["92:3"].inputs.text = "A dramatic zoom into a glowing crystal...";
workflow["92:4"].inputs.text = "blurry, watermark, low quality";

// 3. Set FPS (all three must match)
const fps = 25;
workflow["92:22"].inputs.frame_rate = fps;
workflow["92:51"].inputs.frame_rate = fps;
workflow["92:97"].inputs.fps = fps;

// 4. Set number of frames (must be (n*8)+1)
workflow["92:62"].inputs.value = 121; // ~4.84s at 25fps

// 5. Set output location
workflow["75"].inputs.filename_prefix = "video/my_project/scene_01";

// 6. Set audio file
workflow["92:113"].inputs.audio = "my_audio.mp3";

// 7. Queue the prompt
const result = await queuePrompt(workflow);
console.log('Queued:', result?.prompt_id);

Advanced Sampling Parameters

These parameters control generation quality and are typically left at defaults:

Parameter	Node ID	Key	Default	Description
Steps (Pass 1)	`92:9`	`inputs.steps`	`20`	Denoising steps for initial generation
CFG (Pass 1)	`92:47`	`inputs.cfg`	`4`	Classifier-Free Guidance scale
CFG (Pass 2)	`92:82`	`inputs.cfg`	`1`	Refinement guidance (lower = more faithful)
Seed (Pass 1)	`92:11`	`inputs.noise_seed`	`10`	Random seed for reproducibility
Seed (Pass 2)	`92:67`	`inputs.noise_seed`	`0`	Random seed for refinement pass
Image Compression	`92:99`	`inputs.img_compression`	`33`	Start image conditioning strength
Image Resize	`102`	`inputs.resize_type.width/height`	`1280 × 720`	Input image dimensions

VRAM Management

ComfyUI executes workflows one by one and automatically handles loading/unloading models from VRAM as needed between different jobs.

However, if you want to explicitly clear VRAM and system memory between large batches or if you intend to run other heavy applications, you can trigger ComfyUI's /free endpoint:

// Example: POST to /free to unload models
await fetch('http://127.0.0.1:8188/free', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ unload_models: true, free_memory: true })
});

(This functionality is available via the "🧹 Clear VRAM" button on the LTX Test Page.)

Troubleshooting

"ComfyUI: Disconnected" — Ensure ComfyUI is running at http://127.0.0.1:8188.
Missing model errors — Verify all 5 model files are in the correct ComfyUI model directories (models/checkpoints/, models/loras/, models/upscale_models/, models/text_encoders/).
VRAM errors — Reduce frame count or image resolution. The fp8 checkpoint helps with VRAM usage.
Audio/video desync — Ensure all three FPS values match (92:22, 92:51, 92:97).
Invalid frame count — Use values following (n × 8) + 1 (e.g., 9, 17, 25, 33, 41, 49, 57, 65, 73, 81, 89, 97, 105, 113, 121, etc. ).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🎥 LTX-Video 2.0 — Image-to-Video Workflow

Workflow Overview

Models Required

API Parameter Reference

🖼️ Start Image

✏️ Prompt Text

Positive Prompt

Negative Prompt

🎞️ FPS (Frames Per Second)

🔢 Frame Count & Duration Seconds

🔊 Audio Source

Audio VAE Model

📂 Output File Location

Full API Example

Advanced Sampling Parameters

VRAM Management

Troubleshooting

FilesExpand file tree

readme_ltx2.md

Latest commit

History

readme_ltx2.md

File metadata and controls

🎥 LTX-Video 2.0 — Image-to-Video Workflow

Workflow Overview

Models Required

API Parameter Reference

🖼️ Start Image

✏️ Prompt Text

Positive Prompt

Negative Prompt

🎞️ FPS (Frames Per Second)

🔢 Frame Count & Duration Seconds

🔊 Audio Source

Audio VAE Model

📂 Output File Location

Full API Example

Advanced Sampling Parameters

VRAM Management

Troubleshooting