Three V3 ComfyUI nodes that compose a disciplined cinematic prompt (five camera modes Γ canonical lens recipes Γ diegetic-audio guardrails) and drive LTX 2.3 22B distilled video generation. Plus the controlled sweep methodology, the 3 LTX 2.3 prompt-sensitivity findings they uncovered, and ready-to-run example workflows for t2v, i2v, v2v, and hi-res 1080p production.
βΆ Watch the full A/B reel with audio (1:47, 45 MB) β or browse all release assets
Q2_K GGUF (left) vs FP8 dev + distill LoRA (right) β same Cinema Worldbuilder prompts, two LTX 2.3 model variants. 2 scenes Γ 5 camera modes = 10 paired clips in the full reel; the looping highlight above shows the M3 Action + M4 Performance cells.
π¨ FP8 + distill LoRA β sharper, higher contrast, richer color grade. Reads like a modern digital cinema camera. Strong for advertising / music video / fashion / hero shots.
ποΈ Q2_K GGUF β less saturated, softer highlight rolloff, slightly milky. Old-film, naturalistic feel. Surprisingly nice for documentary / drama / vΓ©ritΓ© styles.
Same model, different quant β pick the chain that matches your project, not just the "best quality" label.
The pack ships three ComfyUI nodes under the Cinema Worldbuilder category:
| Node | What it does |
|---|---|
π₯ CinemaWorldbuilder_CameraBlock |
Picks a film mode (M1 Narrative / M2 Studio / M3 Action / M4 Performance / M5 Atmospheric), a lens (35β85 mm), a runtime (0.5β4.0 s), and emits a canonical camera-vocabulary block. Outputs (camera_block, frame_count, fps, runtime_actual) β the frame count is LTX-valid 8k+1 and respects a 12 GB VRAM cap. |
π CinemaWorldbuilder_AudioLine |
Builds the diegetic audio line. Music is rejected at validation β any music/score/lyrics token fails the graph (prevents bad prompts from wasting renders). Optional spoken_dialogue clause. |
π CinemaWorldbuilder_PromptComposer |
Assembles the final single-paragraph prompt from style_and_mood + dynamic_description + static_description + camera_block + audio_line and feeds CLIPTextEncode. |
Typical wiring: CameraBlock + AudioLine β PromptComposer β CLIPTextEncode (positive) β LTX 2.3 sampler.
Four findings from a controlled 26-pair A/B sweep on LTX 2.3 22B distilled
1.1, evidence in FINDINGS_FOR_LTX.md:
- Equipment vocabulary is decorative β compressing the camera/lens block by β48% (180 β 94 words) produces visually indistinguishable output. CLIP image-embedding cosine similarity: 0.967 Β± 0.071 paired vs 0.699 Β± 0.076 control (β 3.5Ο effect size). Arri / Master Primes / ND filter indices: not interpreted by the model.
static_descriptionis load-bearing β the camera-vocabulary phrasing applies a grade (palette, motion vocabulary, stage lighting); it does not carry scene content. Strip the scene noun and even rich camera prompts collapse to abstract lens-mush.LTXVSchedulerdefaults leave the distilled-1.1 schedule undenoised β output looks soft and grainy. The clean fix is aManualSigmasschedule from the Lightricks 1.1 reference; details in the writeup.- Different quantizations produce different "looks" (craft note) β the dev-fp8 + distill LoRA chain renders modern-digital-cinema sharp/contrasty/saturated; the Q2_K GGUF base renders old-film soft/naturalistic. Not a strict upgrade β pick by project, not by spec.
# Drop into ComfyUI's custom_nodes folder
cd ComfyUI/custom_nodes
git clone https://github.com/Pro2004-a11/comfyui-cinema-worldbuilder.gitRestart ComfyUI. On load you should see comfyui-cinema-worldbuilder in the
console with no IMPORT FAILED. If a workflow says "Installation Required" for
the Cinema nodes, hard-refresh the browser tab (Ctrl+F5) β the frontend caches
its node list at page load.
No third-party Python dependencies. Tests: python -m pytest tests/ -v (20 unit tests, all green).
In example_workflows/ β each pipeline in two formats: *.json (API, for
headless submission) and *_ui.json (graph format, open in the ComfyUI canvas).
| Workflow | Pipeline | Look | Notes |
|---|---|---|---|
cinema_ltx23_t2v_hires_fp8.json |
Hi-res t2v + i2v | π¨ modern digital cinema (sharper, contrasty, richer grade) | Built on the Comfy-Org canonical template β two-stage 540p draft β spatial upscale β 1088p refine. Cinema nodes drive the positive prompt. Uses ltx-2.3-22b-dev-fp8.safetensors + distill LoRA. Output: 1280Γ704β1920Γ1088 @ 24 fps with audio, β 5 s clip, β 290 s wall time on RTX 4070 Ti. |
cinema_ltx23_t2v.json + _ui |
Single-stage t2v | ποΈ old-film naturalistic (softer, less saturated) | Q2_K GGUF base. Faster (β 75 s wall), 768Γ512. Good for sweeps, iteration, and projects wanting a less-graded look. |
cinema_ltx23_i2v.json + _ui |
First-frame image-to-video | (matches base model) | LoadImage anchor via LTXVAddGuide(frame_idx=0, strength=1.0). CFGGuider's pos/neg pull from AddGuide outputs (the i2v anchor point), not raw LTXVConditioning. |
cinema_ltx23_v2v.json + _ui |
Video-to-video refine | (matches base model) | Adapted from a known-good warp-refine graph. Useful for upgrading rough drafts. |
Load any *_ui.json in the ComfyUI canvas, edit the Cinema node widgets, hit Queue.
The recommended hi-res workflow needs:
| File | Folder | Source |
|---|---|---|
ltx-2.3-22b-dev-fp8.safetensors |
models/checkpoints/ |
the LTX 2.3 dev base in fp8 (matches the Comfy-Org canonical template) |
ltx-2.3-22b-distilled-lora-384-1.1.safetensors |
models/loras/ltxv/ltx2/ |
Lightricks/LTX-2.3 on HuggingFace |
ltx-2.3-spatial-upscaler-x2-1.1.safetensors |
models/latent_upscale_models/ |
Lightricks/LTX-2.3 |
gemma-3-12b-it-IQ4_XS.gguf + ltx-2.3_text_projection_bf16.safetensors |
models/text_encoders/ |
Kijai/LTX2.3_comfy |
ltx23_video_vae.safetensors + ltx23_audio_vae.safetensors |
models/vae/ |
Kijai/LTX2.3_comfy |
The non-hires cinema_ltx23_t2v.json workflow uses the GGUF chain instead (UnetLoaderGGUF + Q2_K) β lighter requirements, included for completeness.
Each mode emits a canonical camera-vocabulary block β palette, motion vocabulary, stage lighting cue, focal-length suggestion. The grammar is intentionally short by design (see Finding 1) β motion / lighting / lens / palette / DoF.
| Mode | Register | Default lens | Best for |
|---|---|---|---|
| M1 Narrative | Cinematic push-in, moody, character-focused | 55 mm | A lone subject in a place; story-driven shots |
| M2 Studio | Editorial fashion film, glossy, high-key, photoreal skin | 75 mm | Portraits, product, fashion |
| M3 Action | Gritty documentary realism, fast handheld, motion blur | 40 mm | Combat, sports, kinetic scenes |
| M4 Performance | Stage-grade with audience implied, lighting wash, energy | 55 mm | Dancers, athletes, music-video subjects |
| M5 Atmospheric | Environment plate, slow drift, still and quiet, no subject | 35 mm | Establishing shots, empty interiors, mood-only |
β οΈ static_descriptionis load-bearing (Finding 2). Camera grammar is a grade layer, not a content carrier. The mode shifts the look; the prose carries the scene. Empty or vague scene descriptions collapse silently.
# Validate the 28-job matrix without rendering
python sweep/cinema_sweep_v2.py --check
# Full sweep (β 30 min on RTX 4070 Ti)
python sweep/cinema_sweep_v2.py
# Build the 26-pair side-by-side A/B reel
python sweep/ab_compare.py
# CLIP-similarity on the 26 pairs (β 1 min on GPU)
python sweep/clip_similarity.pyOutputs land in sweep/results/, sweep/results_v2/, and sweep/results_ab/. The per-clip CLIP-similarity table is written to sweep/results_v2/clip_similarity.json.
comfyui-cinema-worldbuilder/
βββ π FINDINGS_FOR_LTX.md # the writeup β empirical study of LTX 2.3
βββ π§© cinema_grammar.py # mode tables, camera blocks, prompt composer (pure functions, no ComfyUI import)
βββ π§© nodes.py # the 3 io.ComfyNode adapters
βββ π§ͺ tests/ # 20 pytest unit tests
βββ ποΈ example_workflows/ # 4 pipelines Γ API + UI formats
βββ π¬ sweep/ # methodology: matrix builder, CLIP-sim script, A/B compositor
βββ π docs/ # design spec + implementation plan
Corrections, replications on other LTX 2.3 configurations, and PRs to extend the camera-grammar vocabulary or audio-discipline rules are welcome. Open an issue before substantial changes so we can chat about scope.
If this pack or the writeup informs your work:
@misc{refaeli2026cinemaworldbuilder,
author = {Refaeli, Yosi},
title = {Cinema Worldbuilder: ComfyUI nodes and prompt-sensitivity study for LTX 2.3},
year = {2026},
url = {https://github.com/Pro2004-a11/comfyui-cinema-worldbuilder}
}Feedback welcome on LinkedIn or via GitHub issues.
Made with discipline by a senior technical artist. π¬
