Skip to content

Pro2004-a11/comfyui-cinema-worldbuilder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

38 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎬 Cinema Worldbuilder

ComfyUI custom-node pack + empirical prompt-sensitivity study for LTX 2.3 video generation

License: MIT Version ComfyUI LTX 2.3 Python Tests

Three V3 ComfyUI nodes that compose a disciplined cinematic prompt (five camera modes Γ— canonical lens recipes Γ— diegetic-audio guardrails) and drive LTX 2.3 22B distilled video generation. Plus the controlled sweep methodology, the 3 LTX 2.3 prompt-sensitivity findings they uncovered, and ready-to-run example workflows for t2v, i2v, v2v, and hi-res 1080p production.


πŸŽ₯ Demo

Cinema Worldbuilder β€” Q2_K GGUF vs FP8 dev + distill LoRA A/B reel

β–Ά Watch the full A/B reel with audio (1:47, 45 MB) β€” or browse all release assets

Q2_K GGUF (left) vs FP8 dev + distill LoRA (right) β€” same Cinema Worldbuilder prompts, two LTX 2.3 model variants. 2 scenes Γ— 5 camera modes = 10 paired clips in the full reel; the looping highlight above shows the M3 Action + M4 Performance cells.

🎨 FP8 + distill LoRA β€” sharper, higher contrast, richer color grade. Reads like a modern digital cinema camera. Strong for advertising / music video / fashion / hero shots.

🎞️ Q2_K GGUF β€” less saturated, softer highlight rolloff, slightly milky. Old-film, naturalistic feel. Surprisingly nice for documentary / drama / vΓ©ritΓ© styles.

Same model, different quant β€” pick the chain that matches your project, not just the "best quality" label.


✨ What it is

The pack ships three ComfyUI nodes under the Cinema Worldbuilder category:

Node What it does
πŸŽ₯ CinemaWorldbuilder_CameraBlock Picks a film mode (M1 Narrative / M2 Studio / M3 Action / M4 Performance / M5 Atmospheric), a lens (35–85 mm), a runtime (0.5–4.0 s), and emits a canonical camera-vocabulary block. Outputs (camera_block, frame_count, fps, runtime_actual) β€” the frame count is LTX-valid 8k+1 and respects a 12 GB VRAM cap.
πŸ”Š CinemaWorldbuilder_AudioLine Builds the diegetic audio line. Music is rejected at validation β€” any music/score/lyrics token fails the graph (prevents bad prompts from wasting renders). Optional spoken_dialogue clause.
πŸ“ CinemaWorldbuilder_PromptComposer Assembles the final single-paragraph prompt from style_and_mood + dynamic_description + static_description + camera_block + audio_line and feeds CLIPTextEncode.

Typical wiring: CameraBlock + AudioLine β†’ PromptComposer β†’ CLIPTextEncode (positive) β†’ LTX 2.3 sampler.

πŸ”¬ What we learned (the headline)

Four findings from a controlled 26-pair A/B sweep on LTX 2.3 22B distilled 1.1, evidence in FINDINGS_FOR_LTX.md:

  1. Equipment vocabulary is decorative β€” compressing the camera/lens block by βˆ’48% (180 β†’ 94 words) produces visually indistinguishable output. CLIP image-embedding cosine similarity: 0.967 Β± 0.071 paired vs 0.699 Β± 0.076 control (β‰ˆ 3.5Οƒ effect size). Arri / Master Primes / ND filter indices: not interpreted by the model.
  2. static_description is load-bearing β€” the camera-vocabulary phrasing applies a grade (palette, motion vocabulary, stage lighting); it does not carry scene content. Strip the scene noun and even rich camera prompts collapse to abstract lens-mush.
  3. LTXVScheduler defaults leave the distilled-1.1 schedule undenoised β€” output looks soft and grainy. The clean fix is a ManualSigmas schedule from the Lightricks 1.1 reference; details in the writeup.
  4. Different quantizations produce different "looks" (craft note) β€” the dev-fp8 + distill LoRA chain renders modern-digital-cinema sharp/contrasty/saturated; the Q2_K GGUF base renders old-film soft/naturalistic. Not a strict upgrade β€” pick by project, not by spec.

πŸ“¦ Install

# Drop into ComfyUI's custom_nodes folder
cd ComfyUI/custom_nodes
git clone https://github.com/Pro2004-a11/comfyui-cinema-worldbuilder.git

Restart ComfyUI. On load you should see comfyui-cinema-worldbuilder in the console with no IMPORT FAILED. If a workflow says "Installation Required" for the Cinema nodes, hard-refresh the browser tab (Ctrl+F5) β€” the frontend caches its node list at page load.

No third-party Python dependencies. Tests: python -m pytest tests/ -v (20 unit tests, all green).

πŸŽ›οΈ Example workflows

In example_workflows/ β€” each pipeline in two formats: *.json (API, for headless submission) and *_ui.json (graph format, open in the ComfyUI canvas).

Workflow Pipeline Look Notes
cinema_ltx23_t2v_hires_fp8.json Hi-res t2v + i2v 🎨 modern digital cinema (sharper, contrasty, richer grade) Built on the Comfy-Org canonical template β€” two-stage 540p draft β†’ spatial upscale β†’ 1088p refine. Cinema nodes drive the positive prompt. Uses ltx-2.3-22b-dev-fp8.safetensors + distill LoRA. Output: 1280Γ—704–1920Γ—1088 @ 24 fps with audio, β‰ˆ 5 s clip, β‰ˆ 290 s wall time on RTX 4070 Ti.
cinema_ltx23_t2v.json + _ui Single-stage t2v 🎞️ old-film naturalistic (softer, less saturated) Q2_K GGUF base. Faster (β‰ˆ 75 s wall), 768Γ—512. Good for sweeps, iteration, and projects wanting a less-graded look.
cinema_ltx23_i2v.json + _ui First-frame image-to-video (matches base model) LoadImage anchor via LTXVAddGuide(frame_idx=0, strength=1.0). CFGGuider's pos/neg pull from AddGuide outputs (the i2v anchor point), not raw LTXVConditioning.
cinema_ltx23_v2v.json + _ui Video-to-video refine (matches base model) Adapted from a known-good warp-refine graph. Useful for upgrading rough drafts.

Load any *_ui.json in the ComfyUI canvas, edit the Cinema node widgets, hit Queue.

Required models

The recommended hi-res workflow needs:

File Folder Source
ltx-2.3-22b-dev-fp8.safetensors models/checkpoints/ the LTX 2.3 dev base in fp8 (matches the Comfy-Org canonical template)
ltx-2.3-22b-distilled-lora-384-1.1.safetensors models/loras/ltxv/ltx2/ Lightricks/LTX-2.3 on HuggingFace
ltx-2.3-spatial-upscaler-x2-1.1.safetensors models/latent_upscale_models/ Lightricks/LTX-2.3
gemma-3-12b-it-IQ4_XS.gguf + ltx-2.3_text_projection_bf16.safetensors models/text_encoders/ Kijai/LTX2.3_comfy
ltx23_video_vae.safetensors + ltx23_audio_vae.safetensors models/vae/ Kijai/LTX2.3_comfy

The non-hires cinema_ltx23_t2v.json workflow uses the GGUF chain instead (UnetLoaderGGUF + Q2_K) β€” lighter requirements, included for completeness.

🎬 The five camera modes

Each mode emits a canonical camera-vocabulary block β€” palette, motion vocabulary, stage lighting cue, focal-length suggestion. The grammar is intentionally short by design (see Finding 1) β€” motion / lighting / lens / palette / DoF.

Mode Register Default lens Best for
M1 Narrative Cinematic push-in, moody, character-focused 55 mm A lone subject in a place; story-driven shots
M2 Studio Editorial fashion film, glossy, high-key, photoreal skin 75 mm Portraits, product, fashion
M3 Action Gritty documentary realism, fast handheld, motion blur 40 mm Combat, sports, kinetic scenes
M4 Performance Stage-grade with audience implied, lighting wash, energy 55 mm Dancers, athletes, music-video subjects
M5 Atmospheric Environment plate, slow drift, still and quiet, no subject 35 mm Establishing shots, empty interiors, mood-only

⚠️ static_description is load-bearing (Finding 2). Camera grammar is a grade layer, not a content carrier. The mode shifts the look; the prose carries the scene. Empty or vague scene descriptions collapse silently.

πŸ§ͺ Reproduce the sweep

# Validate the 28-job matrix without rendering
python sweep/cinema_sweep_v2.py --check

# Full sweep (β‰ˆ 30 min on RTX 4070 Ti)
python sweep/cinema_sweep_v2.py

# Build the 26-pair side-by-side A/B reel
python sweep/ab_compare.py

# CLIP-similarity on the 26 pairs (β‰ˆ 1 min on GPU)
python sweep/clip_similarity.py

Outputs land in sweep/results/, sweep/results_v2/, and sweep/results_ab/. The per-clip CLIP-similarity table is written to sweep/results_v2/clip_similarity.json.

πŸ—‚οΈ Repo layout

comfyui-cinema-worldbuilder/
β”œβ”€β”€ πŸ“„ FINDINGS_FOR_LTX.md      # the writeup β€” empirical study of LTX 2.3
β”œβ”€β”€ 🧩 cinema_grammar.py         # mode tables, camera blocks, prompt composer (pure functions, no ComfyUI import)
β”œβ”€β”€ 🧩 nodes.py                  # the 3 io.ComfyNode adapters
β”œβ”€β”€ πŸ§ͺ tests/                    # 20 pytest unit tests
β”œβ”€β”€ πŸŽ›οΈ example_workflows/        # 4 pipelines Γ— API + UI formats
β”œβ”€β”€ πŸ”¬ sweep/                    # methodology: matrix builder, CLIP-sim script, A/B compositor
└── πŸ“š docs/                     # design spec + implementation plan

🀝 Contributing

Corrections, replications on other LTX 2.3 configurations, and PRs to extend the camera-grammar vocabulary or audio-discipline rules are welcome. Open an issue before substantial changes so we can chat about scope.

πŸ“œ Citation

If this pack or the writeup informs your work:

@misc{refaeli2026cinemaworldbuilder,
  author = {Refaeli, Yosi},
  title  = {Cinema Worldbuilder: ComfyUI nodes and prompt-sensitivity study for LTX 2.3},
  year   = {2026},
  url    = {https://github.com/Pro2004-a11/comfyui-cinema-worldbuilder}
}

πŸ“¬ Contact

Feedback welcome on LinkedIn or via GitHub issues.


Made with discipline by a senior technical artist. 🎬

About

ComfyUI custom-node pack + empirical prompt-sensitivity study for LTX 2.3 video generation. Three nodes (CameraBlock, AudioLine, PromptComposer) + 3 findings + ready-to-run example workflows.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors