Skip to content

Commit b02e3ff

Browse files
localai-botmudler
andauthored
feat(stablediffusion-ggml): LTX-2 support + LTX-2.3 GGUF gallery entries (#9980)
stable-diffusion.cpp gained LTX-2 video generation, which requires an audio VAE and an embeddings_connectors safetensors in addition to the usual diffusion model, VAE, and LLM text encoder. The pinned commit exposes audio_vae_path and embeddings_connectors_path on sd_ctx_params_t; wire both through the option parser so gallery entries can point at the LTX-specific assets. Ship six LTX-2.3 GGUF gallery entries (dev + distilled, UD-Q4_K_M / Q4_K_M / Q8_0 each) backed by a new ltx-ggml.yaml template that defaults to euler / cfg_scale 6.0 / vae_decode_only:false / diffusion_flash_attn / offload_params_to_cpu — matching the upstream LTX-2 CLI recipe. Each entry pulls the model GGUF plus the QAT gemma-3-12b-it text encoder, video VAE, audio VAE, and embeddings connectors needed for T2V / I2V / FLF2V. Assisted-by: Claude:claude-opus-4-7 [Claude-Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
1 parent a891eed commit b02e3ff

3 files changed

Lines changed: 265 additions & 0 deletions

File tree

backend/go/stablediffusion-ggml/cpp/gosd.cpp

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -376,6 +376,8 @@ int load_model(const char *model, char *model_path, char* options[], int threads
376376
const char *clip_g_path = "";
377377
const char *t5xxl_path = "";
378378
const char *vae_path = "";
379+
const char *audio_vae_path = "";
380+
const char *embeddings_connectors_path = "";
379381
const char *scheduler_str = "";
380382
const char *sampler = "";
381383
const char *clip_vision_path = "";
@@ -431,6 +433,12 @@ int load_model(const char *model, char *model_path, char* options[], int threads
431433
if (!strcmp(optname, "vae_path")) {
432434
vae_path = strdup(optval);
433435
}
436+
if (!strcmp(optname, "audio_vae_path")) {
437+
audio_vae_path = strdup(optval);
438+
}
439+
if (!strcmp(optname, "embeddings_connectors_path")) {
440+
embeddings_connectors_path = strdup(optval);
441+
}
434442
if (!strcmp(optname, "scheduler")) {
435443
scheduler_str = optval;
436444
}
@@ -563,6 +571,8 @@ int load_model(const char *model, char *model_path, char* options[], int threads
563571
ctx_params.diffusion_model_path = diffusion_model_path;
564572
ctx_params.high_noise_diffusion_model_path = high_noise_diffusion_model_path;
565573
ctx_params.vae_path = vae_path;
574+
ctx_params.audio_vae_path = audio_vae_path;
575+
ctx_params.embeddings_connectors_path = embeddings_connectors_path;
566576
ctx_params.taesd_path = taesd_path;
567577
ctx_params.control_net_path = control_net_path;
568578
if (lora_dir && strlen(lora_dir) > 0) {

gallery/index.yaml

Lines changed: 240 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30805,6 +30805,246 @@
3080530805
- torch_dtype:bf16
3080630806
parameters:
3080730807
model: Lightricks/LTX-2.3
30808+
- &ltx-2-3-dev-ggml
30809+
name: ltx-2.3-22b-dev-ggml
30810+
url: github:mudler/LocalAI/gallery/ltx-ggml.yaml@master
30811+
urls:
30812+
- https://huggingface.co/Lightricks/LTX-2.3
30813+
- https://huggingface.co/unsloth/LTX-2.3-GGUF
30814+
- https://huggingface.co/unsloth/gemma-3-12b-it-qat-GGUF
30815+
description: |
30816+
LTX-2.3 22B dev - DiT-based audio-video foundation model from Lightricks,
30817+
GGUF-quantized for the stable-diffusion.cpp backend. Generates synchronized
30818+
video and audio from a text prompt (T2V), a reference image (I2V), or
30819+
first/last frame pairs (FLF2V). Uses gemma-3-12b-it as the text encoder
30820+
and ships dedicated video and audio VAEs plus an embeddings_connectors
30821+
safetensors that bridges the LLM hidden states to the diffusion model.
30822+
30823+
This entry uses the dynamic (UD) Q4_K_M quantization of the 22B model
30824+
(~16 GB) paired with the UD-Q4_K_XL QAT Gemma encoder (~7.4 GB).
30825+
Recommended generation: width=1280, height=720, video_frames=33,
30826+
fps=24, sampler=euler, cfg_scale=6.0.
30827+
license: ltx-2-community-license-agreement
30828+
icon: https://cdn-avatars.huggingface.co/v1/production/uploads/1652783139615-628375426db5127097cf5442.png
30829+
tags:
30830+
- ltx
30831+
- ltx-2
30832+
- text-to-video
30833+
- image-to-video
30834+
- first-last-frame-to-video
30835+
- audio-video
30836+
- video-generation
30837+
- diffusion
30838+
- gguf
30839+
- quantized
30840+
- 22b
30841+
- cpu
30842+
- gpu
30843+
overrides:
30844+
parameters:
30845+
model: ltx-2.3-22b-dev-UD-Q4_K_M.gguf
30846+
options:
30847+
- llm_path:gemma-3-12b-it-qat-UD-Q4_K_XL.gguf
30848+
- vae_path:ltx-2.3-22b-dev_video_vae.safetensors
30849+
- audio_vae_path:ltx-2.3-22b-dev_audio_vae.safetensors
30850+
- embeddings_connectors_path:ltx-2.3-22b-dev_embeddings_connectors.safetensors
30851+
files:
30852+
- filename: ltx-2.3-22b-dev-UD-Q4_K_M.gguf
30853+
sha256: a6983fcf16cda13ec6dc22711dae47fa7cf160204d5a3b42b0c09d1f13fc853b
30854+
uri: huggingface://unsloth/LTX-2.3-GGUF/ltx-2.3-22b-dev-UD-Q4_K_M.gguf
30855+
- filename: gemma-3-12b-it-qat-UD-Q4_K_XL.gguf
30856+
sha256: da98f81c86916ed1c76b3eeda56b25cb7b8352b01093e2edb8028110fe2cb53b
30857+
uri: huggingface://unsloth/gemma-3-12b-it-qat-GGUF/gemma-3-12b-it-qat-UD-Q4_K_XL.gguf
30858+
- filename: ltx-2.3-22b-dev_video_vae.safetensors
30859+
sha256: 8732bb70cf4343541815f45c9f90f5ff0519d679bd63483afc27bf79a08d3f4e
30860+
uri: huggingface://unsloth/LTX-2.3-GGUF/vae/ltx-2.3-22b-dev_video_vae.safetensors
30861+
- filename: ltx-2.3-22b-dev_audio_vae.safetensors
30862+
sha256: d7711812d9387ce940c2cd5d65a4f5a1e57bf6087cf618d89b56dd3c722c4dea
30863+
uri: huggingface://unsloth/LTX-2.3-GGUF/vae/ltx-2.3-22b-dev_audio_vae.safetensors
30864+
- filename: ltx-2.3-22b-dev_embeddings_connectors.safetensors
30865+
sha256: a5c5148788d8d9d5d1e650e4cbf3502a46a2f7f975ce70c59082732c8905a8ae
30866+
uri: huggingface://unsloth/LTX-2.3-GGUF/text_encoders/ltx-2.3-22b-dev_embeddings_connectors.safetensors
30867+
- !!merge <<: *ltx-2-3-dev-ggml
30868+
name: ltx-2.3-22b-dev-ggml-q4_k_m
30869+
description: |
30870+
LTX-2.3 22B dev - non-dynamic Q4_K_M quantization (~14.3 GB). Same
30871+
pipeline as ltx-2.3-22b-dev-ggml but with the plain Q4_K_M weights
30872+
instead of the dynamic UD-Q4_K_M variant. Slightly smaller and slightly
30873+
lower quality.
30874+
overrides:
30875+
parameters:
30876+
model: ltx-2.3-22b-dev-Q4_K_M.gguf
30877+
options:
30878+
- llm_path:gemma-3-12b-it-qat-UD-Q4_K_XL.gguf
30879+
- vae_path:ltx-2.3-22b-dev_video_vae.safetensors
30880+
- audio_vae_path:ltx-2.3-22b-dev_audio_vae.safetensors
30881+
- embeddings_connectors_path:ltx-2.3-22b-dev_embeddings_connectors.safetensors
30882+
files:
30883+
- filename: ltx-2.3-22b-dev-Q4_K_M.gguf
30884+
sha256: e053e3d7827f3a69ecd00e55395d3a8f8616ab10d3a394e8d2b65ae204d490e0
30885+
uri: huggingface://unsloth/LTX-2.3-GGUF/ltx-2.3-22b-dev-Q4_K_M.gguf
30886+
- filename: gemma-3-12b-it-qat-UD-Q4_K_XL.gguf
30887+
sha256: da98f81c86916ed1c76b3eeda56b25cb7b8352b01093e2edb8028110fe2cb53b
30888+
uri: huggingface://unsloth/gemma-3-12b-it-qat-GGUF/gemma-3-12b-it-qat-UD-Q4_K_XL.gguf
30889+
- filename: ltx-2.3-22b-dev_video_vae.safetensors
30890+
sha256: 8732bb70cf4343541815f45c9f90f5ff0519d679bd63483afc27bf79a08d3f4e
30891+
uri: huggingface://unsloth/LTX-2.3-GGUF/vae/ltx-2.3-22b-dev_video_vae.safetensors
30892+
- filename: ltx-2.3-22b-dev_audio_vae.safetensors
30893+
sha256: d7711812d9387ce940c2cd5d65a4f5a1e57bf6087cf618d89b56dd3c722c4dea
30894+
uri: huggingface://unsloth/LTX-2.3-GGUF/vae/ltx-2.3-22b-dev_audio_vae.safetensors
30895+
- filename: ltx-2.3-22b-dev_embeddings_connectors.safetensors
30896+
sha256: a5c5148788d8d9d5d1e650e4cbf3502a46a2f7f975ce70c59082732c8905a8ae
30897+
uri: huggingface://unsloth/LTX-2.3-GGUF/text_encoders/ltx-2.3-22b-dev_embeddings_connectors.safetensors
30898+
- !!merge <<: *ltx-2-3-dev-ggml
30899+
name: ltx-2.3-22b-dev-ggml-q8_0
30900+
description: |
30901+
LTX-2.3 22B dev - Q8_0 quantization (~22.8 GB). Highest-quality
30902+
quantized dev variant on the cpp backend; needs roughly twice the
30903+
VRAM/RAM of the Q4 entries but produces noticeably cleaner audio
30904+
and motion. Paired with the QAT Gemma-3 12B encoder.
30905+
overrides:
30906+
parameters:
30907+
model: ltx-2.3-22b-dev-Q8_0.gguf
30908+
options:
30909+
- llm_path:gemma-3-12b-it-qat-UD-Q4_K_XL.gguf
30910+
- vae_path:ltx-2.3-22b-dev_video_vae.safetensors
30911+
- audio_vae_path:ltx-2.3-22b-dev_audio_vae.safetensors
30912+
- embeddings_connectors_path:ltx-2.3-22b-dev_embeddings_connectors.safetensors
30913+
files:
30914+
- filename: ltx-2.3-22b-dev-Q8_0.gguf
30915+
sha256: c4e78967e6c6824864e81e8a9ac182dcd5d06cccfea937347484f4258ab6145c
30916+
uri: huggingface://unsloth/LTX-2.3-GGUF/ltx-2.3-22b-dev-Q8_0.gguf
30917+
- filename: gemma-3-12b-it-qat-UD-Q4_K_XL.gguf
30918+
sha256: da98f81c86916ed1c76b3eeda56b25cb7b8352b01093e2edb8028110fe2cb53b
30919+
uri: huggingface://unsloth/gemma-3-12b-it-qat-GGUF/gemma-3-12b-it-qat-UD-Q4_K_XL.gguf
30920+
- filename: ltx-2.3-22b-dev_video_vae.safetensors
30921+
sha256: 8732bb70cf4343541815f45c9f90f5ff0519d679bd63483afc27bf79a08d3f4e
30922+
uri: huggingface://unsloth/LTX-2.3-GGUF/vae/ltx-2.3-22b-dev_video_vae.safetensors
30923+
- filename: ltx-2.3-22b-dev_audio_vae.safetensors
30924+
sha256: d7711812d9387ce940c2cd5d65a4f5a1e57bf6087cf618d89b56dd3c722c4dea
30925+
uri: huggingface://unsloth/LTX-2.3-GGUF/vae/ltx-2.3-22b-dev_audio_vae.safetensors
30926+
- filename: ltx-2.3-22b-dev_embeddings_connectors.safetensors
30927+
sha256: a5c5148788d8d9d5d1e650e4cbf3502a46a2f7f975ce70c59082732c8905a8ae
30928+
uri: huggingface://unsloth/LTX-2.3-GGUF/text_encoders/ltx-2.3-22b-dev_embeddings_connectors.safetensors
30929+
- &ltx-2-3-distilled-ggml
30930+
name: ltx-2.3-22b-distilled-ggml
30931+
url: github:mudler/LocalAI/gallery/ltx-ggml.yaml@master
30932+
urls:
30933+
- https://huggingface.co/Lightricks/LTX-2.3
30934+
- https://huggingface.co/unsloth/LTX-2.3-GGUF
30935+
- https://huggingface.co/unsloth/gemma-3-12b-it-qat-GGUF
30936+
description: |
30937+
LTX-2.3 22B distilled - faster student of the dev model, GGUF-quantized
30938+
for the stable-diffusion.cpp backend. Trades a small amount of quality
30939+
for substantially fewer sampling steps, making it the right pick for
30940+
iterative previews and CPU-offloaded inference. Same input modalities
30941+
as the dev entry (T2V / I2V / FLF2V) and the same gemma-3-12b-it text
30942+
encoder.
30943+
30944+
This entry uses the dynamic (UD) Q4_K_M quantization of the 22B
30945+
distilled model (~16.3 GB). Recommended generation: width=1280,
30946+
height=720, video_frames=33, fps=24, sampler=euler, cfg_scale=6.0.
30947+
license: ltx-2-community-license-agreement
30948+
icon: https://cdn-avatars.huggingface.co/v1/production/uploads/1652783139615-628375426db5127097cf5442.png
30949+
tags:
30950+
- ltx
30951+
- ltx-2
30952+
- distilled
30953+
- text-to-video
30954+
- image-to-video
30955+
- first-last-frame-to-video
30956+
- audio-video
30957+
- video-generation
30958+
- diffusion
30959+
- gguf
30960+
- quantized
30961+
- 22b
30962+
- cpu
30963+
- gpu
30964+
overrides:
30965+
parameters:
30966+
model: ltx-2.3-22b-distilled-UD-Q4_K_M.gguf
30967+
options:
30968+
- llm_path:gemma-3-12b-it-qat-UD-Q4_K_XL.gguf
30969+
- vae_path:ltx-2.3-22b-distilled_video_vae.safetensors
30970+
- audio_vae_path:ltx-2.3-22b-distilled_audio_vae.safetensors
30971+
- embeddings_connectors_path:ltx-2.3-22b-distilled_embeddings_connectors.safetensors
30972+
files:
30973+
- filename: ltx-2.3-22b-distilled-UD-Q4_K_M.gguf
30974+
sha256: 451ef931569f084c69743d1917096b149eb489517ec0e1de76eaadeb4dbbc9bf
30975+
uri: huggingface://unsloth/LTX-2.3-GGUF/distilled/ltx-2.3-22b-distilled-UD-Q4_K_M.gguf
30976+
- filename: gemma-3-12b-it-qat-UD-Q4_K_XL.gguf
30977+
sha256: da98f81c86916ed1c76b3eeda56b25cb7b8352b01093e2edb8028110fe2cb53b
30978+
uri: huggingface://unsloth/gemma-3-12b-it-qat-GGUF/gemma-3-12b-it-qat-UD-Q4_K_XL.gguf
30979+
- filename: ltx-2.3-22b-distilled_video_vae.safetensors
30980+
sha256: e68d6d8f8a42942ac9b862cc315beb3bc30805a8876c7ad63ba5bf7a2b8e168a
30981+
uri: huggingface://unsloth/LTX-2.3-GGUF/vae/ltx-2.3-22b-distilled_video_vae.safetensors
30982+
- filename: ltx-2.3-22b-distilled_audio_vae.safetensors
30983+
sha256: 3cd6a6eb8cb28f5ecc12f1f3126952b2a3d2b0b42ad3270e63cefafafe0d9b57
30984+
uri: huggingface://unsloth/LTX-2.3-GGUF/vae/ltx-2.3-22b-distilled_audio_vae.safetensors
30985+
- filename: ltx-2.3-22b-distilled_embeddings_connectors.safetensors
30986+
sha256: c61cbb396e2a8175d8b2da51f0fdac885a4ccd22c9f64dafa5aa2c455dc8a507
30987+
uri: huggingface://unsloth/LTX-2.3-GGUF/text_encoders/ltx-2.3-22b-distilled_embeddings_connectors.safetensors
30988+
- !!merge <<: *ltx-2-3-distilled-ggml
30989+
name: ltx-2.3-22b-distilled-ggml-q4_k_m
30990+
description: |
30991+
LTX-2.3 22B distilled - non-dynamic Q4_K_M quantization (~14.3 GB).
30992+
Same pipeline as ltx-2.3-22b-distilled-ggml but with the plain Q4_K_M
30993+
weights instead of the dynamic UD-Q4_K_M variant.
30994+
overrides:
30995+
parameters:
30996+
model: ltx-2.3-22b-distilled-Q4_K_M.gguf
30997+
options:
30998+
- llm_path:gemma-3-12b-it-qat-UD-Q4_K_XL.gguf
30999+
- vae_path:ltx-2.3-22b-distilled_video_vae.safetensors
31000+
- audio_vae_path:ltx-2.3-22b-distilled_audio_vae.safetensors
31001+
- embeddings_connectors_path:ltx-2.3-22b-distilled_embeddings_connectors.safetensors
31002+
files:
31003+
- filename: ltx-2.3-22b-distilled-Q4_K_M.gguf
31004+
sha256: 4e4459bee04199bf93187ba385729f6b7d8e874d754b72d26e751fe2066f4358
31005+
uri: huggingface://unsloth/LTX-2.3-GGUF/distilled/ltx-2.3-22b-distilled-Q4_K_M.gguf
31006+
- filename: gemma-3-12b-it-qat-UD-Q4_K_XL.gguf
31007+
sha256: da98f81c86916ed1c76b3eeda56b25cb7b8352b01093e2edb8028110fe2cb53b
31008+
uri: huggingface://unsloth/gemma-3-12b-it-qat-GGUF/gemma-3-12b-it-qat-UD-Q4_K_XL.gguf
31009+
- filename: ltx-2.3-22b-distilled_video_vae.safetensors
31010+
sha256: e68d6d8f8a42942ac9b862cc315beb3bc30805a8876c7ad63ba5bf7a2b8e168a
31011+
uri: huggingface://unsloth/LTX-2.3-GGUF/vae/ltx-2.3-22b-distilled_video_vae.safetensors
31012+
- filename: ltx-2.3-22b-distilled_audio_vae.safetensors
31013+
sha256: 3cd6a6eb8cb28f5ecc12f1f3126952b2a3d2b0b42ad3270e63cefafafe0d9b57
31014+
uri: huggingface://unsloth/LTX-2.3-GGUF/vae/ltx-2.3-22b-distilled_audio_vae.safetensors
31015+
- filename: ltx-2.3-22b-distilled_embeddings_connectors.safetensors
31016+
sha256: c61cbb396e2a8175d8b2da51f0fdac885a4ccd22c9f64dafa5aa2c455dc8a507
31017+
uri: huggingface://unsloth/LTX-2.3-GGUF/text_encoders/ltx-2.3-22b-distilled_embeddings_connectors.safetensors
31018+
- !!merge <<: *ltx-2-3-distilled-ggml
31019+
name: ltx-2.3-22b-distilled-ggml-q8_0
31020+
description: |
31021+
LTX-2.3 22B distilled - Q8_0 quantization (~22.8 GB). Highest-quality
31022+
distilled variant on the cpp backend; useful when you want the
31023+
distilled sampling cost but the cleanest possible output.
31024+
overrides:
31025+
parameters:
31026+
model: ltx-2.3-22b-distilled-Q8_0.gguf
31027+
options:
31028+
- llm_path:gemma-3-12b-it-qat-UD-Q4_K_XL.gguf
31029+
- vae_path:ltx-2.3-22b-distilled_video_vae.safetensors
31030+
- audio_vae_path:ltx-2.3-22b-distilled_audio_vae.safetensors
31031+
- embeddings_connectors_path:ltx-2.3-22b-distilled_embeddings_connectors.safetensors
31032+
files:
31033+
- filename: ltx-2.3-22b-distilled-Q8_0.gguf
31034+
sha256: ed3be27373771404ed59239e8c2686fb6f8d3cd6a1db7f257d811c8d1a381ef8
31035+
uri: huggingface://unsloth/LTX-2.3-GGUF/distilled/ltx-2.3-22b-distilled-Q8_0.gguf
31036+
- filename: gemma-3-12b-it-qat-UD-Q4_K_XL.gguf
31037+
sha256: da98f81c86916ed1c76b3eeda56b25cb7b8352b01093e2edb8028110fe2cb53b
31038+
uri: huggingface://unsloth/gemma-3-12b-it-qat-GGUF/gemma-3-12b-it-qat-UD-Q4_K_XL.gguf
31039+
- filename: ltx-2.3-22b-distilled_video_vae.safetensors
31040+
sha256: e68d6d8f8a42942ac9b862cc315beb3bc30805a8876c7ad63ba5bf7a2b8e168a
31041+
uri: huggingface://unsloth/LTX-2.3-GGUF/vae/ltx-2.3-22b-distilled_video_vae.safetensors
31042+
- filename: ltx-2.3-22b-distilled_audio_vae.safetensors
31043+
sha256: 3cd6a6eb8cb28f5ecc12f1f3126952b2a3d2b0b42ad3270e63cefafafe0d9b57
31044+
uri: huggingface://unsloth/LTX-2.3-GGUF/vae/ltx-2.3-22b-distilled_audio_vae.safetensors
31045+
- filename: ltx-2.3-22b-distilled_embeddings_connectors.safetensors
31046+
sha256: c61cbb396e2a8175d8b2da51f0fdac885a4ccd22c9f64dafa5aa2c455dc8a507
31047+
uri: huggingface://unsloth/LTX-2.3-GGUF/text_encoders/ltx-2.3-22b-distilled_embeddings_connectors.safetensors
3080831048
- name: deepseek-v4-flash-q2
3080931049
description: |
3081031050
DeepSeek V4 Flash (IQ2XXS GGUF, ~81 GB) - only loadable via the ds4 backend.

gallery/ltx-ggml.yaml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
---
2+
name: "ltx-ggml"
3+
4+
config_file: |
5+
backend: stablediffusion-ggml
6+
step: 30
7+
cfg_scale: 6.0
8+
known_usecases:
9+
- video
10+
options:
11+
- "diffusion_model"
12+
- "sampler:euler"
13+
- "vae_decode_only:false"
14+
- "diffusion_flash_attn:true"
15+
- "offload_params_to_cpu:true"

0 commit comments

Comments
 (0)