[Video Generation] Add VAE encoder support to AutoencoderKLLTXVideo by goyaladitya05 · Pull Request #3829 · openvinotoolkit/openvino.genai

goyaladitya05 · 2026-05-08T16:34:40Z

This PR implements AutoencoderKLLTXVideo::encode(), enabling Image-to-Video workflows where a conditioning frame is encoded into latent space and passed to the diffusion pipeline.

This is phase 1 of Image-to-Video support in LTX Video Generation pipeline.

Changes

VAE encoder now compiles and runs. compile() and reshape() had // TODO: for img2video. Both are now implemented.
encode(video, generator) - takes a [B, C, F, H, W] video tensor, runs the encoder model, and returns a normalized latent ready for the diffusion transformer. Handles both model output variants:
- latent_parameters - samples z from the predicted distribution
- latent_sample - uses the output directly (no sampling needed)
Added 7 tests which cover construction, error messages, output shape, determinism with the same seed, and variation across seeds. Tests that require the encoder skip if vae_encoder is not present in the test model.

Testing

Encode decode roundtrip on an image (512×512, CPU):

Step	Value
Input	[1, 3, 1, 512, 512]
Latent	[1, 128, 1, 16, 16] (32× spatial compression)
Output	[1, 1, 512, 512, 3]

Checklist:

This PR follows GenAI Contributing guidelines.
Tests have been updated or added to cover the new code.
This PR fully addresses the ticket.
I have made corresponding changes to the documentation. No necessary changes.

Copilot

Pull request overview

This PR adds VAE encoder support to the LTX video autoencoder, enabling encoding of a conditioning frame/video into latent space for upcoming Image-to-Video workflows.

Changes:

Implemented encoder compilation/reshape and added AutoencoderKLLTXVideo::encode() with support for latent_parameters sampling and latent_sample passthrough + normalization.
Exposed encode() in Python bindings.
Added Python tests covering encoder construction, error paths, output shape, and seed determinism/variation.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
tests/python_tests/test_video_generation.py	Adds encoder-focused tests (construction, error messages, shape, determinism/seed variation).
src/python/py_video_generation_models.cpp	Exposes `AutoencoderKLLTXVideo.encode()` via pybind11 with GIL release.
src/cpp/src/video_generation/models/autoencoder_kl_ltx_video.cpp	Implements encoder compile/reshape and `encode()` including sampling + latent normalization.
src/cpp/include/openvino/genai/video_generation/autoencoder_kl_ltx_video.hpp	Adds public `encode()` declaration and stores encoder output name in class state.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

goyaladitya05 · 2026-05-14T17:14:50Z

+    // inverse of denormalize_latents used in the decode path
+    const ov::Shape shape = latent.get_shape();
+    OPENVINO_ASSERT(shape.size() == 5, "Encoder output expected to be [B, C, F, H, W]");
+    OPENVINO_ASSERT(latent.get_element_type() == ov::element::f32,
+        "Latent normalization requires f32, got ", latent.get_element_type());
+    const size_t B = shape[0], C = shape[1], spatial = shape[2] * shape[3] * shape[4];


Interesting. This should also affect image generation when models are exported in full fp16 and bf16 using GPU. I'll verify this once.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

goyaladitya05 · 2026-05-14T17:25:03Z

cc @sgonorov @likholat

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

src/cpp/src/video_generation/models/autoencoder_kl_ltx_video.cpp:295

encode() asserts the latent is f32 and normalizes via latent.data() in-place. On GPU (or with inference-precision hints) the encoder output/latent may be f16/bf16, which will currently fail at runtime. To make encode() robust across devices, consider either converting the sampled/copied latent to f32 before normalization, or performing normalization in a dtype-generic way similar to denormalize_latents() in video_generation/ltx_pipeline.hpp.

    OPENVINO_ASSERT(shape.size() == 5, "Encoder output expected to be [B, C, F, H, W]");
    OPENVINO_ASSERT(latent.get_element_type() == ov::element::f32,
        "Latent normalization requires f32, got ", latent.get_element_type());
    const size_t B = shape[0], C = shape[1], spatial = shape[2] * shape[3] * shape[4];

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

src/cpp/src/video_generation/models/autoencoder_kl_ltx_video.cpp:295

encode() asserts that the produced latent must be f32 and then normalizes it via latent.data(). On GPU / fp16 or bf16 exports this will currently fail even though inference succeeds, and it blocks Image-to-Video usage on those configurations. Consider making the normalization path dtype-generic (f16/bf16/f32) or explicitly converting the latent to f32 before normalization, similar to the robustness work tracked in #3865.

    const ov::Shape shape = latent.get_shape();
    OPENVINO_ASSERT(shape.size() == 5, "Encoder output expected to be [B, C, F, H, W]");
    OPENVINO_ASSERT(latent.get_element_type() == ov::element::f32,
        "Latent normalization requires f32, got ", latent.get_element_type());
    const size_t B = shape[0], C = shape[1], spatial = shape[2] * shape[3] * shape[4];

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

src/cpp/src/video_generation/models/autoencoder_kl_ltx_video.cpp:295

encode() hard-requires the latent tensor to be f32 for normalization (latent.get_element_type() == f32). This makes encode() unusable for fp16/bf16 IR exports or GPU inference-precision hints, even though the decode path already supports f16/bf16 (denormalize_latents in ltx_pipeline.hpp is dtype-generic). Consider normalizing in a dtype-generic way (e.g., via OpenVINO ops/broadcast like denormalize_latents does) or explicitly converting the latent to f32 before normalization and documenting the output dtype.

    OPENVINO_ASSERT(shape.size() == 5, "Encoder output expected to be [B, C, F, H, W]");
    OPENVINO_ASSERT(latent.get_element_type() == ov::element::f32,
        "Latent normalization requires f32, got ", latent.get_element_type());
    const size_t B = shape[0], C = shape[1], spatial = shape[2] * shape[3] * shape[4];

sgonorov

Can also refactor tests a little bit - extract more fixtures for less boilerplate, but overall looks good.

Copilot AI review requested due to automatic review settings May 8, 2026 16:34

github-actions Bot added category: Python API Python API for GenAI category: CPP API Changes in GenAI C++ public headers category: GGUF GGUF file reader category: video generation labels May 8, 2026

Copilot started reviewing on behalf of goyaladitya05 May 8, 2026 16:35 View session

Copilot AI reviewed May 8, 2026

View reviewed changes

goyaladitya05 marked this pull request as ready for review May 8, 2026 17:49

Copilot AI review requested due to automatic review settings May 8, 2026 17:49

goyaladitya05 requested review from Wovchena, as-suvorov, likholat and sgonorov as code owners May 8, 2026 17:49

Copilot started reviewing on behalf of goyaladitya05 May 8, 2026 17:50 View session

goyaladitya05 added this to LTX Video Image-to-Video Support May 8, 2026

github-project-automation Bot moved this to Todo in LTX Video Image-to-Video Support May 8, 2026

goyaladitya05 moved this from Todo to In progress in LTX Video Image-to-Video Support May 8, 2026

goyaladitya05 moved this from In progress to Pull Requests in LTX Video Image-to-Video Support May 8, 2026

goyaladitya05 moved this from Pull Requests to In progress in LTX Video Image-to-Video Support May 8, 2026

Copilot AI reviewed May 8, 2026

View reviewed changes

Comment thread src/cpp/src/video_generation/models/autoencoder_kl_ltx_video.cpp

Comment thread src/cpp/src/video_generation/models/autoencoder_kl_ltx_video.cpp Outdated

as-suvorov assigned likholat May 11, 2026

Copilot AI review requested due to automatic review settings May 14, 2026 10:03

Copilot started reviewing on behalf of goyaladitya05 May 14, 2026 10:04 View session

Copilot AI reviewed May 14, 2026

View reviewed changes

goyaladitya05 force-pushed the feature/ltx-vae-encoder-support branch 2 times, most recently from 43ffd94 to 5dab8e8 Compare May 14, 2026 16:05

goyaladitya05 requested a review from Copilot May 14, 2026 16:05

Copilot started reviewing on behalf of goyaladitya05 May 14, 2026 16:06 View session

Copilot AI reviewed May 14, 2026

View reviewed changes

Comment thread src/cpp/src/video_generation/models/autoencoder_kl_ltx_video.cpp Outdated

Comment thread tests/python_tests/test_video_generation.py Outdated

goyaladitya05 mentioned this pull request May 14, 2026

[BUG] Image2Image pipeline may fail on Intel GPU with full f16 or bf16 IR export #3865

Open

goyaladitya05 changed the title ~~[Video Generation] Add VAE encoder support to AutoencoderKLLTXVideo~~ [Video Generation] Add VAE encoder support for Video Generation May 14, 2026

goyaladitya05 changed the title ~~[Video Generation] Add VAE encoder support for Video Generation~~ [Video Generation] Add VAE encoder support to AutoencoderKLLTXVideo May 14, 2026

Copilot AI review requested due to automatic review settings May 15, 2026 19:46

Copilot started reviewing on behalf of goyaladitya05 May 15, 2026 19:47 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

Comment thread src/cpp/src/video_generation/models/autoencoder_kl_ltx_video.cpp

Comment thread tests/python_tests/test_video_generation.py

Comment thread src/python/openvino_genai/py_openvino_genai.pyi Outdated

goyaladitya05 force-pushed the feature/ltx-vae-encoder-support branch from 169aaa9 to 995c396 Compare May 15, 2026 20:32

goyaladitya05 added 3 commits May 16, 2026 02:02

[Video Generation] Add VAE encoder support to AutoencoderKLLTXVideo

1f0d9e9

address review comments

1acac69

address copilot comments

4fcfda6

goyaladitya05 force-pushed the feature/ltx-vae-encoder-support branch from 995c396 to 4fcfda6 Compare May 15, 2026 20:32

goyaladitya05 requested a review from Copilot May 15, 2026 20:32

Copilot started reviewing on behalf of goyaladitya05 May 15, 2026 20:33 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

Comment thread src/cpp/src/video_generation/models/autoencoder_kl_ltx_video.cpp

Comment thread src/cpp/include/openvino/genai/video_generation/autoencoder_kl_ltx_video.hpp Outdated

default generator to nullptr in encode() for latent_sample encoders

3c24055

goyaladitya05 requested a review from Copilot May 18, 2026 10:27

Copilot started reviewing on behalf of goyaladitya05 May 18, 2026 10:27 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

Comment thread src/cpp/src/video_generation/models/autoencoder_kl_ltx_video.cpp

Merge branch 'master' into feature/ltx-vae-encoder-support

6562ee0

sgonorov approved these changes May 20, 2026

View reviewed changes

Conversation

goyaladitya05 commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Testing

Checklist:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

goyaladitya05 May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

goyaladitya05 May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

goyaladitya05 commented May 14, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

sgonorov left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

goyaladitya05 commented May 8, 2026 •

edited

Loading

goyaladitya05 May 14, 2026 •

edited

Loading