[MAX] Add Wan I2V diffusion pipeline by jglee-sqbits · Pull Request #18 · SqueezeBits/modular

jglee-sqbits · 2026-04-01T04:19:18Z

Stacked PRs:

[MAX] Add Wan I2V diffusion pipeline

Summary

Add the Wan image-to-video (I2V) diffusion pipeline, extending the T2V pipeline with image conditioning.

Description

Extends WanPipeline (from [MAX] Add Wan T2V diffusion pipeline with MoE support modular/modular#6302) with image conditioning support
Encodes the input image via VAE, zero-pads to full video length, and concatenates with noise latents (36-channel input: 16 noise + 4 mask + 16 condition)
Compiles a GPU graph for the I2V channel concatenation
Supports MoE dual-transformer with per-phase LoRA weight swapping
Input images can be provided as file paths or URLs (downloaded at runtime)
Architecture registration for Wan-AI/Wan2.2-I2V-A14B-Diffusers, Wan-AI/Wan2.1-I2V-14B-720P-Diffusers

Dependencies

Depends on modular#6302 (T2V pipeline) — inherits from WanPipeline.

Checklist

PR is small and focused
I ran ./bazelw run format to format my changes

Assisted-by: Claude Code

## Summary Add the Wan image-to-video (I2V) diffusion pipeline, extending the T2V pipeline with image conditioning. ## Description - Extends `WanPipeline` (from modular#6302) with image conditioning support - Encodes the input image via VAE, zero-pads to full video length, and concatenates with noise latents (36-channel input: 16 noise + 4 mask + 16 condition) - Compiles a GPU graph for the I2V channel concatenation - Supports MoE dual-transformer with per-phase LoRA weight swapping - Input images can be provided as file paths or URLs (downloaded at runtime) - Architecture registration for `Wan-AI/Wan2.2-I2V-A14B-Diffusers`, `Wan-AI/Wan2.1-I2V-14B-720P-Diffusers` ## Dependencies Depends on modular#6302 (T2V pipeline) — inherits from `WanPipeline`. ## Checklist - [x] PR is small and focused - [x] I ran `./bazelw run format` to format my changes Assisted-by: Claude Code Assisted-by: Claude Code stack-info: PR: #18, branch: jglee-sqbits/stack/6

gemini-code-assist

Code Review

This pull request introduces the WanI2VPipeline for image-to-video generation, extending the WanPipeline with image conditioning via VAE encoding and temporal masking. Feedback identifies a potential runtime error from a dtype mismatch in the condition buffer and recommends removing redundant pre-compilation logic in the execute method.

gemini-code-assist · 2026-04-01T04:21:27Z

+            [mask_expanded, latent_cond_np], axis=1
+        ).astype(np.float32)
+
+        return _numpy_f32_to_buffer(condition, self.vae.config.dtype, device)


The i2v_condition buffer should be created using the transformer's working dtype (self.transformer.config.dtype) rather than the VAE's dtype. In AutoencoderKLWanModel, the VAE dtype is typically hardcoded to bfloat16. If the pipeline is configured to run in float32, using self.vae.config.dtype here will lead to a dtype mismatch during the concatenation step in _concat_i2v_condition, causing a runtime error when the graph attempts to concatenate tensors of different types.

Suggested change

return _numpy_f32_to_buffer(condition, self.vae.config.dtype, device)

return _numpy_f32_to_buffer(condition, self.transformer.config.dtype, device)

gemini-code-assist · 2026-04-01T04:21:27Z

+        if self._i2v_concat_model is None:
+            latent_model_input = self._cast_f32_to_model_dtype.execute(latents)[
+                0
+            ]
+            self._i2v_concat_model = self._compile_i2v_concat(
+                latent_model_input, i2v_condition
+            )
+


This pre-compilation block for the I2V concatenation model is redundant. The _concat_i2v_condition method, which is called at every step within the denoising loop, already includes logic to lazily compile _i2v_concat_model upon its first use. Removing this block simplifies the execute method without affecting functionality or performance.

jglee-sqbits force-pushed the jglee-sqbits/stack/5 branch from d96121b to 35ecf0d Compare April 1, 2026 04:19

jglee-sqbits force-pushed the jglee-sqbits/stack/6 branch from 451c1f7 to cc6ab75 Compare April 1, 2026 04:19

This was referenced Apr 1, 2026

[MAX] Add UniPC multistep scheduler for Wan diffusion #13

Draft

[MAX] Add UMT5 text encoder for Wan diffusion #14

Draft

github-actions Bot added the waiting-on-review label Apr 1, 2026

gemini-code-assist Bot reviewed Apr 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MAX] Add Wan I2V diffusion pipeline#18

[MAX] Add Wan I2V diffusion pipeline#18
jglee-sqbits wants to merge 1 commit into
jglee-sqbits/stack/5from
jglee-sqbits/stack/6

jglee-sqbits commented Apr 1, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 1, 2026

Uh oh!

gemini-code-assist Bot Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	return _numpy_f32_to_buffer(condition, self.vae.config.dtype, device)
	return _numpy_f32_to_buffer(condition, self.transformer.config.dtype, device)

Conversation

jglee-sqbits commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!