Skip to content

[MAX] Add Wan video generation examples and comparison benchmark#19

Draft
jglee-sqbits wants to merge 1 commit into
jglee-sqbits/stack/6from
jglee-sqbits/stack/7
Draft

[MAX] Add Wan video generation examples and comparison benchmark#19
jglee-sqbits wants to merge 1 commit into
jglee-sqbits/stack/6from
jglee-sqbits/stack/7

Conversation

@jglee-sqbits
Copy link
Copy Markdown
Collaborator

@jglee-sqbits jglee-sqbits commented Apr 1, 2026

Stacked PRs:


[MAX] Add Wan video generation examples and comparison benchmark

Summary

Add a standalone video generation example script for Wan T2V and I2V pipelines.

Description

  • simple_offline_video_generation.py: end-to-end script for generating videos from text or image prompts
  • Supports all Wan model variants: 2.2-A14B (MoE), 2.1-14B, T2V and I2V
  • LoRA turbo support (e.g. Lightning 4-step)
  • Built-in profiling with component-level timing breakdown
  • Input images can be local files or URLs (downloaded at runtime)
  • Outputs MP4 video via av (PyAV)

Validation (H200 140GB, 720p 81 frames)

# T2V base (Wan2.2-A14B MoE, 720p, 40 steps)
MODULAR_DEVICE_CONTEXT_MEMORY_MANAGER_CHUNK_PERCENT=100 \
./bazelw run //max/examples/diffusion:simple_offline_video_generation -- \
    --model Wan-AI/Wan2.2-T2V-A14B-Diffusers \
    --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." \
    --negative-prompt "low quality" \
    --height 720 --width 1280 --num-frames 81 \
    --num-inference-steps 40 --guidance-scale 4.0 \
    --guidance-scale-2 3.0 \
    --output t2v_base.mp4

# T2V LoRA turbo (4 steps)
MODULAR_DEVICE_CONTEXT_MEMORY_MANAGER_CHUNK_PERCENT=100 \
./bazelw run //max/examples/diffusion:simple_offline_video_generation -- \
    --model Wan-AI/Wan2.2-T2V-A14B-Diffusers \
    --prompt "A cat playing piano" \
    --height 720 --width 1280 --num-frames 81 \
    --num-inference-steps 4 --guidance-scale 1.0 \
    --lora-repo-id lightx2v/Wan2.2-Lightning \
    --lora-subfolder Wan2.2-T2V-A14B-4steps-lora-rank64-Seko-V2.0 \
    --output t2v_lora.mp4

# I2V base (720p, 40 steps)
MODULAR_DEVICE_CONTEXT_MEMORY_MANAGER_CHUNK_PERCENT=100 \
./bazelw run //max/examples/diffusion:simple_offline_video_generation -- \
    --model Wan-AI/Wan2.2-I2V-A14B-Diffusers \
    --prompt "A cat surfing on a wave" \
    --negative-prompt "low quality" \
    --height 720 --width 1280 --num-frames 81 \
    --num-inference-steps 40 --guidance-scale 4.0 \
    --guidance-scale-2 3.0 \
    --input-image https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B/resolve/main/examples/i2v_input.JPG \
    --output i2v_base.mp4

# I2V LoRA turbo (4 steps)
MODULAR_DEVICE_CONTEXT_MEMORY_MANAGER_CHUNK_PERCENT=100 \
./bazelw run //max/examples/diffusion:simple_offline_video_generation -- \
    --model Wan-AI/Wan2.2-I2V-A14B-Diffusers \
    --prompt "A cat surfing on a wave" \
    --height 720 --width 1280 --num-frames 81 \
    --num-inference-steps 4 --guidance-scale 1.0 \
    --lora-repo-id lightx2v/Wan2.2-Lightning \
    --lora-subfolder Wan2.2-I2V-A14B-4steps-lora-rank64-Seko-V1 \
    --input-image https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B/resolve/main/examples/i2v_input.JPG \
    --output i2v_lora.mp4

Dependencies

Depends on all previous PRs: modular#6298modular#6303.

Checklist

  • PR is small and focused
  • I ran ./bazelw run format to format my changes

Assisted-by: Claude Code

Assisted-by: Claude Code

## Summary

Add a standalone video generation example script for Wan T2V and I2V pipelines.

## Description

- `simple_offline_video_generation.py`: end-to-end script for generating videos from text or image prompts
- Supports all Wan model variants: 2.2-A14B (MoE), 2.1-14B, T2V and I2V
- LoRA turbo support (e.g. Lightning 4-step)
- Built-in profiling with component-level timing breakdown
- Input images can be local files or URLs (downloaded at runtime)
- Outputs MP4 video via `av` (PyAV)

## Validation (H200 140GB, 720p 81 frames)

```bash
# T2V base (Wan2.2-A14B MoE, 720p, 40 steps)
MODULAR_DEVICE_CONTEXT_MEMORY_MANAGER_CHUNK_PERCENT=100 \
./bazelw run //max/examples/diffusion:simple_offline_video_generation -- \
    --model Wan-AI/Wan2.2-T2V-A14B-Diffusers \
    --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." \
    --negative-prompt "low quality" \
    --height 720 --width 1280 --num-frames 81 \
    --num-inference-steps 40 --guidance-scale 4.0 \
    --guidance-scale-2 3.0 \
    --output t2v_base.mp4

# T2V LoRA turbo (4 steps)
MODULAR_DEVICE_CONTEXT_MEMORY_MANAGER_CHUNK_PERCENT=100 \
./bazelw run //max/examples/diffusion:simple_offline_video_generation -- \
    --model Wan-AI/Wan2.2-T2V-A14B-Diffusers \
    --prompt "A cat playing piano" \
    --height 720 --width 1280 --num-frames 81 \
    --num-inference-steps 4 --guidance-scale 1.0 \
    --lora-repo-id lightx2v/Wan2.2-Lightning \
    --lora-subfolder Wan2.2-T2V-A14B-4steps-lora-rank64-Seko-V2.0 \
    --output t2v_lora.mp4

# I2V base (720p, 40 steps)
MODULAR_DEVICE_CONTEXT_MEMORY_MANAGER_CHUNK_PERCENT=100 \
./bazelw run //max/examples/diffusion:simple_offline_video_generation -- \
    --model Wan-AI/Wan2.2-I2V-A14B-Diffusers \
    --prompt "A cat surfing on a wave" \
    --negative-prompt "low quality" \
    --height 720 --width 1280 --num-frames 81 \
    --num-inference-steps 40 --guidance-scale 4.0 \
    --guidance-scale-2 3.0 \
    --input-image https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B/resolve/main/examples/i2v_input.JPG \
    --output i2v_base.mp4

# I2V LoRA turbo (4 steps)
MODULAR_DEVICE_CONTEXT_MEMORY_MANAGER_CHUNK_PERCENT=100 \
./bazelw run //max/examples/diffusion:simple_offline_video_generation -- \
    --model Wan-AI/Wan2.2-I2V-A14B-Diffusers \
    --prompt "A cat surfing on a wave" \
    --height 720 --width 1280 --num-frames 81 \
    --num-inference-steps 4 --guidance-scale 1.0 \
    --lora-repo-id lightx2v/Wan2.2-Lightning \
    --lora-subfolder Wan2.2-I2V-A14B-4steps-lora-rank64-Seko-V1 \
    --input-image https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B/resolve/main/examples/i2v_input.JPG \
    --output i2v_lora.mp4
```

## Dependencies

Depends on all previous PRs: modular#6298modular#6303.

## Checklist

- [x] PR is small and focused
- [x] I ran `./bazelw run format` to format my changes

Assisted-by: Claude Code

Assisted-by: Claude Code

stack-info: PR: #19, branch: jglee-sqbits/stack/7
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces two new Python utilities for Wan video generation: a benchmarking script for model speed metrics and an offline video generation example. The feedback identifies a critical deadlock risk when executing nested Bazel commands, highlights a mismatch in target names within the documentation, suggests a more robust method for resolving the workspace directory, and points out an inefficiency where input images are downloaded multiple times.

Comment on lines +150 to +152
"./bazelw",
"run",
bazel_target,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Running bazel run inside a script that is itself executed via bazel run will cause a deadlock because the first Bazel process holds the workspace lock. Consider running the underlying Python script directly or building the target first and executing the resulting binary.

Comment on lines +16 to +21
./bazelw run //max/examples/diffusion:full_metric

# Specific model only
MODULAR_DEVICE_CONTEXT_MEMORY_MANAGER_CHUNK_PERCENT=100 \
./bazelw run //max/examples/diffusion:full_metric -- \
--model wan2.2-t2v-a14b
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Bazel target name in the usage examples (full_metric) does not match the actual target name defined in BUILD.bazel (all_wan_model_speed_metric).

Suggested change
./bazelw run //max/examples/diffusion:full_metric
# Specific model only
MODULAR_DEVICE_CONTEXT_MEMORY_MANAGER_CHUNK_PERCENT=100 \
./bazelw run //max/examples/diffusion:full_metric -- \
--model wan2.2-t2v-a14b
./bazelw run //max/examples/diffusion:all_wan_model_speed_metric
# Specific model only
MODULAR_DEVICE_CONTEXT_MEMORY_MANAGER_CHUNK_PERCENT=100 \
./bazelw run //max/examples/diffusion:all_wan_model_speed_metric -- \
--model wan2.2-t2v-a14b

text=True,
timeout=7200,
env=env,
cwd=str(Path(__file__).resolve().parents[3]),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Resolving the workspace root using __file__ and parents[3] is brittle and may fail when the script is executed via Bazel (where __file__ points into the runfiles directory). It is safer to use the BUILD_WORKSPACE_DIRECTORY environment variable provided by bazel run.

Suggested change
cwd=str(Path(__file__).resolve().parents[3]),
cwd=os.environ.get("BUILD_WORKSPACE_DIRECTORY", str(Path(__file__).resolve().parents[3])),

Comment on lines +178 to +183
if args.input_image.startswith(("http://", "https://")):
import io
import urllib.request

with urllib.request.urlopen(args.input_image) as _resp:
img = Image.open(io.BytesIO(_resp.read()))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The input image is downloaded here to compute dimensions and then downloaded again in generate_video. This is inefficient. Consider downloading the image once and passing the loaded PIL.Image object or the raw bytes to the rest of the pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant