[MAX] Add Wan video generation examples and comparison benchmark#19
[MAX] Add Wan video generation examples and comparison benchmark#19jglee-sqbits wants to merge 1 commit into
Conversation
## Summary
Add a standalone video generation example script for Wan T2V and I2V pipelines.
## Description
- `simple_offline_video_generation.py`: end-to-end script for generating videos from text or image prompts
- Supports all Wan model variants: 2.2-A14B (MoE), 2.1-14B, T2V and I2V
- LoRA turbo support (e.g. Lightning 4-step)
- Built-in profiling with component-level timing breakdown
- Input images can be local files or URLs (downloaded at runtime)
- Outputs MP4 video via `av` (PyAV)
## Validation (H200 140GB, 720p 81 frames)
```bash
# T2V base (Wan2.2-A14B MoE, 720p, 40 steps)
MODULAR_DEVICE_CONTEXT_MEMORY_MANAGER_CHUNK_PERCENT=100 \
./bazelw run //max/examples/diffusion:simple_offline_video_generation -- \
--model Wan-AI/Wan2.2-T2V-A14B-Diffusers \
--prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." \
--negative-prompt "low quality" \
--height 720 --width 1280 --num-frames 81 \
--num-inference-steps 40 --guidance-scale 4.0 \
--guidance-scale-2 3.0 \
--output t2v_base.mp4
# T2V LoRA turbo (4 steps)
MODULAR_DEVICE_CONTEXT_MEMORY_MANAGER_CHUNK_PERCENT=100 \
./bazelw run //max/examples/diffusion:simple_offline_video_generation -- \
--model Wan-AI/Wan2.2-T2V-A14B-Diffusers \
--prompt "A cat playing piano" \
--height 720 --width 1280 --num-frames 81 \
--num-inference-steps 4 --guidance-scale 1.0 \
--lora-repo-id lightx2v/Wan2.2-Lightning \
--lora-subfolder Wan2.2-T2V-A14B-4steps-lora-rank64-Seko-V2.0 \
--output t2v_lora.mp4
# I2V base (720p, 40 steps)
MODULAR_DEVICE_CONTEXT_MEMORY_MANAGER_CHUNK_PERCENT=100 \
./bazelw run //max/examples/diffusion:simple_offline_video_generation -- \
--model Wan-AI/Wan2.2-I2V-A14B-Diffusers \
--prompt "A cat surfing on a wave" \
--negative-prompt "low quality" \
--height 720 --width 1280 --num-frames 81 \
--num-inference-steps 40 --guidance-scale 4.0 \
--guidance-scale-2 3.0 \
--input-image https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B/resolve/main/examples/i2v_input.JPG \
--output i2v_base.mp4
# I2V LoRA turbo (4 steps)
MODULAR_DEVICE_CONTEXT_MEMORY_MANAGER_CHUNK_PERCENT=100 \
./bazelw run //max/examples/diffusion:simple_offline_video_generation -- \
--model Wan-AI/Wan2.2-I2V-A14B-Diffusers \
--prompt "A cat surfing on a wave" \
--height 720 --width 1280 --num-frames 81 \
--num-inference-steps 4 --guidance-scale 1.0 \
--lora-repo-id lightx2v/Wan2.2-Lightning \
--lora-subfolder Wan2.2-I2V-A14B-4steps-lora-rank64-Seko-V1 \
--input-image https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B/resolve/main/examples/i2v_input.JPG \
--output i2v_lora.mp4
```
## Dependencies
Depends on all previous PRs: modular#6298–modular#6303.
## Checklist
- [x] PR is small and focused
- [x] I ran `./bazelw run format` to format my changes
Assisted-by: Claude Code
Assisted-by: Claude Code
stack-info: PR: #19, branch: jglee-sqbits/stack/7
eb2be92 to
d35bbc2
Compare
451c1f7 to
cc6ab75
Compare
There was a problem hiding this comment.
Code Review
This pull request introduces two new Python utilities for Wan video generation: a benchmarking script for model speed metrics and an offline video generation example. The feedback identifies a critical deadlock risk when executing nested Bazel commands, highlights a mismatch in target names within the documentation, suggests a more robust method for resolving the workspace directory, and points out an inefficiency where input images are downloaded multiple times.
| "./bazelw", | ||
| "run", | ||
| bazel_target, |
There was a problem hiding this comment.
| ./bazelw run //max/examples/diffusion:full_metric | ||
|
|
||
| # Specific model only | ||
| MODULAR_DEVICE_CONTEXT_MEMORY_MANAGER_CHUNK_PERCENT=100 \ | ||
| ./bazelw run //max/examples/diffusion:full_metric -- \ | ||
| --model wan2.2-t2v-a14b |
There was a problem hiding this comment.
The Bazel target name in the usage examples (full_metric) does not match the actual target name defined in BUILD.bazel (all_wan_model_speed_metric).
| ./bazelw run //max/examples/diffusion:full_metric | |
| # Specific model only | |
| MODULAR_DEVICE_CONTEXT_MEMORY_MANAGER_CHUNK_PERCENT=100 \ | |
| ./bazelw run //max/examples/diffusion:full_metric -- \ | |
| --model wan2.2-t2v-a14b | |
| ./bazelw run //max/examples/diffusion:all_wan_model_speed_metric | |
| # Specific model only | |
| MODULAR_DEVICE_CONTEXT_MEMORY_MANAGER_CHUNK_PERCENT=100 \ | |
| ./bazelw run //max/examples/diffusion:all_wan_model_speed_metric -- \ | |
| --model wan2.2-t2v-a14b |
| text=True, | ||
| timeout=7200, | ||
| env=env, | ||
| cwd=str(Path(__file__).resolve().parents[3]), |
There was a problem hiding this comment.
Resolving the workspace root using __file__ and parents[3] is brittle and may fail when the script is executed via Bazel (where __file__ points into the runfiles directory). It is safer to use the BUILD_WORKSPACE_DIRECTORY environment variable provided by bazel run.
| cwd=str(Path(__file__).resolve().parents[3]), | |
| cwd=os.environ.get("BUILD_WORKSPACE_DIRECTORY", str(Path(__file__).resolve().parents[3])), |
| if args.input_image.startswith(("http://", "https://")): | ||
| import io | ||
| import urllib.request | ||
|
|
||
| with urllib.request.urlopen(args.input_image) as _resp: | ||
| img = Image.open(io.BytesIO(_resp.read())) |
Stacked PRs:
[MAX] Add Wan video generation examples and comparison benchmark
Summary
Add a standalone video generation example script for Wan T2V and I2V pipelines.
Description
simple_offline_video_generation.py: end-to-end script for generating videos from text or image promptsav(PyAV)Validation (H200 140GB, 720p 81 frames)
Dependencies
Depends on all previous PRs: modular#6298–modular#6303.
Checklist
./bazelw run formatto format my changesAssisted-by: Claude Code
Assisted-by: Claude Code