Skip to content

Commit f23746b

Browse files
authored
Revert to default prompts for WAN I2V and update README (#323)
1 parent 94204be commit f23746b

3 files changed

Lines changed: 74 additions & 38 deletions

File tree

README.md

Lines changed: 70 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -14,22 +14,23 @@
1414
limitations under the License.
1515
-->
1616

17-
[![Unit Tests](https://github.com/google/maxtext/actions/workflows/UnitTests.yml/badge.svg)](https://github.com/AI-Hypercomputer/maxdiffusion/actions/workflows/UnitTests.yml)
17+
[![Unit Tests](https://github.com/AI-Hypercomputer/maxdiffusion/actions/workflows/UnitTests.yml/badge.svg)](https://github.com/AI-Hypercomputer/maxdiffusion/actions/workflows/UnitTests.yml)
1818

1919
# What's new?
20-
- **`2026/1/15`**: Wan2.1 and Wan2.2 Img2vid generation is now supported
20+
- **`2026/01/29`**: Wan LoRA for inference is now supported
21+
- **`2026/01/15`**: Wan2.1 and Wan2.2 Img2vid generation is now supported
2122
- **`2025/11/11`**: Wan2.2 txt2vid generation is now supported
2223
- **`2025/10/10`**: Wan2.1 txt2vid training and generation is now supported.
2324
- **`2025/10/14`**: NVIDIA DGX Spark Flux support.
24-
- **`2025/8/14`**: LTX-Video img2vid generation is now supported.
25-
- **`2025/7/29`**: LTX-Video text2vid generation is now supported.
25+
- **`2025/08/14`**: LTX-Video img2vid generation is now supported.
26+
- **`2025/07/29`**: LTX-Video text2vid generation is now supported.
2627
- **`2025/04/17`**: Flux Finetuning.
2728
- **`2025/02/12`**: Flux LoRA for inference.
2829
- **`2025/02/08`**: Flux schnell & dev inference.
2930
- **`2024/12/12`**: Load multiple LoRAs for inference.
3031
- **`2024/10/22`**: LoRA support for Hyper SDXL.
31-
- **`2024/8/1`**: Orbax is the new default checkpointer. You can still use `pipeline.save_pretrained` after training to save in diffusers format.
32-
- **`2024/7/20`**: Dreambooth training for Stable Diffusion 1.x,2.x is now supported.
32+
- **`2024/08/01`**: Orbax is the new default checkpointer. You can still use `pipeline.save_pretrained` after training to save in diffusers format.
33+
- **`2024/07/20`**: Dreambooth training for Stable Diffusion 1.x,2.x is now supported.
3334

3435
# Overview
3536

@@ -68,14 +69,15 @@ MaxDiffusion supports
6869
- [SD 1.4](#stable-diffusion-14-training)
6970
- [Dreambooth](#dreambooth)
7071
- [Inference](#inference)
71-
- [Wan2.1](#wan21)
72-
- [Wan2.2](#wan22)
72+
- [Wan](#wan-models)
7373
- [LTX-Video](#ltx-video)
7474
- [Flux](#flux)
7575
- [Fused Attention for GPU](#fused-attention-for-gpu)
7676
- [SDXL](#stable-diffusion-xl)
7777
- [SD 2 base](#stable-diffusion-2-base)
7878
- [SD 2.1](#stable-diffusion-21)
79+
- [Wan LoRA](#wan-lora)
80+
- [Flux LoRA](#flux-lora)
7981
- [Hyper SDXL LoRA](#hyper-sdxl-lora)
8082
- [Load Multiple LoRA](#load-multiple-lora)
8183
- [SDXL Lightning](#sdxl-lightning)
@@ -482,41 +484,48 @@ To generate images, run the following command:
482484

483485
Add conditioning image path as conditioning_media_paths in the form of ["IMAGE_PATH"] along with other generation parameters in the ltx_video.yml file. Then follow same instruction as above.
484486

485-
## Wan2.1
487+
## Wan Models
486488

487489
Although not required, attaching an external disk is recommended as weights take up a lot of disk space. [Follow these instructions if you would like to attach an external disk](https://cloud.google.com/tpu/docs/attach-durable-block-storage).
488490

489-
### Text2Vid
491+
Supports both Text2Vid and Img2Vid pipelines.
490492

491-
```bash
492-
HF_HUB_CACHE=/mnt/disks/external_disk/maxdiffusion_hf_cache/
493-
LIBTPU_INIT_ARGS="--xla_tpu_enable_async_collective_fusion=true --xla_tpu_enable_async_collective_fusion_fuse_all_reduce=true --xla_tpu_enable_async_collective_fusion_multiple_steps=true --xla_tpu_overlap_compute_collective_tc=true --xla_enable_async_all_reduce=true" HF_HUB_ENABLE_HF_TRANSFER=1 python src/maxdiffusion/generate_wan.py src/maxdiffusion/configs/base_wan_14b.yml attention="flash" num_inference_steps=50 num_frames=81 width=1280 height=720 jax_cache_dir=gs://jfacevedo-maxdiffusion/jax_cache/ per_device_batch_size=.125 ici_data_parallelism=2 ici_fsdp_parallelism=2 flow_shift=5.0 enable_profiler=True run_name=wan-inference-testing-720p output_dir=gs:/jfacevedo-maxdiffusion fps=16 flash_min_seq_length=0 flash_block_sizes='{"block_q" : 3024, "block_kv_compute" : 1024, "block_kv" : 2048, "block_q_dkv": 3024, "block_kv_dkv" : 2048, "block_kv_dkv_compute" : 2048, "block_q_dq" : 3024, "block_kv_dq" : 2048 }' seed=118445
494-
```
495-
496-
### Img2Vid
497-
498-
```bash
499-
HF_HUB_CACHE=/mnt/disks/external_disk/maxdiffusion_hf_cache/
500-
LIBTPU_INIT_ARGS="--xla_tpu_enable_async_collective_fusion=true --xla_tpu_enable_async_collective_fusion_fuse_all_reduce=true --xla_tpu_enable_async_collective_fusion_multiple_steps=true --xla_tpu_overlap_compute_collective_tc=true --xla_enable_async_all_reduce=true" HF_HUB_ENABLE_HF_TRANSFER=1 python src/maxdiffusion/generate_wan.py src/maxdiffusion/configs/base_wan_i2v_14b.yml attention="flash" num_inference_steps=30 num_frames=81 width=832 height=480 jax_cache_dir=gs://jfacevedo-maxdiffusion/jax_cache/ per_device_batch_size=.125 ici_data_parallelism=2 ici_fsdp_parallelism=2 flow_shift=3.0 enable_profiler=True run_name=wan-i2v-inference-testing-480p output_dir=gs:/jfacevedo-maxdiffusion fps=16 flash_min_seq_length=0 flash_block_sizes='{"block_q" : 3024, "block_kv_compute" : 1024, "block_kv" : 2048, "block_q_dkv": 3024, "block_kv_dkv" : 2048, "block_kv_dkv_compute" : 2048, "block_q_dq" : 3024, "block_kv_dq" : 2048 }' seed=118445
501-
```
502-
503-
## Wan2.2
504-
505-
Although not required, attaching an external disk is recommended as weights take up a lot of disk space. [Follow these instructions if you would like to attach an external disk](https://cloud.google.com/tpu/docs/attach-durable-block-storage).
506-
507-
### Text2Vid
493+
The following command will run Wan2.1 T2V:
508494

509495
```bash
510-
HF_HUB_CACHE=/mnt/disks/external_disk/maxdiffusion_hf_cache/
511-
LIBTPU_INIT_ARGS="--xla_tpu_enable_async_collective_fusion=true --xla_tpu_enable_async_collective_fusion_fuse_all_reduce=true --xla_tpu_enable_async_collective_fusion_multiple_steps=true --xla_tpu_overlap_compute_collective_tc=true --xla_enable_async_all_reduce=true" HF_HUB_ENABLE_HF_TRANSFER=1 python src/maxdiffusion/generate_wan.py src/maxdiffusion/configs/base_wan_27b.yml attention="flash" num_inference_steps=50 num_frames=81 width=1280 height=720 jax_cache_dir=gs://jfacevedo-maxdiffusion/jax_cache/ per_device_batch_size=.125 ici_data_parallelism=2 ici_fsdp_parallelism=2 flow_shift=5.0 enable_profiler=True run_name=wan-inference-testing-720p output_dir=gs:/jfacevedo-maxdiffusion fps=16 flash_min_seq_length=0 flash_block_sizes='{"block_q" : 3024, "block_kv_compute" : 1024, "block_kv" : 2048, "block_q_dkv": 3024, "block_kv_dkv" : 2048, "block_kv_dkv_compute" : 2048, "block_q_dq" : 3024, "block_kv_dq" : 2048 }' seed=118445
496+
HF_HUB_CACHE=/mnt/disks/external_disk/maxdiffusion_hf_cache/ \
497+
LIBTPU_INIT_ARGS="--xla_tpu_enable_async_collective_fusion=true \
498+
--xla_tpu_enable_async_collective_fusion_fuse_all_reduce=true \
499+
--xla_tpu_enable_async_collective_fusion_multiple_steps=true \
500+
--xla_tpu_overlap_compute_collective_tc=true \
501+
--xla_enable_async_all_reduce=true" \
502+
HF_HUB_ENABLE_HF_TRANSFER=1 \
503+
python src/maxdiffusion/generate_wan.py \
504+
src/maxdiffusion/configs/base_wan_14b.yml \
505+
attention="flash" \
506+
num_inference_steps=50 \
507+
num_frames=81 \
508+
width=1280 \
509+
height=720 \
510+
jax_cache_dir=gs://jfacevedo-maxdiffusion/jax_cache/ \
511+
per_device_batch_size=.125 \
512+
ici_data_parallelism=2 \
513+
ici_context_parallelism=2 \
514+
flow_shift=5.0 \
515+
enable_profiler=True \
516+
run_name=wan-inference-testing-720p \
517+
output_dir=gs:/jfacevedo-maxdiffusion \
518+
fps=16 \
519+
flash_min_seq_length=0 \
520+
flash_block_sizes='{"block_q" : 3024, "block_kv_compute" : 1024, "block_kv" : 2048, "block_q_dkv": 3024, "block_kv_dkv" : 2048, "block_kv_dkv_compute" : 2048, "block_q_dq" : 3024, "block_kv_dq" : 2048 }' \
521+
seed=118445
512522
```
513523

514-
### Img2Vid
524+
To run other Wan model inference pipelines, change the config file in the command above:
515525

516-
```bash
517-
HF_HUB_CACHE=/mnt/disks/external_disk/maxdiffusion_hf_cache/
518-
LIBTPU_INIT_ARGS="--xla_tpu_enable_async_collective_fusion=true --xla_tpu_enable_async_collective_fusion_fuse_all_reduce=true --xla_tpu_enable_async_collective_fusion_multiple_steps=true --xla_tpu_overlap_compute_collective_tc=true --xla_enable_async_all_reduce=true" HF_HUB_ENABLE_HF_TRANSFER=1 python src/maxdiffusion/generate_wan.py src/maxdiffusion/configs/base_wan_i2v_27b.yml attention="flash" num_inference_steps=30 num_frames=81 width=832 height=480 jax_cache_dir=gs://jfacevedo-maxdiffusion/jax_cache/ per_device_batch_size=.125 ici_data_parallelism=2 ici_fsdp_parallelism=2 flow_shift=3.0 enable_profiler=True run_name=wan-i2v-inference-testing-480p output_dir=gs:/jfacevedo-maxdiffusion fps=16 flash_min_seq_length=0 flash_block_sizes='{"block_q" : 3024, "block_kv_compute" : 1024, "block_kv" : 2048, "block_q_dkv": 3024, "block_kv_dkv" : 2048, "block_kv_dkv_compute" : 2048, "block_q_dq" : 3024, "block_kv_dq" : 2048 }' seed=118445
519-
```
526+
* For Wan2.1 I2V, use `base_wan_i2v_14b.yml`.
527+
* For Wan2.2 T2V, use `base_wan_27b.yml`.
528+
* For Wan2.2 I2V, use `base_wan_i2v_27b.yml`.
520529

521530
## Flux
522531

@@ -568,6 +577,33 @@ To generate images, run the following command:
568577
```bash
569578
NVTE_FUSED_ATTN=1 HF_HUB_ENABLE_HF_TRANSFER=1 python src/maxdiffusion/generate_flux.py src/maxdiffusion/configs/base_flux_dev.yml jax_cache_dir=/tmp/cache_dir run_name=flux_test output_dir=/tmp/ prompt='A cute corgi lives in a house made out of sushi, anime' num_inference_steps=28 split_head_dim=True per_device_batch_size=1 attention="cudnn_flash_te" hardware=gpu
570579
```
580+
## Wan LoRA
581+
582+
Disclaimer: not all LoRA formats have been tested. Currently supports ComfyUI and AI Toolkit formats. If there is a specific LoRA that doesn't load, please let us know.
583+
584+
First create a copy of the relevant config file eg: `src/maxdiffusion/configs/base_wan_{*}.yml`. Update the prompt and LoRA details in the config. Make sure to set `enable_lora: True`. Then run the following command:
585+
586+
```bash
587+
HF_HUB_CACHE=/mnt/disks/external_disk/maxdiffusion_hf_cache/ \
588+
LIBTPU_INIT_ARGS="--xla_tpu_enable_async_collective_fusion=true \
589+
--xla_tpu_enable_async_collective_fusion_fuse_all_reduce=true \
590+
--xla_tpu_enable_async_collective_fusion_multiple_steps=true \
591+
--xla_tpu_overlap_compute_collective_tc=true \
592+
--xla_enable_async_all_reduce=true" \
593+
HF_HUB_ENABLE_HF_TRANSFER=1 \
594+
python src/maxdiffusion/generate_wan.py \
595+
src/maxdiffusion/configs/base_wan_i2v_14b.yml \ # --> Change to your copy
596+
jax_cache_dir=gs://jfacevedo-maxdiffusion/jax_cache/ \
597+
per_device_batch_size=.125 \
598+
ici_data_parallelism=2 \
599+
ici_context_parallelism=2 \
600+
run_name=wan-lora-inference-testing-720p \
601+
output_dir=gs:/jfacevedo-maxdiffusion \
602+
seed=118445 \
603+
enable_lora=True \
604+
```
605+
606+
Loading multiple LoRAs is supported as well.
571607
572608
## Flux LoRA
573609

src/maxdiffusion/configs/base_wan_i2v_14b.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -276,8 +276,8 @@ profiler_steps: 10
276276
enable_jax_named_scopes: False
277277

278278
# Generation parameters
279-
prompt: "An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in the background. Appearing behind him is a giant, translucent, pink spiritual manifestation (faxiang) that is synchronized with the man's action and pose." #"An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot."
280-
prompt_2: "An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in the background. Appearing behind him is a giant, translucent, pink spiritual manifestation (faxiang) that is synchronized with the man's action and pose." #"An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot."
279+
prompt: "An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot." #LoRA prompt "An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in the background. Appearing behind him is a giant, translucent, pink spiritual manifestation (faxiang) that is synchronized with the man's action and pose."
280+
prompt_2: "An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot." #LoRA prompt "An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in the background. Appearing behind him is a giant, translucent, pink spiritual manifestation (faxiang) that is synchronized with the man's action and pose."
281281
negative_prompt: "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards"
282282
do_classifier_free_guidance: True
283283
height: 720

src/maxdiffusion/configs/base_wan_i2v_27b.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -277,8 +277,8 @@ profiler_steps: 10
277277
enable_jax_named_scopes: False
278278

279279
# Generation parameters
280-
prompt: "orbit 180 around an astronaut on the moon, the darkness and depth of space realised in the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot."
281-
prompt_2: "orbit 180 around an astronaut on the moon, the darkness and depth of space realised in the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot."
280+
prompt: "An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot." #LoRA prompt "orbit 180 around an astronaut on the moon, the darkness and depth of space realised in the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot."
281+
prompt_2: "An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot." #LoRA prompt "orbit 180 around an astronaut on the moon, the darkness and depth of space realised in the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot."
282282
negative_prompt: "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards"
283283
do_classifier_free_guidance: True
284284
height: 720

0 commit comments

Comments
 (0)