Skip to content

Commit b51a749

Browse files
authored
Merge branch 'main' into fix-controlnet-tests
2 parents 300223f + 037efda commit b51a749

38 files changed

Lines changed: 9053 additions & 173 deletions

.ai/skills/model-integration/SKILL.md

Lines changed: 30 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -73,10 +73,37 @@ See [../../models.md](../../models.md) for the attention pattern, implementation
7373

7474
**Don't combine structural changes with behavioral changes.** Restructuring code to fit diffusers APIs (ModelMixin, ConfigMixin, etc.) is unavoidable. But don't also "improve" the algorithm, refactor computation order, or rename internal variables for aesthetics. Keep numerical logic as close to the reference as possible, even if it looks unclean. For standard → modular, this is stricter: copy loop logic verbatim and only restructure into blocks. Clean up in a separate commit after parity is confirmed.
7575

76-
### Test setup
76+
### Testing
7777

78-
- Slow tests gated with `@slow` and `RUN_SLOW=1`
79-
- All model-level tests must use the `BaseModelTesterConfig`, `ModelTesterMixin`, `MemoryTesterMixin`, `AttentionTesterMixin`, `LoraTesterMixin`, and `TrainingTesterMixin` classes initially to write the tests. Any additional tests should be added after discussions with the maintainers. Use `tests/models/transformers/test_models_transformer_flux.py` as a reference.
78+
Two test layers must be added for any new pipeline: pipeline-level tests, and (if a new model is introduced) model-level tests. Integration/slow tests and LoRA tests are **not** added in the initial PR — they come later, after discussion with maintainers.
79+
80+
**General rules (apply to both layers):**
81+
- Keep component sizes tiny so the suite runs fast — small `num_layers`, small hidden/attention dims, low resolution, few frames. Reference `tests/pipelines/wan/test_wan.py` (`get_dummy_components` and `get_dummy_inputs`) for the size scale to target.
82+
- No LoRA tests in the initial PR (no `LoraTesterMixin`, no `tests/lora/test_lora_layers_<model>.py`).
83+
- No integration / slow tests in the initial PR — don't add anything gated on `@slow` / `RUN_SLOW=1` yet.
84+
85+
#### Pipeline-level tests
86+
87+
- Location: `tests/pipelines/<model>/test_<model>.py` (one file per pipeline variant, e.g. T2V, I2V).
88+
- Subclass both `PipelineTesterMixin` (from `..test_pipelines_common`) and `unittest.TestCase`.
89+
- Set `pipeline_class`, `params`, `batch_params`, `image_params` from `..pipeline_params`, and any `required_optional_params` / capability flags (`test_xformers_attention`, `supports_dduf`, etc.) that apply.
90+
- Implement `get_dummy_components()` (build all sub-modules with tiny configs and a fixed `torch.manual_seed(0)` before each) and `get_dummy_inputs(device, seed=0)`.
91+
- Skip any inherited tests that don't apply with `@unittest.skip("Test not supported")` rather than deleting them.
92+
- Reference: `tests/pipelines/wan/test_wan.py`.
93+
94+
#### Model-level tests
95+
96+
Only required if the pipeline introduces a new model class (transformer, VAE, etc.). Don't write these by hand — generate them (example command below):
97+
98+
```bash
99+
python utils/generate_model_tests.py src/diffusers/models/transformers/transformer_<model>.py
100+
```
101+
102+
- Run with **no `--include` flags** initially. The generator auto-detects mixins/attributes and emits the always-on testers (`ModelTesterMixin`, `MemoryTesterMixin`, `TorchCompileTesterMixin`, plus `AttentionTesterMixin` / `ContextParallelTesterMixin` / `TrainingTesterMixin` as applicable). Optional testers (quantization, caching, single-file, IP adapter, etc.) are added later, after maintainer discussion.
103+
- The generator writes to `tests/models/transformers/test_models_transformer_<model>.py` (or the matching `unets/` / `autoencoders/` subdir).
104+
- Fill in the `TODO`s in the generated `<Model>TesterConfig`: `pretrained_model_name_or_path`, `get_init_dict()` (tiny config), `get_dummy_inputs()`, `input_shape`, `output_shape`. Keep init dims small for speed.
105+
- Do **not** add `LoraTesterMixin` at the start, even if the model subclasses `PeftAdapterMixin` — strip it from the generated file for the initial PR.
106+
- Reference: `tests/models/transformers/test_models_transformer_flux.py`.
80107

81108
---
82109

.ai/skills/parity-testing/SKILL.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@ description: >
77
visual artifacts — as these are usually parity bugs.
88
---
99

10+
> **Note**: Parity testing is **separate from** the unit-level tests that ship in `tests/`. If you are integrating a new model, the model-level test suite under `tests/models/` is still required — follow the **"#### Model-level tests"** section in [`../model-integration/SKILL.md`](../model-integration/SKILL.md) (generate via `utils/generate_model_tests.py`, no `--include` flags initially, no `LoraTesterMixin`). Parity tests verify numerical correctness during development; the generated test suite is what CI runs.
11+
1012
## Setup — gather before starting
1113

1214
Before writing any test code, gather:

docs/source/en/_toctree.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -388,6 +388,8 @@
388388
title: LuminaNextDiT2DModel
389389
- local: api/models/mochi_transformer3d
390390
title: MochiTransformer3DModel
391+
- local: api/models/motif_video_transformer_3d
392+
title: MotifVideoTransformer3DModel
391393
- local: api/models/omnigen_transformer
392394
title: OmniGenTransformer2DModel
393395
- local: api/models/ovisimage_transformer2d
@@ -684,6 +686,8 @@
684686
title: LTXVideo
685687
- local: api/pipelines/mochi
686688
title: Mochi
689+
- local: api/pipelines/motif_video
690+
title: Motif-Video
687691
- local: api/pipelines/skyreels_v2
688692
title: SkyReels-V2
689693
- local: api/pipelines/stable_diffusion/svd
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
<!-- Copyright 2026 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# MotifVideoTransformer3DModel
13+
14+
A Diffusion Transformer model for 3D video-like data was introduced in Motif-Video by the Motif Technologies Team.
15+
16+
The model uses a three-stage architecture with 12 dual-stream + 16 single-stream + 8 DDT decoder layers and rotary positional embeddings (RoPE) for video generation.
17+
18+
The model can be loaded with the following code snippet.
19+
20+
```python
21+
from diffusers import MotifVideoTransformer3DModel
22+
23+
transformer = MotifVideoTransformer3DModel.from_pretrained("Motif-Technologies/Motif-Video-2B", subfolder="transformer", torch_dtype=torch.bfloat16)
24+
```
25+
26+
## MotifVideoTransformer3DModel
27+
28+
[[autodoc]] MotifVideoTransformer3DModel
29+
30+
## Transformer2DModelOutput
31+
32+
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
<!-- Copyright 2026 The HuggingFace Team. All rights reserved. -->
2+
3+
# Motif-Video
4+
5+
[Technical Report](https://arxiv.org/abs/2604.16503)
6+
7+
Motif-Video is a 2B parameter diffusion transformer designed for text-to-video and image-to-video generation. It features a three-stage architecture with 12 dual-stream + 16 single-stream + 8 DDT decoder layers, Shared Cross-Attention for stable text-video alignment under long video sequences, T5Gemma2 text encoder, and rectified flow matching for velocity prediction.
8+
9+
<p align="center">
10+
<img src="https://huggingface.co/Motif-Technologies/Motif-Video-2B/resolve/main/assets/architecture.png" width="90%" alt="Motif-Video architecture"/>
11+
</p>
12+
13+
## Text-to-Video Generation
14+
15+
Use `MotifVideoPipeline` for text-to-video generation:
16+
17+
```python
18+
import torch
19+
from diffusers import MotifVideoPipeline
20+
from diffusers.utils import export_to_video
21+
22+
23+
pipe = MotifVideoPipeline.from_pretrained(
24+
"Motif-Technologies/Motif-Video-2B",
25+
torch_dtype=torch.bfloat16,
26+
)
27+
pipe.to("cuda")
28+
29+
prompt = "A woman with long brown hair and light skin smiles at another woman with long blonde hair."
30+
negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"
31+
32+
video = pipe(
33+
prompt=prompt,
34+
negative_prompt=negative_prompt,
35+
width=1280,
36+
height=736,
37+
num_frames=121,
38+
num_inference_steps=50,
39+
).frames[0]
40+
export_to_video(video, "output.mp4", fps=24)
41+
```
42+
43+
## Image-to-Video Generation
44+
45+
Use `MotifVideoImage2VideoPipeline` for image-to-video generation:
46+
47+
```python
48+
import torch
49+
from diffusers import MotifVideoImage2VideoPipeline
50+
from diffusers.utils import export_to_video, load_image
51+
52+
53+
pipe = MotifVideoImage2VideoPipeline.from_pretrained(
54+
"Motif-Technologies/Motif-Video-2B",
55+
torch_dtype=torch.bfloat16,
56+
)
57+
pipe.to("cuda")
58+
59+
image = load_image("input_image.png")
60+
prompt = "A cinematic scene with vivid colors."
61+
negative_prompt = "worst quality, blurry, jittery, distorted"
62+
63+
video = pipe(
64+
image=image,
65+
prompt=prompt,
66+
negative_prompt=negative_prompt,
67+
width=1280,
68+
height=736,
69+
num_frames=121,
70+
num_inference_steps=50,
71+
).frames[0]
72+
export_to_video(video, "i2v_output.mp4", fps=24)
73+
```
74+
75+
### Memory-efficient Inference
76+
77+
For GPUs with less than 30GB VRAM (e.g., RTX 4090), use model CPU offloading:
78+
79+
```bash
80+
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
81+
```
82+
83+
```python
84+
import torch
85+
from diffusers import MotifVideoPipeline
86+
from diffusers.utils import export_to_video
87+
88+
89+
pipe = MotifVideoPipeline.from_pretrained(
90+
"Motif-Technologies/Motif-Video-2B",
91+
torch_dtype=torch.bfloat16,
92+
)
93+
pipe.enable_model_cpu_offload()
94+
95+
prompt = "A woman with long brown hair and light skin smiles at another woman with long blonde hair."
96+
negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"
97+
98+
video = pipe(
99+
prompt=prompt,
100+
negative_prompt=negative_prompt,
101+
width=1280,
102+
height=736,
103+
num_frames=121,
104+
num_inference_steps=50,
105+
).frames[0]
106+
export_to_video(video, "output.mp4", fps=24)
107+
```
108+
109+
## MotifVideoPipeline
110+
111+
[[autodoc]] MotifVideoPipeline
112+
- all
113+
- __call__
114+
115+
## MotifVideoImage2VideoPipeline
116+
117+
[[autodoc]] MotifVideoImage2VideoPipeline
118+
- all
119+
- __call__
120+
121+
## MotifVideoPipelineOutput
122+
123+
[[autodoc]] pipelines.motif_video.pipeline_output.MotifVideoPipelineOutput

docs/source/en/api/pipelines/overview.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@ The table below lists all the pipelines currently available in 🤗 Diffusers an
5757
| [LLaDA2](llada2) | text2text |
5858
| [Lumina-T2X](lumina) | text2image |
5959
| [Marigold](marigold) | depth-estimation, normals-estimation, intrinsic-decomposition |
60+
| [Motif-Video](motif_video) | text2video, image2video |
6061
| [PAG](pag) | text2image |
6162
| [PixArt-α](pixart) | text2image |
6263
| [PixArt-Σ](pixart_sigma) | text2image |

docs/source/en/conceptual/contribution.md

Lines changed: 24 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -570,11 +570,29 @@ For documentation strings, 🧨 Diffusers follows the [Google style](https://goo
570570

571571
## Coding with AI agents
572572

573-
The repository keeps AI-agent configuration in `.ai/` and exposes local agent files via symlinks.
574-
575-
- **Source of truth** — edit files under `.ai/` (`AGENTS.md` for coding guidelines, `skills/` for on-demand task knowledge)
576-
- **Don't edit** generated root-level `AGENTS.md`, `CLAUDE.md`, or `.agents/skills`/`.claude/skills` — they are symlinks
577-
- Setup commands:
573+
The repository keeps AI-agent configuration in [`.ai/`](https://github.com/huggingface/diffusers/tree/main/.ai) and exposes local agent files via symlinks. If you use a coding agent (Claude Code, OpenAI Codex, etc.) to help with a contribution, point it at this directory — it contains the project conventions and on-demand task knowledge maintainers expect contributors to follow.
574+
575+
- **Read-only for contributors**`.ai/` is maintained by the core maintainers. Please do not edit files under `.ai/` (or the generated root-level `AGENTS.md`, `CLAUDE.md`, `.agents/skills`, `.claude/skills`, which are symlinks) in your PR. If you find something missing or wrong, open an issue or flag it on the PR and a maintainer will update it.
576+
- **Guidelines** (loaded into every agent session):
577+
- [`.ai/AGENTS.md`](https://github.com/huggingface/diffusers/blob/main/.ai/AGENTS.md) — top-level coding guidelines
578+
- [`.ai/models.md`](https://github.com/huggingface/diffusers/blob/main/.ai/models.md) — attention pattern, model implementation rules, common conventions
579+
- [`.ai/pipelines.md`](https://github.com/huggingface/diffusers/blob/main/.ai/pipelines.md) — pipeline conventions
580+
- [`.ai/modular.md`](https://github.com/huggingface/diffusers/blob/main/.ai/modular.md) — modular pipeline conventions and conversion checklist
581+
- [`.ai/review-rules.md`](https://github.com/huggingface/diffusers/blob/main/.ai/review-rules.md) — what reviewers look for
582+
- **Skills** (under [`.ai/skills/`](https://github.com/huggingface/diffusers/tree/main/.ai/skills), loaded on demand for specific tasks):
583+
- `model-integration` — adding a new model or pipeline to diffusers end-to-end (file structure, integration checklist, testing layout, weight conversion)
584+
- `parity-testing` — verifying numerical parity between the diffusers implementation and a reference implementation
585+
- **Setup commands**:
578586
- `make codex` — symlink guidelines + skills for OpenAI Codex
579587
- `make claude` — symlink guidelines + skills for Claude Code
580-
- `make clean-ai` — remove all generated symlinks
588+
- `make clean-ai` — remove all generated symlinks
589+
590+
### AI-assisted and agentic contributions
591+
592+
AI-assisted contributions are welcome, but they must be coordinated, scoped, and verified to keep review load manageable. PRs that do not follow these guidelines may be closed without detailed review.
593+
594+
- **Coordinate before opening a PR.** Find or open an issue, review similar PRs (open and recently closed), and wait for an explicit acknowledgment from a maintainer on that issue before opening a PR. This gives us a chance to discuss scope, avoid duplicate work, and confirm the approach.
595+
- **Fix patterns, not one-offs.** If you spot an recurring issue, search the codebase for similar instances and open a *single* issue with a clear, systematic scope (e.g. "fix mutable defaults across all schedulers") rather than many issues or PRs for individual instances.
596+
- **Include in the PR description:**
597+
- A **coordination link** to the issue or discussion where a maintainer acknowledged the work.
598+
- The **test commands you ran** and their results (paste relevant output, not just "tests pass").

src/diffusers/__init__.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -266,6 +266,7 @@
266266
"LuminaNextDiT2DModel",
267267
"MochiTransformer3DModel",
268268
"ModelMixin",
269+
"MotifVideoTransformer3DModel",
269270
"MotionAdapter",
270271
"MultiAdapter",
271272
"MultiControlNetModel",
@@ -621,7 +622,9 @@
621622
"LongCatImageEditPipeline",
622623
"LongCatImagePipeline",
623624
"LTX2ConditionPipeline",
625+
"LTX2HDRPipeline",
624626
"LTX2ImageToVideoPipeline",
627+
"LTX2InContextPipeline",
625628
"LTX2LatentUpsamplePipeline",
626629
"LTX2Pipeline",
627630
"LTXConditionPipeline",
@@ -638,6 +641,9 @@
638641
"MarigoldIntrinsicsPipeline",
639642
"MarigoldNormalsPipeline",
640643
"MochiPipeline",
644+
"MotifVideoImage2VideoPipeline",
645+
"MotifVideoPipeline",
646+
"MotifVideoPipelineOutput",
641647
"MusicLDMPipeline",
642648
"NucleusMoEImagePipeline",
643649
"OmniGenPipeline",
@@ -1088,6 +1094,7 @@
10881094
LuminaNextDiT2DModel,
10891095
MochiTransformer3DModel,
10901096
ModelMixin,
1097+
MotifVideoTransformer3DModel,
10911098
MotionAdapter,
10921099
MultiAdapter,
10931100
MultiControlNetModel,
@@ -1418,7 +1425,9 @@
14181425
LongCatImageEditPipeline,
14191426
LongCatImagePipeline,
14201427
LTX2ConditionPipeline,
1428+
LTX2HDRPipeline,
14211429
LTX2ImageToVideoPipeline,
1430+
LTX2InContextPipeline,
14221431
LTX2LatentUpsamplePipeline,
14231432
LTX2Pipeline,
14241433
LTXConditionPipeline,
@@ -1435,6 +1444,9 @@
14351444
MarigoldIntrinsicsPipeline,
14361445
MarigoldNormalsPipeline,
14371446
MochiPipeline,
1447+
MotifVideoImage2VideoPipeline,
1448+
MotifVideoPipeline,
1449+
MotifVideoPipelineOutput,
14381450
MusicLDMPipeline,
14391451
NucleusMoEImagePipeline,
14401452
OmniGenPipeline,

src/diffusers/hooks/_helpers.py

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -188,6 +188,10 @@ def _register_transformer_blocks_metadata():
188188
from ..models.transformers.transformer_kandinsky import Kandinsky5TransformerDecoderBlock
189189
from ..models.transformers.transformer_ltx import LTXVideoTransformerBlock
190190
from ..models.transformers.transformer_mochi import MochiTransformerBlock
191+
from ..models.transformers.transformer_motif_video import (
192+
MotifVideoSingleTransformerBlock,
193+
MotifVideoTransformerBlock,
194+
)
191195
from ..models.transformers.transformer_qwenimage import QwenImageTransformerBlock
192196
from ..models.transformers.transformer_wan import WanTransformerBlock
193197
from ..models.transformers.transformer_z_image import ZImageTransformerBlock
@@ -290,6 +294,22 @@ def _register_transformer_blocks_metadata():
290294
),
291295
)
292296

297+
# MotifVideo
298+
TransformerBlockRegistry.register(
299+
model_class=MotifVideoTransformerBlock,
300+
metadata=TransformerBlockMetadata(
301+
return_hidden_states_index=0,
302+
return_encoder_hidden_states_index=1,
303+
),
304+
)
305+
TransformerBlockRegistry.register(
306+
model_class=MotifVideoSingleTransformerBlock,
307+
metadata=TransformerBlockMetadata(
308+
return_hidden_states_index=0,
309+
return_encoder_hidden_states_index=1,
310+
),
311+
)
312+
293313
# Wan
294314
TransformerBlockRegistry.register(
295315
model_class=WanTransformerBlock,

0 commit comments

Comments
 (0)