Skip to content

Commit 67536f9

Browse files
authored
Merge branch 'main' into custom-modular-tests
2 parents 3eb1f0e + 2246d2c commit 67536f9

146 files changed

Lines changed: 8874 additions & 3435 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

docs/source/en/_toctree.yml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -375,6 +375,8 @@
375375
title: MochiTransformer3DModel
376376
- local: api/models/omnigen_transformer
377377
title: OmniGenTransformer2DModel
378+
- local: api/models/ovisimage_transformer2d
379+
title: OvisImageTransformer2DModel
378380
- local: api/models/pixart_transformer2d
379381
title: PixArtTransformer2DModel
380382
- local: api/models/prior_transformer
@@ -399,6 +401,8 @@
399401
title: WanAnimateTransformer3DModel
400402
- local: api/models/wan_transformer_3d
401403
title: WanTransformer3DModel
404+
- local: api/models/z_image_transformer2d
405+
title: ZImageTransformer2DModel
402406
title: Transformers
403407
- sections:
404408
- local: api/models/stable_cascade_unet
@@ -549,6 +553,8 @@
549553
title: Kandinsky 2.2
550554
- local: api/pipelines/kandinsky3
551555
title: Kandinsky 3
556+
- local: api/pipelines/kandinsky5_image
557+
title: Kandinsky 5.0 Image
552558
- local: api/pipelines/kolors
553559
title: Kolors
554560
- local: api/pipelines/latent_consistency_models
@@ -567,6 +573,8 @@
567573
title: MultiDiffusion
568574
- local: api/pipelines/omnigen
569575
title: OmniGen
576+
- local: api/pipelines/ovis_image
577+
title: Ovis-Image
570578
- local: api/pipelines/pag
571579
title: PAG
572580
- local: api/pipelines/paint_by_example
@@ -642,6 +650,8 @@
642650
title: VisualCloze
643651
- local: api/pipelines/wuerstchen
644652
title: Wuerstchen
653+
- local: api/pipelines/z_image
654+
title: Z-Image
645655
title: Image
646656
- sections:
647657
- local: api/pipelines/allegro

docs/source/en/api/cache.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,3 +34,9 @@ Cache methods speedup diffusion transformers by storing and reusing intermediate
3434
[[autodoc]] FirstBlockCacheConfig
3535

3636
[[autodoc]] apply_first_block_cache
37+
38+
### TaylorSeerCacheConfig
39+
40+
[[autodoc]] TaylorSeerCacheConfig
41+
42+
[[autodoc]] apply_taylorseer_cache
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# OvisImageTransformer2DModel
13+
14+
The model can be loaded with the following code snippet.
15+
16+
```python
17+
from diffusers import OvisImageTransformer2DModel
18+
19+
transformer = OvisImageTransformer2DModel.from_pretrained("AIDC-AI/Ovis-Image-7B", subfolder="transformer", torch_dtype=torch.bfloat16)
20+
```
21+
22+
## OvisImageTransformer2DModel
23+
24+
[[autodoc]] OvisImageTransformer2DModel
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# ZImageTransformer2DModel
14+
15+
A Transformer model for image-like data from [Z-Image](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo).
16+
17+
## ZImageTransformer2DModel
18+
19+
[[autodoc]] ZImageTransformer2DModel

docs/source/en/api/pipelines/hunyuan_video15.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -56,8 +56,8 @@ export_to_video(video, "output.mp4", fps=15)
5656

5757
- HunyuanVideo1.5 use attention masks with variable-length sequences. For best performance, we recommend using an attention backend that handles padding efficiently.
5858

59-
- **H100/H800:** `_flash_3_hub` or `_flash_varlen_3`
60-
- **A100/A800/RTX 4090:** `flash_hub` or `flash_varlen`
59+
- **H100/H800:** `_flash_3_hub` or `_flash_3_varlen_hub`
60+
- **A100/A800/RTX 4090:** `flash_hub` or `flash_varlen_hub`
6161
- **Other GPUs:** `sage_hub`
6262

6363
Refer to the [Attention backends](../../optimization/attention_backends) guide for more details about using a different backend.
Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
<!--Copyright 2025 The HuggingFace Team and Kandinsky Lab Team. All rights reserved.
2+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
3+
the License. You may obtain a copy of the License at
4+
http://www.apache.org/licenses/LICENSE-2.0
5+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
6+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
7+
specific language governing permissions and limitations under the License.
8+
-->
9+
10+
# Kandinsky 5.0 Image
11+
12+
[Kandinsky 5.0](https://arxiv.org/abs/2511.14993) is a family of diffusion models for Video & Image generation.
13+
14+
Kandinsky 5.0 Image Lite is a lightweight image generation model (6B parameters).
15+
16+
The model introduces several key innovations:
17+
- **Latent diffusion pipeline** with **Flow Matching** for improved training stability
18+
- **Diffusion Transformer (DiT)** as the main generative backbone with cross-attention to text embeddings
19+
- Dual text encoding using **Qwen2.5-VL** and **CLIP** for comprehensive text understanding
20+
- **Flux VAE** for efficient image encoding and decoding
21+
22+
The original codebase can be found at [kandinskylab/Kandinsky-5](https://github.com/kandinskylab/Kandinsky-5).
23+
24+
> [!TIP]
25+
> Check out the [Kandinsky Lab](https://huggingface.co/kandinskylab) organization on the Hub for the official model checkpoints for text-to-video generation, including pretrained, SFT, no-CFG, and distilled variants.
26+
27+
28+
## Available Models
29+
30+
Kandinsky 5.0 Image Lite:
31+
32+
| model_id | Description | Use Cases |
33+
|------------|-------------|-----------|
34+
| [**kandinskylab/Kandinsky-5.0-T2I-Lite-sft-Diffusers**](https://huggingface.co/kandinskylab/Kandinsky-5.0-T2I-Lite-sft-Diffusers) | 6B image Supervised Fine-Tuned model | Highest generation quality |
35+
| [**kandinskylab/Kandinsky-5.0-I2I-Lite-sft-Diffusers**](https://huggingface.co/kandinskylab/Kandinsky-5.0-I2I-Lite-sft-Diffusers) | 6B image editing Supervised Fine-Tuned model | Highest generation quality |
36+
| [**kandinskylab/Kandinsky-5.0-T2I-Lite-pretrain-Diffusers**](https://huggingface.co/kandinskylab/Kandinsky-5.0-T2I-Lite-pretrain-Diffusers) | 6B image Base pretrained model | Research and fine-tuning |
37+
| [**kandinskylab/Kandinsky-5.0-I2I-Lite-pretrain-Diffusers**](https://huggingface.co/kandinskylab/Kandinsky-5.0-I2I-Lite-pretrain-Diffusers) | 6B image editing Base pretrained model | Research and fine-tuning |
38+
39+
## Usage Examples
40+
41+
### Basic Text-to-Image Generation
42+
43+
```python
44+
import torch
45+
from diffusers import Kandinsky5T2IPipeline
46+
47+
# Load the pipeline
48+
model_id = "kandinskylab/Kandinsky-5.0-T2I-Lite-sft-Diffusers"
49+
pipe = Kandinsky5T2IPipeline.from_pretrained(model_id)
50+
_ = pipe.to(device='cuda',dtype=torch.bfloat16)
51+
52+
# Generate image
53+
prompt = "A fluffy, expressive cat wearing a bright red hat with a soft, slightly textured fabric. The hat should look cozy and well-fitted on the cat’s head. On the front of the hat, add clean, bold white text that reads “SWEET”, clearly visible and neatly centered. Ensure the overall lighting highlights the hat’s color and the cat’s fur details."
54+
55+
output = pipe(
56+
prompt=prompt,
57+
negative_prompt="",
58+
height=1024,
59+
width=1024,
60+
num_inference_steps=50,
61+
guidance_scale=3.5,
62+
).image[0]
63+
```
64+
65+
### Basic Image-to-Image Generation
66+
67+
```python
68+
import torch
69+
from diffusers import Kandinsky5I2IPipeline
70+
from diffusers.utils import load_image
71+
# Load the pipeline
72+
model_id = "kandinskylab/Kandinsky-5.0-I2I-Lite-sft-Diffusers"
73+
pipe = Kandinsky5I2IPipeline.from_pretrained(model_id)
74+
75+
_ = pipe.to(device='cuda',dtype=torch.bfloat16)
76+
pipe.enable_model_cpu_offload() # <--- Enable CPU offloading for single GPU inference
77+
78+
# Edit the input image
79+
image = load_image(
80+
"https://huggingface.co/kandinsky-community/kandinsky-3/resolve/main/assets/title.jpg?download=true"
81+
)
82+
83+
prompt = "Change the background from a winter night scene to a bright summer day. Place the character on a sandy beach with clear blue sky, soft sunlight, and gentle waves in the distance. Replace the winter clothing with a light short-sleeved T-shirt (in soft pastel colors) and casual shorts. Ensure the character’s fur reflects warm daylight instead of cold winter tones. Add small beach details such as seashells, footprints in the sand, and a few scattered beach toys nearby. Keep the oranges in the scene, but place them naturally on the sand."
84+
negative_prompt = ""
85+
86+
output = pipe(
87+
image=image,
88+
prompt=prompt,
89+
negative_prompt=negative_prompt,
90+
guidance_scale=3.5,
91+
).image[0]
92+
```
93+
94+
95+
## Kandinsky5T2IPipeline
96+
97+
[[autodoc]] Kandinsky5T2IPipeline
98+
- all
99+
- __call__
100+
101+
## Kandinsky5I2IPipeline
102+
103+
[[autodoc]] Kandinsky5I2IPipeline
104+
- all
105+
- __call__
106+
107+
108+
## Citation
109+
```bibtex
110+
@misc{kandinsky2025,
111+
author = {Alexander Belykh and Alexander Varlamov and Alexey Letunovskiy and Anastasia Aliaskina and Anastasia Maltseva and Anastasiia Kargapoltseva and Andrey Shutkin and Anna Averchenkova and Anna Dmitrienko and Bulat Akhmatov and Denis Dimitrov and Denis Koposov and Denis Parkhomenko and Dmitrii and Ilya Vasiliev and Ivan Kirillov and Julia Agafonova and Kirill Chernyshev and Kormilitsyn Semen and Lev Novitskiy and Maria Kovaleva and Mikhail Mamaev and Mikhailov and Nikita Kiselev and Nikita Osterov and Nikolai Gerasimenko and Nikolai Vaulin and Olga Kim and Olga Vdovchenko and Polina Gavrilova and Polina Mikhailova and Tatiana Nikulina and Viacheslav Vasilev and Vladimir Arkhipkin and Vladimir Korviakov and Vladimir Polovnikov and Yury Kolabushin},
112+
title = {Kandinsky 5.0: A family of diffusion models for Video & Image generation},
113+
howpublished = {\url{https://github.com/kandinskylab/Kandinsky-5}},
114+
year = 2025
115+
}
116+
```

0 commit comments

Comments
 (0)