Skip to content

Commit dfbd485

Browse files
authored
Merge branch 'main' into enable-cp-kernels
2 parents 9bd8361 + 5e48f46 commit dfbd485

94 files changed

Lines changed: 4893 additions & 232 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

docs/source/en/_toctree.yml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -365,6 +365,8 @@
365365
title: HunyuanVideoTransformer3DModel
366366
- local: api/models/latte_transformer3d
367367
title: LatteTransformer3DModel
368+
- local: api/models/longcat_image_transformer2d
369+
title: LongCatImageTransformer2DModel
368370
- local: api/models/ltx_video_transformer3d
369371
title: LTXVideoTransformer3DModel
370372
- local: api/models/lumina2_transformer2d
@@ -402,7 +404,7 @@
402404
- local: api/models/wan_transformer_3d
403405
title: WanTransformer3DModel
404406
- local: api/models/z_image_transformer2d
405-
title: ZImageTransformer2DModel
407+
title: ZImageTransformer2DModel
406408
title: Transformers
407409
- sections:
408410
- local: api/models/stable_cascade_unet
@@ -563,6 +565,8 @@
563565
title: Latent Diffusion
564566
- local: api/pipelines/ledits_pp
565567
title: LEDITS++
568+
- local: api/pipelines/longcat_image
569+
title: LongCat-Image
566570
- local: api/pipelines/lumina2
567571
title: Lumina 2.0
568572
- local: api/pipelines/lumina

docs/source/en/api/models/controlnet.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,21 @@ url = "https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5/blob/m
3333
pipe = StableDiffusionControlNetPipeline.from_single_file(url, controlnet=controlnet)
3434
```
3535

36+
## Loading from Control LoRA
37+
38+
Control-LoRA is introduced by Stability AI in [stabilityai/control-lora](https://huggingface.co/stabilityai/control-lora) by adding low-rank parameter efficient fine tuning to ControlNet. This approach offers a more efficient and compact method to bring model control to a wider variety of consumer GPUs.
39+
40+
```py
41+
from diffusers import ControlNetModel, UNet2DConditionModel
42+
43+
lora_id = "stabilityai/control-lora"
44+
lora_filename = "control-LoRAs-rank128/control-lora-canny-rank128.safetensors"
45+
46+
unet = UNet2DConditionModel.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", subfolder="unet", torch_dtype=torch.bfloat16).to("cuda")
47+
controlnet = ControlNetModel.from_unet(unet).to(device="cuda", dtype=torch.bfloat16)
48+
controlnet.load_lora_adapter(lora_id, weight_name=lora_filename, prefix=None, controlnet_config=controlnet.config)
49+
```
50+
3651
## ControlNetModel
3752

3853
[[autodoc]] ControlNetModel
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# LongCatImageTransformer2DModel
14+
15+
The model can be loaded with the following code snippet.
16+
17+
```python
18+
from diffusers import LongCatImageTransformer2DModel
19+
20+
transformer = LongCatImageTransformer2DModel.from_pretrained("meituan-longcat/LongCat-Image ", subfolder="transformer", torch_dtype=torch.bfloat16)
21+
```
22+
23+
## LongCatImageTransformer2DModel
24+
25+
[[autodoc]] LongCatImageTransformer2DModel
Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# LongCat-Image
14+
15+
<div class="flex flex-wrap space-x-1">
16+
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
17+
</div>
18+
19+
20+
We introduce LongCat-Image, a pioneering open-source and bilingual (Chinese-English) foundation model for image generation, designed to address core challenges in multilingual text rendering, photorealism, deployment efficiency, and developer accessibility prevalent in current leading models.
21+
22+
23+
### Key Features
24+
- 🌟 **Exceptional Efficiency and Performance**: With only **6B parameters**, LongCat-Image surpasses numerous open-source models that are several times larger across multiple benchmarks, demonstrating the immense potential of efficient model design.
25+
- 🌟 **Superior Editing Performance**: LongCat-Image-Edit model achieves state-of-the-art performance among open-source models, delivering leading instruction-following and image quality with superior visual consistency.
26+
- 🌟 **Powerful Chinese Text Rendering**: LongCat-Image demonstrates superior accuracy and stability in rendering common Chinese characters compared to existing SOTA open-source models and achieves industry-leading coverage of the Chinese dictionary.
27+
- 🌟 **Remarkable Photorealism**: Through an innovative data strategy and training framework, LongCat-Image achieves remarkable photorealism in generated images.
28+
- 🌟 **Comprehensive Open-Source Ecosystem**: We provide a complete toolchain, from intermediate checkpoints to full training code, significantly lowering the barrier for further research and development.
29+
30+
For more details, please refer to the comprehensive [***LongCat-Image Technical Report***](https://arxiv.org/abs/2412.11963)
31+
32+
33+
## Usage Example
34+
35+
```py
36+
import torch
37+
import diffusers
38+
from diffusers import LongCatImagePipeline
39+
40+
weight_dtype = torch.bfloat16
41+
pipe = LongCatImagePipeline.from_pretrained("meituan-longcat/LongCat-Image", torch_dtype=torch.bfloat16 )
42+
pipe.to('cuda')
43+
# pipe.enable_model_cpu_offload()
44+
45+
prompt = '一个年轻的亚裔女性,身穿黄色针织衫,搭配白色项链。她的双手放在膝盖上,表情恬静。背景是一堵粗糙的砖墙,午后的阳光温暖地洒在她身上,营造出一种宁静而温馨的氛围。镜头采用中距离视角,突出她的神态和服饰的细节。光线柔和地打在她的脸上,强调她的五官和饰品的质感,增加画面的层次感与亲和力。整个画面构图简洁,砖墙的纹理与阳光的光影效果相得益彰,突显出人物的优雅与从容。'
46+
image = pipe(
47+
prompt,
48+
height=768,
49+
width=1344,
50+
guidance_scale=4.0,
51+
num_inference_steps=50,
52+
num_images_per_prompt=1,
53+
generator=torch.Generator("cpu").manual_seed(43),
54+
enable_cfg_renorm=True,
55+
enable_prompt_rewrite=True,
56+
).images[0]
57+
image.save(f'./longcat_image_t2i_example.png')
58+
```
59+
60+
61+
This pipeline was contributed by LongCat-Image Team. The original codebase can be found [here](https://github.com/meituan-longcat/LongCat-Image).
62+
63+
Available models:
64+
<div style="overflow-x: auto; margin-bottom: 16px;">
65+
<table style="border-collapse: collapse; width: 100%;">
66+
<thead>
67+
<tr>
68+
<th style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de; background-color: #f6f8fa;">Models</th>
69+
<th style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de; background-color: #f6f8fa;">Type</th>
70+
<th style="padding: 8px; border: 1px solid #d0d7de; background-color: #f6f8fa;">Description</th>
71+
<th style="padding: 8px; border: 1px solid #d0d7de; background-color: #f6f8fa;">Download Link</th>
72+
</tr>
73+
</thead>
74+
<tbody>
75+
<tr>
76+
<td style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de;">LongCat&#8209;Image</td>
77+
<td style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de;">Text&#8209;to&#8209;Image</td>
78+
<td style="padding: 8px; border: 1px solid #d0d7de;">Final Release. The standard model for out&#8209;of&#8209;the&#8209;box inference.</td>
79+
<td style="padding: 8px; border: 1px solid #d0d7de;">
80+
<span style="white-space: nowrap;">🤗&nbsp;<a href="https://huggingface.co/meituan-longcat/LongCat-Image">Huggingface</a></span>
81+
</td>
82+
</tr>
83+
<tr>
84+
<td style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de;">LongCat&#8209;Image&#8209;Dev</td>
85+
<td style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de;">Text&#8209;to&#8209;Image</td>
86+
<td style="padding: 8px; border: 1px solid #d0d7de;">Development. Mid-training checkpoint, suitable for fine-tuning.</td>
87+
<td style="padding: 8px; border: 1px solid #d0d7de;">
88+
<span style="white-space: nowrap;">🤗&nbsp;<a href="https://huggingface.co/meituan-longcat/LongCat-Image-Dev">Huggingface</a></span>
89+
</td>
90+
</tr>
91+
<tr>
92+
<td style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de;">LongCat&#8209;Image&#8209;Edit</td>
93+
<td style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de;">Image Editing</td>
94+
<td style="padding: 8px; border: 1px solid #d0d7de;">Specialized model for image editing.</td>
95+
<td style="padding: 8px; border: 1px solid #d0d7de;">
96+
<span style="white-space: nowrap;">🤗&nbsp;<a href="https://huggingface.co/meituan-longcat/LongCat-Image-Edit">Huggingface</a></span>
97+
</td>
98+
</tr>
99+
</tbody>
100+
</table>
101+
</div>
102+
103+
## LongCatImagePipeline
104+
105+
[[autodoc]] LongCatImagePipeline
106+
- all
107+
- __call__
108+
109+
## LongCatImagePipelineOutput
110+
111+
[[autodoc]] pipelines.longcat_image.pipeline_output.LongCatImagePipelineOutput
112+
113+
114+

docs/source/en/training/distributed_inference.md

Lines changed: 68 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -237,6 +237,8 @@ By selectively loading and unloading the models you need at a given stage and sh
237237

238238
Use [`~ModelMixin.set_attention_backend`] to switch to a more optimized attention backend. Refer to this [table](../optimization/attention_backends#available-backends) for a complete list of available backends.
239239

240+
Most attention backends are compatible with context parallelism. Open an [issue](https://github.com/huggingface/diffusers/issues/new) if a backend is not compatible.
241+
240242
### Ring Attention
241243

242244
Key (K) and value (V) representations communicate between devices using [Ring Attention](https://huggingface.co/papers/2310.01889). This ensures each split sees every other token's K/V. Each GPU computes attention for its local K/V and passes it to the next GPU in the ring. No single GPU holds the full sequence, which reduces communication latency.
@@ -245,38 +247,58 @@ Pass a [`ContextParallelConfig`] to the `parallel_config` argument of the transf
245247

246248
```py
247249
import torch
248-
from diffusers import AutoModel, QwenImagePipeline, ContextParallelConfig
249-
250-
try:
251-
torch.distributed.init_process_group("nccl")
252-
rank = torch.distributed.get_rank()
253-
device = torch.device("cuda", rank % torch.cuda.device_count())
250+
from torch import distributed as dist
251+
from diffusers import DiffusionPipeline, ContextParallelConfig
252+
253+
def setup_distributed():
254+
if not dist.is_initialized():
255+
dist.init_process_group(backend="nccl")
256+
rank = dist.get_rank()
257+
device = torch.device(f"cuda:{rank}")
254258
torch.cuda.set_device(device)
255-
256-
transformer = AutoModel.from_pretrained("Qwen/Qwen-Image", subfolder="transformer", torch_dtype=torch.bfloat16, parallel_config=ContextParallelConfig(ring_degree=2))
257-
pipeline = QwenImagePipeline.from_pretrained("Qwen/Qwen-Image", transformer=transformer, torch_dtype=torch.bfloat16, device_map="cuda")
258-
pipeline.transformer.set_attention_backend("flash")
259+
return device
260+
261+
def main():
262+
device = setup_distributed()
263+
world_size = dist.get_world_size()
264+
265+
pipeline = DiffusionPipeline.from_pretrained(
266+
"black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16, device_map=device
267+
)
268+
pipeline.transformer.set_attention_backend("_native_cudnn")
269+
270+
cp_config = ContextParallelConfig(ring_degree=world_size)
271+
pipeline.transformer.enable_parallelism(config=cp_config)
259272

260273
prompt = """
261274
cinematic film still of a cat sipping a margarita in a pool in Palm Springs, California
262275
highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain
263276
"""
264-
277+
265278
# Must specify generator so all ranks start with same latents (or pass your own)
266279
generator = torch.Generator().manual_seed(42)
267-
image = pipeline(prompt, num_inference_steps=50, generator=generator).images[0]
268-
269-
if rank == 0:
270-
image.save("output.png")
271-
272-
except Exception as e:
273-
print(f"An error occurred: {e}")
274-
torch.distributed.breakpoint()
275-
raise
276-
277-
finally:
278-
if torch.distributed.is_initialized():
279-
torch.distributed.destroy_process_group()
280+
image = pipeline(
281+
prompt,
282+
guidance_scale=3.5,
283+
num_inference_steps=50,
284+
generator=generator,
285+
).images[0]
286+
287+
if dist.get_rank() == 0:
288+
image.save(f"output.png")
289+
290+
if dist.is_initialized():
291+
dist.destroy_process_group()
292+
293+
294+
if __name__ == "__main__":
295+
main()
296+
```
297+
298+
The script above needs to be run with a distributed launcher, such as [torchrun](https://docs.pytorch.org/docs/stable/elastic/run.html), that is compatible with PyTorch. `--nproc-per-node` is set to the number of GPUs available.
299+
300+
```shell
301+
torchrun --nproc-per-node 2 above_script.py
280302
```
281303

282304
### Ulysses Attention
@@ -288,5 +310,26 @@ finally:
288310
Pass the [`ContextParallelConfig`] to [`~ModelMixin.enable_parallelism`].
289311

290312
```py
313+
# Depending on the number of GPUs available.
291314
pipeline.transformer.enable_parallelism(config=ContextParallelConfig(ulysses_degree=2))
292-
```
315+
```
316+
317+
### parallel_config
318+
319+
Pass `parallel_config` during model initialization to enable context parallelism.
320+
321+
```py
322+
CKPT_ID = "black-forest-labs/FLUX.1-dev"
323+
324+
cp_config = ContextParallelConfig(ring_degree=2)
325+
transformer = AutoModel.from_pretrained(
326+
CKPT_ID,
327+
subfolder="transformer",
328+
torch_dtype=torch.bfloat16,
329+
parallel_config=cp_config
330+
)
331+
332+
pipeline = DiffusionPipeline.from_pretrained(
333+
CKPT_ID, transformer=transformer, torch_dtype=torch.bfloat16,
334+
).to(device)
335+
```

examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@
9494
import wandb
9595

9696
# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
97-
check_min_version("0.36.0.dev0")
97+
check_min_version("0.37.0.dev0")
9898

9999
logger = get_logger(__name__)
100100

examples/advanced_diffusion_training/train_dreambooth_lora_sd15_advanced.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@
8888

8989

9090
# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
91-
check_min_version("0.36.0.dev0")
91+
check_min_version("0.37.0.dev0")
9292

9393
logger = get_logger(__name__)
9494

examples/advanced_diffusion_training/train_dreambooth_lora_sdxl_advanced.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@
9595
import wandb
9696

9797
# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
98-
check_min_version("0.36.0.dev0")
98+
check_min_version("0.37.0.dev0")
9999

100100
logger = get_logger(__name__)
101101

examples/cogvideo/train_cogvideox_image_to_video_lora.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@
6161
import wandb
6262

6363
# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
64-
check_min_version("0.36.0.dev0")
64+
check_min_version("0.37.0.dev0")
6565

6666
logger = get_logger(__name__)
6767

examples/cogvideo/train_cogvideox_lora.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@
5252
import wandb
5353

5454
# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
55-
check_min_version("0.36.0.dev0")
55+
check_min_version("0.37.0.dev0")
5656

5757
logger = get_logger(__name__)
5858

0 commit comments

Comments
 (0)