Skip to content

Commit 6899fe3

Browse files
authored
Merge branch 'main' into custom-modular-tests
2 parents 67536f9 + 5851928 commit 6899fe3

95 files changed

Lines changed: 4619 additions & 323 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

docs/source/en/api/models/controlnet.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,21 @@ url = "https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5/blob/m
3333
pipe = StableDiffusionControlNetPipeline.from_single_file(url, controlnet=controlnet)
3434
```
3535

36+
## Loading from Control LoRA
37+
38+
Control-LoRA is introduced by Stability AI in [stabilityai/control-lora](https://huggingface.co/stabilityai/control-lora) by adding low-rank parameter efficient fine tuning to ControlNet. This approach offers a more efficient and compact method to bring model control to a wider variety of consumer GPUs.
39+
40+
```py
41+
from diffusers import ControlNetModel, UNet2DConditionModel
42+
43+
lora_id = "stabilityai/control-lora"
44+
lora_filename = "control-LoRAs-rank128/control-lora-canny-rank128.safetensors"
45+
46+
unet = UNet2DConditionModel.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", subfolder="unet", torch_dtype=torch.bfloat16).to("cuda")
47+
controlnet = ControlNetModel.from_unet(unet).to(device="cuda", dtype=torch.bfloat16)
48+
controlnet.load_lora_adapter(lora_id, weight_name=lora_filename, prefix=None, controlnet_config=controlnet.config)
49+
```
50+
3651
## ControlNetModel
3752

3853
[[autodoc]] ControlNetModel

docs/source/en/training/distributed_inference.md

Lines changed: 68 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -237,6 +237,8 @@ By selectively loading and unloading the models you need at a given stage and sh
237237

238238
Use [`~ModelMixin.set_attention_backend`] to switch to a more optimized attention backend. Refer to this [table](../optimization/attention_backends#available-backends) for a complete list of available backends.
239239

240+
Most attention backends are compatible with context parallelism. Open an [issue](https://github.com/huggingface/diffusers/issues/new) if a backend is not compatible.
241+
240242
### Ring Attention
241243

242244
Key (K) and value (V) representations communicate between devices using [Ring Attention](https://huggingface.co/papers/2310.01889). This ensures each split sees every other token's K/V. Each GPU computes attention for its local K/V and passes it to the next GPU in the ring. No single GPU holds the full sequence, which reduces communication latency.
@@ -245,38 +247,58 @@ Pass a [`ContextParallelConfig`] to the `parallel_config` argument of the transf
245247

246248
```py
247249
import torch
248-
from diffusers import AutoModel, QwenImagePipeline, ContextParallelConfig
249-
250-
try:
251-
torch.distributed.init_process_group("nccl")
252-
rank = torch.distributed.get_rank()
253-
device = torch.device("cuda", rank % torch.cuda.device_count())
250+
from torch import distributed as dist
251+
from diffusers import DiffusionPipeline, ContextParallelConfig
252+
253+
def setup_distributed():
254+
if not dist.is_initialized():
255+
dist.init_process_group(backend="nccl")
256+
rank = dist.get_rank()
257+
device = torch.device(f"cuda:{rank}")
254258
torch.cuda.set_device(device)
255-
256-
transformer = AutoModel.from_pretrained("Qwen/Qwen-Image", subfolder="transformer", torch_dtype=torch.bfloat16, parallel_config=ContextParallelConfig(ring_degree=2))
257-
pipeline = QwenImagePipeline.from_pretrained("Qwen/Qwen-Image", transformer=transformer, torch_dtype=torch.bfloat16, device_map="cuda")
258-
pipeline.transformer.set_attention_backend("flash")
259+
return device
260+
261+
def main():
262+
device = setup_distributed()
263+
world_size = dist.get_world_size()
264+
265+
pipeline = DiffusionPipeline.from_pretrained(
266+
"black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16, device_map=device
267+
)
268+
pipeline.transformer.set_attention_backend("_native_cudnn")
269+
270+
cp_config = ContextParallelConfig(ring_degree=world_size)
271+
pipeline.transformer.enable_parallelism(config=cp_config)
259272

260273
prompt = """
261274
cinematic film still of a cat sipping a margarita in a pool in Palm Springs, California
262275
highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain
263276
"""
264-
277+
265278
# Must specify generator so all ranks start with same latents (or pass your own)
266279
generator = torch.Generator().manual_seed(42)
267-
image = pipeline(prompt, num_inference_steps=50, generator=generator).images[0]
268-
269-
if rank == 0:
270-
image.save("output.png")
271-
272-
except Exception as e:
273-
print(f"An error occurred: {e}")
274-
torch.distributed.breakpoint()
275-
raise
276-
277-
finally:
278-
if torch.distributed.is_initialized():
279-
torch.distributed.destroy_process_group()
280+
image = pipeline(
281+
prompt,
282+
guidance_scale=3.5,
283+
num_inference_steps=50,
284+
generator=generator,
285+
).images[0]
286+
287+
if dist.get_rank() == 0:
288+
image.save(f"output.png")
289+
290+
if dist.is_initialized():
291+
dist.destroy_process_group()
292+
293+
294+
if __name__ == "__main__":
295+
main()
296+
```
297+
298+
The script above needs to be run with a distributed launcher, such as [torchrun](https://docs.pytorch.org/docs/stable/elastic/run.html), that is compatible with PyTorch. `--nproc-per-node` is set to the number of GPUs available.
299+
300+
```shell
301+
torchrun --nproc-per-node 2 above_script.py
280302
```
281303

282304
### Ulysses Attention
@@ -288,5 +310,26 @@ finally:
288310
Pass the [`ContextParallelConfig`] to [`~ModelMixin.enable_parallelism`].
289311

290312
```py
313+
# Depending on the number of GPUs available.
291314
pipeline.transformer.enable_parallelism(config=ContextParallelConfig(ulysses_degree=2))
292-
```
315+
```
316+
317+
### parallel_config
318+
319+
Pass `parallel_config` during model initialization to enable context parallelism.
320+
321+
```py
322+
CKPT_ID = "black-forest-labs/FLUX.1-dev"
323+
324+
cp_config = ContextParallelConfig(ring_degree=2)
325+
transformer = AutoModel.from_pretrained(
326+
CKPT_ID,
327+
subfolder="transformer",
328+
torch_dtype=torch.bfloat16,
329+
parallel_config=cp_config
330+
)
331+
332+
pipeline = DiffusionPipeline.from_pretrained(
333+
CKPT_ID, transformer=transformer, torch_dtype=torch.bfloat16,
334+
).to(device)
335+
```

examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@
9494
import wandb
9595

9696
# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
97-
check_min_version("0.36.0.dev0")
97+
check_min_version("0.37.0.dev0")
9898

9999
logger = get_logger(__name__)
100100

examples/advanced_diffusion_training/train_dreambooth_lora_sd15_advanced.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@
8888

8989

9090
# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
91-
check_min_version("0.36.0.dev0")
91+
check_min_version("0.37.0.dev0")
9292

9393
logger = get_logger(__name__)
9494

examples/advanced_diffusion_training/train_dreambooth_lora_sdxl_advanced.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@
9595
import wandb
9696

9797
# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
98-
check_min_version("0.36.0.dev0")
98+
check_min_version("0.37.0.dev0")
9999

100100
logger = get_logger(__name__)
101101

examples/cogvideo/train_cogvideox_image_to_video_lora.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@
6161
import wandb
6262

6363
# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
64-
check_min_version("0.36.0.dev0")
64+
check_min_version("0.37.0.dev0")
6565

6666
logger = get_logger(__name__)
6767

examples/cogvideo/train_cogvideox_lora.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@
5252
import wandb
5353

5454
# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
55-
check_min_version("0.36.0.dev0")
55+
check_min_version("0.37.0.dev0")
5656

5757
logger = get_logger(__name__)
5858

examples/cogview4-control/train_control_cogview4.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@
6060
import wandb
6161

6262
# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
63-
check_min_version("0.36.0.dev0")
63+
check_min_version("0.37.0.dev0")
6464

6565
logger = get_logger(__name__)
6666

examples/community/marigold_depth_estimation.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@
4343

4444

4545
# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
46-
check_min_version("0.36.0.dev0")
46+
check_min_version("0.37.0.dev0")
4747

4848

4949
class MarigoldDepthOutput(BaseOutput):

examples/consistency_distillation/train_lcm_distill_lora_sd_wds.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@
7474
import wandb
7575

7676
# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
77-
check_min_version("0.36.0.dev0")
77+
check_min_version("0.37.0.dev0")
7878

7979
logger = get_logger(__name__)
8080

0 commit comments

Comments
 (0)