Skip to content

Commit ed31974

Browse files
authored
[docs] updates (#13248)
* fixes * few more links * update zh * fix
1 parent e5aa719 commit ed31974

File tree

10 files changed

+10
-148
lines changed

10 files changed

+10
-148
lines changed

docs/source/en/_toctree.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,8 @@
2222
title: Reproducibility
2323
- local: using-diffusers/schedulers
2424
title: Schedulers
25+
- local: using-diffusers/guiders
26+
title: Guiders
2527
- local: using-diffusers/automodel
2628
title: AutoModel
2729
- local: using-diffusers/other-formats
@@ -110,8 +112,6 @@
110112
title: ModularPipeline
111113
- local: modular_diffusers/components_manager
112114
title: ComponentsManager
113-
- local: modular_diffusers/guiders
114-
title: Guiders
115115
- local: modular_diffusers/custom_blocks
116116
title: Building Custom Blocks
117117
- local: modular_diffusers/mellon

docs/source/en/api/pipelines/hunyuan_video15.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ To update guider configuration, you can run `pipe.guider = pipe.guider.new(...)`
9999
pipe.guider = pipe.guider.new(guidance_scale=5.0)
100100
```
101101

102-
Read more on Guider [here](../../modular_diffusers/guiders).
102+
Read more on Guider [here](../../using-diffusers/guiders).
103103

104104

105105

docs/source/en/api/pipelines/hunyuanimage21.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ HunyuanImage-2.1 comes in the following variants:
3030
3131
## HunyuanImage-2.1
3232

33-
HunyuanImage-2.1 applies [Adaptive Projected Guidance (APG)](https://huggingface.co/papers/2410.02416) combined with Classifier-Free Guidance (CFG) in the denoising loop. `HunyuanImagePipeline` has a `guider` component (read more about [Guider](../modular_diffusers/guiders.md)) and does not take a `guidance_scale` parameter at runtime. To change guider-related parameters, e.g., `guidance_scale`, you can update the `guider` configuration instead.
33+
HunyuanImage-2.1 applies [Adaptive Projected Guidance (APG)](https://huggingface.co/papers/2410.02416) combined with Classifier-Free Guidance (CFG) in the denoising loop. `HunyuanImagePipeline` has a `guider` component (read more about [Guider](../../using-diffusers/guiders)) and does not take a `guidance_scale` parameter at runtime. To change guider-related parameters, e.g., `guidance_scale`, you can update the `guider` configuration instead.
3434

3535
```python
3636
import torch

docs/source/en/modular_diffusers/modular_pipeline.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -338,7 +338,7 @@ guider = ClassifierFreeGuidance(guidance_scale=5.0)
338338
pipeline.update_components(guider=guider)
339339
```
340340

341-
See the [Guiders](./guiders) guide for more details on available guiders and how to configure them.
341+
See the [Guiders](../using-diffusers/guiders) guide for more details on available guiders and how to configure them.
342342

343343
## Splitting a pipeline into stages
344344

docs/source/en/modular_diffusers/overview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ The Modular Diffusers docs are organized as shown below.
3939

4040
- [ModularPipeline](./modular_pipeline) shows you how to create and convert pipeline blocks into an executable [`ModularPipeline`].
4141
- [ComponentsManager](./components_manager) shows you how to manage and reuse components across multiple pipelines.
42-
- [Guiders](./guiders) shows you how to use different guidance methods in the pipeline.
42+
- [Guiders](../using-diffusers/guiders) shows you how to use different guidance methods in the pipeline.
4343

4444
## Mellon Integration
4545

docs/source/en/optimization/memory.md

Lines changed: 1 addition & 139 deletions
Original file line numberDiff line numberDiff line change
@@ -482,144 +482,6 @@ print(
482482
) # (2880, 1, 960, 320) having a stride of 1 for the 2nd dimension proves that it works
483483
```
484484

485-
## torch.jit.trace
486-
487-
[torch.jit.trace](https://pytorch.org/docs/stable/generated/torch.jit.trace.html) records the operations a model performs on a sample input and creates a new, optimized representation of the model based on the recorded execution path. During tracing, the model is optimized to reduce overhead from Python and dynamic control flows and operations are fused together for more efficiency. The returned executable or [ScriptFunction](https://pytorch.org/docs/stable/generated/torch.jit.ScriptFunction.html) can be compiled.
488-
489-
```py
490-
import time
491-
import torch
492-
from diffusers import StableDiffusionPipeline
493-
import functools
494-
495-
# torch disable grad
496-
torch.set_grad_enabled(False)
497-
498-
# set variables
499-
n_experiments = 2
500-
unet_runs_per_experiment = 50
501-
502-
# load sample inputs
503-
def generate_inputs():
504-
sample = torch.randn((2, 4, 64, 64), device="cuda", dtype=torch.float16)
505-
timestep = torch.rand(1, device="cuda", dtype=torch.float16) * 999
506-
encoder_hidden_states = torch.randn((2, 77, 768), device="cuda", dtype=torch.float16)
507-
return sample, timestep, encoder_hidden_states
508-
509-
510-
pipeline = StableDiffusionPipeline.from_pretrained(
511-
"stable-diffusion-v1-5/stable-diffusion-v1-5",
512-
torch_dtype=torch.float16,
513-
use_safetensors=True,
514-
).to("cuda")
515-
unet = pipeline.unet
516-
unet.eval()
517-
unet.to(memory_format=torch.channels_last) # use channels_last memory format
518-
unet.forward = functools.partial(unet.forward, return_dict=False) # set return_dict=False as default
519-
520-
# warmup
521-
for _ in range(3):
522-
with torch.inference_mode():
523-
inputs = generate_inputs()
524-
orig_output = unet(*inputs)
525-
526-
# trace
527-
print("tracing..")
528-
unet_traced = torch.jit.trace(unet, inputs)
529-
unet_traced.eval()
530-
print("done tracing")
531-
532-
# warmup and optimize graph
533-
for _ in range(5):
534-
with torch.inference_mode():
535-
inputs = generate_inputs()
536-
orig_output = unet_traced(*inputs)
537-
538-
# benchmarking
539-
with torch.inference_mode():
540-
for _ in range(n_experiments):
541-
torch.cuda.synchronize()
542-
start_time = time.time()
543-
for _ in range(unet_runs_per_experiment):
544-
orig_output = unet_traced(*inputs)
545-
torch.cuda.synchronize()
546-
print(f"unet traced inference took {time.time() - start_time:.2f} seconds")
547-
for _ in range(n_experiments):
548-
torch.cuda.synchronize()
549-
start_time = time.time()
550-
for _ in range(unet_runs_per_experiment):
551-
orig_output = unet(*inputs)
552-
torch.cuda.synchronize()
553-
print(f"unet inference took {time.time() - start_time:.2f} seconds")
554-
555-
# save the model
556-
unet_traced.save("unet_traced.pt")
557-
```
558-
559-
Replace the pipeline's UNet with the traced version.
560-
561-
```py
562-
import torch
563-
from diffusers import StableDiffusionPipeline
564-
from dataclasses import dataclass
565-
566-
@dataclass
567-
class UNet2DConditionOutput:
568-
sample: torch.Tensor
569-
570-
pipeline = StableDiffusionPipeline.from_pretrained(
571-
"stable-diffusion-v1-5/stable-diffusion-v1-5",
572-
torch_dtype=torch.float16,
573-
use_safetensors=True,
574-
).to("cuda")
575-
576-
# use jitted unet
577-
unet_traced = torch.jit.load("unet_traced.pt")
578-
579-
# del pipeline.unet
580-
class TracedUNet(torch.nn.Module):
581-
def __init__(self):
582-
super().__init__()
583-
self.in_channels = pipe.unet.config.in_channels
584-
self.device = pipe.unet.device
585-
586-
def forward(self, latent_model_input, t, encoder_hidden_states):
587-
sample = unet_traced(latent_model_input, t, encoder_hidden_states)[0]
588-
return UNet2DConditionOutput(sample=sample)
589-
590-
pipeline.unet = TracedUNet()
591-
592-
with torch.inference_mode():
593-
image = pipe([prompt] * 1, num_inference_steps=50).images[0]
594-
```
595-
596485
## Memory-efficient attention
597486

598-
> [!TIP]
599-
> Memory-efficient attention optimizes for memory usage *and* [inference speed](./fp16#scaled-dot-product-attention)!
600-
601-
The Transformers attention mechanism is memory-intensive, especially for long sequences, so you can try using different and more memory-efficient attention types.
602-
603-
By default, if PyTorch >= 2.0 is installed, [scaled dot-product attention (SDPA)](https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html) is used. You don't need to make any additional changes to your code.
604-
605-
SDPA supports [FlashAttention](https://github.com/Dao-AILab/flash-attention) and [xFormers](https://github.com/facebookresearch/xformers) as well as a native C++ PyTorch implementation. It automatically selects the most optimal implementation based on your input.
606-
607-
You can explicitly use xFormers with the [`~ModelMixin.enable_xformers_memory_efficient_attention`] method.
608-
609-
```py
610-
# pip install xformers
611-
import torch
612-
from diffusers import StableDiffusionXLPipeline
613-
614-
pipeline = StableDiffusionXLPipeline.from_pretrained(
615-
"stabilityai/stable-diffusion-xl-base-1.0",
616-
torch_dtype=torch.float16,
617-
).to("cuda")
618-
pipeline.enable_xformers_memory_efficient_attention()
619-
```
620-
621-
Call [`~ModelMixin.disable_xformers_memory_efficient_attention`] to disable it.
622-
623-
```py
624-
pipeline.disable_xformers_memory_efficient_attention()
625-
```
487+
Diffusers supports multiple memory-efficient attention backends (FlashAttention, xFormers, SageAttention, and more) through [`~ModelMixin.set_attention_backend`]. Refer to the [Attention backends](./attention_backends) guide to learn how to switch between them.

docs/source/en/optimization/xformers.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ pip install xformers
2323
> [!TIP]
2424
> The xFormers `pip` package requires the latest version of PyTorch. If you need to use a previous version of PyTorch, then we recommend [installing xFormers from the source](https://github.com/facebookresearch/xformers#installing-xformers).
2525
26-
After xFormers is installed, you can use `enable_xformers_memory_efficient_attention()` for faster inference and reduced memory consumption as shown in this [section](memory#memory-efficient-attention).
26+
After xFormers is installed, you can use it with [`~ModelMixin.set_attention_backend`] as shown in the [Attention backends](./attention_backends) guide.
2727

2828
> [!WARNING]
2929
> According to this [issue](https://github.com/huggingface/diffusers/issues/2234#issuecomment-1416931212), xFormers `v0.0.16` cannot be used for training (fine-tune or DreamBooth) in some GPUs. If you observe this problem, please install a development version as indicated in the issue comments.
File renamed without changes.

docs/source/zh/_toctree.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@
1414
sections:
1515
- local: using-diffusers/schedulers
1616
title: Load schedulers and models
17+
- local: using-diffusers/guiders
18+
title: Guiders
1719

1820
- title: Inference
1921
isExpanded: false
@@ -80,8 +82,6 @@
8082
title: ModularPipeline
8183
- local: modular_diffusers/components_manager
8284
title: ComponentsManager
83-
- local: modular_diffusers/guiders
84-
title: Guiders
8585

8686
- title: Training
8787
isExpanded: false
File renamed without changes.

0 commit comments

Comments
 (0)