Skip to content

Commit 4ac2b4a

Browse files
sayakpaulstevhliu
andauthored
[docs] polish caching docs. (huggingface#12684)
* polish caching docs. * Update docs/source/en/optimization/cache.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/optimization/cache.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * up --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
1 parent 418313b commit 4ac2b4a

3 files changed

Lines changed: 20 additions & 5 deletions

File tree

docs/source/en/api/cache.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ Cache methods speedup diffusion transformers by storing and reusing intermediate
2929

3030
[[autodoc]] apply_faster_cache
3131

32-
### FirstBlockCacheConfig
32+
## FirstBlockCacheConfig
3333

3434
[[autodoc]] FirstBlockCacheConfig
3535

docs/source/en/optimization/cache.md

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,20 @@ config = FasterCacheConfig(
6868
pipeline.transformer.enable_cache(config)
6969
```
7070

71+
## FirstBlockCache
72+
73+
[FirstBlock Cache](https://huggingface.co/docs/diffusers/main/en/api/cache#diffusers.FirstBlockCacheConfig) checks how much the early layers of the denoiser changes from one timestep to the next. If the change is small, the model skips the expensive later layers and reuses the previous output.
74+
75+
```py
76+
import torch
77+
from diffusers import DiffusionPipeline
78+
from diffusers.hooks import apply_first_block_cache, FirstBlockCacheConfig
79+
80+
pipeline = DiffusionPipeline.from_pretrained(
81+
"Qwen/Qwen-Image", torch_dtype=torch.bfloat16
82+
)
83+
apply_first_block_cache(pipeline.transformer, FirstBlockCacheConfig(threshold=0.2))
84+
```
7185
## TaylorSeer Cache
7286

7387
[TaylorSeer Cache](https://huggingface.co/papers/2403.06923) accelerates diffusion inference by using Taylor series expansions to approximate and cache intermediate activations across denoising steps. The method predicts future outputs based on past computations, reusing them at specified intervals to reduce redundant calculations.
@@ -87,8 +101,7 @@ from diffusers import FluxPipeline, TaylorSeerCacheConfig
87101
pipe = FluxPipeline.from_pretrained(
88102
"black-forest-labs/FLUX.1-dev",
89103
torch_dtype=torch.bfloat16,
90-
)
91-
pipe.to("cuda")
104+
).to("cuda")
92105

93106
config = TaylorSeerCacheConfig(
94107
cache_interval=5,
@@ -97,4 +110,4 @@ config = TaylorSeerCacheConfig(
97110
taylor_factors_dtype=torch.bfloat16,
98111
)
99112
pipe.transformer.enable_cache(config)
100-
```
113+
```

src/diffusers/models/cache_utils.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,9 +41,11 @@ def enable_cache(self, config) -> None:
4141
Enable caching techniques on the model.
4242
4343
Args:
44-
config (`Union[PyramidAttentionBroadcastConfig]`):
44+
config (`Union[PyramidAttentionBroadcastConfig, FasterCacheConfig, FirstBlockCacheConfig]`):
4545
The configuration for applying the caching technique. Currently supported caching techniques are:
4646
- [`~hooks.PyramidAttentionBroadcastConfig`]
47+
- [`~hooks.FasterCacheConfig`]
48+
- [`~hooks.FirstBlockCacheConfig`]
4749
4850
Example:
4951

0 commit comments

Comments
 (0)