You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[MagCache](https://github.com/Zehong-Ma/MagCache) accelerates inference by skipping transformer blocks based on the magnitude of the residual update. It observes that the magnitude of updates (Output - Input) decays predictably over the diffusion process. By accumulating an "error budget" based on pre-computed magnitude ratios, it dynamically decides when to skip computation and reuse the previous residual.
118
+
119
+
MagCache relies on **Magnitude Ratios** (`mag_ratios`), which describe this decay curve. These ratios are specific to the model checkpoint and scheduler.
120
+
121
+
### Usage
122
+
123
+
To use MagCache, you typically follow a two-step process: **Calibration** and **Inference**.
124
+
125
+
1.**Calibration**: Run inference once with `calibrate=True`. The hook will measure the residual magnitudes and print the calculated ratios to the console.
126
+
2.**Inference**: Pass these ratios to `MagCacheConfig` to enable acceleration.
127
+
128
+
```python
129
+
import torch
130
+
from diffusers import FluxPipeline, MagCacheConfig
# Apply the specific ratios obtained from calibration for optimized speed.
148
+
# Note: For Flux models, you can also import defaults:
149
+
# from diffusers.hooks.mag_cache import FLUX_MAG_RATIOS
150
+
mag_config = MagCacheConfig(
151
+
mag_ratios=[1.0, 1.37, 0.97, 0.87],
152
+
num_inference_steps=4
153
+
)
154
+
155
+
pipe.transformer.enable_cache(mag_config)
156
+
157
+
image = pipe("A cat playing chess", num_inference_steps=4).images[0]
158
+
```
159
+
160
+
> [!NOTE]
161
+
> `mag_ratios` represent the model's intrinsic magnitude decay curve. Ratios calibrated for a high number of steps (e.g., 50) can be reused for lower step counts (e.g., 20). The implementation uses interpolation to map the curve to the current number of inference steps.
162
+
163
+
> [!TIP]
164
+
> For pipelines that run Classifier-Free Guidance sequentially (like Kandinsky 5.0), the calibration log might print two arrays: one for the Conditional pass and one for the Unconditional pass. In most cases, you should use the first array (Conditional).
165
+
166
+
> [!TIP]
167
+
> For pipelines that run Classifier-Free Guidance in a **batched** manner (like SDXL or Flux), the `hidden_states` processed by the model contain both conditional and unconditional branches concatenated together. The calibration process automatically accounts for this, producing a single array of ratios that represents the joint behavior. You can use this resulting array directly without modification.
0 commit comments