add memory reduction options

EndlessSora · EndlessSora · commit bc2924fa9ec3 · 2025-04-12T17:14:20.000-07:00
diff --git a/README.md b/README.md
@@ -24,6 +24,8 @@ ByteDance Intelligent Creation
 
 ## 🔥 News
 
+- [04/2025] 🔥 Quantization and offloading [options](https://github.com/bytedance/InfiniteYou#memory-requirements) are provided to reduce the memory requirements for InfiniteYou-FLUX v1.0.
+
 - [03/2025] 🔥 The [code](https://github.com/bytedance/InfiniteYou), [model](https://huggingface.co/ByteDance/InfiniteYou), and [demo](https://huggingface.co/spaces/ByteDance/InfiniteYou-FLUX) of InfiniteYou-FLUX v1.0 are released.
 
 - [03/2025] 🔥 The [project page](https://bytedance.github.io/InfiniteYou) of InfiniteYou is created.
@@ -35,7 +37,7 @@ ByteDance Intelligent Creation
 
 - We released two model variants of InfiniteYou-FLUX v1.0: [aes_stage2](https://huggingface.co/ByteDance/InfiniteYou/tree/main/infu_flux_v1.0/aes_stage2) and [sim_stage1](https://huggingface.co/ByteDance/InfiniteYou/tree/main/infu_flux_v1.0/sim_stage1). The `aes_stage2` is our model after SFT, which is used by default for better text-image alignment and aesthetics. For higher ID similarity, please try `sim_stage1` (using `--model_version` to switch). More details can be found in our [paper](https://arxiv.org/abs/2503.16418).
 
-- To better fit specific personal needs, we find that two arguments are highly useful to adjust: <br />`--infusenet_conditioning_scale` (default: `1.0`) and `--infusenet_guidance_start` (default: `0.0`). Usually, you may NOT need to adjust them. If necessary, start by trying a slightly larger <br />`--infusenet_guidance_start` (*e.g.*, `0.1`) only (especially helpful for `sim_stage1`). If still not satisfactory, then try a slightly smaller `--infusenet_conditioning_scale` (*e.g.*, `0.9`).
+- To better fit specific personal needs, we find that two arguments are highly useful to adjust: <br />`--infusenet_conditioning_scale` (default: `1.0`) and `--infusenet_guidance_start` (default: `0.0`). Usually, you may NOT need to adjust them. If necessary, start by trying a slightly larger `--infusenet_guidance_start` (*e.g.*, `0.1`) only (especially helpful for `sim_stage1`). If still not satisfactory, then try a slightly smaller `--infusenet_conditioning_scale` (*e.g.*, `0.9`).
 
 - We also provided two LoRAs ([Realism](https://civitai.com/models/631986?modelVersionId=706528) and [Anti-blur](https://civitai.com/models/675581/anti-blur-flux-lora)) to enable additional usage flexibility. If needed, try `Realism` only first.  They are *entirely optional*, which are examples to try but are NOT used in our paper.
 
@@ -62,9 +64,15 @@ pip install -r requirements.txt
 
 ### Memory Requirements 
 
-Please note that the current full-performance `bf16` model inference requires a **peak VRAM** of around **43GB**. **We are trying to reduce memory usage and will post an update soon.** Community contributions are welcome.
+- **Full-performance**: The original `bf16` model inference requires a **peak VRAM** of around **43GB**.
+
+- **Fast CPU offloading**: By specifying only `--cpu_offload` in [test.py](https://github.com/bytedance/InfiniteYou/blob/main/test.py#L44), the **peak VRAM** is reduced to around **30GB** with **NO** performance degradation.
+
+- **8-bit quantization**: By specifying only `--quantize_8bit` in [test.py](https://github.com/bytedance/InfiniteYou/blob/main/test.py#L44), the **peak VRAM** is reduced to around **24GB** with performance remaining very similar.
+
+- **Combining fast CPU offloading and 8-bit quantization**: By specifying both `--cpu_offload` and <br />`--quantize_8bit`, the **peak VRAM** is further reduced to around **16GB** with performance remaining very similar.
 
-If you want to use our models ASAP but do not have a GPU with sufficient VRAM, please follow [Diffusers memory reduction tips](https://huggingface.co/docs/diffusers/en/optimization/memory) first, where some offloading strategies may be helpful.
+If you want to use our models but only have a GPU with even less VRAM, please further refer to [Diffusers memory reduction tips](https://huggingface.co/docs/diffusers/en/optimization/memory), where some more aggressive strategies may be helpful. Community contributions are also welcome.
 
 
 ## ⚡️ Quick Inference
@@ -100,6 +108,9 @@ python test.py --id_image ./assets/examples/man.jpg --prompt "A man, portrait, c
 - Optional LoRAs:
   - `--enable_realism_lora (store_true)`: Whether to enable the Realism LoRA. Default: `False`.
   - `--enable_anti_blur_lora (store_true)`: Whether to enable the Anti-blur LoRA. Default: `False`.
+- Memory reduction options:
+  - `--quantize_8bit (store_true)`: Whether to quantize the model to the 8-bit format. Default: `False`.
+  - `--cpu_offload (store_true)`: Whether to use fast CPU offloading. Default: `False`.
 
 </details>
 
@@ -134,7 +145,7 @@ InfU features a desirable plug-and-play design, compatible with many existing me
 
 The images used in this repository and related demos are sourced from consented subjects or generated by the models. These pictures are intended solely to showcase the capabilities of our research. If you have any concerns, please feel free to contact us, and we will promptly remove any inappropriate content.
 
-The use of the released code, model, and demo must strictly adhere to the respective licenses. Our code is released under the [Apache 2.0 License](./LICENSE), and our model is released under the [Creative Commons Attribution-NonCommercial 4.0 International Public License](https://huggingface.co/ByteDance/InfiniteYou/blob/main/LICENSE) for academic research purposes only. Any manual or automatic downloading of the face models from [InsightFace](https://github.com/deepinsight/insightface), the [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) base model, LoRAs ([Realism](https://civitai.com/models/631986?modelVersionId=706528) and [Anti-blur](https://civitai.com/models/675581/anti-blur-flux-lora)), *etc.*, must follow their original licenses and be used only for academic research purposes.
+The use of the released code, model, and demo must strictly adhere to the respective licenses. Our code is released under the [Apache License 2.0](./LICENSE), and our model is released under the [Creative Commons Attribution-NonCommercial 4.0 International Public License](https://huggingface.co/ByteDance/InfiniteYou/blob/main/LICENSE) for academic research purposes only. Any manual or automatic downloading of the face models from [InsightFace](https://github.com/deepinsight/insightface), the [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) base model, LoRAs ([Realism](https://civitai.com/models/631986?modelVersionId=706528) and [Anti-blur](https://civitai.com/models/675581/anti-blur-flux-lora)), *etc.*, must follow their original licenses and be used only for academic research purposes.
 
 This research aims to positively impact the field of Generative AI. Any usage of this method must be responsible and comply with local laws. The developers do not assume any responsibility for any potential misuse.