Skip to content

Commit 46f14a7

Browse files
authored
Merge branch 'main' into flax-dep
2 parents 0397776 + 303f3a7 commit 46f14a7

30 files changed

Lines changed: 609 additions & 269 deletions

File tree

.github/workflows/build_documentation.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ jobs:
2525
notebook_folder: diffusers_doc
2626
languages: en ko zh ja pt
2727
custom_container: diffusers/diffusers-doc-builder
28+
pre_command: uv pip uninstall transformers huggingface_hub && UV_PRERELEASE=allow uv pip install -U transformers@git+https://github.com/huggingface/transformers.git
2829
secrets:
2930
token: ${{ secrets.HUGGINGFACE_PUSH }}
3031
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}

.github/workflows/build_pr_documentation.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,3 +50,4 @@ jobs:
5050
package: diffusers
5151
languages: en ko zh ja pt
5252
custom_container: diffusers/diffusers-doc-builder
53+
pre_command: uv pip uninstall transformers huggingface_hub && UV_PRERELEASE=allow uv pip install -U transformers@git+https://github.com/huggingface/transformers.git

.github/workflows/pr_tests.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -194,6 +194,8 @@ jobs:
194194
- name: Install dependencies
195195
run: |
196196
uv pip install -e ".[quality]"
197+
uv pip uninstall transformers huggingface_hub && UV_PRERELEASE=allow uv pip install -U transformers@git+https://github.com/huggingface/transformers.git
198+
uv pip uninstall tokenizers && uv pip install "tokenizers<=0.23.0"
197199
198200
- name: Environment
199201
run: |

docs/source/en/_toctree.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -372,6 +372,8 @@
372372
title: HunyuanVideo15Transformer3DModel
373373
- local: api/models/hunyuan_video_transformer_3d
374374
title: HunyuanVideoTransformer3DModel
375+
- local: api/models/transformer_joyimage
376+
title: JoyImageEditTransformer3DModel
375377
- local: api/models/latte_transformer3d
376378
title: LatteTransformer3DModel
377379
- local: api/models/longcat_image_transformer2d
@@ -560,6 +562,8 @@
560562
title: HunyuanImage2.1
561563
- local: api/pipelines/pix2pix
562564
title: InstructPix2Pix
565+
- local: api/pipelines/joyimage_edit
566+
title: JoyImage Edit
563567
- local: api/pipelines/kandinsky
564568
title: Kandinsky 2.1
565569
- local: api/pipelines/kandinsky_v22

docs/source/en/api/cache.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,8 +35,14 @@ Cache methods speedup diffusion transformers by storing and reusing intermediate
3535

3636
[[autodoc]] apply_first_block_cache
3737

38-
### TaylorSeerCacheConfig
38+
## TaylorSeerCacheConfig
3939

4040
[[autodoc]] TaylorSeerCacheConfig
4141

4242
[[autodoc]] apply_taylorseer_cache
43+
44+
## MagCacheConfig
45+
46+
[[autodoc]] MagCacheConfig
47+
48+
[[autodoc]] apply_mag_cache
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# JoyImageEditTransformer3DModel
14+
15+
The model can be loaded with the following code snippet.
16+
17+
```python
18+
from diffusers import JoyImageEditTransformer3DModel
19+
20+
transformer = JoyImageEditTransformer3DModel.from_pretrained("jdopensource/JoyAI-Image-Edit-Diffusers", subfolder="transformer", torch_dtype=torch.bfloat16)
21+
```
22+
23+
## JoyImageEditTransformer3DModel
24+
25+
[[autodoc]] JoyImageEditTransformer3DModel
26+
27+
## Transformer2DModelOutput
28+
29+
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# JoyAI-Image-Edit
14+
15+
[JoyAI-Image](https://github.com/jd-opensource/JoyAI-Image) is a unified multimodal foundation model for image understanding, text-to-image generation, and instruction-guided image editing. It combines an 8B Multimodal Large Language Model (MLLM) with a 16B Multimodal Diffusion Transformer (MMDiT). A central principle of JoyAI-Image is the closed-loop collaboration between understanding, generation, and editing.
16+
17+
JoyAI-Image-Edit supports general image editing as well as spatial editing capabilities including object move, object rotation, and camera control.
18+
19+
| Model | Description | Download |
20+
|:-----:|:-----------:|:--------:|
21+
| JoyAI-Image-Edit | Instruction-guided image editing with precise and controllable spatial manipulation | [Hugging Face](https://huggingface.co/jdopensource/JoyAI-Image-Edit-Diffusers) |
22+
23+
```python
24+
import torch
25+
from diffusers import JoyImageEditPipeline
26+
from diffusers.utils import load_image
27+
28+
pipeline = JoyImageEditPipeline.from_pretrained(
29+
"jdopensource/JoyAI-Image-Edit-Diffusers", torch_dtype=torch.bfloat16
30+
)
31+
pipeline.to("cuda")
32+
33+
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/astronaut.jpg")
34+
prompt = "Add wings to the astronaut."
35+
36+
output = pipeline(
37+
image=image,
38+
prompt=prompt,
39+
num_inference_steps=40,
40+
guidance_scale=4.0,
41+
generator=torch.Generator("cuda").manual_seed(0),
42+
).images[0]
43+
output.save("joyimage_edit_output.png")
44+
```
45+
46+
## Spatial editing
47+
48+
JoyAI-Image supports three spatial editing prompt patterns: **Object Move**, **Object Rotation**, and **Camera Control**. For best results, follow the prompt templates below as closely as possible. For more information, refer to [SpatialEdit](https://github.com/EasonXiao-888/SpatialEdit).
49+
50+
### Object Move
51+
52+
Move a target object into a specified region marked by a red box in the input image.
53+
54+
```text
55+
Move the <object> into the red box and finally remove the red box.
56+
```
57+
58+
### Object Rotation
59+
60+
Rotate an object to a specific canonical view. Supported `<view>` values: `front`, `right`, `left`, `rear`, `front right`, `front left`, `rear right`, `rear left`.
61+
62+
```text
63+
Rotate the <object> to show the <view> side view.
64+
```
65+
66+
### Camera Control
67+
68+
Change the camera viewpoint while keeping the 3D scene unchanged.
69+
70+
```text
71+
Move the camera.
72+
- Camera rotation: Yaw {y_rotation}°, Pitch {p_rotation}°.
73+
- Camera zoom: in/out/unchanged.
74+
- Keep the 3D scene static; only change the viewpoint.
75+
```
76+
77+
## JoyImageEditPipeline
78+
79+
[[autodoc]] JoyImageEditPipeline
80+
- all
81+
- __call__
82+
83+
## JoyImageEditPipelineOutput
84+
85+
[[autodoc]] pipelines.joyimage.pipeline_output.JoyImageEditPipelineOutput

docs/source/en/optimization/cache.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -118,8 +118,6 @@ pipe.transformer.enable_cache(config)
118118

119119
MagCache relies on **Magnitude Ratios** (`mag_ratios`), which describe this decay curve. These ratios are specific to the model checkpoint and scheduler.
120120

121-
### Usage
122-
123121
To use MagCache, you typically follow a two-step process: **Calibration** and **Inference**.
124122

125123
1. **Calibration**: Run inference once with `calibrate=True`. The hook will measure the residual magnitudes and print the calculated ratios to the console.

src/diffusers/__init__.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
is_torchao_available,
2323
is_torchsde_available,
2424
is_transformers_available,
25+
is_transformers_flax_compatible,
2526
is_transformers_version,
2627
)
2728

@@ -861,7 +862,6 @@
861862
_import_structure["models.modeling_flax_utils"] = ["FlaxModelMixin"]
862863
_import_structure["models.unets.unet_2d_condition_flax"] = ["FlaxUNet2DConditionModel"]
863864
_import_structure["models.vae_flax"] = ["FlaxAutoencoderKL"]
864-
_import_structure["pipelines"].extend(["FlaxDiffusionPipeline"])
865865
_import_structure["schedulers"].extend(
866866
[
867867
"FlaxDDIMScheduler",
@@ -878,7 +878,7 @@
878878

879879

880880
try:
881-
if not (is_flax_available() and is_transformers_available()):
881+
if not (is_flax_available() and is_transformers_available() and is_transformers_flax_compatible()):
882882
raise OptionalDependencyNotAvailable()
883883
except OptionalDependencyNotAvailable:
884884
from .utils import dummy_flax_and_transformers_objects # noqa F403
@@ -891,6 +891,7 @@
891891
else:
892892
_import_structure["pipelines"].extend(
893893
[
894+
"FlaxDiffusionPipeline",
894895
"FlaxStableDiffusionControlNetPipeline",
895896
"FlaxStableDiffusionImg2ImgPipeline",
896897
"FlaxStableDiffusionInpaintPipeline",
@@ -1620,7 +1621,6 @@
16201621
from .models.modeling_flax_utils import FlaxModelMixin
16211622
from .models.unets.unet_2d_condition_flax import FlaxUNet2DConditionModel
16221623
from .models.vae_flax import FlaxAutoencoderKL
1623-
from .pipelines import FlaxDiffusionPipeline
16241624
from .schedulers import (
16251625
FlaxDDIMScheduler,
16261626
FlaxDDPMScheduler,
@@ -1634,12 +1634,13 @@
16341634
)
16351635

16361636
try:
1637-
if not (is_flax_available() and is_transformers_available()):
1637+
if not (is_flax_available() and is_transformers_available() and is_transformers_flax_compatible()):
16381638
raise OptionalDependencyNotAvailable()
16391639
except OptionalDependencyNotAvailable:
16401640
from .utils.dummy_flax_and_transformers_objects import * # noqa F403
16411641
else:
16421642
from .pipelines import (
1643+
FlaxDiffusionPipeline,
16431644
FlaxStableDiffusionControlNetPipeline,
16441645
FlaxStableDiffusionImg2ImgPipeline,
16451646
FlaxStableDiffusionInpaintPipeline,

src/diffusers/modular_pipelines/ernie_image/encoders.py

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,16 +15,23 @@
1515
import json
1616

1717
import torch
18-
from transformers import AutoModel, AutoModelForCausalLM, AutoTokenizer
18+
from transformers import AutoTokenizer, Mistral3Model
1919

2020
from ...configuration_utils import FrozenDict
2121
from ...guiders import ClassifierFreeGuidance
2222
from ...utils import logging
23+
from ...utils.import_utils import is_transformers_version
2324
from ..modular_pipeline import ModularPipelineBlocks, PipelineState
2425
from ..modular_pipeline_utils import ComponentSpec, InputParam, OutputParam
2526
from .modular_pipeline import ErnieImageModularPipeline
2627

2728

29+
if is_transformers_version("<", "5.0.0"):
30+
raise ImportError("`ErnieImageModularPipeline` requires `transformers>=5.0.0` for `Ministral3ForCausalLM`.")
31+
32+
from transformers import Ministral3ForCausalLM # noqa: E402
33+
34+
2835
logger = logging.get_logger(__name__) # pylint: disable=invalid-name
2936

3037

@@ -38,7 +45,7 @@ def description(self) -> str:
3845
@property
3946
def expected_components(self) -> list[ComponentSpec]:
4047
return [
41-
ComponentSpec("pe", AutoModelForCausalLM),
48+
ComponentSpec("pe", Ministral3ForCausalLM),
4249
ComponentSpec("pe_tokenizer", AutoTokenizer),
4350
]
4451

@@ -83,7 +90,7 @@ def intermediate_outputs(self) -> list[OutputParam]:
8390

8491
@staticmethod
8592
def _enhance_prompt(
86-
pe: AutoModelForCausalLM,
93+
pe: Ministral3ForCausalLM,
8794
pe_tokenizer: AutoTokenizer,
8895
prompt: str,
8996
device: torch.device,
@@ -160,7 +167,7 @@ def description(self) -> str:
160167
@property
161168
def expected_components(self) -> list[ComponentSpec]:
162169
return [
163-
ComponentSpec("text_encoder", AutoModel),
170+
ComponentSpec("text_encoder", Mistral3Model),
164171
ComponentSpec("tokenizer", AutoTokenizer),
165172
ComponentSpec(
166173
"guider",
@@ -200,7 +207,7 @@ def intermediate_outputs(self) -> list[OutputParam]:
200207

201208
@staticmethod
202209
def _encode(
203-
text_encoder: AutoModel,
210+
text_encoder: Mistral3Model,
204211
tokenizer: AutoTokenizer,
205212
prompt: list[str],
206213
device: torch.device,

0 commit comments

Comments
 (0)