Skip to content

Commit 2ccc45f

Browse files
committed
upd
Signed-off-by: Lancer <maruixiang6688@gmail.com>
1 parent 5ebbbd7 commit 2ccc45f

17 files changed

Lines changed: 797 additions & 411 deletions

docs/source/en/_toctree.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -370,6 +370,8 @@
370370
title: LatteTransformer3DModel
371371
- local: api/models/longcat_image_transformer2d
372372
title: LongCatImageTransformer2DModel
373+
- local: api/models/joyai_image_transformer3d
374+
title: JoyAIImageTransformer3DModel
373375
- local: api/models/ltx2_video_transformer3d
374376
title: LTX2VideoTransformer3DModel
375377
- local: api/models/ltx_video_transformer3d
@@ -466,6 +468,8 @@
466468
title: AutoencoderKLQwenImage
467469
- local: api/models/autoencoder_kl_wan
468470
title: AutoencoderKLWan
471+
- local: api/models/autoencoder_kl_joyai_image
472+
title: JoyAIImageVAE
469473
- local: api/models/autoencoder_rae
470474
title: AutoencoderRAE
471475
- local: api/models/consistency_decoder_vae
@@ -558,6 +562,8 @@
558562
title: Kandinsky 5.0 Image
559563
- local: api/pipelines/kolors
560564
title: Kolors
565+
- local: api/pipelines/joyai_image
566+
title: JoyAI-Image
561567
- local: api/pipelines/latent_consistency_models
562568
title: Latent Consistency Models
563569
- local: api/pipelines/latent_diffusion
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
<!--Copyright 2026 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# JoyAIImageVAE
14+
15+
The 3D variational autoencoder (VAE) model with KL loss used in JoyAI-Image by JDopensource.
16+
17+
The model can be loaded with the following code snippet.
18+
19+
```python
20+
from diffusers import JoyAIImageVAE
21+
22+
vae = JoyAIImageVAE.from_pretrained("path/to/checkpoint", subfolder="vae", torch_dtype=torch.bfloat16)
23+
```
24+
25+
26+
## JoyAIImageVAE
27+
28+
[[autodoc]] JoyAIImageVAE
29+
- decode
30+
- all
31+
32+
33+
## DecoderOutput
34+
35+
[[autodoc]] diffusers.models.autoencoders.autoencoder_kl.AutoencoderKLOutput
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
<!--Copyright 2026 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# JoyAIImageTransformer3DModel
14+
15+
The model can be loaded with the following code snippet.
16+
17+
```python
18+
from diffusers import JoyAIImageTransformer3DModel
19+
20+
transformer = JoyAIImageTransformer3DModel.from_pretrained("path/to/checkpoint", subfolder="transformer", torch_dtype=torch.bfloat16)
21+
```
22+
23+
24+
## JoyAIImageTransformer3DModel
25+
26+
[[autodoc]] JoyAIImageTransformer3DModel
Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
<!--Copyright 2026 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# JoyAI-Image
14+
15+
<div class="flex flex-wrap space-x-1">
16+
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
17+
</div>
18+
19+
JoyAI-Image is a multimodal foundation model specialized in instruction-guided image editing. It enables precise and controllable edits by leveraging strong spatial understanding, including scene parsing, relational grounding, and instruction decomposition, allowing complex modifications to be applied accurately to specified regions.
20+
21+
22+
### Key Features
23+
- 🌟 **Unified Multimodal Understanding and Generation**: Combines powerful image understanding with generation capabilities in a single model.
24+
- 🌟 **Spatial Editing**: Supports precise spatial editing including object movement, rotation, and camera control.
25+
- 🌟 **Instruction Following**: Accurately interprets user instructions for image modifications while preserving image quality.
26+
- 🌟 **Qwen2.5-VL Integration**: Leverages Qwen2.5-VL for enhanced multimodal understanding.
27+
28+
For more details, please refer to the [JoyAI-Image GitHub](https://github.com/jd-opensource/JoyAI-Image).
29+
30+
31+
## Usage Example
32+
33+
```py
34+
import torch
35+
from diffusers import JoyAIImagePipeline
36+
37+
pipe = JoyAIImagePipeline.from_pretrained("path/to/converted/checkpoint", torch_dtype=torch.bfloat16)
38+
pipe.to("cuda")
39+
40+
prompt = "Move the apple into the red box and finally remove the red box."
41+
image = pipe(
42+
prompt,
43+
image=input_image,
44+
num_inference_steps=30,
45+
guidance_scale=5.0,
46+
).images[0]
47+
image.save("./output.png")
48+
```
49+
50+
51+
### Supported Prompt Patterns
52+
53+
#### 1. Object Move
54+
```text
55+
Move the <object> into the red box and finally remove the red box.
56+
```
57+
58+
#### 2. Object Rotation
59+
```text
60+
Rotate the <object> to show the <view> side view.
61+
```
62+
Supported views: front, right, left, rear, front right, front left, rear right, rear left
63+
64+
#### 3. Camera Control
65+
```text
66+
Move the camera.
67+
- Camera rotation: Yaw {y_rotation}°, Pitch {p_rotation}°.
68+
- Camera zoom: in/out/unchanged.
69+
- Keep the 3D scene static; only change the viewpoint.
70+
```
71+
72+
This pipeline was contributed by JDopensource Team. The original codebase can be found [here](https://github.com/jd-opensource/JoyAI-Image).
73+
74+
75+
## Available Models
76+
<div style="overflow-x: auto; margin-bottom: 16px;">
77+
<table style="border-collapse: collapse; width: 100%;">
78+
<thead>
79+
<tr>
80+
<th style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de; background-color: #f6f8fa;">Models</th>
81+
<th style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de; background-color: #f6f8fa;">Type</th>
82+
<th style="padding: 8px; border: 1px solid #d0d7de; background-color: #f6f8fa;">Description</th>
83+
<th style="padding: 8px; border: 1px solid #d0d7de; background-color: #f6f8fa;">Download Link</th>
84+
</tr>
85+
</thead>
86+
<tbody>
87+
<tr>
88+
<td style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de;">JoyAI&#8209;Image&#8209;Edit</td>
89+
<td style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de;">Image Editing</td>
90+
<td style="padding: 8px; border: 1px solid #d0d7de;">Final Release. Specialized model for instruction-guided image editing.</td>
91+
<td style="padding: 8px; border: 1px solid #d0d7de;">
92+
<span style="white-space: nowrap;">🤗&nbsp;<a href="https://huggingface.co/jdopensource/JoyAI-Image-Edit">Huggingface</a></span>
93+
</td>
94+
</tr>
95+
</tbody>
96+
</table>
97+
</div>
98+
99+
## Converting Original Checkpoint to Diffusers Format
100+
101+
If you have the original JoyAI checkpoint, you can convert it to diffusers format using the provided conversion script:
102+
103+
```bash
104+
python scripts/convert_joyai_image_to_diffusers.py \
105+
--source_path /path/to/original/JoyAI-Image-Edit \
106+
--output_path /path/to/converted/checkpoint \
107+
--dtype bf16
108+
```
109+
110+
After conversion, load the model with:
111+
112+
```py
113+
from diffusers import JoyAIImagePipeline
114+
pipe = JoyAIImagePipeline.from_pretrained("/path/to/converted/checkpoint")
115+
```
116+
117+
118+
## JoyAIImagePipeline
119+
120+
[[autodoc]] JoyAIImagePipeline
121+
- all
122+
- __call__
123+
124+
125+
## JoyAIImagePipelineOutput
126+
127+
[[autodoc]] pipelines.joyai_image.pipeline_output.JoyAIImagePipelineOutput
128+

docs/source/en/api/pipelines/overview.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ The table below lists all the pipelines currently available in 🤗 Diffusers an
4949
| [Kandinsky 2.2](kandinsky_v22) | text2image, image2image, inpainting |
5050
| [Kandinsky 3](kandinsky3) | text2image, image2image |
5151
| [Kolors](kolors) | text2image |
52+
| [JoyAI-Image](joyai_image) | image editing |
5253
| [Latent Consistency Models](latent_consistency_models) | text2image |
5354
| [Latent Diffusion](latent_diffusion) | text2image, super-resolution |
5455
| [Latte](latte) | text2image |

0 commit comments

Comments
 (0)