|
| 1 | +<!--Copyright 2026 The HuggingFace Team. All rights reserved. |
| 2 | +
|
| 3 | +Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with |
| 4 | +the License. You may obtain a copy of the License at |
| 5 | +
|
| 6 | +http://www.apache.org/licenses/LICENSE-2.0 |
| 7 | +
|
| 8 | +Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on |
| 9 | +an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the |
| 10 | +specific language governing permissions and limitations under the License. |
| 11 | +--> |
| 12 | + |
| 13 | +# JoyAI-Image |
| 14 | + |
| 15 | +<div class="flex flex-wrap space-x-1"> |
| 16 | + <img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/> |
| 17 | +</div> |
| 18 | + |
| 19 | +JoyAI-Image is a multimodal foundation model specialized in instruction-guided image editing. It enables precise and controllable edits by leveraging strong spatial understanding, including scene parsing, relational grounding, and instruction decomposition, allowing complex modifications to be applied accurately to specified regions. |
| 20 | + |
| 21 | + |
| 22 | +### Key Features |
| 23 | +- 🌟 **Unified Multimodal Understanding and Generation**: Combines powerful image understanding with generation capabilities in a single model. |
| 24 | +- 🌟 **Spatial Editing**: Supports precise spatial editing including object movement, rotation, and camera control. |
| 25 | +- 🌟 **Instruction Following**: Accurately interprets user instructions for image modifications while preserving image quality. |
| 26 | +- 🌟 **Qwen2.5-VL Integration**: Leverages Qwen2.5-VL for enhanced multimodal understanding. |
| 27 | + |
| 28 | +For more details, please refer to the [JoyAI-Image GitHub](https://github.com/jd-opensource/JoyAI-Image). |
| 29 | + |
| 30 | + |
| 31 | +## Usage Example |
| 32 | + |
| 33 | +```py |
| 34 | +import torch |
| 35 | +from diffusers import JoyAIImagePipeline |
| 36 | + |
| 37 | +pipe = JoyAIImagePipeline.from_pretrained("path/to/converted/checkpoint", torch_dtype=torch.bfloat16) |
| 38 | +pipe.to("cuda") |
| 39 | + |
| 40 | +prompt = "Move the apple into the red box and finally remove the red box." |
| 41 | +image = pipe( |
| 42 | + prompt, |
| 43 | + image=input_image, |
| 44 | + num_inference_steps=30, |
| 45 | + guidance_scale=5.0, |
| 46 | +).images[0] |
| 47 | +image.save("./output.png") |
| 48 | +``` |
| 49 | + |
| 50 | + |
| 51 | +### Supported Prompt Patterns |
| 52 | + |
| 53 | +#### 1. Object Move |
| 54 | +```text |
| 55 | +Move the <object> into the red box and finally remove the red box. |
| 56 | +``` |
| 57 | + |
| 58 | +#### 2. Object Rotation |
| 59 | +```text |
| 60 | +Rotate the <object> to show the <view> side view. |
| 61 | +``` |
| 62 | +Supported views: front, right, left, rear, front right, front left, rear right, rear left |
| 63 | + |
| 64 | +#### 3. Camera Control |
| 65 | +```text |
| 66 | +Move the camera. |
| 67 | +- Camera rotation: Yaw {y_rotation}°, Pitch {p_rotation}°. |
| 68 | +- Camera zoom: in/out/unchanged. |
| 69 | +- Keep the 3D scene static; only change the viewpoint. |
| 70 | +``` |
| 71 | + |
| 72 | +This pipeline was contributed by JDopensource Team. The original codebase can be found [here](https://github.com/jd-opensource/JoyAI-Image). |
| 73 | + |
| 74 | + |
| 75 | +## Available Models |
| 76 | +<div style="overflow-x: auto; margin-bottom: 16px;"> |
| 77 | + <table style="border-collapse: collapse; width: 100%;"> |
| 78 | + <thead> |
| 79 | + <tr> |
| 80 | + <th style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de; background-color: #f6f8fa;">Models</th> |
| 81 | + <th style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de; background-color: #f6f8fa;">Type</th> |
| 82 | + <th style="padding: 8px; border: 1px solid #d0d7de; background-color: #f6f8fa;">Description</th> |
| 83 | + <th style="padding: 8px; border: 1px solid #d0d7de; background-color: #f6f8fa;">Download Link</th> |
| 84 | + </tr> |
| 85 | + </thead> |
| 86 | + <tbody> |
| 87 | + <tr> |
| 88 | + <td style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de;">JoyAI‑Image‑Edit</td> |
| 89 | + <td style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de;">Image Editing</td> |
| 90 | + <td style="padding: 8px; border: 1px solid #d0d7de;">Final Release. Specialized model for instruction-guided image editing.</td> |
| 91 | + <td style="padding: 8px; border: 1px solid #d0d7de;"> |
| 92 | + <span style="white-space: nowrap;">🤗 <a href="https://huggingface.co/jdopensource/JoyAI-Image-Edit">Huggingface</a></span> |
| 93 | + </td> |
| 94 | + </tr> |
| 95 | + </tbody> |
| 96 | + </table> |
| 97 | +</div> |
| 98 | + |
| 99 | +## Converting Original Checkpoint to Diffusers Format |
| 100 | + |
| 101 | +If you have the original JoyAI checkpoint, you can convert it to diffusers format using the provided conversion script: |
| 102 | + |
| 103 | +```bash |
| 104 | +python scripts/convert_joyai_image_to_diffusers.py \ |
| 105 | + --source_path /path/to/original/JoyAI-Image-Edit \ |
| 106 | + --output_path /path/to/converted/checkpoint \ |
| 107 | + --dtype bf16 |
| 108 | +``` |
| 109 | + |
| 110 | +After conversion, load the model with: |
| 111 | + |
| 112 | +```py |
| 113 | +from diffusers import JoyAIImagePipeline |
| 114 | +pipe = JoyAIImagePipeline.from_pretrained("/path/to/converted/checkpoint") |
| 115 | +``` |
| 116 | + |
| 117 | + |
| 118 | +## JoyAIImagePipeline |
| 119 | + |
| 120 | +[[autodoc]] JoyAIImagePipeline |
| 121 | + - all |
| 122 | + - __call__ |
| 123 | + |
| 124 | + |
| 125 | +## JoyAIImagePipelineOutput |
| 126 | + |
| 127 | +[[autodoc]] pipelines.joyai_image.pipeline_output.JoyAIImagePipelineOutput |
| 128 | + |
0 commit comments