Skip to content

Commit 9b0818c

Browse files
apolinarioJinLiIdeogramYiYi Xu
authored
Add Ideogram 4 (#13859)
* Add Ideogram 4 Adds the Ideogram 4 text-to-image model: transformer, standard pipeline, modular pipeline, docs, and tests. Checkpoint: ideogram-ai/ideogram-4-nf4 Co-Authored-By: YiYi Xu <yiyi@huggingface.co> * Use split q/k/v projections in Ideogram4 attention Replace the fused `qkv`/`o` linears with canonical `to_q`/`to_k`/`to_v`/`to_out` projections, matching the standard diffusers attention layout and the split checkpoint format. Mathematically equivalent to the fused form (q/k/v are contiguous row-slices of the fused weight). Drops the now-inapplicable fuse/unfuse overrides. --------- Co-authored-by: Jin <jin.li@ideogram.ai> Co-authored-by: YiYi Xu <yiyi@huggingface.co>
1 parent f34de43 commit 9b0818c

25 files changed

Lines changed: 2951 additions & 0 deletions

docs/source/en/_toctree.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -349,6 +349,8 @@
349349
title: HunyuanVideo15Transformer3DModel
350350
- local: api/models/hunyuan_video_transformer_3d
351351
title: HunyuanVideoTransformer3DModel
352+
- local: api/models/ideogram4_transformer2d
353+
title: Ideogram4Transformer2DModel
352354
- local: api/models/transformer_joyimage
353355
title: JoyImageEditTransformer3DModel
354356
- local: api/models/latte_transformer3d
@@ -541,6 +543,8 @@
541543
title: Hunyuan-DiT
542544
- local: api/pipelines/hunyuanimage21
543545
title: HunyuanImage2.1
546+
- local: api/pipelines/ideogram4
547+
title: Ideogram 4
544548
- local: api/pipelines/pix2pix
545549
title: InstructPix2Pix
546550
- local: api/pipelines/joyimage_edit
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
<!--Copyright 2026 Ideogram AI and The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# Ideogram4Transformer2DModel
14+
15+
A transformer for image-like data from [Ideogram 4](https://github.com/ideogram-oss/ideogram-4).
16+
17+
## Ideogram4Transformer2DModel
18+
19+
[[autodoc]] Ideogram4Transformer2DModel
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
<!--Copyright 2026 Ideogram AI and The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# Ideogram 4
14+
15+
Ideogram 4 is a flow-matching text-to-image model that uses a multimodal text encoder and an asymmetric
16+
classifier-free guidance scheme: a dedicated `unconditional_transformer` produces the negative branch with zeroed text
17+
features, while the main `transformer` consumes the full packed text + image sequence.
18+
19+
The pipeline defaults are the recommended settings for best quality, so a plain `pipe(prompt)` call produces
20+
best-quality results out of the box: 48 flow-matching steps on a logit-normal schedule (`mu=0.0`, `std=1.5`) with
21+
classifier-free guidance held at 7.0 for the main steps and dropped to 3.0 for the final 3 "polish" steps.
22+
23+
Key inference-time knobs are exposed via the pipeline call:
24+
25+
- `num_inference_steps`, `mu`, and `std` control the resolution-aware logit-normal flow-matching schedule.
26+
- `guidance_scale` (or a full per-step `guidance_schedule`) blends the conditional and unconditional velocities.
27+
28+
## Text-to-image
29+
30+
```python
31+
import torch
32+
from diffusers import Ideogram4Pipeline
33+
34+
pipe = Ideogram4Pipeline.from_pretrained("ideogram-ai/ideogram-v4", torch_dtype=torch.bfloat16)
35+
pipe.to("cuda")
36+
37+
prompt = "A photo of a cat holding a sign that says hello world"
38+
# The defaults are the recommended settings for best quality.
39+
image = pipe(prompt, height=1024, width=1024, generator=torch.Generator("cuda").manual_seed(0)).images[0]
40+
image.save("ideogram4.png")
41+
```
42+
43+
## Ideogram4Pipeline
44+
45+
[[autodoc]] Ideogram4Pipeline
46+
- all
47+
- __call__
48+
49+
## Ideogram4PipelineOutput
50+
51+
[[autodoc]] pipelines.ideogram4.pipeline_output.Ideogram4PipelineOutput

src/diffusers/__init__.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -258,6 +258,7 @@
258258
"HunyuanVideoFramepackTransformer3DModel",
259259
"HunyuanVideoTransformer3DModel",
260260
"I2VGenXLUNet",
261+
"Ideogram4Transformer2DModel",
261262
"JoyImageEditTransformer3DModel",
262263
"Kandinsky3UNet",
263264
"Kandinsky5Transformer3DModel",
@@ -475,6 +476,8 @@
475476
"HeliosPyramidModularPipeline",
476477
"HunyuanVideo15AutoBlocks",
477478
"HunyuanVideo15ModularPipeline",
479+
"Ideogram4AutoBlocks",
480+
"Ideogram4ModularPipeline",
478481
"LTXAutoBlocks",
479482
"LTXModularPipeline",
480483
"QwenImageAutoBlocks",
@@ -590,6 +593,7 @@
590593
"HunyuanVideoImageToVideoPipeline",
591594
"HunyuanVideoPipeline",
592595
"I2VGenXLPipeline",
596+
"Ideogram4Pipeline",
593597
"IFImg2ImgPipeline",
594598
"IFImg2ImgSuperResolutionPipeline",
595599
"IFInpaintingPipeline",
@@ -1098,6 +1102,7 @@
10981102
HunyuanVideoFramepackTransformer3DModel,
10991103
HunyuanVideoTransformer3DModel,
11001104
I2VGenXLUNet,
1105+
Ideogram4Transformer2DModel,
11011106
JoyImageEditTransformer3DModel,
11021107
Kandinsky3UNet,
11031108
Kandinsky5Transformer3DModel,
@@ -1294,6 +1299,8 @@
12941299
HeliosPyramidModularPipeline,
12951300
HunyuanVideo15AutoBlocks,
12961301
HunyuanVideo15ModularPipeline,
1302+
Ideogram4AutoBlocks,
1303+
Ideogram4ModularPipeline,
12971304
LTXAutoBlocks,
12981305
LTXModularPipeline,
12991306
QwenImageAutoBlocks,
@@ -1405,6 +1412,7 @@
14051412
HunyuanVideoImageToVideoPipeline,
14061413
HunyuanVideoPipeline,
14071414
I2VGenXLPipeline,
1415+
Ideogram4Pipeline,
14081416
IFImg2ImgPipeline,
14091417
IFImg2ImgSuperResolutionPipeline,
14101418
IFInpaintingPipeline,

src/diffusers/models/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,7 @@
118118
_import_structure["transformers.transformer_hunyuan_video15"] = ["HunyuanVideo15Transformer3DModel"]
119119
_import_structure["transformers.transformer_hunyuan_video_framepack"] = ["HunyuanVideoFramepackTransformer3DModel"]
120120
_import_structure["transformers.transformer_hunyuanimage"] = ["HunyuanImageTransformer2DModel"]
121+
_import_structure["transformers.transformer_ideogram4"] = ["Ideogram4Transformer2DModel"]
121122
_import_structure["transformers.transformer_joyimage"] = ["JoyImageEditTransformer3DModel"]
122123
_import_structure["transformers.transformer_kandinsky"] = ["Kandinsky5Transformer3DModel"]
123124
_import_structure["transformers.transformer_longcat_audio_dit"] = ["LongCatAudioDiTTransformer"]
@@ -248,6 +249,7 @@
248249
HunyuanVideo15Transformer3DModel,
249250
HunyuanVideoFramepackTransformer3DModel,
250251
HunyuanVideoTransformer3DModel,
252+
Ideogram4Transformer2DModel,
251253
JoyImageEditTransformer3DModel,
252254
Kandinsky5Transformer3DModel,
253255
LatteTransformer3DModel,

src/diffusers/models/transformers/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@
3939
from .transformer_hunyuan_video15 import HunyuanVideo15Transformer3DModel
4040
from .transformer_hunyuan_video_framepack import HunyuanVideoFramepackTransformer3DModel
4141
from .transformer_hunyuanimage import HunyuanImageTransformer2DModel
42+
from .transformer_ideogram4 import Ideogram4Transformer2DModel
4243
from .transformer_joyimage import JoyImageEditTransformer3DModel
4344
from .transformer_kandinsky import Kandinsky5Transformer3DModel
4445
from .transformer_longcat_audio_dit import LongCatAudioDiTTransformer

0 commit comments

Comments
 (0)