A Qwen3-VL based raw pixel patch transformer for HiDream-O1-Image.
HiDream-O1 does not use a VAE. The transformer predicts raw RGB pixel patches through the O1 denoising path added on top of Qwen3-VL.
The model can be loaded with the following code snippet.
import torch
from diffusers import HiDreamO1Transformer2DModel
transformer = HiDreamO1Transformer2DModel.from_pretrained(
"HiDream-ai/HiDream-O1-Image",
torch_dtype=torch.bfloat16,
)[[autodoc]] HiDreamO1Transformer2DModel