huggingface
diff --git a/‎docs/source/en/_toctree.yml‎
Lines changed: 2 additions & 0 deletions b/‎docs/source/en/_toctree.yml‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎docs/source/en/api/attnprocessor.md‎
Lines changed: 4 additions & 0 deletions b/‎docs/source/en/api/attnprocessor.md‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎docs/source/en/api/pipelines/dreamlite.md‎
Lines changed: 157 additions & 0 deletions b/‎docs/source/en/api/pipelines/dreamlite.md‎
Lines changed: 157 additions & 0 deletions
diff --git a/‎src/diffusers/__init__.py‎
Lines changed: 10 additions & 0 deletions b/‎src/diffusers/__init__.py‎
Lines changed: 10 additions & 0 deletions
diff --git a/‎src/diffusers/models/__init__.py‎
Lines changed: 4 additions & 0 deletions b/‎src/diffusers/models/__init__.py‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎src/diffusers/models/transformers/__init__.py‎
Lines changed: 1 addition & 0 deletions b/‎src/diffusers/models/transformers/__init__.py‎
Lines changed: 1 addition & 0 deletions
@@ -527,6 +527,8 @@
         title: DeepFloyd IF
       - local: api/pipelines/dit
         title: DiT
+      - local: api/pipelines/dreamlite
+        title: DreamLite
       - local: api/pipelines/easyanimate
         title: EasyAnimate
       - local: api/pipelines/ernie_image
 
@@ -44,6 +44,10 @@ An attention processor is a class for applying different types of attention mech
 
 [[autodoc]] models.attention_processor.FusedCogVideoXAttnProcessor2_0
 
+## DreamLite
+
+[[autodoc]] models.unets.unet_dreamlite.DreamLiteAttnProcessor2_0
+
 ## CrossFrameAttnProcessor
 
 [[autodoc]] pipelines.deprecated.text_to_video_synthesis.pipeline_text_to_video_zero.CrossFrameAttnProcessor
 
@@ -0,0 +1,157 @@
+<!--Copyright 2026 The ByteDance Authors. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# DreamLite
+
+DreamLite is a text-to-image and image-editing model from ByteDance. It pairs a custom 2D U-Net
+(`DreamLiteUNetModel`) with the `Qwen3-VL` multimodal encoder as its prompt / image-instruction encoder,
+and uses an `AutoencoderTiny` (TAESD-style) VAE for fast latent encode/decode.
+
+Two pipelines are exposed:
+
+| Pipeline | Modes | CFG | Use case |
+|---|---|---|---|
+| [`DreamLitePipeline`] | text-to-image **and** image-editing (auto-selected by whether `image` is `None`) | 3-branch dual CFG (`guidance_scale` on text branch, `image_guidance_scale` on image branch, à la InstructPix2Pix) | Highest quality |
+| [`DreamLiteMobilePipeline`] | text-to-image **and** image-editing (auto-selected by whether `image` is `None`) | None — distilled, single UNet forward per step | On-device / low-latency |
+
+Official checkpoints:
+
+* Base model: [carlofkl/DreamLite-base](https://huggingface.co/carlofkl/DreamLite-base)
+* Distilled mobile model: [carlofkl/DreamLite-mobile](https://huggingface.co/carlofkl/DreamLite-mobile)
+
+> [!TIP]
+> Both pipelines auto-detect text-to-image vs. image-editing mode from whether the `image` argument is
+> provided. There is no separate `Img2Img` class.
+
+> [!TIP]
+> When loading an input image for editing, prefer `diffusers.utils.load_image(...)` over raw `PIL.Image.open(...)`.
+> `load_image` enforces an RGB conversion and applies EXIF orientation, both of which the pipeline assumes.
+> A plain `Image.open` of an RGBA / palette / EXIF-rotated source will silently produce a different latent
+> conditioning and degrade output quality.
+
+## Text-to-image (Base)
+
+```python
+import torch
+from diffusers import DreamLitePipeline
+
+pipe = DreamLitePipeline.from_pretrained("carlofkl/DreamLite-base", revision="diffusers", torch_dtype=torch.bfloat16)
+pipe = pipe.to("cuda")
+
+image = pipe(
+    prompt="a dog running on the grass",
+    negative_prompt="",
+    height=1024,
+    width=1024,
+    num_inference_steps=28,
+    generator=torch.Generator("cpu").manual_seed(42),
+).images[0]
+image.save("dreamlite_t2i.png")
+```
+
+## Image editing (Base)
+
+Pass an `image` to enter edit mode. Both `guidance_scale` (text branch) and `image_guidance_scale`
+(image branch) are active here.
+
+```python
+import torch
+from diffusers import DreamLitePipeline
+from diffusers.utils import load_image
+
+pipe = DreamLitePipeline.from_pretrained("carlofkl/DreamLite-base", revision="diffusers", torch_dtype=torch.bfloat16)
+pipe = pipe.to("cuda")
+
+source = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png")
+
+image = pipe(
+    prompt="turn the cat into a corgi",
+    image=source,
+    height=1024,
+    width=1024,
+    num_inference_steps=28,
+    generator=torch.Generator("cpu").manual_seed(42),
+).images[0]
+image.save("dreamlite_edit.png")
+```
+
+## Text-to-image (Mobile)
+
+The mobile pipeline is distilled and skips CFG entirely — a single UNet forward per step. It accepts the
+same `prompt` / `height` / `width` / `num_inference_steps` arguments, but **ignores** `guidance_scale` and
+`image_guidance_scale` if passed (a warning is logged).
+
+```python
+import torch
+from diffusers import DreamLiteMobilePipeline
+
+pipe = DreamLiteMobilePipeline.from_pretrained("carlofkl/DreamLite-mobile", revision="diffusers", torch_dtype=torch.bfloat16)
+pipe = pipe.to("cuda")
+
+image = pipe(
+    prompt="a dog running on the grass",
+    height=1024,
+    width=1024,
+    num_inference_steps=4,
+    generator=torch.Generator("cpu").manual_seed(42),
+).images[0]
+image.save("dreamlite_mobile_t2i.png")
+```
+
+## Image editing (Mobile)
+
+```python
+import torch
+from diffusers import DreamLiteMobilePipeline
+from diffusers.utils import load_image
+
+pipe = DreamLiteMobilePipeline.from_pretrained("carlofkl/DreamLite-mobile", revision="diffusers", torch_dtype=torch.bfloat16)
+pipe = pipe.to("cuda")
+
+source = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png")
+
+image = pipe(
+    prompt="turn the cat into a corgi",
+    image=source,
+    height=1024,
+    width=1024,
+    num_inference_steps=4,
+    generator=torch.Generator("cpu").manual_seed(42),
+).images[0]
+image.save("dreamlite_mobile_edit.png")
+```
+
+## Notes and limitations
+
+* Both pipelines force `batch_size = 1` internally; `num_images_per_prompt` controls how many samples
+  are drawn from the same prompt rather than parallel batching.
+* The prompt encoder is `Qwen3-VL`, which is a multimodal model. Loading the full pipeline therefore
+  requires sufficient GPU memory for both the U-Net and the Qwen3-VL text encoder (~4 GB + ~0.7 GB
+  in bf16 for the base release).
+* The VAE is `AutoencoderTiny` and exposes `encoder_block_out_channels`; `vae_scale_factor` is derived
+  from it at pipeline init time.
+
+## DreamLitePipeline
+
+[[autodoc]] DreamLitePipeline
+    - all
+    - __call__
+
+## DreamLiteMobilePipeline
+
+[[autodoc]] DreamLiteMobilePipeline
+    - all
+    - __call__
+
+## DreamLitePipelineOutput
+
+[[autodoc]] pipelines.dreamlite.pipeline_output.DreamLitePipelineOutput
@@ -254,6 +254,8 @@
             "CosmosControlNetModel",
             "CosmosTransformer3DModel",
             "DiTTransformer2DModel",
+            "DreamLiteTransformer2DModel",
+            "DreamLiteUNetModel",
             "EasyAnimateTransformer3DModel",
             "ErnieImageTransformer2DModel",
             "Flux2Transformer2DModel",
@@ -570,6 +572,9 @@
             "CosmosTextToWorldPipeline",
             "CosmosVideoToWorldPipeline",
             "CycleDiffusionPipeline",
+            "DreamLiteMobilePipeline",
+            "DreamLitePipeline",
+            "DreamLitePipelineOutput",
             "EasyAnimateControlPipeline",
             "EasyAnimateInpaintPipeline",
             "EasyAnimatePipeline",
@@ -1108,6 +1113,8 @@
             CosmosControlNetModel,
             CosmosTransformer3DModel,
             DiTTransformer2DModel,
+            DreamLiteTransformer2DModel,
+            DreamLiteUNetModel,
             EasyAnimateTransformer3DModel,
             ErnieImageTransformer2DModel,
             Flux2Transformer2DModel,
@@ -1399,6 +1406,9 @@
             CosmosTextToWorldPipeline,
             CosmosVideoToWorldPipeline,
             CycleDiffusionPipeline,
+            DreamLiteMobilePipeline,
+            DreamLitePipeline,
+            DreamLitePipelineOutput,
             EasyAnimateControlPipeline,
             EasyAnimateInpaintPipeline,
             EasyAnimatePipeline,
 
@@ -96,6 +96,7 @@
     _import_structure["transformers.stable_audio_transformer"] = ["StableAudioDiTModel"]
     _import_structure["transformers.t5_film_transformer"] = ["T5FilmDecoder"]
     _import_structure["transformers.transformer_2d"] = ["Transformer2DModel"]
+    _import_structure["transformers.transformer_2d_dreamlite"] = ["DreamLiteTransformer2DModel"]
     _import_structure["transformers.transformer_allegro"] = ["AllegroTransformer3DModel"]
     _import_structure["transformers.transformer_anyflow"] = ["AnyFlowTransformer3DModel"]
     _import_structure["transformers.transformer_anyflow_far"] = ["AnyFlowFARTransformer3DModel"]
@@ -145,6 +146,7 @@
     _import_structure["unets.unet_2d"] = ["UNet2DModel"]
     _import_structure["unets.unet_2d_condition"] = ["UNet2DConditionModel"]
     _import_structure["unets.unet_3d_condition"] = ["UNet3DConditionModel"]
+    _import_structure["unets.unet_dreamlite"] = ["DreamLiteUNetModel"]
     _import_structure["unets.unet_i2vgen_xl"] = ["I2VGenXLUNet"]
     _import_structure["unets.unet_kandinsky3"] = ["Kandinsky3UNet"]
     _import_structure["unets.unet_motion_model"] = ["MotionAdapter", "UNetMotionModel"]
@@ -236,6 +238,7 @@
             Cosmos3OmniTransformer,
             CosmosTransformer3DModel,
             DiTTransformer2DModel,
+            DreamLiteTransformer2DModel,
             DualTransformer2DModel,
             EasyAnimateTransformer3DModel,
             ErnieImageTransformer2DModel,
@@ -282,6 +285,7 @@
             ZImageTransformer2DModel,
         )
         from .unets import (
+            DreamLiteUNetModel,
             I2VGenXLUNet,
             Kandinsky3UNet,
             MotionAdapter,
 
@@ -17,6 +17,7 @@
     from .stable_audio_transformer import StableAudioDiTModel
     from .t5_film_transformer import T5FilmDecoder
     from .transformer_2d import Transformer2DModel
+    from .transformer_2d_dreamlite import DreamLiteTransformer2DModel
     from .transformer_allegro import AllegroTransformer3DModel
     from .transformer_anyflow import AnyFlowTransformer3DModel
     from .transformer_anyflow_far import AnyFlowFARTransformer3DModel