Commit a50ade4
[Pipelines] Add DreamLite text-to-image and image-edit pipelines (#13815)
* feat(pipelines): add DreamLite text-to-image and image-edit pipelines
Add ByteDance's DreamLite model family to diffusers. DreamLite is a
UNet-based diffusion model that supports both text-to-image generation
and reference-image editing through a shared 3-branch dual-CFG design.
Two pipelines are shipped:
* DreamLitePipeline - full 3-branch dual CFG (negative,
reference, prompt); supports T2I and
I2I editing at 1024x1024.
* DreamLiteMobilePipeline - distilled single-branch variant for
on-device inference; no CFG.
New model code (all isolated under *_dreamlite.py / unet_dreamlite.py
to avoid touching shared upstream files):
* models/transformers/transformer_2d_dreamlite.py - DreamLite 2D
transformer block.
* models/unets/unet_dreamlite.py - DreamLiteUNetModel.
* models/unets/unet_2d_blocks_dreamlite.py - DreamLite-specific
down/up/mid blocks.
* models/resnet_dreamlite.py - DreamLite ResNet
variants.
* models/attention_processor.py - add
DreamLiteAttnProcessor2_0 (pure addition, no existing processor
modified).
Pipeline + tests + docs:
* pipelines/dreamlite/{__init__.py, pipeline_dreamlite.py,
pipeline_dreamlite_mobile.py, pipeline_output.py}.
* tests/pipelines/dreamlite/{test_pipeline_dreamlite.py,
test_pipeline_dreamlite_mobile.py} with the standard
PipelineTesterMixin suite; setUp/tearDown auto-patches encode_prompt
with a fake so MagicMock text encoders work without per-test
boilerplate.
* Skip 8 mixin tests that don't apply to DreamLite (MagicMock
serialisation, custom attention processor, encode_prompt return
shape, batch_size > 1 sweep), mirroring SD3 / Flux conventions.
* docs/source/en/api/pipelines/dreamlite.md + _toctree.yml entry
(alphabetically between DiT and EasyAnimate).
* Register exports in 6 __init__.py files.
Two real bugs surfaced by the mixin test suite are fixed in this
commit:
* num_images_per_prompt > 1: prompt_embeds and text_attention_mask
are now repeated along the batch dimension in both pipelines'
T2I and I2I branches before being passed to the UNet.
* vae=None: __init__ now guards the encoder_block_out_channels
lookup so encode_prompt can be tested in isolation per
PipelineTesterMixin convention.
SlowTests real-checkpoint resolution is set to 1024x1024 (the only
size DreamLite is trained for).
Test result: 27 passed, 50 skipped, 0 failed on CPU fast suite.
make style && make quality: clean.
* docs+tests(pipelines/dreamlite): pin Hub repos to `diffusers` branch
The `carlofkl/DreamLite-{base,mobile}` Hub repos host two flavours of the
same checkpoint:
* `main` branch - keeps `model_index.json` pointing at ByteDance's
internal package path so the original (non-diffusers)
reference code can still load these weights.
* `diffusers` branch - rewrites the `unet` entry of `model_index.json` to
`["diffusers", "DreamLiteUNetModel"]` so this
integration loads correctly from `diffusers`.
This commit pins every `from_pretrained(...)` call shipped with the
diffusers integration (docs examples, pipeline docstrings, SlowTests) to
`revision="diffusers"`. Local-override env vars (DREAMLITE_BASE_PATH /
DREAMLITE_MOBILE_PATH) still bypass the revision pin.
* chore(pipelines/dreamlite): sync `# Copied from` blocks + dummy objects after rebase
Mechanical changes after rebasing onto current `main`:
* `pipeline_dreamlite.py::retrieve_timesteps` — re-synced from
`diffusers.pipelines.flux.pipeline_flux.retrieve_timesteps` (PEP 604
type hints, expanded docstring, plus the new
`accepts_timesteps` / `accept_sigmas` introspection guards). DreamLite's
default code path uses `num_inference_steps` (uniform schedule) and never
passes custom `timesteps` / `sigmas`, so the added guards are dead-code
for this pipeline — behaviour is unchanged.
* `dummy_pt_objects.py` / `dummy_torch_and_transformers_objects.py` —
registered the dummy classes auto-generated by `make fix-copies` for
`DreamLiteTransformer2DModel`, `DreamLiteUNetModel`, `DreamLitePipeline`,
`DreamLiteMobilePipeline`, `DreamLitePipelineOutput`.
Generated by `make fix-copies`. No hand edits.
* docs(dreamlite): register attention processor + split combined docstring entries
- Register DreamLiteAttnProcessor2_0 in docs/source/en/api/attnprocessor.md
(fixes check_support_list.py).
- Split combined 'height / width' and 'guidance_scale / image_guidance_scale'
entries in the two pipeline docstrings; add a complete Args block to
DreamLiteTransformer2DModel.forward
(fixes check_forward_call_docstrings.py).
No behavioral change.
* refactor(dreamlite): address review feedback from #13815
- Inline the down/up block factories and define DreamLiteCrossAttn{,NoSelfAttn}{Down,Up}Block2D directly (review #1, #2)
- Rename DownBlock2DDreamLite/UpBlock2DDreamLite to DreamLiteDownBlock2D/DreamLiteUpBlock2D to match diffusers naming conventions (review #3, #4)
- Merge unet_2d_blocks_dreamlite.py into unet_dreamlite.py to mirror recent transformer model files (review #5)
- Wire max_sequence_length into the tokenizer call for generate mode (review #6)
- Replace hard-coded drop_idx values (64/34) with self.prompt_template_encode_*_start_idx attributes plus a comment explaining how the offsets are derived (review #7, #8)
- Drop the manual Image.resize call and rely on VaeImageProcessor's LANCZOS default in preprocess(image, height, width) (review #9)
- Use self.guidance_scale / self.image_guidance_scale properties in the CFG combine instead of the underscore-prefixed attributes (review #10, #11)
- Inline retrieve_latents / retrieve_timesteps / calculate_shift in the mobile pipeline with `# Copied from` markers, removing the cross-pipeline imports (review #12)
- Add `# Copied from` marker to _extract_masked_hidden in the mobile pipeline (review #13)
* refactor(dreamlite): address dg845 follow-up review
- Merge resnet_dreamlite.py (DepthwiseSeparableConv + ResnetBlock2DDreamLite)
into unet_dreamlite.py and delete the standalone module (review #1)
- Move DreamLiteAttnProcessor2_0 from attention_processor.py into
unet_dreamlite.py to keep all DreamLite-specific code in one place;
update docs autodoc reference accordingly (review #2)
- Drop the PyTorch 2.0 hasattr/ImportError check in
DreamLiteAttnProcessor2_0.__init__ (diffusers already requires
torch>=2.0; matches Wan deprecation) (review #3)
- Drop the deprecated `scale` argument handling from
DreamLiteAttnProcessor2_0.__call__ (new model, no legacy callers)
(review #4)
- Switch SDPA call to dispatch_attention_fn so all diffusers attention
backends (FlashAttention, FlashAttention-3, sageattention, etc.) are
selectable (review #5)
- Rename block dispatch keys in _get_{down,mid,up}_block_dreamlite to
match the Python class names (DreamLiteCrossAttn{Down,Up}Block2D /
DreamLiteCrossAttnNoSelfAttn{Down,Up}Block2D /
DreamLiteUNetMidBlock2DCrossAttn / DreamLite{Down,Up}Block2D);
default down/up/mid block_types in DreamLiteUNetModel and the test
fixtures are updated to the new keys (review #6, #7); the
carlofkl/DreamLite-{base,mobile} (diffusers branch) Hub configs are
being updated in lock-step
- Localize retrieve_latents inside pipeline_dreamlite.py with a
`# Copied from` marker, removing the cross-pipeline import; mirrors
the mobile pipeline (review #8)
- Add a check_inputs() method to both DreamLitePipeline and
DreamLiteMobilePipeline (mobile uses `# Copied from`); call it from
__call__; pulls the image-type validation out of prepare_image_latents
and adds prompt-type and h/w-divisibility checks (review #9)
* fix(dreamlite): correct Q/K/V layout for dispatch_attention_fn
dispatch_attention_fn expects (batch, seq, heads, head_dim) and handles the transpose internally; the previous code passed (batch, heads, seq, head_dim), which collided with the dispatch's internal transpose and broke inference (RuntimeError: tensor size mismatch at non-singleton dimension 1).
* test(dreamlite): swap MagicMock for tiny real Qwen3-VL fixture
Address dg845's review: rebuild the DreamLite fast-test fixture around a
real (tiny) Qwen3VLForConditionalGeneration + Qwen3VLProcessor so the
standard PipelineTesterMixin save/load, dtype, and offload tests run
end-to-end against the actual encode_prompt code path. Override
DreamLiteUNetModel.set_default_attn_processor to reinstall the GQA
processor so mixin utilities that round-trip through it keep working.
* Apply style fixes
* fix(dreamlite): address blocking review issues from #13815
- Override _no_split_modules / _repeated_blocks on DreamLiteUNetModel
with the actual DreamLite class names (BasicTransformerBlockDreamLite,
ResnetBlock2DDreamLite, DreamLiteCrossAttnUpBlock2D,
DreamLiteUpBlock2D) so device_map="auto" and compile_repeated_blocks()
match correctly.
- Keep attention masks as bool tensors in DreamLiteTransformer2DModel
instead of converting them to dense additive float biases. The dense
format hard-raises on flash / _flash_3 / _sage backends in
dispatch_attention_fn (which requires dtype == torch.bool).
- Add explicit parentheses around each clause in check_inputs's mixed
and/or condition (both pipelines) for readability.
- Replace nn.Module.__init__(self) with ModelMixin.__init__(self) in
DreamLiteUNetModel.__init__ so mixin state (e.g.
_gradient_checkpointing_func) is properly initialised. ConfigMixin /
PushToHubMixin don't define their own __init__, so this covers the
full chain without re-running UNet2DConditionModel.__init__.
* fix(dreamlite): forward all processor outputs to Qwen3VL text encoder
Recent versions of Qwen3VLProcessor add an mm_token_type_ids output, and
Qwen3VLModel.compute_3d_position_ids raises ValueError whenever
multimodal inputs are present (image_grid_thw is not None) but
mm_token_type_ids is None.
encode_prompt previously forwarded only input_ids / attention_mask /
pixel_values / image_grid_thw, dropping the new field and breaking the
fast pipeline tests against transformers main.
Switch to ``self.text_encoder(**tk_out, output_hidden_states=True)``
(matching NucleusMoEImagePipeline) so all processor outputs are
forwarded automatically and future additions don't regress this path.
* Apply style fixes
* docs(dreamlite): address final review nits from #13815
- Replace broken cat.png URL in editing examples (both base and mobile)
with the standard `huggingface/documentation-images` source used
elsewhere in the diffusers docs.
- Promote the recommended guidance_scale=3.5 / image_guidance_scale=1.5
to the default values of DreamLitePipeline.__call__, and drop the
now-redundant explicit args from the docs examples.
- Switch the EXAMPLE_DOC_STRING examples in both pipelines from
torch.float16 to torch.bfloat16 for consistency with the rest of the
docs.
---------
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>1 parent 0b83812 commit a50ade4
19 files changed
Lines changed: 4828 additions & 0 deletions
File tree
- docs/source/en
- api
- pipelines
- src/diffusers
- models
- transformers
- unets
- pipelines
- dreamlite
- utils
- tests/pipelines/dreamlite
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
527 | 527 | | |
528 | 528 | | |
529 | 529 | | |
| 530 | + | |
| 531 | + | |
530 | 532 | | |
531 | 533 | | |
532 | 534 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
47 | 51 | | |
48 | 52 | | |
49 | 53 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
254 | 254 | | |
255 | 255 | | |
256 | 256 | | |
| 257 | + | |
| 258 | + | |
257 | 259 | | |
258 | 260 | | |
259 | 261 | | |
| |||
570 | 572 | | |
571 | 573 | | |
572 | 574 | | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
573 | 578 | | |
574 | 579 | | |
575 | 580 | | |
| |||
1108 | 1113 | | |
1109 | 1114 | | |
1110 | 1115 | | |
| 1116 | + | |
| 1117 | + | |
1111 | 1118 | | |
1112 | 1119 | | |
1113 | 1120 | | |
| |||
1399 | 1406 | | |
1400 | 1407 | | |
1401 | 1408 | | |
| 1409 | + | |
| 1410 | + | |
| 1411 | + | |
1402 | 1412 | | |
1403 | 1413 | | |
1404 | 1414 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
96 | 96 | | |
97 | 97 | | |
98 | 98 | | |
| 99 | + | |
99 | 100 | | |
100 | 101 | | |
101 | 102 | | |
| |||
145 | 146 | | |
146 | 147 | | |
147 | 148 | | |
| 149 | + | |
148 | 150 | | |
149 | 151 | | |
150 | 152 | | |
| |||
236 | 238 | | |
237 | 239 | | |
238 | 240 | | |
| 241 | + | |
239 | 242 | | |
240 | 243 | | |
241 | 244 | | |
| |||
282 | 285 | | |
283 | 286 | | |
284 | 287 | | |
| 288 | + | |
285 | 289 | | |
286 | 290 | | |
287 | 291 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| 20 | + | |
20 | 21 | | |
21 | 22 | | |
22 | 23 | | |
| |||
0 commit comments