You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[agents docs] restructure modular.md: standalone reusability + IO-respect patterns
Distilled from the ErnieImage modular pipeline review (PR huggingface#13498):
- New "Common modular conventions" section: skim qwenimage / flux2 / wan /
helios first, mirroring the references-driven shape of models.md / pipelines.md.
- Promoted "Standalone block reusability" to a Key pattern. Each block (text
encoder, VAE encoder, prepare-latents, denoise, decoder) must run on its
own; encoders take raw inputs only, per-prompt expansion happens in a
dedicated input step inside the core denoise sequence. Replaces old
gotchas huggingface#4 (pre-computed encoder outputs) and huggingface#5 (VAE encode in
prepare-latents).
- Promoted "Flat block assembly" to a Key pattern (was gotcha huggingface#7).
- New gotcha "Respect the declared IO system": one rule covering three
bypass directions — defensive `getattr` reads of declared
components/state, undeclared `block_state` writes, and direct
`state.set()` calls that skip `set_block_state` entirely.
- Reworked InputParam/OutputParam section to link to INPUT_PARAM_TEMPLATES /
OUTPUT_PARAM_TEMPLATES in modular_pipeline_utils.py (the registry is
dynamic) and added a non-template example.
- Added a distilled-checkpoint exception to the `guidance_scale`-as-input
gotcha — distilled flux-style models legitimately accept it.
- Dropped the "inputs duplicating derivable state" gotcha (uncommon).
Co-authored-by: yiyi@huggingface.co <yiyi@ip-26-0-160-103.ec2.internal>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: .ai/modular.md
+54-23Lines changed: 54 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,6 +2,10 @@
2
2
3
3
Shared reference for modular pipeline conventions, patterns, and gotchas.
4
4
5
+
## Common modular conventions
6
+
7
+
When adding a new modular pipeline (or reviewing one), skim `src/diffusers/modular_pipelines/qwenimage/`, `src/diffusers/modular_pipelines/flux2/`, `src/diffusers/modular_pipelines/wan/`, and `src/diffusers/modular_pipelines/helios/` first to establish the pattern. Most conventions (file split between `encoders.py` / `before_denoise.py` / `denoise.py` / `decoders.py`, how `expected_components` / `inputs` / `intermediate_outputs` are declared, the denoise-loop wrapping with `LoopSequentialPipelineBlocks`, top-level assembly via `AutoPipelineBlocks` / `SequentialPipelineBlocks` in `modular_blocks_<model>.py`, the `ModularPipeline` subclass shape, the guider-abstracted denoise body, `kwargs_type="denoiser_input_fields"` plumbing) are easiest to internalize by comparison rather than from a fixed list.
8
+
5
9
## File structure
6
10
7
11
```
@@ -107,34 +111,60 @@ class AutoDenoise(ConditionalPipelineBlocks):
107
111
default_block_name ="text2video"
108
112
```
109
113
110
-
## Standard InputParam/OutputParam templates
114
+
## Key pattern: Standalone block reusability
115
+
116
+
One of the core reason a pipeline is split into blocks at all: each block (text encoder, VAE encoder, prepare-latents, denoise, decoder) must be runnable on its own, and its output must be reusable as the input to a different downstream chain.
117
+
118
+
Concretely:
119
+
- The text encoder block returns `prompt_embeds`. A user can run only that block, save the embeddings, and feed them to the denoise loop later — possibly with a different `num_images_per_prompt`, possibly across multiple runs.
120
+
- The VAE encoder is its own block in `encoders.py` (e.g. `WanVaeEncoderStep`) returning `image_latents`. The prepare-latents block accepts `image_latents`, not raw images, so users can swap in pre-encoded latents.
121
+
- The decoder block accepts denoised latents from any source — directly from the denoise loop, or after an injected step (upscale, latent edit). Don't bundle decoding into the denoise loop.
122
+
123
+
Two consequences for input plumbing:
124
+
125
+
1.**Encoder / VAE-encoder blocks accept raw inputs only** (`prompt`, `image`, ...) and emit per-prompt outputs (`prompt_embeds`, `image_latents`). They do **not** bake in `num_images_per_prompt`.
126
+
2.**Per-prompt expansion happens in a dedicated input step** inside the core denoise sequence (e.g. `<Model>TextInputStep`). That keeps pre-encoded embeds reusable across runs with different `num_images_per_prompt`. See `qwenimage/before_denoise.py` for the canonical input step.
127
+
128
+
Standard pipelines accept `prompt_embeds` / `image_latents` as `__call__` inputs so users can skip encoding. In modular pipelines this is unnecessary — users just pop out the encoder block and run it standalone. Don't accept pre-computed encoder outputs as `__call__` inputs of an encoder block.
129
+
130
+
## Key pattern: Flat block assembly
131
+
132
+
Prefer flat sequences over nested compositions. Put the `Auto` / `Conditional` selection at the top level and make each workflow variant a flat `InsertableDict` of leaf blocks. Try not to nest `AutoPipelineBlocks` inside `SequentialPipelineBlocks` inside `AutoPipelineBlocks` — debugging which workflow was selected, and which block inside which sub-block touched which state, becomes painful. See `flux2/modular_blocks_flux2_klein.py` for the canonical shape.
133
+
134
+
## InputParam / OutputParam
135
+
136
+
Use `.template("<name>")` for params with a canonical meaning (`prompt`, `negative_prompt`, `image`, `generator`, `num_inference_steps`, `latents`, `prompt_embeds`, `images`, `videos`, etc.) — the template carries a vetted description and type hint. The full registry lives in [`src/diffusers/modular_pipelines/modular_pipeline_utils.py`](../src/diffusers/modular_pipelines/modular_pipeline_utils.py) (`INPUT_PARAM_TEMPLATES`, `OUTPUT_PARAM_TEMPLATES`); read that file rather than relying on a hardcoded list here, since names get added.
137
+
138
+
For params that don't match a template (model-specific names, custom semantics), declare the field directly:
description="Per-prompt text lengths used by the transformer attention mask.",
147
+
)
120
148
121
149
# Outputs
122
-
OutputParam.template("prompt_embeds")
123
-
OutputParam.template("negative_prompt_embeds")
124
-
OutputParam.template("image_latents")
125
-
OutputParam.template("latents")
126
-
OutputParam.template("videos")
127
-
OutputParam.template("images")
150
+
OutputParam(
151
+
"text_bth",
152
+
type_hint=torch.Tensor,
153
+
kwargs_type="denoiser_input_fields",
154
+
description="Padded text hidden states of shape (B, T_max, H) fed into the transformer.",
155
+
)
128
156
```
129
157
158
+
If a template's predefined description doesn't fit (e.g. the `"latents"` output template means "Denoised latents", which is wrong for the noisy latents out of a prepare-latents step) — drop the template and declare the field directly with an accurate description. See gotcha #5.
2.**Cross-importing between modular pipelines.** Don't import utilities from another model's modular pipeline (e.g. SD3 importing from `qwenimage.inputs`). If a utility is shared, move it to `modular_pipeline_utils.py` or copy it with a `# Copied from` header.
151
181
152
-
3.**Accepting `guidance_scale` as a pipeline input.** Users configure the guider separately (see [guider docs](https://huggingface.co/docs/diffusers/main/en/api/guiders)). Different guider types have different parameters; forwarding them through the pipeline doesn't scale. Don't manually set `components.guider.guidance_scale = ...` inside blocks. Same applies to computing `do_classifier_free_guidance` — that logic belongs in the guider.
153
-
154
-
4.**Accepting pre-computed outputs as inputs to skip encoding.** In standard pipelines we accept `prompt_embeds`, `negative_prompt_embeds`, `image_latents`, etc. so users can skip encoding steps. In modular pipelines this is unnecessary — users just pop out the encoder block and run it separately. Encoder blocks should only accept raw inputs (`prompt`, `image`, etc.).
182
+
3.**Accepting `guidance_scale` as a pipeline input.** Users configure the guider separately (see [guider docs](https://huggingface.co/docs/diffusers/main/en/api/guiders)). Different guider types have different parameters; forwarding them through the pipeline doesn't scale. Don't manually set `components.guider.guidance_scale = ...` inside blocks. Same applies to computing `do_classifier_free_guidance` — that logic belongs in the guider. **Exception:** some pipeline only support distilled checkpoints (e.g. distilled Flux) skip CFG entirely and don't carry a guider — `guidance_scale` is then a real model input, not a guider knob, and accepting it as a pipeline input is fine. If you're reviewing a pipeline that doesn't have a `guider` in `expected_components`, flag it explicitly so the choice is intentional.
155
183
156
-
5.**VAE encoding inside prepare-latents.**Image encoding should be its own block in `encoders.py` (e.g. `MyModelVaeEncoderStep`). The prepare-latents block should accept `image_latents`, not raw images. This lets users run encoding standalone. See `WanVaeEncoderStep` for reference.
184
+
4.**Instantiating components inline.**If a class like `VideoProcessor` is needed, register it as a `ComponentSpec` and access via `components.video_processor`. Don't create new instances inside block `__call__`.
157
185
158
-
6.**Instantiating components inline.**If a class like `VideoProcessor` is needed, register it as a `ComponentSpec` and access via `components.video_processor`. Don't create new instances inside block `__call__`.
186
+
5.**Using `InputParam.template()` / `OutputParam.template()` when semantics don't match.**Templates carry predefined descriptions — e.g. the `"latents"` output template means "Denoised latents". Don't use it for initial noisy latents from a prepare-latents step. Use a plain `InputParam(...)` / `OutputParam(...)` with an accurate description instead.
159
187
160
-
7.**Deeply nested block structure.** Prefer flat sequences over nesting Auto blocks inside Sequential blocks inside Auto blocks. Put the `Auto` selection at the top level and make each workflow variant a flat `InsertableDict` of leaf blocks. See `flux2/modular_blocks_flux2_klein.py` for the pattern.
188
+
6.**Test model paths pointing to contributor repos.** Tiny test models must live under `hf-internal-testing/`, not personal repos like `username/tiny-model`. Move the model before merge.
161
189
162
-
8.**Using `InputParam.template()` / `OutputParam.template()` when semantics don't match.** Templates carry predefined descriptions — e.g. the `"latents"` output template means "Denoised latents". Don't use it for initial noisy latents from a prepare-latents step. Use a plain `InputParam(...)` / `OutputParam(...)` with an accurate description instead.
190
+
7.**Respect the declared IO system.** Components in `expected_components`, fields in `inputs` / `intermediate_outputs` — once declared, the modular framework guarantees them. So:
191
+
-**Don't read defensively.** Declared components are always set as attributes (possibly `None`); declared upstream outputs are always populated in `block_state` after the upstream block runs. `getattr(components, "vae", None)`, `hasattr(self, "vae")`, `getattr(block_state, "prompt_embeds", None)` are dead code that hides typos. Use `components.vae` / `block_state.prompt_embeds` directly. Check `is not None` only when nullability is meaningful (a component the user might not have loaded).
192
+
-**Don't write undeclared.** If a block sets `block_state.foo = ...`, declare `OutputParam("foo", ...)` in `intermediate_outputs`. The declarations are the public contract — undeclared writes can't be wired to downstream blocks.
193
+
-**Don't call `state.set()` directly inside a block.** Write to state only through declared `intermediate_outputs` via `self.get_block_state(state)` / `self.set_block_state(state, block_state)`. A direct `state.set("foo", value)` bypasses the block's interface entirely — the field never appears as a declared output, so downstream blocks can't see it through the normal wiring and the framework can't generate docs / validate types for it.
163
194
164
-
9.**Test model paths pointing to contributor repos.**Tiny test models must live under `hf-internal-testing/`, not personal repos like `username/tiny-model`. Move the model before merge.
195
+
8.**No-op skip logic inside an optional block.**If a step is conditional (e.g. an optional prompt enhancer), don't have the block check a flag at the top of `__call__` and `return` early. Wrap it in an `AutoPipelineBlocks` with `block_trigger_inputs = ["use_xxx"]` so the block is only assembled into the pipeline when the trigger input is provided. The block's own `__call__` should always assume its components and inputs are present.
0 commit comments