Skip to content

Commit d67491e

Browse files
committed
docs(flux2): clarify image= is reference conditioning, not img2img
1 parent f7fd76a commit d67491e

3 files changed

Lines changed: 38 additions & 10 deletions

File tree

docs/source/en/api/pipelines/flux2.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,26 @@ Flux.2 can potentially generate better better outputs with better prompts. We ca
3232
an input prompt by setting the `caption_upsample_temperature` argument in the pipeline call arguments.
3333
The [official implementation](https://github.com/black-forest-labs/flux2/blob/5a5d316b1b42f6b59a8c9194b77c8256be848432/src/flux2/text_encoder.py#L140) recommends this value to be 0.15.
3434

35+
## Reference conditioning vs. img2img
36+
37+
The `image` argument on `Flux2Pipeline` and `Flux2KleinPipeline` is **reference conditioning**, not
38+
img2img. Reference images are encoded into additional attention tokens that flow through the
39+
transformer alongside the text prompt — there is no noisy latent initialization, and so no `strength`
40+
parameter to scale.
41+
42+
This differs from `StableDiffusionImg2ImgPipeline`, `FluxImg2ImgPipeline`, and
43+
`FluxKontextInpaintPipeline`, which add noise to a latent encoding of the input image and then
44+
partially denoise it. If you port code from those pipelines and pass `strength=...` to a Flux.2
45+
pipeline, you will see:
46+
47+
```
48+
TypeError: Flux2Pipeline.__call__() got an unexpected keyword argument 'strength'
49+
```
50+
51+
Drop the `strength` kwarg and pass references via `image=` (a single image, or a list for multiple
52+
references). For Flux.2 inpainting (which does add noise to a latent and therefore does take a
53+
`strength` parameter), use `Flux2KleinInpaintPipeline` instead.
54+
3555
## Flux2Pipeline
3656

3757
[[autodoc]] Flux2Pipeline

src/diffusers/pipelines/flux2/pipeline_flux2.py

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -769,11 +769,15 @@ def __call__(
769769
770770
Args:
771771
image (`torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, `list[torch.Tensor]`, `list[PIL.Image.Image]`, or `list[np.ndarray]`):
772-
`Image`, numpy array or tensor representing an image batch to be used as the starting point. For both
773-
numpy array and pytorch tensor, the expected value range is between `[0, 1]` If it's a tensor or a list
774-
or tensors, the expected shape should be `(B, C, H, W)` or `(C, H, W)`. If it is a numpy array or a
775-
list of arrays, the expected shape should be `(B, H, W, C)` or `(H, W, C)` It can also accept image
776-
latents as `image`, but if passing latents directly it is not encoded again.
772+
Reference image(s) used to condition generation. Flux.2 encodes them as additional attention tokens that
773+
flow through the transformer alongside the text prompt — this is **reference conditioning**, not
774+
SD/Flux.1 style img2img, so there is no companion `strength` argument. Pass a list to provide multiple
775+
references.
776+
777+
For both numpy array and pytorch tensor, the expected value range is between `[0, 1]`. If it's a tensor
778+
or a list of tensors, the expected shape should be `(B, C, H, W)` or `(C, H, W)`. If it is a numpy
779+
array or a list of arrays, the expected shape should be `(B, H, W, C)` or `(H, W, C)`. Can also accept
780+
image latents directly, in which case they will not be re-encoded.
777781
prompt (`str` or `list[str]`, *optional*):
778782
The prompt or prompts to guide the image generation. If not defined, one has to pass `prompt_embeds`.
779783
instead.

src/diffusers/pipelines/flux2/pipeline_flux2_klein.py

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -635,11 +635,15 @@ def __call__(
635635
636636
Args:
637637
image (`torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.Tensor]`, `List[PIL.Image.Image]`, or `List[np.ndarray]`):
638-
`Image`, numpy array or tensor representing an image batch to be used as the starting point. For both
639-
numpy array and pytorch tensor, the expected value range is between `[0, 1]` If it's a tensor or a list
640-
or tensors, the expected shape should be `(B, C, H, W)` or `(C, H, W)`. If it is a numpy array or a
641-
list of arrays, the expected shape should be `(B, H, W, C)` or `(H, W, C)` It can also accept image
642-
latents as `image`, but if passing latents directly it is not encoded again.
638+
Reference image(s) used to condition generation. Flux.2 encodes them as additional attention tokens that
639+
flow through the transformer alongside the text prompt — this is **reference conditioning**, not
640+
SD/Flux.1 style img2img, so there is no companion `strength` argument. Pass a list to provide multiple
641+
references.
642+
643+
For both numpy array and pytorch tensor, the expected value range is between `[0, 1]`. If it's a tensor
644+
or a list of tensors, the expected shape should be `(B, C, H, W)` or `(C, H, W)`. If it is a numpy
645+
array or a list of arrays, the expected shape should be `(B, H, W, C)` or `(H, W, C)`. Can also accept
646+
image latents directly, in which case they will not be re-encoded.
643647
prompt (`str` or `List[str]`, *optional*):
644648
The prompt or prompts to guide the image generation. If not defined, one has to pass `prompt_embeds`.
645649
instead.

0 commit comments

Comments
 (0)