Skip to content

Commit 4429c94

Browse files
committed
Updated the docstring with the shape requirements
1 parent 7dbf6f6 commit 4429c94

1 file changed

Lines changed: 9 additions & 7 deletions

File tree

src/diffusers/pipelines/flux2/pipeline_flux2_klein_inpaint.py

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -855,17 +855,19 @@ def __call__(
855855
instead.
856856
image (`torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.Tensor]`, `List[PIL.Image.Image]`, or `List[np.ndarray]`):
857857
`Image`, numpy array or tensor representing an image batch to be used as the starting point. For both
858-
numpy array and pytorch tensor, the expected value range is between `[0, 1]` If it's a tensor or a list
859-
or tensors, the expected shape should be `(B, C, H, W)` or `(C, H, W)`. If it is a numpy array or a
860-
list of arrays, the expected shape should be `(B, H, W, C)` or `(H, W, C)` It can also accept image
861-
latents as `image`, but if passing latents directly it is not encoded again.
858+
numpy array and pytorch tensor, the expected value range is between `[0, 1]`. If it's a tensor or a list
859+
of tensors, the expected shape should be `(B, C, H, W)` or `(C, H, W)`. If it is a numpy array or a
860+
list of arrays, the expected shape should be `(B, H, W, C)` or `(H, W, C)`. It can also accept image latents directly,
861+
in which case encoding is skipped. Latents must be in patchified form of shape `(B, latent_channels * 4, H // 2, W // 2)`, where
862+
each 2×2 spatial patch has been folded into the channel dimension.
862863
image_reference (`torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.Tensor]`, `List[PIL.Image.Image]`, or `List[np.ndarray]`, *optional*):
863864
`Image`, numpy array or tensor representing an image batch to be used as the reference for the masked
864865
area. This allows conditioning the inpainted region on a specific reference image. For both numpy array
865-
and pytorch tensor, the expected value range is between `[0, 1]` If it's a tensor or a list or tensors,
866+
and pytorch tensor, the expected value range is between `[0, 1]`. If it's a tensor or a list of tensors,
866867
the expected shape should be `(B, C, H, W)` or `(C, H, W)`. If it is a numpy array or a list of arrays,
867-
the expected shape should be `(B, H, W, C)` or `(H, W, C)` It can also accept image latents as
868-
`image_reference`, but if passing latents directly it is not encoded again.
868+
the expected shape should be `(B, H, W, C)` or `(H, W, C)`. It can also accept image latents directly,
869+
in which case encoding is skipped. Latents must be in patchified form of shape `(B, latent_channels * 4, H // 2, W // 2)`, where
870+
each 2×2 spatial patch has been folded into the channel dimension.
869871
mask_image (`torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.Tensor]`, `List[PIL.Image.Image]`, or `List[np.ndarray]`):
870872
`Image`, numpy array or tensor representing an image batch to mask `image`. White pixels in the mask
871873
are repainted while black pixels are preserved. If `mask_image` is a PIL image, it is converted to a

0 commit comments

Comments
 (0)