@@ -855,19 +855,21 @@ def __call__(
855855 instead.
856856 image (`torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.Tensor]`, `List[PIL.Image.Image]`, or `List[np.ndarray]`):
857857 `Image`, numpy array or tensor representing an image batch to be used as the starting point. For both
858- numpy array and pytorch tensor, the expected value range is between `[0, 1]`. If it's a tensor or a list
859- of tensors, the expected shape should be `(B, C, H, W)` or `(C, H, W)`. If it is a numpy array or a
860- list of arrays, the expected shape should be `(B, H, W, C)` or `(H, W, C)`. It can also accept image latents directly,
861- in which case encoding is skipped. Latents must be in patchified form of shape `(B, latent_channels * 4, H // 2, W // 2)`, where
862- each 2×2 spatial patch has been folded into the channel dimension.
858+ numpy array and pytorch tensor, the expected value range is between `[0, 1]`. If it's a tensor or a
859+ list of tensors, the expected shape should be `(B, C, H, W)` or `(C, H, W)`. If it is a numpy array or
860+ a list of arrays, the expected shape should be `(B, H, W, C)` or `(H, W, C)`. It can also accept image
861+ latents directly, in which case encoding is skipped. Latents must be in patchified form of shape `(B,
862+ latent_channels * 4, H // 2, W // 2)`, where each 2×2 spatial patch has been folded into the channel
863+ dimension.
863864 image_reference (`torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.Tensor]`, `List[PIL.Image.Image]`, or `List[np.ndarray]`, *optional*):
864865 `Image`, numpy array or tensor representing an image batch to be used as the reference for the masked
865866 area. This allows conditioning the inpainted region on a specific reference image. For both numpy array
866- and pytorch tensor, the expected value range is between `[0, 1]`. If it's a tensor or a list of tensors,
867- the expected shape should be `(B, C, H, W)` or `(C, H, W)`. If it is a numpy array or a list of arrays,
868- the expected shape should be `(B, H, W, C)` or `(H, W, C)`. It can also accept image latents directly,
869- in which case encoding is skipped. Latents must be in patchified form of shape `(B, latent_channels * 4, H // 2, W // 2)`, where
870- each 2×2 spatial patch has been folded into the channel dimension.
867+ and pytorch tensor, the expected value range is between `[0, 1]`. If it's a tensor or a list of
868+ tensors, the expected shape should be `(B, C, H, W)` or `(C, H, W)`. If it is a numpy array or a list
869+ of arrays, the expected shape should be `(B, H, W, C)` or `(H, W, C)`. It can also accept image latents
870+ directly, in which case encoding is skipped. Latents must be in patchified form of shape `(B,
871+ latent_channels * 4, H // 2, W // 2)`, where each 2×2 spatial patch has been folded into the channel
872+ dimension.
871873 mask_image (`torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.Tensor]`, `List[PIL.Image.Image]`, or `List[np.ndarray]`):
872874 `Image`, numpy array or tensor representing an image batch to mask `image`. White pixels in the mask
873875 are repainted while black pixels are preserved. If `mask_image` is a PIL image, it is converted to a
0 commit comments