latent_consistency_models model/pipeline review
Commit tested: 0f1abc4ae8b0eb2a3b40e82a310507281144c423
Review performed against the repository review rules.
Issue 1: Img2Img does not serialize requires_safety_checker
Affected code:
|
self.register_modules( |
|
vae=vae, |
|
text_encoder=text_encoder, |
|
tokenizer=tokenizer, |
|
unet=unet, |
|
scheduler=scheduler, |
|
safety_checker=safety_checker, |
|
feature_extractor=feature_extractor, |
|
image_encoder=image_encoder, |
|
) |
|
|
|
if safety_checker is None and requires_safety_checker: |
|
logger.warning( |
|
f"You have disabled the safety checker for {self.__class__} by passing `safety_checker=None`. Ensure" |
|
" that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered" |
|
" results in services or applications open to the public. Both the diffusers team and Hugging Face" |
|
" strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling" |
|
" it only for use-cases that involve analyzing network behavior or auditing its results. For more" |
|
" information, please have a look at https://github.com/huggingface/diffusers/pull/254 ." |
|
) |
|
|
|
self.vae_scale_factor = 2 ** (len(self.vae.config.block_out_channels) - 1) if getattr(self, "vae", None) else 8 |
|
self.image_processor = VaeImageProcessor(vae_scale_factor=self.vae_scale_factor) |
|
|
Problem:
LatentConsistencyModelImg2ImgPipeline.__init__ accepts requires_safety_checker, but never calls self.register_to_config(...). The text2img LCM pipeline does register it, so img2img saved configs lose this user choice.
Impact:
Pipelines constructed with requires_safety_checker=False do not persist that setting through config/save/load paths, causing inconsistent serialization behavior between the two LCM pipelines.
Reproduction:
from diffusers import LatentConsistencyModelPipeline, LatentConsistencyModelImg2ImgPipeline
kwargs = dict(
vae=None,
text_encoder=None,
tokenizer=None,
unet=None,
scheduler=None,
safety_checker=None,
feature_extractor=None,
requires_safety_checker=False,
)
for cls in (LatentConsistencyModelPipeline, LatentConsistencyModelImg2ImgPipeline):
pipe = cls(**kwargs)
print(cls.__name__, pipe.config.get("requires_safety_checker"))
# Text2Img prints False; Img2Img prints None.
Relevant precedent:
|
self.register_modules( |
|
vae=vae, |
|
text_encoder=text_encoder, |
|
tokenizer=tokenizer, |
|
unet=unet, |
|
scheduler=scheduler, |
|
safety_checker=safety_checker, |
|
feature_extractor=feature_extractor, |
|
image_encoder=image_encoder, |
|
) |
|
self.vae_scale_factor = 2 ** (len(self.vae.config.block_out_channels) - 1) if getattr(self, "vae", None) else 8 |
|
self.image_processor = VaeImageProcessor(vae_scale_factor=self.vae_scale_factor) |
|
self.register_to_config(requires_safety_checker=requires_safety_checker) |
|
unet=unet, |
|
scheduler=scheduler, |
|
safety_checker=safety_checker, |
|
feature_extractor=feature_extractor, |
|
image_encoder=image_encoder, |
|
) |
|
self.vae_scale_factor = 2 ** (len(self.vae.config.block_out_channels) - 1) if getattr(self, "vae", None) else 8 |
|
self.image_processor = VaeImageProcessor(vae_scale_factor=self.vae_scale_factor) |
|
self.register_to_config(requires_safety_checker=requires_safety_checker) |
Suggested fix:
self.vae_scale_factor = 2 ** (len(self.vae.config.block_out_channels) - 1) if getattr(self, "vae", None) else 8
self.image_processor = VaeImageProcessor(vae_scale_factor=self.vae_scale_factor)
self.register_to_config(requires_safety_checker=requires_safety_checker)
Duplicate search:
No matching issue or PR found for LatentConsistencyModelImg2ImgPipeline requires_safety_checker.
Issue 2: Img2Img accepts a safety checker without a feature extractor
Affected code:
|
self.register_modules( |
|
vae=vae, |
|
text_encoder=text_encoder, |
|
tokenizer=tokenizer, |
|
unet=unet, |
|
scheduler=scheduler, |
|
safety_checker=safety_checker, |
|
feature_extractor=feature_extractor, |
|
image_encoder=image_encoder, |
|
) |
|
|
|
if safety_checker is None and requires_safety_checker: |
|
logger.warning( |
|
f"You have disabled the safety checker for {self.__class__} by passing `safety_checker=None`. Ensure" |
|
" that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered" |
|
" results in services or applications open to the public. Both the diffusers team and Hugging Face" |
|
" strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling" |
|
" it only for use-cases that involve analyzing network behavior or auditing its results. For more" |
|
" information, please have a look at https://github.com/huggingface/diffusers/pull/254 ." |
|
) |
|
|
|
self.vae_scale_factor = 2 ** (len(self.vae.config.block_out_channels) - 1) if getattr(self, "vae", None) else 8 |
|
self.image_processor = VaeImageProcessor(vae_scale_factor=self.vae_scale_factor) |
|
|
|
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.run_safety_checker |
|
def run_safety_checker(self, image, device, dtype): |
|
if self.safety_checker is None: |
|
has_nsfw_concept = None |
|
else: |
|
if torch.is_tensor(image): |
|
feature_extractor_input = self.image_processor.postprocess(image, output_type="pil") |
|
else: |
|
feature_extractor_input = self.image_processor.numpy_to_pil(image) |
|
safety_checker_input = self.feature_extractor(feature_extractor_input, return_tensors="pt").to(device) |
|
image, has_nsfw_concept = self.safety_checker( |
|
images=image, clip_input=safety_checker_input.pixel_values.to(dtype) |
|
) |
Problem:
The img2img constructor does not reject safety_checker != None with feature_extractor=None. Later, run_safety_checker unconditionally calls self.feature_extractor(...), producing a late TypeError.
Impact:
Users can construct an invalid pipeline successfully and only fail during inference/safety checking with an unclear 'NoneType' object is not callable error.
Reproduction:
import torch
from diffusers import LatentConsistencyModelImg2ImgPipeline
class DummySafetyChecker:
def __call__(self, images, clip_input):
return images, [False] * images.shape[0]
pipe = LatentConsistencyModelImg2ImgPipeline(
vae=None,
text_encoder=None,
tokenizer=None,
unet=None,
scheduler=None,
safety_checker=DummySafetyChecker(),
feature_extractor=None,
requires_safety_checker=True,
)
try:
pipe.run_safety_checker(torch.zeros(1, 3, 8, 8), "cpu", torch.float32)
except Exception as e:
print(type(e).__name__, str(e))
# TypeError 'NoneType' object is not callable
Relevant precedent:
|
if safety_checker is not None and feature_extractor is None: |
|
raise ValueError( |
|
"Make sure to define a feature extractor when loading {self.__class__} if you want to use the safety" |
|
" checker. If you do not want to use the safety checker, you can pass `'safety_checker=None'` instead." |
|
) |
|
if safety_checker is not None and feature_extractor is None: |
|
raise ValueError( |
|
"Make sure to define a feature extractor when loading {self.__class__} if you want to use the safety" |
|
" checker. If you do not want to use the safety checker, you can pass `'safety_checker=None'` instead." |
|
) |
Suggested fix:
if safety_checker is not None and feature_extractor is None:
raise ValueError(
"Make sure to define a feature extractor when loading {self.__class__} if you want to use the safety"
" checker. If you do not want to use the safety checker, you can pass `'safety_checker=None'` instead."
)
Duplicate search:
No matching issue or PR found for LatentConsistencyModelImg2ImgPipeline feature_extractor.
Issue 3: LCM pipeline __call__ docs have stale parameters and defaults
Affected code:
|
def __call__( |
|
self, |
|
prompt: str | list[str] = None, |
|
height: int | None = None, |
|
width: int | None = None, |
|
num_inference_steps: int = 4, |
|
original_inference_steps: int = None, |
|
timesteps: list[int] = None, |
|
guidance_scale: float = 8.5, |
|
num_images_per_prompt: int | None = 1, |
|
generator: torch.Generator | list[torch.Generator] | None = None, |
|
latents: torch.Tensor | None = None, |
|
prompt_embeds: torch.Tensor | None = None, |
|
ip_adapter_image: PipelineImageInput | None = None, |
|
ip_adapter_image_embeds: list[torch.Tensor] | None = None, |
|
output_type: str | None = "pil", |
|
return_dict: bool = True, |
|
cross_attention_kwargs: dict[str, Any] | None = None, |
|
clip_skip: int | None = None, |
|
callback_on_step_end: Callable[[int, int], None] | None = None, |
|
callback_on_step_end_tensor_inputs: list[str] = ["latents"], |
|
**kwargs, |
|
): |
|
r""" |
|
The call function to the pipeline for generation. |
|
|
|
Args: |
|
prompt (`str` or `list[str]`, *optional*): |
|
The prompt or prompts to guide image generation. If not defined, you need to pass `prompt_embeds`. |
|
height (`int`, *optional*, defaults to `self.unet.config.sample_size * self.vae_scale_factor`): |
|
The height in pixels of the generated image. |
|
width (`int`, *optional*, defaults to `self.unet.config.sample_size * self.vae_scale_factor`): |
|
The width in pixels of the generated image. |
|
num_inference_steps (`int`, *optional*, defaults to 50): |
|
The number of denoising steps. More denoising steps usually lead to a higher quality image at the |
|
expense of slower inference. |
|
original_inference_steps (`int`, *optional*): |
|
The original number of inference steps use to generate a linearly-spaced timestep schedule, from which |
|
we will draw `num_inference_steps` evenly spaced timesteps from as our final timestep schedule, |
|
following the Skipping-Step method in the paper (see Section 4.3). If not set this will default to the |
|
scheduler's `original_inference_steps` attribute. |
|
timesteps (`list[int]`, *optional*): |
|
Custom timesteps to use for the denoising process. If not defined, equal spaced `num_inference_steps` |
|
timesteps on the original LCM training/distillation timestep schedule are used. Must be in descending |
|
order. |
|
guidance_scale (`float`, *optional*, defaults to 7.5): |
|
A higher guidance scale value encourages the model to generate images closely linked to the text |
|
`prompt` at the expense of lower image quality. Guidance scale is enabled when `guidance_scale > 1`. |
|
Note that the original latent consistency models paper uses a different CFG formulation where the |
|
def __call__( |
|
self, |
|
prompt: str | list[str] = None, |
|
image: PipelineImageInput = None, |
|
num_inference_steps: int = 4, |
|
strength: float = 0.8, |
|
original_inference_steps: int = None, |
|
timesteps: list[int] = None, |
|
guidance_scale: float = 8.5, |
|
num_images_per_prompt: int | None = 1, |
|
generator: torch.Generator | list[torch.Generator] | None = None, |
|
latents: torch.Tensor | None = None, |
|
prompt_embeds: torch.Tensor | None = None, |
|
ip_adapter_image: PipelineImageInput | None = None, |
|
ip_adapter_image_embeds: list[torch.Tensor] | None = None, |
|
output_type: str | None = "pil", |
|
return_dict: bool = True, |
|
cross_attention_kwargs: dict[str, Any] | None = None, |
|
clip_skip: int | None = None, |
|
callback_on_step_end: Callable[[int, int], None] | None = None, |
|
callback_on_step_end_tensor_inputs: list[str] = ["latents"], |
|
**kwargs, |
|
): |
|
r""" |
|
The call function to the pipeline for generation. |
|
|
|
Args: |
|
prompt (`str` or `list[str]`, *optional*): |
|
The prompt or prompts to guide image generation. If not defined, you need to pass `prompt_embeds`. |
|
height (`int`, *optional*, defaults to `self.unet.config.sample_size * self.vae_scale_factor`): |
|
The height in pixels of the generated image. |
|
width (`int`, *optional*, defaults to `self.unet.config.sample_size * self.vae_scale_factor`): |
|
The width in pixels of the generated image. |
|
num_inference_steps (`int`, *optional*, defaults to 50): |
|
The number of denoising steps. More denoising steps usually lead to a higher quality image at the |
|
expense of slower inference. |
|
original_inference_steps (`int`, *optional*): |
|
The original number of inference steps use to generate a linearly-spaced timestep schedule, from which |
|
we will draw `num_inference_steps` evenly spaced timesteps from as our final timestep schedule, |
|
following the Skipping-Step method in the paper (see Section 4.3). If not set this will default to the |
|
scheduler's `original_inference_steps` attribute. |
|
timesteps (`list[int]`, *optional*): |
|
Custom timesteps to use for the denoising process. If not defined, equal spaced `num_inference_steps` |
|
timesteps on the original LCM training/distillation timestep schedule are used. Must be in descending |
|
order. |
|
guidance_scale (`float`, *optional*, defaults to 7.5): |
|
A higher guidance scale value encourages the model to generate images closely linked to the text |
|
`prompt` at the expense of lower image quality. Guidance scale is enabled when `guidance_scale > 1`. |
|
Note that the original latent consistency models paper uses a different CFG formulation where the |
|
guidance scales are decreased by 1 (so in the paper formulation CFG is enabled when `guidance_scale > |
Problem:
Both __call__ docstrings say num_inference_steps defaults to 50 and guidance_scale defaults to 7.5, while the signatures default to 4 and 8.5. The img2img docstring also documents nonexistent height and width parameters and omits its real image and strength parameters.
Impact:
The generated API docs mislead users about LCM's fast-step defaults and img2img inputs.
Reproduction:
import inspect
from diffusers import LatentConsistencyModelPipeline, LatentConsistencyModelImg2ImgPipeline
for cls in (LatentConsistencyModelPipeline, LatentConsistencyModelImg2ImgPipeline):
sig = inspect.signature(cls.__call__)
doc = inspect.getdoc(cls.__call__) or ""
print(cls.__name__, sig.parameters["num_inference_steps"].default, sig.parameters["guidance_scale"].default)
print("doc says steps default 50:", "defaults to 50" in doc)
print("doc says guidance default 7.5:", "defaults to 7.5" in doc)
img_doc = inspect.getdoc(LatentConsistencyModelImg2ImgPipeline.__call__) or ""
print("img2img signature has image:", "image" in inspect.signature(LatentConsistencyModelImg2ImgPipeline.__call__).parameters)
print("img2img docs image:", "image (`" in img_doc)
print("img2img docs height:", "height (`int`" in img_doc)
Relevant precedent:
|
image (`torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, `list[torch.Tensor]`, `list[PIL.Image.Image]`, or `list[np.ndarray]`): |
|
`Image`, numpy array or tensor representing an image batch to be used as the starting point. For both |
|
numpy array and pytorch tensor, the expected value range is between `[0, 1]` If it's a tensor or a list |
|
or tensors, the expected shape should be `(B, C, H, W)` or `(C, H, W)`. If it is a numpy array or a |
|
list of arrays, the expected shape should be `(B, H, W, C)` or `(H, W, C)` It can also accept image |
|
latents as `image`, but if passing latents directly it is not encoded again. |
|
strength (`float`, *optional*, defaults to 0.8): |
|
Indicates extent to transform the reference `image`. Must be between 0 and 1. `image` is used as a |
|
starting point and more noise is added the higher the `strength`. The number of denoising steps depends |
|
on the amount of noise initially added. When `strength` is 1, added noise is maximum and the denoising |
|
process runs for the full number of iterations specified in `num_inference_steps`. A value of 1 |
|
essentially ignores `image`. |
Suggested fix:
Update the LCM docstrings to match their signatures: num_inference_steps default 4, guidance_scale default 8.5, and for img2img replace the stale height/width entries with image and strength documentation.
Duplicate search:
No matching issue or PR found for LCM img2img docstring/default mismatches.
Coverage and duplicate-search status:
Fast and slow tests exist for both LCM pipelines under tests/pipelines/latent_consistency_models/; slow coverage is not missing. Local target pytest collection was blocked by the venv torch build missing torch._C._distributed_c10d via shared test utilities. python utils/check_copies.py passed. Broad duplicate searches found historical LCM issues/PRs, but no duplicates for the three findings above.
latent_consistency_modelsmodel/pipeline reviewCommit tested:
0f1abc4ae8b0eb2a3b40e82a310507281144c423Review performed against the repository review rules.
Issue 1: Img2Img does not serialize
requires_safety_checkerAffected code:
diffusers/src/diffusers/pipelines/latent_consistency_models/pipeline_latent_consistency_img2img.py
Lines 216 to 239 in 0f1abc4
Problem:
LatentConsistencyModelImg2ImgPipeline.__init__acceptsrequires_safety_checker, but never callsself.register_to_config(...). The text2img LCM pipeline does register it, so img2img saved configs lose this user choice.Impact:
Pipelines constructed with
requires_safety_checker=Falsedo not persist that setting through config/save/load paths, causing inconsistent serialization behavior between the two LCM pipelines.Reproduction:
Relevant precedent:
diffusers/src/diffusers/pipelines/latent_consistency_models/pipeline_latent_consistency_text2img.py
Lines 211 to 223 in 0f1abc4
diffusers/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py
Lines 314 to 322 in 0f1abc4
Suggested fix:
Duplicate search:
No matching issue or PR found for
LatentConsistencyModelImg2ImgPipeline requires_safety_checker.Issue 2: Img2Img accepts a safety checker without a feature extractor
Affected code:
diffusers/src/diffusers/pipelines/latent_consistency_models/pipeline_latent_consistency_img2img.py
Lines 216 to 239 in 0f1abc4
diffusers/src/diffusers/pipelines/latent_consistency_models/pipeline_latent_consistency_img2img.py
Lines 494 to 506 in 0f1abc4
Problem:
The img2img constructor does not reject
safety_checker != Nonewithfeature_extractor=None. Later,run_safety_checkerunconditionally callsself.feature_extractor(...), producing a lateTypeError.Impact:
Users can construct an invalid pipeline successfully and only fail during inference/safety checking with an unclear
'NoneType' object is not callableerror.Reproduction:
Relevant precedent:
diffusers/src/diffusers/pipelines/latent_consistency_models/pipeline_latent_consistency_text2img.py
Lines 205 to 209 in 0f1abc4
diffusers/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py
Lines 279 to 283 in 0f1abc4
Suggested fix:
Duplicate search:
No matching issue or PR found for
LatentConsistencyModelImg2ImgPipeline feature_extractor.Issue 3: LCM pipeline
__call__docs have stale parameters and defaultsAffected code:
diffusers/src/diffusers/pipelines/latent_consistency_models/pipeline_latent_consistency_text2img.py
Lines 642 to 690 in 0f1abc4
diffusers/src/diffusers/pipelines/latent_consistency_models/pipeline_latent_consistency_img2img.py
Lines 711 to 760 in 0f1abc4
Problem:
Both
__call__docstrings saynum_inference_stepsdefaults to50andguidance_scaledefaults to7.5, while the signatures default to4and8.5. The img2img docstring also documents nonexistentheightandwidthparameters and omits its realimageandstrengthparameters.Impact:
The generated API docs mislead users about LCM's fast-step defaults and img2img inputs.
Reproduction:
Relevant precedent:
diffusers/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py
Lines 891 to 902 in 0f1abc4
Suggested fix:
Update the LCM docstrings to match their signatures:
num_inference_stepsdefault4,guidance_scaledefault8.5, and for img2img replace the staleheight/widthentries withimageandstrengthdocumentation.Duplicate search:
No matching issue or PR found for LCM img2img docstring/default mismatches.
Coverage and duplicate-search status:
Fast and slow tests exist for both LCM pipelines under
tests/pipelines/latent_consistency_models/; slow coverage is not missing. Local target pytest collection was blocked by the venv torch build missingtorch._C._distributed_c10dvia shared test utilities.python utils/check_copies.pypassed. Broad duplicate searches found historical LCM issues/PRs, but no duplicates for the three findings above.