-
Notifications
You must be signed in to change notification settings - Fork 7k
fix(qwen): fix CFG failing when passing neg prompt embeds with none mask #13379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
dbe4f95
d7814c3
f600a36
5b2a1d2
c774885
908c304
eb86a2e
42587ae
7fa33ee
17938f5
30e10dd
4f736f7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -584,9 +584,7 @@ def __call__( | |||||||||||||||
|
|
||||||||||||||||
| device = self._execution_device | ||||||||||||||||
|
|
||||||||||||||||
| has_neg_prompt = negative_prompt is not None or ( | ||||||||||||||||
| negative_prompt_embeds is not None and negative_prompt_embeds_mask is not None | ||||||||||||||||
| ) | ||||||||||||||||
| has_neg_prompt = negative_prompt is not None or negative_prompt_embeds is not None | ||||||||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How do we handle diffusers/src/diffusers/pipelines/qwenimage/pipeline_qwenimage.py Lines 261 to 267 in 71a6fd9
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If Interestingly, while working on this, I found that some other variants (like |
||||||||||||||||
|
|
||||||||||||||||
| if true_cfg_scale > 1 and not has_neg_prompt: | ||||||||||||||||
| logger.warning( | ||||||||||||||||
|
|
||||||||||||||||
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
|
|
@@ -302,11 +302,13 @@ def encode_prompt( | |||
| _, seq_len, _ = prompt_embeds.shape | ||||
| prompt_embeds = prompt_embeds.repeat(1, num_images_per_prompt, 1) | ||||
| prompt_embeds = prompt_embeds.view(batch_size * num_images_per_prompt, seq_len, -1) | ||||
| prompt_embeds_mask = prompt_embeds_mask.repeat(1, num_images_per_prompt, 1) | ||||
| prompt_embeds_mask = prompt_embeds_mask.view(batch_size * num_images_per_prompt, seq_len) | ||||
|
|
||||
| if prompt_embeds_mask is not None and prompt_embeds_mask.all(): | ||||
| prompt_embeds_mask = None | ||||
| if prompt_embeds_mask is not None: | ||||
| prompt_embeds_mask = prompt_embeds_mask.repeat(1, num_images_per_prompt, 1) | ||||
| prompt_embeds_mask = prompt_embeds_mask.view(batch_size * num_images_per_prompt, seq_len) | ||||
|
|
||||
| if prompt_embeds_mask.all(): | ||||
| prompt_embeds_mask = None | ||||
|
Comment on lines
+308
to
+314
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems like an unrelated change? Since it already has diffusers/src/diffusers/pipelines/qwenimage/pipeline_qwenimage_controlnet_inpaint.py Line 272 in 71a6fd9
if we run
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll revert the manual edits on the copied methods, run make fix-copies to sync them up properly, and update the PR. |
||||
|
|
||||
| return prompt_embeds, prompt_embeds_mask | ||||
|
|
||||
|
|
@@ -353,15 +355,6 @@ def check_inputs( | |||
| f" {negative_prompt_embeds}. Please make sure to only forward one of the two." | ||||
| ) | ||||
|
|
||||
| if prompt_embeds is not None and prompt_embeds_mask is None: | ||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the reasoning behind this? What happens when users pass the embeds and not the masks? Maybe we should warn them?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I removed the hard ValueError here because the base QwenImagePipeline actually allows passing embeds without masks, and the underlying transformer natively handles a None mask by treating all tokens as valid. Throwing a strict exception here was completely blocking users from doing exactly what this PR is fixing (passing negative_prompt_embeds without a mask to trigger True CFG). That said, I totally agree we shouldn't just let it pass silently, especially since the text encoder's output often relies on masks for sequence packing. I'll swap these hard exceptions out for a logger.warning to let users know they should probably pass the mask if they have it. I'll get that updated! |
||||
| raise ValueError( | ||||
| "If `prompt_embeds` are provided, `prompt_embeds_mask` also have to be passed. Make sure to generate `prompt_embeds_mask` from the same text encoder that was used to generate `prompt_embeds`." | ||||
| ) | ||||
| if negative_prompt_embeds is not None and negative_prompt_embeds_mask is None: | ||||
| raise ValueError( | ||||
| "If `negative_prompt_embeds` are provided, `negative_prompt_embeds_mask` also have to be passed. Make sure to generate `negative_prompt_embeds_mask` from the same text encoder that was used to generate `negative_prompt_embeds`." | ||||
| ) | ||||
|
|
||||
| if max_sequence_length is not None and max_sequence_length > 1024: | ||||
| raise ValueError(f"`max_sequence_length` cannot be greater than 1024 but is {max_sequence_length}") | ||||
|
|
||||
|
|
@@ -739,9 +732,7 @@ def __call__( | |||
|
|
||||
| device = self._execution_device | ||||
|
|
||||
| has_neg_prompt = negative_prompt is not None or ( | ||||
| negative_prompt_embeds is not None and negative_prompt_embeds_mask is not None | ||||
| ) | ||||
| has_neg_prompt = negative_prompt is not None or negative_prompt_embeds is not None | ||||
| do_true_cfg = true_cfg_scale > 1 and has_neg_prompt | ||||
| prompt_embeds, prompt_embeds_mask = self.encode_prompt( | ||||
| prompt=prompt, | ||||
|
|
||||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -234,3 +234,29 @@ def test_vae_tiling(self, expected_diff_max: float = 0.2): | |||||
| expected_diff_max, | ||||||
| "VAE tiling should not affect the inference results", | ||||||
| ) | ||||||
|
|
||||||
| def test_true_cfg_without_negative_prompt_embeds_mask(self): | ||||||
| components = self.get_dummy_components() | ||||||
| pipe = self.pipeline_class(**components) | ||||||
| pipe.to("cpu") | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| pipe.set_progress_bar_config(disable=None) | ||||||
|
|
||||||
| inputs = self.get_dummy_inputs("cpu") | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| prompt = inputs.pop("prompt") | ||||||
|
|
||||||
| prompt_embeds, prompt_embeds_mask = pipe.encode_prompt( | ||||||
| prompt=prompt, | ||||||
| device="cpu", | ||||||
| num_images_per_prompt=1, | ||||||
| max_sequence_length=inputs.get("max_sequence_length", 16), | ||||||
| ) | ||||||
|
|
||||||
| inputs["prompt_embeds"] = prompt_embeds | ||||||
| inputs["prompt_embeds_mask"] = prompt_embeds_mask | ||||||
| inputs["negative_prompt_embeds"] = prompt_embeds | ||||||
| inputs["negative_prompt"] = None | ||||||
| inputs["negative_prompt_embeds_mask"] = None | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we not be using it because it's the core thing that we're fixing here?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The intention here is explicitly to test the pipeline without providing the negative_prompt_embeds_mask (hence the test name test_true_cfg_without_negative_prompt_embeds_mask), to verify that our fix correctly catches the missing mask, issues the logger.warning, and successfully proceeds with CFG instead of throwing a hard ValueError or silently disabling CFG. To make this intent clearer and avoid explicitly assigning None to the inputs dict, I will update the test to use inputs.pop(...) so that it relies directly on the pipeline's default None arguments. |
||||||
| inputs["true_cfg_scale"] = 2.0 | ||||||
|
|
||||||
| image = pipe(**inputs).images | ||||||
| self.assertIsNotNone(image) | ||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reasoning behind this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When
image=Noneis passed to theQwen2VLProcessor(for example, when encoding thenegative_prompt), the processor returns aBatchFeatureobject that does not contain thepixel_valuesandimage_grid_thwkeys.Attempting to access them directly via dot notation (e.g.
model_inputs.pixel_values) raises anAttributeErrorfrom the underlyingtransformers.utils.logging.FeatureExtractionUtilsmapping class. Using.get()safely defaults toNone, which prevents the pipeline from crashing during negative prompt generations. I also updatedinput_idsandattention_maskto use.get()here to maintain a consistent access pattern within the same function call.