-
Notifications
You must be signed in to change notification settings - Fork 7k
Fix the attention mask in ulysses SP for QwenImage #13278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
832b4d2
fix mask in SP
zhtmike 1e535f0
change the modification to qwen specific
zhtmike 3e212e7
Merge branch 'main' into fix_sp
sayakpaul c76fc30
Merge branch 'main' into fix_sp
sayakpaul ea43309
drop xfail since qwen-image mask is fixed
zhtmike 0a5c6ad
Merge branch 'main' into fix_sp
sayakpaul File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this Qwen specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't test with other models. But I think models with a 2D masks input should have the similar problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible to check out one other? And also run the
diffusers/tests/models/transformers/test_models_transformer_flux.py
Line 227 in 16e7067
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. From a quick scan, most models seem to handle the mask shape correctly in their own implementations. So I’ve limited the modification to QwenImage only.
Should I run any test cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Maybe we could add a similar test to
diffusers/tests/models/transformers/test_models_transformer_flux.py
Line 227 in 16e7067
I will give you a ping once it's refactored to follow the latest pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry disregard my suggestion on using the CUDNN backend.
Is it the case just for Qwen or the same happens for Flux, as well? Also, the test under consideration -- does it not use a single prompt?
diffusers/tests/models/transformers/test_models_transformer_qwenimage.py
Line 81 in 11a3284
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far, I have only found that Qwen has this problem. Other models, such as Z-Image, HunyuanImage
expand the attention mask in a similar way before entering the attention block. For Flux, I tested with the main branch, and it works fine with both CP and batch inputs.
I am wondering whether we should add a batch input test if possible. At the beginning, I think we should first ensure that all unit tests pass without modifying them.
The background of this bug is that we are working on the training engine based on the Diffusers backend, using QwenImage as the first example. Therefore, we may need a combination of batch inputs (for high throughput) as well as Ulysses SP support. This is why we encountered this bug during the forward process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed and thanks so much for the context!
Would you like to take a crack at this? We'll be quick to review.
I think first we need to ensure that the
test_context_parallel_inference()test is xfailed when ring attention is enabled with the SDPA. #13182 is adding a test suite for CP-backends and attention backends.And then a test for batched inputs.
Then let's revisit this PR?
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure NP, I will add a UT test for batch input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sayakpaul , I have added a PR #13312 for this, could you take a look?