perf: Qwen image optimize.#1230
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a communication and computation overlap mechanism for sequence parallelism in the DiT model. It adds a new QwenDoubleStreamAttnProcessorCMO2_0 implementation, updates the transformer block to support this mode via a new global flag, and refactors positional embedding handling. The review feedback highlights several style improvements, including replacing auto with explicit types, using the torch:: namespace, marking implementation classes as final, and fixing typos in parameter annotations.
| const torch::Tensor& encoder_hidden_states_mask = torch::Tensor(), | ||
| const torch::Tensor& attention_mask = torch::Tensor(), | ||
| const std::tuple<at::Tensor, at::Tensor>& image_rotary_emb = {}) { | ||
| const std::tuple<at::Tensor, at::Tensor>& image_rotary_emb = {}) = 0; |
There was a problem hiding this comment.
Use the torch:: namespace instead of at:: for tensor declarations.
| const std::tuple<at::Tensor, at::Tensor>& image_rotary_emb = {}) = 0; | |
| const std::tuple<torch::Tensor, torch::Tensor>& image_rotary_emb = {}) = 0; |
References
- Use torch:: namespace instead of at:: or c10:: wherever possible. Prefer the highest-level PyTorch C++ API. (link)
| /*pre_tockens=*/65535, | ||
| /*next_tockens=*/65535); |
There was a problem hiding this comment.
Correct the typos in the parameter annotations: tockens should be tokens.
| /*pre_tockens=*/65535, | |
| /*next_tockens=*/65535); | |
| /*pre_tokens=*/65535, | |
| /*next_tokens=*/65535); |
References
- Annotate constant arguments with a comment indicating the parameter name when calling functions or constructors. (link)
1ae65d3 to
bd0a8b0
Compare
No description provided.