🚀 The feature, motivation and pitch
For visual gen attention blocks, all current models have adopted the qk-norm attention path, i.e.,
qkv_proj → qk_rmsnorm → qk_rope → attn (BMMs)
Unlike LLM’s qk-norm attention module, visual gen currently does not have a unified fused_qk_norm_rope path, due to many variants. For example, qk_rmsnorm can be per-head norm or cross-head norm; RoPE can be interleaved or split-half; and q/k can have different sequence lengths (cross-attn vs. self-attn).
So far, fused qk_norm_rope kernel for Flux (per-head, interleaved, self-attn path), shows ~5–8% perf gain.
It would be great to extend this fused path to Wan / LTX2 models. In particular, qk-norm in Wan/LTX2 uses cross-head norm kernels, so this could also be an opportunity to further optimize cross-head norm kernel performance (especially for LTX2, where norm takes 5-10% of runtime).
References:
Flux fused qknorm+rope PR #11869
SGLang similar effort: sgl-project/sglang#21503 / sgl-project/sglang#21440
Alternatives
No response
Additional context
No response
Before submitting a new issue...
🚀 The feature, motivation and pitch
For visual gen attention blocks, all current models have adopted the qk-norm attention path, i.e.,
qkv_proj → qk_rmsnorm → qk_rope → attn (BMMs)
Unlike LLM’s qk-norm attention module, visual gen currently does not have a unified fused_qk_norm_rope path, due to many variants. For example, qk_rmsnorm can be per-head norm or cross-head norm; RoPE can be interleaved or split-half; and q/k can have different sequence lengths (cross-attn vs. self-attn).
So far, fused qk_norm_rope kernel for Flux (per-head, interleaved, self-attn path), shows ~5–8% perf gain.
It would be great to extend this fused path to Wan / LTX2 models. In particular, qk-norm in Wan/LTX2 uses cross-head norm kernels, so this could also be an opportunity to further optimize cross-head norm kernel performance (especially for LTX2, where norm takes 5-10% of runtime).
References:
Flux fused qknorm+rope PR #11869
SGLang similar effort: sgl-project/sglang#21503 / sgl-project/sglang#21440
Alternatives
No response
Additional context
No response
Before submitting a new issue...