[Feature] VisualGen: Add qknorm + rope fuse kernel for cross-head norm (Wan/LTX-2)

### 🚀 The feature, motivation and pitch

For visual gen attention blocks, all current models have adopted the qk-norm attention path, i.e.,
 qkv_proj → qk_rmsnorm → qk_rope → attn (BMMs)

Unlike LLM’s qk-norm attention module, visual gen currently does not have a unified fused_qk_norm_rope path, due to many variants. For example, qk_rmsnorm can be per-head norm or cross-head norm; RoPE can be interleaved or split-half; and q/k can have different sequence lengths (cross-attn vs. self-attn).

So far, fused qk_norm_rope kernel for Flux (per-head, interleaved, self-attn path), shows ~5–8% perf gain.
It would be great to extend this fused path to Wan / LTX2 models. In particular, qk-norm in Wan/LTX2 uses cross-head norm kernels, so this could also be an opportunity to further optimize cross-head norm kernel performance (especially for LTX2, where norm takes 5-10% of runtime).

References:

Flux fused qknorm+rope PR https://github.com/NVIDIA/TensorRT-LLM/pull/11869/
SGLang similar effort:  https://github.com/sgl-project/sglang/pull/21503 / https://github.com/sgl-project/sglang/pull/21440

### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] VisualGen: Add qknorm + rope fuse kernel for cross-head norm (Wan/LTX-2) #12716

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] VisualGen: Add qknorm + rope fuse kernel for cross-head norm (Wan/LTX-2) #12716

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions