add partial_rope_qk_inplace by xiaobaicxy · Pull Request #416 · sgl-project/sgl-kernel-npu

xiaobaicxy · 2026-03-31T09:25:17Z

No description provided.

gemini-code-assist

Code Review

This pull request introduces a Triton-based in-place partial RoPE kernel for NPUs and a corresponding test suite. The review feedback identifies a logic error in the head indexing for Grouped Query Attention (GQA) and suggests adding input validation for the rotary dimension. Additionally, the reviewer notes that the kernel may fail for non-power-of-two dimensions and recommends expanding the test cases to include GQA scenarios.

gemini-code-assist · 2026-03-31T09:27:13Z

+    for g_id in range(groups):
+        hq_id = hk_id + g_id


The calculation of hq_id is incorrect for Grouped Query Attention (GQA) where Hk > 1. It should be hk_id * groups + g_id to correctly map each query head to its corresponding key head. The current implementation would incorrectly reuse query head indices across different key heads (e.g., if Hk=2 and groups=2, hk_id=1 would incorrectly process hq_id=1, 2 instead of 2, 3).

Suggested change

for g_id in range(groups):

hq_id = hk_id + g_id

for g_id in range(groups):

hq_id = hk_id * groups + g_id

gemini-code-assist · 2026-03-31T09:27:13Z

+    T, Hq, D = query.shape
+    _, Hk, _ = key.shape
+    assert Hq % Hk == 0


It is important to validate that rotary_dim is within the bounds of the head dimension D and that it is an even number. If rotary_dim > D, the kernel will perform out-of-bounds memory accesses on the query and key tensors.

Suggested change

T, Hq, D = query.shape

_, Hk, _ = key.shape

assert Hq % Hk == 0

T, Hq, D = query.shape

_, Hk, _ = key.shape

assert Hq % Hk == 0

assert rotary_dim <= D

assert rotary_dim % 2 == 0

gemini-code-assist · 2026-03-31T09:27:13Z

+    t_id = tl.program_id(0)
+    hk_id = tl.program_id(1)
+
+    d = tl.arange(0, D_ROPE // 2)


Triton's tl.arange requires the size to be a power of 2. If D_ROPE // 2 is not a power of 2 (e.g., for a rotary dimension of 96), this kernel will fail to compile. Consider using a block size that is the next power of 2 and applying a mask during loads and stores to ensure compatibility with non-power-of-2 rotary dimensions.

gemini-code-assist · 2026-03-31T09:27:13Z

+    shapes = [
+        [32, 4, 1, 256, 64],  # partial
+        [32, 4, 1, 64, 64]   # no partial
+    ]


The current test cases only cover Hk = 1, which does not exercise the GQA logic and hides the bug in the hq_id calculation. Adding a test case with Hk > 1 (e.g., Hq=8, Hk=2) is necessary to ensure correctness for Grouped Query Attention.

shapes = [ [32, 4, 1, 256, 64], # partial [32, 8, 2, 128, 64], # GQA [32, 4, 1, 64, 64] # no partial ]

add partial_rope_qk_inplace

9ebd6ba

gemini-code-assist Bot reviewed Mar 31, 2026

View reviewed changes

partial_rope_qk_inplace opt

708a84b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add partial_rope_qk_inplace#416

add partial_rope_qk_inplace#416
xiaobaicxy wants to merge 2 commits intosgl-project:mainfrom
xiaobaicxy:main

xiaobaicxy commented Mar 31, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Mar 31, 2026

Uh oh!

gemini-code-assist Bot Mar 31, 2026

Uh oh!

gemini-code-assist Bot Mar 31, 2026

Uh oh!

gemini-code-assist Bot Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xiaobaicxy commented Mar 31, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant