support ssd chunk scan triton & ssd chunk state triton on npu by sigama-w · Pull Request #448 · sgl-project/sgl-kernel-npu

sigama-w · 2026-04-27T01:02:10Z

No description provided.

gemini-code-assist

Code Review

This pull request implements NPU-specific Mamba kernels for causal convolution and SSD operations. Feedback highlights a bug where the chunk scan returns the wrong tensor, a missing null check for the output parameter, and the need to replace CUDA device contexts with NPU-specific ones for proper kernel execution.

gemini-code-assist · 2026-04-27T01:07:16Z

+        IS_TRITON_22=TRITON_22,
+        HAS_INITSTATES=initial_states is not None,
+    )
+    return out_x


The function currently returns out_x, which is None if z is None. This is likely a bug as the user would expect the result of the scan (which is in out). It is recommended to return out as the primary result.

Suggested change

return out_x

return out

On the GPU, this is how it is implemented. Keep it consistent with that.

gemini-code-assist · 2026-04-27T01:07:17Z

+    else:
+        chunk_indices, chunk_offsets = None, None
+
+    assert out.shape == x.shape


The out parameter is optional in the function signature but its shape is asserted immediately without checking for None. This will cause an AttributeError if the user does not provide an output tensor. It is better to initialize it if it's missing.

if out is None: out = torch.empty_like(x) else: assert out.shape == x.shape

On the GPU, this is how it is implemented. Keep it consistent with that.

gemini-code-assist · 2026-04-27T01:07:17Z

+        batch * nchunks,
+        nheads,
+    )
+    with torch.cuda.device(x.device.index):


Since this kernel is specifically for NPU, you should use torch.npu.device instead of torch.cuda.device.

Suggested change

with torch.cuda.device(x.device.index):

with torch.npu.device(x.device.index):

support ssd chunk scan triton & ssd chunk state triton on npu

e5bb6af

gemini-code-assist Bot reviewed Apr 27, 2026

View reviewed changes

fix cuda bug

b4ac4ac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support ssd chunk scan triton & ssd chunk state triton on npu#448

support ssd chunk scan triton & ssd chunk state triton on npu#448
sigama-w wants to merge 2 commits into
sgl-project:mainfrom
sigama-w:main

sigama-w commented Apr 27, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Uh oh!

sigama-w Apr 27, 2026

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Uh oh!

sigama-w Apr 27, 2026

Uh oh!

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	with torch.cuda.device(x.device.index):
	with torch.npu.device(x.device.index):

Conversation

sigama-w commented Apr 27, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

sigama-w Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

sigama-w Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant