Add native swiglu-silu-clamp-mul by McZyWu · Pull Request #507 · sgl-project/sgl-kernel-npu

McZyWu · 2026-05-20T12:10:36Z

Add native swiglu-silu-clamp-mul

gemini-code-assist

Code Review

This pull request introduces the swiglu_silu_clamp_mul activation function and its native PyTorch implementation. Feedback suggests correcting misleading docstrings regarding output behavior, optimizing memory usage through in-place clamping, and improving type safety and parameter naming consistency.

gemini-code-assist · 2026-05-20T12:12:05Z

+def swiglu_silu_clamp_mul_native(x: torch.Tensor, limit: float = 7.0) -> torch.Tensor:
+    """Out-variant of swiglustep activation.
+
+    Writes into `out`:
+      silu(x[:d]).clamp(max=limit) * x[d:].clamp(-limit, limit)
+    """
+    gate, up = x.chunk(2, dim=-1)
+    gate = F.silu(gate)
+    gate = gate.clamp(max=limit)
+    up = up.clamp(min=-limit, max=limit)
+    out = gate * up
+    return out


The docstring for swiglu_silu_clamp_mul_native is slightly misleading as it refers to an "Out-variant" and "Writes into out", which typically implies a function that accepts an output buffer as an argument. Since this is a standard PyTorch implementation returning a new tensor, the docstring should be updated for clarity. Additionally, we can optimize memory usage by using an in-place clamp on the temporary tensor produced by F.silu, and rename the input parameter to hidden_states for consistency with the rest of the module.

Suggested change

def swiglu_silu_clamp_mul_native(x: torch.Tensor, limit: float = 7.0) -> torch.Tensor:

"""Out-variant of swiglustep activation.

Writes into `out`:

silu(x[:d]).clamp(max=limit) * x[d:].clamp(-limit, limit)

"""

gate, up = x.chunk(2, dim=-1)

gate = F.silu(gate)

gate = gate.clamp(max=limit)

up = up.clamp(min=-limit, max=limit)

out = gate * up

return out

def swiglu_silu_clamp_mul_native(hidden_states: torch.Tensor, limit: float = 7.0) -> torch.Tensor:

"""Reference implementation of swiglu-silu-clamp-mul activation.

Computes:

silu(x[:d]).clamp(max=limit) * x[d:].clamp(-limit, limit)

"""

gate, up = hidden_states.chunk(2, dim=-1)

return F.silu(gate).clamp_(max=limit) * up.clamp(min=-limit, max=limit)

gemini-code-assist · 2026-05-20T12:12:05Z

+    return out
+
+
+def swiglu_silu_clamp_mul(hidden_states, limit: float = 7.0) -> torch.Tensor:


Add a type hint to the hidden_states parameter for consistency with the native implementation and better type safety.

Suggested change

def swiglu_silu_clamp_mul(hidden_states, limit: float = 7.0) -> torch.Tensor:

def swiglu_silu_clamp_mul(hidden_states: torch.Tensor, limit: float = 7.0) -> torch.Tensor:

add native swiglu-silu-clamp-mul

6cc0621

gemini-code-assist Bot reviewed May 20, 2026

View reviewed changes

add triton kernel swiglu-silu-clamp-mul

48b6d6c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add native swiglu-silu-clamp-mul#507

Add native swiglu-silu-clamp-mul#507
McZyWu wants to merge 2 commits into
sgl-project:mainfrom
McZyWu:swiglu-silu-clamp-mul

McZyWu commented May 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 20, 2026

Uh oh!

gemini-code-assist Bot May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		return out


		def swiglu_silu_clamp_mul(hidden_states, limit: float = 7.0) -> torch.Tensor:

	def swiglu_silu_clamp_mul(hidden_states, limit: float = 7.0) -> torch.Tensor:
	def swiglu_silu_clamp_mul(hidden_states: torch.Tensor, limit: float = 7.0) -> torch.Tensor:

Conversation

McZyWu commented May 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant