optimized fused_grouped_topk SYCL kernel for MoE expert routing by xiaolong-intel · Pull Request #253 · vllm-project/vllm-xpu-kernels

xiaolong-intel · 2026-04-07T02:54:25Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS ABOVE HAVE BEEN CONSIDERED.

Purpose

optimized fused_grouped_topk SYCL kernel for MoE expert routing

Test Plan

I wrote test cases in https://github.com/xiaolong-intel/vllm-xpu-kernels/blob/grouped_topk/tests/test_grouped_topk.py. Tested the consistency of the forward_xpu operator with the torch version of grouped_topk on B60

python -m pytest test_grouped_topk.py -v

Test Result

test cases:

test results:

All test cases passed successfully

Tested operator performance on GPU B60 with the following configuration:

num_expert_group = 8
topk_group = 4
topk = 8

(Optional) Documentation Update

BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)

Copilot

Pull request overview

This PR updates the XPU MoE “grouped topk” path by introducing a substantially revised SYCL implementation for fused_grouped_topk, updating its Torch bindings, and adjusting test coverage to validate correctness against a PyTorch baseline.

Changes:

Add _moe_C import to the Python fused MoE interface to ensure MoE ops are registered.
Rework csrc/moe/fused_grouped_topk.cpp with a new SYCL kernel implementation and update the Torch op schema formatting.
Update grouped-topk tests and baselines (param ranges, determinism settings, and wrapper behavior).

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`vllm_xpu_kernels/fused_moe_interface.py`	Ensures `_moe_C` is imported so MoE ops (including grouped topk) are available.
`tests/test_grouped_topk.py`	Expands parameter coverage (notably token counts and expert counts) for grouped topk validation.
`tests/ops/grouped_topk_op.py`	Adjusts the reference implementation for determinism and updates the SYCL wrapper behavior.
`csrc/moe/torch_bindings.cpp`	Reformats/declares the `fused_grouped_topk` op schema.
`csrc/moe/moe_ops.h`	Minor header formatting around the `fused_grouped_topk` declaration.
`csrc/moe/fused_grouped_topk.cpp`	Replaces the previous kernel with a new SYCL implementation and host-side dispatch/validation logic.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-07T02:59:29Z

+@pytest.mark.parametrize("n_token", [64, 50000,100000])
 @pytest.mark.parametrize("n_hidden", [1024, 2048])
-@pytest.mark.parametrize("n_expert", [16])
-@pytest.mark.parametrize("topk", [2])
+@pytest.mark.parametrize("n_expert", [128, 256])
+@pytest.mark.parametrize("topk", [8])


The default parametrization makes this test extremely large (n_token up to 100000, plus multiple values for n_hidden/n_expert/dtype/etc.), which is likely to cause very long runtimes and/or XPU OOM in the normal (non-MINI) test run. Consider keeping unit-test sizes small by default and gating large shapes behind the existing MINI_PYTEST_PARAMS/CI env, or moving the large-shape coverage to a dedicated benchmark/perf test.

Copilot · 2026-04-07T02:59:30Z

+    auto input_size = gating_output.sizes();
+    int64_t num_tokens = input_size[0];
+    int64_t num_experts = input_size[1];
+    int64_t n_group = n_expert_group;
+    int64_t topk_group = n_topk_group;
+    int64_t topk = n_topk;
+
+    TORCH_CHECK(hidden_states.sizes()[0] == gating_output.sizes()[0],
+                "Number of tokens mismatch");
+    TORCH_CHECK(input_size.size() == 2, "gating_output must be a 2D Tensor");


num_tokens = input_size[0] / num_experts = input_size[1] is computed before verifying gating_output is 2D, which can read out of bounds for unexpected inputs. Also, the previous implementation validated bias shape matched num_experts; that check is now missing, but the kernel indexes routingBias[...] assuming length == num_experts. Please add TORCH_CHECKs to validate gating_output.dim()==2 before indexing sizes, and if bias is provided ensure it is 1D (or broadcastable in an explicitly supported way) with bias.numel()==num_experts (and ideally on the same device).

Suggested change

auto input_size = gating_output.sizes();

int64_t num_tokens = input_size[0];

int64_t num_experts = input_size[1];

int64_t n_group = n_expert_group;

int64_t topk_group = n_topk_group;

int64_t topk = n_topk;

TORCH_CHECK(hidden_states.sizes()[0] == gating_output.sizes()[0],

"Number of tokens mismatch");

TORCH_CHECK(input_size.size() == 2, "gating_output must be a 2D Tensor");

TORCH_CHECK(gating_output.dim() == 2, "gating_output must be a 2D Tensor");

auto input_size = gating_output.sizes();

int64_t num_tokens = input_size[0];

int64_t num_experts = input_size[1];

int64_t n_group = n_expert_group;

int64_t topk_group = n_topk_group;

int64_t topk = n_topk;

TORCH_CHECK(hidden_states.sizes()[0] == gating_output.sizes()[0],

"Number of tokens mismatch");

if (has_bias) {

TORCH_CHECK(bias->dim() == 1, "bias must be a 1D Tensor");

TORCH_CHECK(bias->numel() == num_experts,

"bias must have num_experts elements");

TORCH_CHECK(bias->device() == gating_output.device(),

"bias must be on the same device as gating_output");

}

Copilot · 2026-04-07T02:59:31Z

+
+        constexpr int MaxExpertCandidatesPerLane = NumDeepseekExperts / WARP_SIZE;
+        T localCandidateScores[MaxExpertCandidatesPerLane];
+        IdxT localCandidateIdx[MaxExpertCandidatesPerLane];


In the grouped-routing path, selectedExpertScores/Idx are sized to DefaultMaxNumTopExperts (8), but topk is only checked as <= 32. If topk > 8 with UseGroups=true, reduceTopK writes past these arrays. Either enforce topk <= DefaultMaxNumTopExperts for the grouped path, or add additional kernel instantiations / dynamic storage sized for larger topk.

Suggested change

IdxT localCandidateIdx[MaxExpertCandidatesPerLane];

IdxT localCandidateIdx[MaxExpertCandidatesPerLane];

TORCH_CHECK(

topk <= DefaultMaxNumTopExperts,

"Grouped routing only supports topk <= ", DefaultMaxNumTopExperts,

", but got topk=", topk,

". Increase grouped-path storage/kernel support for larger topk values.");

Copilot · 2026-04-07T02:59:31Z

+        int32_t totalCandidates = topkGroup * numExpertsPerGroup;
+        for (int32_t candidate = laneIdx; candidate < totalCandidates;
+             candidate += WARP_SIZE) {
+            int32_t localSlot = candidate / WARP_SIZE;
+            int32_t selectedGroup = candidate / numExpertsPerGroup;
+            int32_t expertInGroup = candidate % numExpertsPerGroup;
+            int32_t gid = selectedGroupIdx[selectedGroup];
+            int32_t idx = gid * numExpertsPerGroup + expertInGroup;
+            T candidateScore = neg_inf<T>();
+
+            T input = scoresToken[idx];
+            if (is_finite(input)) {
+                T score = apply_scoring<SF>(input);
+                candidateScore = score;
+                if (has_bias) {
+                    candidateScore = candidateScore + sycl_cast<T, BiasT>(routingBias[idx]);
+                }
+            }
+
+            localCandidateScores[localSlot] = candidateScore;
+            localCandidateIdx[localSlot] = static_cast<IdxT>(idx);


localCandidateScores/localCandidateIdx are sized assuming at most NumDeepseekExperts candidates (256 total, 8 per lane), but totalCandidates = topkGroup * numExpertsPerGroup can exceed 256 for valid inputs (e.g., small numGroup with larger numExpertsPerGroup). When that happens, localSlot = candidate / WARP_SIZE will exceed MaxExpertCandidatesPerLane and write out of bounds. Add a host-side TORCH_CHECK that topkGroup * numExpertsPerGroup <= 256 (or dispatch a kernel variant sized for the actual maximum candidates).

jikunshang · 2026-04-07T03:33:20Z

what's benefit come from? how do you benchmark?

xiaolong-intel · 2026-04-07T05:23:09Z

what's benefit come from? how do you benchmark?

Benifit comes from:

The old kernel first keeps all experts around, masks out the unwanted groups, and then runs top-k on that larger set. The new kernel first picks the best groups, then only looks at experts inside those groups, and finally does top-k on this much smaller candidate set.
In old kernel,a group often contains more than 32 experts, so one warp cannot directly keep all group values in registers and do the full selection in one step. So each lane needs to introduce extra loops.

I used https://github.com/xiaolong-intel/vllm-xpu-kernels/blob/grouped_topk/tests/test_grouped_topk.py. to test the precision and accuracy of the op.
Using torch.profile, tested the kernel_time of two ops under the same input and precision.Calculate ten times and take the average after warmup.

def _run_profile(num_tokens: int, hidden_dim: int, num_experts: int):
    if not torch.xpu.is_available():
        raise RuntimeError("XPU is not available for profiling")
    
    device = DEVICE
    print("Allocating profiling tensors...")

    test_hidden = torch.randn(num_tokens, hidden_dim, device=device, dtype=torch.bfloat16)
    test_gating = torch.randn(num_tokens, num_experts, device=device, dtype=torch.bfloat16)
    test_bias = torch.zeros(num_experts, device=device, dtype=torch.bfloat16)

    print("Profiling Fused Kernel...")
    # Increase warmup and active iterations for better stability
    for _ in range(50):
        _run_op_once(num_tokens, num_experts, test_hidden, test_gating, test_bias)
    torch.xpu.synchronize()

    activities = [torch.profiler.ProfilerActivity.CPU]
    if hasattr(torch.profiler.ProfilerActivity, "XPU"):
        activities.append(torch.profiler.ProfilerActivity.XPU)

    default_sort = (
        "self_xpu_time_total"
        if hasattr(torch.profiler.ProfilerActivity, "XPU")
        else "self_cpu_time_total"
    )
    sort_by = PROFILE_SORT or default_sort
    if PROFILE_TARGET in ("both", "fused"):
        _profile_kernel(
            "fused",
            lambda: _run_op_once(
                num_tokens, hidden_dim, test_hidden, test_gating, test_bias),
            PROFILE_WARMUP,
            PROFILE_ACTIVE,
            PROFILE_REPEAT,
            activities,
            sort_by,
            PROFILE_TRACE_DIR,
        )

    torch.xpu.synchronize()
    print("Profiling Grouped Kernel...")
    if PROFILE_TARGET in ("both", "grouped"):
        _profile_kernel(
            "grouped",
            lambda: _run_grouped_topk_once(
                num_tokens, test_hidden, test_gating, test_bias),
            PROFILE_WARMUP,
            PROFILE_ACTIVE,
            PROFILE_REPEAT,
            activities,
            sort_by,
            PROFILE_TRACE_DIR,
        )

mayuyuace · 2026-04-07T05:58:02Z

+    assert hidden_states.size(0) == gating_output.size(0), (
+        "Number of tokens mismatch")
+    if scoring_func == "softmax":
+        scores = torch.softmax(gating_output, dim=-1)


Have you compared the performance of the softmax scoring_func?

yes. I tested the kernel_time under softmax scoring_func with seqlenth=100000. The above is the original version, and the below is the optimized version.
I am currently testing with https://github.com/vllm-project/vllm-xpu-kernels/blob/main/benchmark/benchmark_grouped_topk.py , and I will provide the results later.

mayuyuace · 2026-04-07T05:59:19Z

 try:
    from . import _C  # noqa: F401
    from . import _xpu_C  # noqa: F401
+    from . import _moe_C  # noqa: F401


Please do not add unnecessary import.

mayuyuace · 2026-04-07T06:18:28Z

Please provide benchmark results as benchmark_grouped_topk do.
https://github.com/vllm-project/vllm-xpu-kernels/blob/main/benchmark/benchmark_grouped_topk.py

jikunshang · 2026-04-07T06:29:22Z

 #include <sycl/sycl.hpp>
-
-#include "../utils.h"
+#include <c10/xpu/XPUStream.h>


please don't include here.
../utils.h should cover this.

jikunshang · 2026-04-07T06:29:38Z

-    return 1.0f / (1.0f + sycl::native::exp(-x));
-  }
+// Type trait: bfloat16 -> float for computation, everything else stays as-is
+template <typename T>


for such util, move to utils.h

Okay, thank you for your suggestion, I will make the correction.

xiaolong-intel · 2026-04-08T02:24:06Z

Please provide benchmark results as benchmark_grouped_topk do. https://github.com/vllm-project/vllm-xpu-kernels/blob/main/benchmark/benchmark_grouped_topk.py

Hi mayuyuace:
I used https://github.com/vllm-project/vllm-xpu-kernels/blob/main/benchmark/benchmark_grouped_topk.py to test operator performance on the B60, with seq_length using a large length of 50000.
I conducted tests under the DeepSeek conditions [256,8,4,8] and the Nemo conditions [128,1,1,6].
Benchmark.py is a useful tool, thanks for your suggestion. : )

jikunshang

pls fix pre-commit issue and do not change oneDNN commit.

jikunshang · 2026-04-09T14:39:48Z

-n_expert_range = [16, 64, 128]
-topk_range = [2, 4]
-topk_group_range = [4, 8]
+n_token_range = [50000]


please don't remove previous one.

jikunshang · 2026-04-09T14:40:03Z


    benchmark = get_benchmark()
-    benchmark.run(print_data=True, save_path=args.save_path)
+    benchmark.run(print_data=True, save_path=args.save_path)


pls add blank line

jikunshang · 2026-04-09T14:41:46Z

+      std::is_same_v<T, sycl::ext::oneapi::bfloat16> ||
+      std::is_same_v<T, sycl::half>;;


jikunshang · 2026-04-09T14:42:51Z



-@pytest.mark.parametrize("n_token", [1, 33, 64])
+@pytest.mark.parametrize("n_token", [64, 50000,100000])


we don't want use such large shape in CI which may make job longer to complete.

jikunshang · 2026-04-09T14:43:17Z

-@pytest.mark.parametrize("topk_group", [2])
-@pytest.mark.parametrize("scoring_func", ["softmax", "sigmoid"])
+@pytest.mark.parametrize("topk_group", [4])
+@pytest.mark.parametrize("scoring_func", ["sigmoid","softmax"])


please revert unnecessary change. same for blank line

xiaolong-intel · 2026-04-10T06:13:44Z

oneDNN commit.

pre-commit is completed. onednn. It seems I haven't made any changes😂, just synced with the latest vllm-xpu-kernels.

xinyu-intel · 2026-04-20T07:49:26Z

what's this for?

xinyu-intel · 2026-04-20T07:49:43Z

why update onednn?

mayuyuace · 2026-04-20T07:54:50Z

You may need rebase the branch and only keep the key code changing.
Onednn should not be changed in this PR.

xiaolong-intel · 2026-04-20T07:59:11Z

You may need rebase the branch and only keep the key code changing. Onednn should not be changed in this PR.

Okay, I understand, thank you

xinyu-intel · 2026-04-28T11:45:30Z

can you rebase and fix DCO?

Signed-off-by: xiaolong <xiaolong.guo@intel.com>

Signed-off-by: root <xiaolong.guo@intel.com> Signed-off-by: xiaolong <xiaolong.guo@intel.com>

Signed-off-by: xiaolong-intel <xiaolong.guo@intel.com> Signed-off-by: xiaolong <xiaolong.guo@intel.com>

Signed-off-by: root <xiaolong.guo@intel.com> Signed-off-by: xiaolong <xiaolong.guo@intel.com>

Signed-off-by: xiaolong-intel <xiaolong.guo@intel.com> Signed-off-by: xiaolong <xiaolong.guo@intel.com>

Signed-off-by: root <xiaolong.guo@intel.com> Signed-off-by: xiaolong <xiaolong.guo@intel.com>

Signed-off-by: xiaolong-intel <xiaolong.guo@intel.com> Signed-off-by: xiaolong <xiaolong.guo@intel.com>

…onversion of renormalization Signed-off-by: root <xiaolong.guo@intel.com> Signed-off-by: xiaolong <xiaolong.guo@intel.com>

Signed-off-by: root <xiaolong.guo@intel.com> Signed-off-by: <xiaolong.guo@intel.com> Signed-off-by: xiaolong <xiaolong.guo@intel.com>

xiaolong-intel · 2026-04-29T06:41:00Z

can you rebase and fix DCO?

Done.Thanks

Signed-off-by: xiaolong <xiaolong.guo@intel.com>

Signed-off-by: root <xiaolong.guo@intel.com>

Copilot AI review requested due to automatic review settings April 7, 2026 02:54

Copilot started reviewing on behalf of xiaolong-intel April 7, 2026 02:55 View session

xiaolong-intel mentioned this pull request Apr 7, 2026

optimizing grouped_topk_multi_group SYCL kernel for MoE routing #247

Closed

Copilot AI reviewed Apr 7, 2026

View reviewed changes

mayuyuace reviewed Apr 7, 2026

View reviewed changes

Comment thread tests/ops/grouped_topk_op.py

mayuyuace reviewed Apr 7, 2026

View reviewed changes

jikunshang reviewed Apr 7, 2026

View reviewed changes

xiaolong-intel changed the title ~~Grouped topk~~ optimized fused_grouped_topk SYCL kernel for MoE expert routin Apr 8, 2026

xiaolong-intel changed the title ~~optimized fused_grouped_topk SYCL kernel for MoE expert routin~~ optimized fused_grouped_topk SYCL kernel for MoE expert routing Apr 8, 2026

xiaolong-intel marked this pull request as draft April 8, 2026 07:08

xiaolong-intel marked this pull request as ready for review April 9, 2026 02:47

jikunshang reviewed Apr 9, 2026

View reviewed changes

xiaolong-intel marked this pull request as draft April 10, 2026 05:25

xiaolong-intel marked this pull request as ready for review April 10, 2026 05:57

xinyu-intel reviewed Apr 20, 2026

View reviewed changes

Comment thread csrc/moe/moe_ops.h

Copy link
Copy Markdown

Collaborator

xinyu-intel Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this for?

xinyu-intel reviewed Apr 20, 2026

View reviewed changes

Comment thread third_party/oneDNN

Copy link
Copy Markdown

Collaborator

xinyu-intel Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why update onednn?

xiaolong-intel marked this pull request as draft April 22, 2026 02:40

xiaolong-intel marked this pull request as ready for review April 22, 2026 03:11

xiaolong-intel force-pushed the grouped_topk branch from 367b868 to fa68c74 Compare April 29, 2026 05:31

xiaolong-intel marked this pull request as draft April 29, 2026 05:34

xiaolong-intel force-pushed the grouped_topk branch from fa68c74 to 367b868 Compare April 29, 2026 05:53

xiaolong-intel force-pushed the grouped_topk branch 2 times, most recently from 515d662 to 367b868 Compare April 29, 2026 05:57

xiaolong-intel added 10 commits April 29, 2026 06:16

optimizing grouped_topk_multi_group SYCL kernel for MoE routing

8083713

Signed-off-by: xiaolong <xiaolong.guo@intel.com>

optimizing fused_grouped_topk SYCL kernel for MoE expert routing

e0521b0

Signed-off-by: root <xiaolong.guo@intel.com> Signed-off-by: xiaolong <xiaolong.guo@intel.com>

Update fused_grouped_topk.cpp

bf44a73

Signed-off-by: xiaolong-intel <xiaolong.guo@intel.com> Signed-off-by: xiaolong <xiaolong.guo@intel.com>

import vllm_xpu_kernels._moe_C in fused_moe_interface.py

5bc1bc9

Signed-off-by: root <xiaolong.guo@intel.com> Signed-off-by: xiaolong <xiaolong.guo@intel.com>

Update torch_bindings.cpp

da3a68d

Signed-off-by: xiaolong-intel <xiaolong.guo@intel.com> Signed-off-by: xiaolong <xiaolong.guo@intel.com>

Update grouped_topk_op.py

72e3afd

Signed-off-by: xiaolong-intel <xiaolong.guo@intel.com> Signed-off-by: xiaolong <xiaolong.guo@intel.com>

Merge branch 'main' into grouped_topk

7143bb8

Modify to make the code more standardized

2ef197a

Signed-off-by: root <xiaolong.guo@intel.com> Signed-off-by: xiaolong <xiaolong.guo@intel.com>

Update utils.h

8116714

Signed-off-by: xiaolong-intel <xiaolong.guo@intel.com> Signed-off-by: xiaolong <xiaolong.guo@intel.com>

Removed unnecessary modifications and made changes to the precision c…

9a46cf9

…onversion of renormalization Signed-off-by: root <xiaolong.guo@intel.com> Signed-off-by: xiaolong <xiaolong.guo@intel.com>

xiaolong-intel force-pushed the grouped_topk branch from 367b868 to 5fb85ba Compare April 29, 2026 06:22

pre-commit check

5fb85ba

Signed-off-by: root <xiaolong.guo@intel.com> Signed-off-by: <xiaolong.guo@intel.com> Signed-off-by: xiaolong <xiaolong.guo@intel.com>

xiaolong-intel force-pushed the grouped_topk branch 2 times, most recently from 91fe795 to b6e2254 Compare April 29, 2026 06:39

xiaolong-intel marked this pull request as ready for review April 29, 2026 06:41

xiaolong-intel added 2 commits April 29, 2026 06:44

Revert unintended oneDNN and moe_ops.h changes

b6e2254

Signed-off-by: xiaolong <xiaolong.guo@intel.com>

add optimization for single group

6d9c006

Signed-off-by: root <xiaolong.guo@intel.com>

-        IdxT localCandidateIdx[MaxExpertCandidatesPerLane];
+        IdxT localCandidateIdx[MaxExpertCandidatesPerLane];
+        TORCH_CHECK(
+            topk <= DefaultMaxNumTopExperts,
+            "Grouped routing only supports topk <= ", DefaultMaxNumTopExperts,
+            ", but got topk=", topk,
+            ". Increase grouped-path storage/kernel support for larger topk values.");

		std::is_same_v<T, sycl::ext::oneapi::bfloat16> \|\|
		std::is_same_v<T, sycl::half>;;



		@pytest.mark.parametrize("n_token", [1, 33, 64])
		@pytest.mark.parametrize("n_token", [64, 50000,100000])

Conversation

xiaolong-intel commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

jikunshang commented Apr 7, 2026

Uh oh!

xiaolong-intel commented Apr 7, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mayuyuace commented Apr 7, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xiaolong-intel commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jikunshang left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xiaolong-intel commented Apr 10, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mayuyuace commented Apr 20, 2026

Uh oh!

xiaolong-intel commented Apr 20, 2026

Uh oh!

xinyu-intel commented Apr 28, 2026

Uh oh!

xiaolong-intel commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

xiaolong-intel commented Apr 7, 2026 •

edited

Loading

xiaolong-intel commented Apr 8, 2026 •

edited

Loading