Split kv decode by Copilot · Pull Request #146 · sgl-project/sgl-kernel-xpu

Copilot · 2026-03-23T07:17:35Z

Add bool use_sink = false; and bool use_causal_mask = false; to Arguments struct in xe_fmha_fwd_decode_runner.hpp
Remove standalone bool use_sink parameter from FmhaDecodeRunner::operator() and FmhaSplitDecodeRunner::operator() signatures
Update .cpp.in bodies: dispatch on params.use_sink and params.use_causal_mask
Update flash_attention.cpp: set params.use_sink and params.use_causal_mask; update DISPATCH_DECODE_KERNEL macro
Fix non-ASCII em-dash character in flash_attention.cpp comment (replace with ASCII -)

📍 Connect Copilot coding agent with Jira, Azure Boards or Linear to delegate work to Copilot in one click without leaving your project management tool.

…el compilation units (#140) Split the monolithic template instantiation of xe_fmha_fwd_decode_runner.hpp into 72 separate .cpp files (one per QG_SZ × HEAD_DIM × PAGE_SIZE combination), each compiled as its own library. This enables parallel compilation and significantly speeds up build times. Changes: - Create xe_fmha_fwd_decode_kernel.cpp.in template for per-combination compilation - Create xe_fmha_fwd_decode_dispatch.hpp with function declarations for all 72 kernels - Move decode::mha_fwd() from header to flash_attention.cpp with dispatch table - Update src/CMakeLists.txt to generate .cpp files via configure_file() - Remove mha_fwd() definition from xe_fmha_fwd_decode_runner.hpp header Co-authored-by: airMeng <39229107+airMeng@users.noreply.github.com> Co-authored-by: jiwei1.sun <jiwei1.sun@intel.com>

* fix norm with noncontiguous input * remove comment out test * support in kernel

* Add MXFP4 Per Token Group Quant kernel and tests Remove commented out fp8 blockwise group gemm registration * Add benchmarking for per token group quant mxfp4 * Add test to run_suite.py * Fix group size constraint for mxfp4; Add benchmark test to CI flow * Remove reference provider from the benchmark script - Add check for quantized and scale values separately - Include eps value in ref quant function call * Fix MXFP4 quantization to match OCP MX spec - Replace ceil(log2(max/6.0)) scale computation with floor(log2(max)) - E2M1_EMAX per OCP MX spec - Fix roundTiesToEven at midpoints in SYCL kernel (change <= to < at odd-mantissa boundaries) - Replace naive argmin-based quantize_to_e2m1 reference with microxcaling _quantize_elemwise_core algorithm - Normalize signed zeros (+0.0 vs -0.0) before packed byte comparison in tests and benchmark * Fix lint issues * Remove unsupported group sizes * Apply formatting check * Add TODO for quantize_to_e2m1 * Update tests/test_per_token_group_quant_mxfp4.py Co-authored-by: Meng, Hengyu <airdldl@163.com> * trigger CI --------- Co-authored-by: Meng, Hengyu <airdldl@163.com>

… and fix SplitDeodeConfig typo

sunjiweiswift · 2026-03-23T09:07:41Z

@copilot ”FMHAConfig“ undefine and split_kv_decode has update. please rebase main again

sunjiweiswift · 2026-03-23T09:09:02Z

@copilot ”FMHAConfig“ undefine and split_kv_decode has update. please rebase main again

…nditional dispatch between DecodeConfig and SplitDecodeConfig, comment out page_size 32 Co-authored-by: sunjiweiswift <16934286+sunjiweiswift@users.noreply.github.com> Agent-Logs-Url: https://github.com/sgl-project/sgl-kernel-xpu/sessions/82c107d1-3f61-4ce0-9444-d7f19f27a292

Copilot · 2026-03-23T09:13:19Z

@copilot ”FMHAConfig“ undefine and split_kv_decode has update. please rebase main again

Fixed in 8ce3170. Three changes:

FMHAConfig → DecodeConfig in xe_fmha_fwd_decode_kernel.cpp.in (fixes the undefined symbol)
Added conditional dispatch in flash_attention.cpp: uses SplitDecodeConfig when params.use_split_kv_decode is true, otherwise falls back to DecodeConfig
Commented out page_size 32 support (matching base branch update)

…ch to run

Copilot · 2026-03-23T09:19:48Z

This branch has conflicts that must be resolved @copilot

Resolved in eeee619. Merged latest origin/split_kv_decode, resolved the conflict in xe_fmha_fwd_decode_runner.hpp (kept mha_fwd in flash_attention.cpp), and renamed kernel_dispatch → run to match the base branch update.

@copilot ”FMHAConfig“ undefine and split_kv_decode has update. please rebase main again

Addressed in 8ce3170 (FMHAConfig fix) and eeee619 (merge conflicts resolved with latest split_kv_decode).

sunjiweiswift · 2026-03-24T02:53:49Z

@copilot Use a function pointer table to perform the calls. You can refer to flash_attention.cpp in the main branch. Rather than instantiating a template object.

…pattern Co-authored-by: sunjiweiswift <16934286+sunjiweiswift@users.noreply.github.com> Agent-Logs-Url: https://github.com/sgl-project/sgl-kernel-xpu/sessions/5c058f18-d408-4752-aba6-2625a6e5e638

Copilot · 2026-03-24T03:09:41Z

@copilot Use a function pointer table to perform the calls. You can refer to flash_attention.cpp in the main branch. Rather than instantiating a template object.

Done in 1614a4c. Following the GroupGemmXe20 pattern:

xe_fmha_fwd_decode_runner.hpp: Added launch_fmha_decode<QG, HD, PS> and launch_fmha_split_decode<QG, HD, PS> template function definitions here.
.cpp.in files: Reduced to a single explicit instantiation line (template void launch_fmha_decode<@QG_SZ@, @HEAD_DIM@, @PAGE_SIZE@>(bool, const Arguments&)) — matching GroupGemmXe20LauncherInstance.cpp.in.
xe_fmha_fwd_decode_dispatch.hpp: Changed to template declarations + extern template instantiation declarations (equivalent to GroupGemmXe20.cpp's DECLARE_XE20_MOE_EXTERN pattern).
flash_attention.cpp: Function pointer table uses &launch_fmha_decode<QG, HD, PS> addresses — no template objects instantiated inline.

… pattern Co-authored-by: sunjiweiswift <16934286+sunjiweiswift@users.noreply.github.com> Agent-Logs-Url: https://github.com/sgl-project/sgl-kernel-xpu/sessions/871c6171-1314-44d4-b4f4-007ddffc6ed4

sunjiweiswift · 2026-03-24T03:26:31Z

@copilot
In flash_attention.cpp, include only the struct declarations and calls to the corresponding structs' operator(); the actual struct definitions should be implemented in cpp.in. You may refer to GroupGemmXe20.cpp as a reference.

Copilot · 2026-03-24T03:27:30Z

@copilot
In flash_attention.cpp, include only the struct declarations and calls to the corresponding structs' operator(); the actual struct definitions should be implemented in ...

Already done in 25d22d1:

flash_attention.cpp: Contains only extern template struct declarations (via dispatch.hpp) and calls via &FmhaDecodeRunner<QG,HD,PS>::call / &FmhaSplitDecodeRunner<QG,HD,PS>::call — no kernel definitions.
runner.hpp: Structs with operator() declared only (no body), plus a static call() helper.
.cpp.in files: Full operator() body + template struct explicit instantiation — mirroring the GroupGemmXe20LauncherInstance.cpp.in pattern.

… (GroupGemmXe20 pattern) Co-authored-by: sunjiweiswift <16934286+sunjiweiswift@users.noreply.github.com> Agent-Logs-Url: https://github.com/sgl-project/sgl-kernel-xpu/sessions/85877884-e91a-400f-adcc-5b9a55692df9

…rom operator() signature Co-authored-by: sunjiweiswift <16934286+sunjiweiswift@users.noreply.github.com> Agent-Logs-Url: https://github.com/sgl-project/sgl-kernel-xpu/sessions/f88d0fec-8f93-4f48-99b8-714d68fd14f4

…yphen Co-authored-by: sunjiweiswift <16934286+sunjiweiswift@users.noreply.github.com> Agent-Logs-Url: https://github.com/sgl-project/sgl-kernel-xpu/sessions/8cc49274-a20f-42e3-aad5-39043ba2eefa

Copilot AI and others added 5 commits March 18, 2026 14:13

Fix noncontiguous input for rmsnorm (#117)

b6f0d63

* fix norm with noncontiguous input * remove comment out test * support in kernel

add page 64

9bdad0b

Initial plan

3c14768

Copilot AI assigned Copilot and sunjiweiswift Mar 23, 2026

Copilot AI mentioned this pull request Mar 23, 2026

Split-KV decode, refactor prefill instantiation, and add flash_attn CI benchmarking #145

Merged

Copilot started work on behalf of sunjiweiswift March 23, 2026 07:17 View session

sunjiweiswift and others added 12 commits March 23, 2026 07:19

add reduce.h

644ff83

add XeFMHAFwdSplitKVKernel

ab884e9

const tensor for Q

a732778

add split kernel

151122d

save

2bb49ae

cache_seqlens

ea443ce

head_dim =128

e073c7a

2026

294cdcf

test for mingxu

b4ec20b

Initial plan

40ba59f

Rebase onto main: integrate split-KV changes into flash_attention.cpp…

64e9a55

… and fix SplitDeodeConfig typo

Merge remote after rebase onto main

5f159b0

Copilot AI changed the title ~~[WIP] Split kv decode to enhance performance~~ Split kv decode Mar 23, 2026

Copilot AI requested a review from sunjiweiswift March 23, 2026 07:33

Copilot finished work on behalf of sunjiweiswift March 23, 2026 07:33

sunjiweiswift force-pushed the split_kv_decode branch from 9bdad0b to 25a95a3 Compare March 23, 2026 09:02

Copilot started work on behalf of sunjiweiswift March 23, 2026 09:07 View session

Copilot finished work on behalf of sunjiweiswift March 23, 2026 09:15

Copilot started work on behalf of sunjiweiswift March 23, 2026 09:15 View session

Merge origin/split_kv_decode: resolve conflicts, rename kernel_dispat…

eeee619

…ch to run

Copilot finished work on behalf of sunjiweiswift March 23, 2026 09:21

bugfix

6f57253

Copilot started work on behalf of sunjiweiswift March 24, 2026 02:53 View session

Refactor dispatch to function pointer tables following GroupGemmXe20 …

1614a4c

…pattern Co-authored-by: sunjiweiswift <16934286+sunjiweiswift@users.noreply.github.com> Agent-Logs-Url: https://github.com/sgl-project/sgl-kernel-xpu/sessions/5c058f18-d408-4752-aba6-2625a6e5e638

Copilot finished work on behalf of sunjiweiswift March 24, 2026 03:11

Copilot started work on behalf of sunjiweiswift March 24, 2026 03:14 View session

Refactor decode dispatch to struct operator() following GroupGemmXe20…

25d22d1

… pattern Co-authored-by: sunjiweiswift <16934286+sunjiweiswift@users.noreply.github.com> Agent-Logs-Url: https://github.com/sgl-project/sgl-kernel-xpu/sessions/871c6171-1314-44d4-b4f4-007ddffc6ed4

Copilot finished work on behalf of sunjiweiswift March 24, 2026 03:21

Copilot started work on behalf of sunjiweiswift March 24, 2026 03:26 View session

Copilot finished work on behalf of sunjiweiswift March 24, 2026 03:27

Copilot started work on behalf of sunjiweiswift March 24, 2026 03:30 View session

Copilot finished work on behalf of sunjiweiswift March 24, 2026 03:34

Copilot started work on behalf of sunjiweiswift March 24, 2026 03:44 View session

Copilot finished work on behalf of sunjiweiswift March 24, 2026 03:47

Copilot started work on behalf of sunjiweiswift March 24, 2026 04:51 View session

Replace non-ASCII em-dash in flash_attention.cpp comment with ASCII h…

a7eaeee

…yphen Co-authored-by: sunjiweiswift <16934286+sunjiweiswift@users.noreply.github.com> Agent-Logs-Url: https://github.com/sgl-project/sgl-kernel-xpu/sessions/8cc49274-a20f-42e3-aad5-39043ba2eefa

Copilot finished work on behalf of sunjiweiswift March 24, 2026 04:54

sunjiweiswift marked this pull request as ready for review March 24, 2026 05:12

sunjiweiswift merged commit 48bcf2e into split_kv_decode Mar 24, 2026

sunjiweiswift deleted the copilot/sub-pr-145 branch March 24, 2026 05:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split kv decode#146

Split kv decode#146
sunjiweiswift merged 25 commits intosplit_kv_decodefrom
copilot/sub-pr-145

Copilot AI commented Mar 23, 2026 •

edited

Loading

Uh oh!

sunjiweiswift commented Mar 23, 2026

Uh oh!

sunjiweiswift commented Mar 23, 2026

Uh oh!

Copilot AI commented Mar 23, 2026

Uh oh!

Copilot AI commented Mar 23, 2026 •

edited

Loading

Uh oh!

sunjiweiswift commented Mar 24, 2026

Uh oh!

Copilot AI commented Mar 24, 2026

Uh oh!

sunjiweiswift commented Mar 24, 2026

Uh oh!

Copilot AI commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Copilot AI commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sunjiweiswift commented Mar 23, 2026

Uh oh!

sunjiweiswift commented Mar 23, 2026

Uh oh!

Copilot AI commented Mar 23, 2026

Uh oh!

Copilot AI commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sunjiweiswift commented Mar 24, 2026

Uh oh!

Copilot AI commented Mar 24, 2026

Uh oh!

sunjiweiswift commented Mar 24, 2026

Uh oh!

Copilot AI commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Copilot AI commented Mar 23, 2026 •

edited

Loading

Copilot AI commented Mar 23, 2026 •

edited

Loading