Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
4421989
Split FMHA decode and GroupGemm template instantiations into per-kern…
Copilot Mar 18, 2026
b6f0d63
Fix noncontiguous input for rmsnorm (#117)
yangw1234 Mar 20, 2026
3796408
Add MXFP4 Per Token Group Quant kernel and tests (#106)
sspintel Mar 23, 2026
9bdad0b
add page 64
sunjiweiswift Mar 23, 2026
3c14768
Initial plan
Copilot Mar 23, 2026
644ff83
add reduce.h
sunjiweiswift Mar 13, 2026
ab884e9
add XeFMHAFwdSplitKVKernel
sunjiweiswift Mar 13, 2026
a732778
const tensor for Q
sunjiweiswift Mar 13, 2026
151122d
add split kernel
sunjiweiswift Mar 17, 2026
2bb49ae
save
sunjiweiswift Mar 17, 2026
ea443ce
cache_seqlens
sunjiweiswift Mar 17, 2026
e073c7a
head_dim =128
sunjiweiswift Mar 17, 2026
294cdcf
2026
sunjiweiswift Mar 18, 2026
b4ec20b
test for mingxu
sunjiweiswift Mar 23, 2026
40ba59f
Initial plan
Copilot Mar 23, 2026
64e9a55
Rebase onto main: integrate split-KV changes into flash_attention.cpp…
Copilot Mar 23, 2026
5f159b0
Merge remote after rebase onto main
Copilot Mar 23, 2026
8ce3170
Rebase onto updated split_kv_decode: fix FMHAConfig undefined, add co…
Copilot Mar 23, 2026
eeee619
Merge origin/split_kv_decode: resolve conflicts, rename kernel_dispat…
Copilot Mar 23, 2026
6f57253
bugfix
sunjiweiswift Mar 24, 2026
1614a4c
Refactor dispatch to function pointer tables following GroupGemmXe20 …
Copilot Mar 24, 2026
25d22d1
Refactor decode dispatch to struct operator() following GroupGemmXe20…
Copilot Mar 24, 2026
d2157bb
Replace function pointer table with direct struct operator() dispatch…
Copilot Mar 24, 2026
5c29480
Add use_sink and use_causal_mask to Arguments; remove bool use_sink f…
Copilot Mar 24, 2026
a7eaeee
Replace non-ASCII em-dash in flash_attention.cpp comment with ASCII h…
Copilot Mar 24, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/pr-test-xpu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,8 @@ jobs:
python3 bench_merge_states_v2.py 2>&1 | tee merge_states.py.log \
python3 bench_swiglu_alpha_limit.py 2>&1 | tee swiglu_alpha_limit.py.log \
python3 bench_fused_qk_norm_rope.py 2>&1 | tee fused_qk_norm_rope.py.log \
python3 bench_per_token_group_quant_8bit.py 2>&1 | tee per_token_group_quant_8bit.py.log \
python3 bench_per_token_group_quant_mxfp4.py 2>&1 | tee per_token_group_quant_mxfp4.py.log \
"

- name: Copy logs from container
Expand Down
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ set(CUTLASS_ENABLE_HEADERS_ONLY ON CACHE BOOL "Enable headers only mode in cutla
FetchContent_Declare(
repo-cutlass-sycl
GIT_REPOSITORY https://github.com/intel/sycl-tla.git
GIT_TAG 482b40e8bed0e9204311d1569c876b4573dfb952
GIT_TAG 64584484b4279b1b4184b508af445698a4a1b603
GIT_SHALLOW OFF
)
FetchContent_MakeAvailable(repo-cutlass-sycl)
Expand Down
5 changes: 3 additions & 2 deletions Dockerfile.xpu_kernel
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,10 @@ ARG SG_LANG_KERNEL_BRANCH=main
# Install the latest UMD driver for SYCL-TLA
RUN apt-get install -y software-properties-common && \
add-apt-repository -y ppa:kobuk-team/intel-graphics && \
apt-get update && \
apt-get install -y libze-intel-gpu1 libze1 intel-metrics-discovery intel-opencl-icd clinfo intel-gsc && \
apt-get install -y intel-media-va-driver-non-free libmfx-gen1 libvpl2 libvpl-tools libva-glx2 va-driver-all vainfo && \
apt-get install -y libze-dev intel-ocloc && \
apt-get update
apt-get install -y libze-dev intel-ocloc

# Install Miniforge & PyTorch/Triton
RUN curl -fsSL -v -o miniforge.sh -O https://github.com/conda-forge/miniforge/releases/download/25.1.1-0/Miniforge3-Linux-x86_64.sh && \
Expand Down Expand Up @@ -66,3 +66,4 @@ RUN --mount=type=secret,id=github_token \
# Set the default shell to bash
SHELL ["bash", "-c"]
CMD ["bash", "-c", "source /root/.bashrc && exec bash"]
USER root
Loading