-
Notifications
You must be signed in to change notification settings - Fork 28
Split-KV decode, refactor prefill instantiation, and add flash_attn CI benchmarking #145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
37 commits
Select commit
Hold shift + click to select a range
a20df4d
add reduce.h
sunjiweiswift 5ac747d
add XeFMHAFwdSplitKVKernel
sunjiweiswift 902a29c
const tensor for Q
sunjiweiswift 23bfd0c
add split kernel
sunjiweiswift b7dab66
save
sunjiweiswift 09f27ce
cache_seqlens
sunjiweiswift 121b736
head_dim =128
sunjiweiswift f47daf2
2026
sunjiweiswift 61c3ddc
test for mingxu
sunjiweiswift 6f76a31
add seqlen_k
sunjiweiswift 25a95a3
add page 64
sunjiweiswift b6257f2
add
sunjiweiswift 48bcf2e
Split kv decode (#146)
Copilot 10d7dfe
lint and bench
sunjiweiswift cb20d54
lint
sunjiweiswift 368ff64
Merge branch 'main' into split_kv_decode
sunjiweiswift d002cf0
fix
sunjiweiswift b1e152e
add HD 256
sunjiweiswift 056dd99
code opt
sunjiweiswift 38e2efa
lint
sunjiweiswift b0b8f6b
Merge branch 'main' into split_kv_decode
sunjiweiswift 153b637
Merge branch 'main' into split_kv_decode
sunjiweiswift 4971500
add 512
sunjiweiswift 84f43de
Merge branch 'main' into split_kv_decode
sunjiweiswift bca059a
lint
sunjiweiswift 63165db
add 256 and 512 prefill
sunjiweiswift 1c3030b
Refactor prefill instantiation to match decode/splitdecode pattern
Copilot 0831deb
Refactor FMHAPrefillXe20.cmake to match FMHADecodeXe20.cmake structure
sunjiweiswift 952184d
delete template page 32 and QZ 32
sunjiweiswift af91f16
Merge branch 'main' into split_kv_decode
sunjiweiswift c392fc6
fix oom
sunjiweiswift 9ad6f4d
lint
sunjiweiswift b23cc66
delete some case
sunjiweiswift 9130e35
fix ci benchmark scripts for mla_decode
pralay-das 9a03181
Add flash_attn benchmark support to CI pipeline and update_baseline_f…
Copilot 81498c1
update baseline
sunjiweiswift b540bc4
add baseline and line
sunjiweiswift File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| # Generate FMHA prefill kernel instantiation files. | ||
| # Each HEAD_DIM is compiled as a separate translation unit to parallelize | ||
| # and speed up compilation. | ||
|
|
||
| set(FMHA_PREFILL_HEAD_DIMS 64 96 128 192 256 512) | ||
|
|
||
| set(FMHA_PREFILL_TEMPLATE | ||
| "${CMAKE_CURRENT_SOURCE_DIR}/sycl/xe_fmha_fwd_prefill_kernel.cpp.in") | ||
|
|
||
| # Per-HEAD_DIM tile shape parameters (TILED_Q, TILED_KV, NUM_SG) | ||
| set(FMHA_PREFILL_TILED_Q_64 128) | ||
| set(FMHA_PREFILL_TILED_KV_64 64) | ||
| set(FMHA_PREFILL_NUM_SG_64 8) | ||
|
|
||
| set(FMHA_PREFILL_TILED_Q_96 128) | ||
| set(FMHA_PREFILL_TILED_KV_96 64) | ||
| set(FMHA_PREFILL_NUM_SG_96 8) | ||
|
|
||
| set(FMHA_PREFILL_TILED_Q_128 256) | ||
| set(FMHA_PREFILL_TILED_KV_128 32) | ||
| set(FMHA_PREFILL_NUM_SG_128 16) | ||
|
|
||
| set(FMHA_PREFILL_TILED_Q_192 256) | ||
| set(FMHA_PREFILL_TILED_KV_192 64) | ||
| set(FMHA_PREFILL_NUM_SG_192 32) | ||
|
|
||
| set(FMHA_PREFILL_TILED_Q_256 256) | ||
| set(FMHA_PREFILL_TILED_KV_256 64) | ||
| set(FMHA_PREFILL_NUM_SG_256 32) | ||
|
|
||
| set(FMHA_PREFILL_TILED_Q_512 256) | ||
| set(FMHA_PREFILL_TILED_KV_512 64) | ||
| set(FMHA_PREFILL_NUM_SG_512 32) | ||
|
|
||
| foreach(HEAD_DIM ${FMHA_PREFILL_HEAD_DIMS}) | ||
| set(TILED_Q ${FMHA_PREFILL_TILED_Q_${HEAD_DIM}}) | ||
| set(TILED_KV ${FMHA_PREFILL_TILED_KV_${HEAD_DIM}}) | ||
| set(NUM_SG ${FMHA_PREFILL_NUM_SG_${HEAD_DIM}}) | ||
|
|
||
| set(GENERATED_FILE | ||
| "${CMAKE_CURRENT_BINARY_DIR}/sycl/xe_fmha_fwd_prefill_kernel_${HEAD_DIM}.cpp") | ||
| configure_file(${FMHA_PREFILL_TEMPLATE} ${GENERATED_FILE} @ONLY) | ||
| list(APPEND device_cpp_common ${GENERATED_FILE}) | ||
| endforeach() |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.