You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Enable split-K decode SDPA by default with --no-splitk opt-out
Add `use_splitk_decode` config flag to control whether FullAttention
uses the split-K (flash-decoding) SDPA kernel or the tiled SDPA for
decode (T=1). The split-K kernel partitions the KV sequence across
CTAs, yielding ~20% higher decode throughput on H100:
Variant Decode tok/s (avg across prompts)
Tiled SDPA 88.5
Split-K SDPA 107.5 (+21%)
The flag defaults to True (split-K on). Pass `--no-splitk` at export
time to disable. Quality is verified identical at temperature=0.
This PR was authored with the assistance of Claude
0 commit comments