Commit 4de4538
committed
Add --splitk flag and enable split-K decode SDPA by default
Add `use_splitk_decode` config flag to control whether FullAttention
uses the split-K (flash-decoding) SDPA kernel or the tiled SDPA for
decode (T=1). The split-K kernel partitions the KV sequence across
CTAs, yielding ~20% higher decode throughput on H100:
Variant Decode tok/s (avg across prompts)
Tiled SDPA 88.5
Split-K SDPA 107.5 (+21%)
The flag defaults to True (split-K on) and can be disabled at export
time by omitting `--splitk`. Quality is verified identical at
temperature=0.
This PR was authored with the assistance of Claude1 parent 44b69df commit 4de4538
2 files changed
Lines changed: 23 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
| 33 | + | |
33 | 34 | | |
34 | | - | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
35 | 40 | | |
36 | 41 | | |
37 | 42 | | |
38 | 43 | | |
39 | 44 | | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
40 | 49 | | |
41 | 50 | | |
42 | 51 | | |
| |||
51 | 60 | | |
52 | 61 | | |
53 | 62 | | |
54 | | - | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
55 | 66 | | |
56 | 67 | | |
57 | 68 | | |
58 | 69 | | |
59 | 70 | | |
| 71 | + | |
60 | 72 | | |
61 | 73 | | |
62 | 74 | | |
| |||
70 | 82 | | |
71 | 83 | | |
72 | 84 | | |
| 85 | + | |
73 | 86 | | |
74 | 87 | | |
75 | 88 | | |
| |||
557 | 570 | | |
558 | 571 | | |
559 | 572 | | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
560 | 578 | | |
561 | 579 | | |
562 | 580 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
51 | 51 | | |
52 | 52 | | |
53 | 53 | | |
| 54 | + | |
54 | 55 | | |
55 | 56 | | |
56 | 57 | | |
| |||
232 | 233 | | |
233 | 234 | | |
234 | 235 | | |
| 236 | + | |
235 | 237 | | |
236 | 238 | | |
237 | 239 | | |
| |||
290 | 292 | | |
291 | 293 | | |
292 | 294 | | |
293 | | - | |
| 295 | + | |
294 | 296 | | |
295 | 297 | | |
296 | 298 | | |
| |||
0 commit comments