Commit 4d08bf0
committed
Enable split-K decode SDPA by default with --no-splitk opt-out
Add `use_splitk_decode` config flag to control whether FullAttention
uses the split-K (flash-decoding) SDPA kernel or the tiled SDPA for
decode (T=1). The split-K kernel partitions the KV sequence across
CTAs, yielding ~20% higher decode throughput on H100:
Variant Decode tok/s (avg across prompts)
Tiled SDPA 88.5
Split-K SDPA 107.5 (+21%)
The flag defaults to True (split-K on). Pass `--no-splitk` at export
time to disable. Quality is verified identical at temperature=0.
This PR was authored with the assistance of Claude1 parent 4e72d4b commit 4d08bf0
2 files changed
Lines changed: 24 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
77 | 77 | | |
78 | 78 | | |
79 | 79 | | |
| 80 | + | |
80 | 81 | | |
81 | 82 | | |
82 | 83 | | |
| |||
111 | 112 | | |
112 | 113 | | |
113 | 114 | | |
| 115 | + | |
114 | 116 | | |
115 | 117 | | |
116 | 118 | | |
| |||
133 | 135 | | |
134 | 136 | | |
135 | 137 | | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
136 | 142 | | |
137 | 143 | | |
138 | 144 | | |
| |||
148 | 154 | | |
149 | 155 | | |
150 | 156 | | |
151 | | - | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
152 | 162 | | |
153 | 163 | | |
154 | 164 | | |
| |||
162 | 172 | | |
163 | 173 | | |
164 | 174 | | |
165 | | - | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
166 | 178 | | |
167 | 179 | | |
168 | 180 | | |
169 | 181 | | |
170 | 182 | | |
| 183 | + | |
171 | 184 | | |
172 | 185 | | |
173 | 186 | | |
| |||
181 | 194 | | |
182 | 195 | | |
183 | 196 | | |
| 197 | + | |
184 | 198 | | |
185 | 199 | | |
186 | 200 | | |
| |||
783 | 797 | | |
784 | 798 | | |
785 | 799 | | |
| 800 | + | |
| 801 | + | |
| 802 | + | |
| 803 | + | |
| 804 | + | |
786 | 805 | | |
787 | 806 | | |
788 | 807 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
50 | 50 | | |
51 | 51 | | |
52 | 52 | | |
| 53 | + | |
53 | 54 | | |
54 | 55 | | |
55 | 56 | | |
| |||
231 | 232 | | |
232 | 233 | | |
233 | 234 | | |
| 235 | + | |
234 | 236 | | |
235 | 237 | | |
236 | 238 | | |
| |||
289 | 291 | | |
290 | 292 | | |
291 | 293 | | |
292 | | - | |
| 294 | + | |
293 | 295 | | |
294 | 296 | | |
295 | 297 | | |
| |||
0 commit comments