Commit af26ee9
committed
Re-enable warp-agnostic ROCm SDPA kernel
Re-enable the optimized SDPA kernel with the warp-size agnostic
implementation. The kernel uses 32-thread tiles for consistent
behavior across RDNA and CDNA architectures.
The memory fault issue appears to be elsewhere in the inference
pipeline, not in SDPA.1 parent a6bf8cb commit af26ee9
1 file changed
Lines changed: 20 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
216 | 216 | | |
217 | 217 | | |
218 | 218 | | |
219 | | - | |
220 | | - | |
221 | | - | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
222 | 239 | | |
223 | 240 | | |
224 | 241 | | |
| |||
0 commit comments