Commit 48773c2
committed
Add GEMM-based standard SDPA benchmark
Pull Request resolved: #18646
Add bench_sdpa.cpp with a standalone GEMM-based SDPA implementation
(run_standard_sdpa) alongside ExecuTorch's tiled flash attention
(custom_sdpa_out) for comparative benchmarking.
The standalone SDPA uses full GEMM per head with 3-pass softmax and
supports both [B,S,H,D] and [B,H,S,D] layouts via BLAS leading
dimension parameters, allowing isolation of algorithm vs layout effects.
Includes validation tests that verify the GEMM-based implementation
matches custom_sdpa_out within tolerance.
ghstack-source-id: 374666323
@exported-using-ghexport
Differential Revision: [D96044313](https://our.internmc.facebook.com/intern/diff/D96044313/)1 parent b2cc00d commit 48773c2
2 files changed
Lines changed: 590 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
| 2 | + | |
2 | 3 | | |
3 | 4 | | |
4 | 5 | | |
| |||
89 | 90 | | |
90 | 91 | | |
91 | 92 | | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
0 commit comments