Commit 4a828d4
committed
Add GEMM-based standard SDPA benchmark
Add bench_sdpa.cpp with a standalone GEMM-based SDPA implementation
(run_standard_sdpa) alongside ExecuTorch's tiled flash attention
(custom_sdpa_out) for comparative benchmarking.
The standalone SDPA uses full GEMM per head with 3-pass softmax and
supports both [B,S,H,D] and [B,H,S,D] layouts via BLAS leading
dimension parameters, allowing isolation of algorithm vs layout effects.
Includes validation tests that verify the GEMM-based implementation
matches custom_sdpa_out within tolerance.
Differential Revision: [D96044313](https://our.internmc.facebook.com/intern/diff/D96044313/)
ghstack-source-id: 361224784
Pull Request resolved: #186461 parent 44e344c commit 4a828d4
2 files changed
+503
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
| 2 | + | |
2 | 3 | | |
3 | 4 | | |
4 | 5 | | |
| |||
89 | 90 | | |
90 | 91 | | |
91 | 92 | | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
0 commit comments