Commit dc0d80e
committed
Update base for Update on "Add GEMM-based standard SDPA benchmark"
Add bench_sdpa.cpp with a standalone GEMM-based SDPA implementation
(run_standard_sdpa) alongside ExecuTorch's tiled flash attention
(custom_sdpa_out) for comparative benchmarking.
The standalone SDPA uses full GEMM per head with 3-pass softmax and
supports both [B,S,H,D] and [B,H,S,D] layouts via BLAS leading
dimension parameters, allowing isolation of algorithm vs layout effects.
Includes validation tests that verify the GEMM-based implementation
matches custom_sdpa_out within tolerance.
Differential Revision: [D96044313](https://our.internmc.facebook.com/intern/diff/D96044313/)
[ghstack-poisoned]1 parent d62b5f4 commit dc0d80e
0 file changed
0 commit comments