Skip to content

Commit dc0d80e

Browse files
committed
Update base for Update on "Add GEMM-based standard SDPA benchmark"
Add bench_sdpa.cpp with a standalone GEMM-based SDPA implementation (run_standard_sdpa) alongside ExecuTorch's tiled flash attention (custom_sdpa_out) for comparative benchmarking. The standalone SDPA uses full GEMM per head with 3-pass softmax and supports both [B,S,H,D] and [B,H,S,D] layouts via BLAS leading dimension parameters, allowing isolation of algorithm vs layout effects. Includes validation tests that verify the GEMM-based implementation matches custom_sdpa_out within tolerance. Differential Revision: [D96044313](https://our.internmc.facebook.com/intern/diff/D96044313/) [ghstack-poisoned]
1 parent d62b5f4 commit dc0d80e

0 file changed

File tree

    0 commit comments

    Comments
     (0)