Commit dc0d80e

committed

Update base for Update on "Add GEMM-based standard SDPA benchmark"

Add bench_sdpa.cpp with a standalone GEMM-based SDPA implementation (run_standard_sdpa) alongside ExecuTorch's tiled flash attention (custom_sdpa_out) for comparative benchmarking. The standalone SDPA uses full GEMM per head with 3-pass softmax and supports both [B,S,H,D] and [B,H,S,D] layouts via BLAS leading dimension parameters, allowing isolation of algorithm vs layout effects. Includes validation tests that verify the GEMM-based implementation matches custom_sdpa_out within tolerance. Differential Revision: [D96044313](https://our.internmc.facebook.com/intern/diff/D96044313/) [ghstack-poisoned]

1 parent d62b5f4 commit dc0d80eCopy full SHA for dc0d80e

0 file changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit dc0d80e

File tree

0 commit comments