Commit e968c4a
committed
Add GEMM-based standard SDPA benchmark and fix custom_sdpa_out signature
Adds a standalone GEMM-based (non-tiled) attention benchmark alongside
the existing ET flash attention benchmark, allowing direct comparison
of the two algorithms. The standard SDPA uses cblas_sgemm for Q@K^T
and scores@V with 3-pass softmax, matching the approach used by
ONNX Runtime's GQA operator.
Both transposed [B,H,S,D] and standard [B,S,H,D] cache layouts are
supported via BLAS leading dimension parameter. Validation tests run
before benchmarks to ensure correctness against ET's custom_sdpa_out.
Also fixes broken custom_sdpa_out calls to match the new 3-bool
signature (is_seq_dim_2, is_k_seq_dim_2, is_v_seq_dim_2).
Authored with Claude.
Differential Revision: [D99677686](https://our.internmc.facebook.com/intern/diff/D99677686/)
[ghstack-poisoned]1 parent 45468e1 commit e968c4a
2 files changed
Lines changed: 381 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
139 | 139 | | |
140 | 140 | | |
141 | 141 | | |
| 142 | + | |
| 143 | + | |
142 | 144 | | |
143 | 145 | | |
144 | 146 | | |
| |||
0 commit comments