Commit bb4e73f
committed
Update on "Limit softmax to causally-valid elements in cpu_sdpa"
Instead of setting masked positions to -inf and computing
max/exp/normalize over all kvSize elements, limit the softmax
to only the causally-valid range per row. This avoids unnecessary
computation on masked positions and zero-fills them for GEMM 2.
Differential Revision: [D96044307](https://our.internmc.facebook.com/intern/diff/D96044307/)
[ghstack-poisoned]0 file changed
0 commit comments