You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Limit softmax to causally-valid elements in cpu_sdpa
Pull Request resolved: #18650
Instead of setting masked positions to -inf and computing
max/exp/normalize over all kvSize elements, limit the softmax
to only the causally-valid range per row. This avoids unnecessary
computation on masked positions and zero-fills them for GEMM 2.
ghstack-source-id: 374666321
@exported-using-ghexport
Differential Revision: [D96044307](https://our.internmc.facebook.com/intern/diff/D96044307/)
0 commit comments