You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Limit softmax to causally-valid elements in cpu_sdpa
Instead of setting masked positions to -inf and computing
max/exp/normalize over all kvSize elements, limit the softmax
to only the causally-valid range per row. This avoids unnecessary
computation on masked positions and zero-fills them for GEMM 2.
Differential Revision: [D96044307](https://our.internmc.facebook.com/intern/diff/D96044307/)
ghstack-source-id: 361224791
Pull Request resolved: #18650
0 commit comments