Skip to content

Commit 6b3cb43

Browse files
committed
Update base for Update on "Reduce allocation overhead in quantized sdpa"
For small models dequantizing portions of v cache causes extra alloc overhead. Probably a better way to handle this is to dequantize entire v cache outside the model There isnt significant perf advantage from this yet but subsequent diffs will use caching allocator where this refactor help. Differential Revision: [D85532077](https://our.internmc.facebook.com/intern/diff/D85532077/) [ghstack-poisoned]
1 parent b11ca01 commit 6b3cb43

0 file changed

File tree

    0 commit comments

    Comments
     (0)