perf: turbo VEC flash attention — +9% decode on CUDA via autoresearch#53
Open
signalnine wants to merge 153 commits into
Open
perf: turbo VEC flash attention — +9% decode on CUDA via autoresearch#53signalnine wants to merge 153 commits into
signalnine wants to merge 153 commits into
background
wait
wait-all
cancel
parallel
Loading