Pull Request #5157 did excellent work and the performance of non-transposed [SD]GEMV improved.
I think Neoverse V1 has room for further performance improvement and the A64FX has a better optimal loop unrolling number. So, I would like to propose a patch for the A64FX and Neoverse V1.
Pull Request #5157 did excellent work and the performance of non-transposed [SD]GEMV improved.
I think Neoverse V1 has room for further performance improvement and the A64FX has a better optimal loop unrolling number. So, I would like to propose a patch for the A64FX and Neoverse V1.