kernel/riscv64:Optimized the implementation of axpby on TARGET=RISCV64_ZVL256B.#5288
kernel/riscv64:Optimized the implementation of axpby on TARGET=RISCV64_ZVL256B.#5288martin-frbg merged 2 commits intoOpenMathLib:developfrom
Conversation
|
Just out of curiosity and for my learning, i have below question : What are the key practical scenarios or algorithms where a dedicated AXPBY kernel from openBLAS provides a significant performance advantage given that if we already have a highly optimized AXPY ? Thanks |
|
We may not have a highly optimized AXPY on all architectures, and the current default for AXPBY is a naive C loop instead of combining calls to SCAL and AXPY in the interface. (The git log suggests that axpby was added a decade ago for compatibility with MKL - #285 - and nobody looked at it - or its performance - ever since) |
|
(small correction - the Loongson crew did add optimized kernels for their hardware in late 2023, so this is not entirely without precedent. There are no callers in Reference-LAPACK, and the only user in OpenBLAS itself seems to be the generic GEADD, so this may have gone mostly unnoticed) |
The specific improvements are shown in the figure below.

