kernel/riscv64:fixed the performance problem in RISCV64_ZVL256 when OPENBLAS_K is small by guoyuanplct · Pull Request #5291 · OpenMathLib/OpenBLAS

guoyuanplct · 2025-06-05T14:06:17Z

I made these two code modifications to address the HBMV issue. When the computation scale is too small, the performance of RVV is very poor. Therefore, I call the unvectorized code when the scale is small.
The numbers 8 and 16 in the code are the balance points I found. Around these values, the performance of the RVV version and the unvectorized version is close.

…o develop

guoyuanplct · 2025-06-05T14:17:45Z

This PR is directly related to issue #5286

martin-frbg · 2025-06-05T14:34:36Z

Thanks - that's pretty much the same as what I came up with in my initial experimentation. I do wonder if the _rvv kernels (used by the ZVL128B and x280 targets) perform better - one oddity I noticed is that the _vector kernels always request maximum vector length (VSETVL_MAX) while their _rvv counterparts seem to try to match the vector length to the actual amount of data. (I'm still rather new to RISCV though, so may be misreading the code...)

guoyuanplct · 2025-06-05T14:59:55Z

I'm also relatively new to RVV. My understanding is that both kernels will try to match the vector length to the actual amount of data. VSETVL_MAX is generally only used outside the computation loop to initialize some variables (or perform other types of operations), which need to be long enough to ensure they can be used for subsequent vector computations. Once inside the loop, both kernels will use VSETVL to match the appropriate length.

martin-frbg · 2025-06-05T18:20:46Z

Ah, you're right of course, I missed the later vsetvl() in zdot_vector.c

guoyuanplct added 2 commits June 5, 2025 21:53

fixed the performance problem in RISCV64_ZVL256 when OPENBLAS_K is small

2ae0191

Merge branch 'develop' of https://github.com/guoyuanplct/OpenBLAS int…

83fcab7

…o develop

martin-frbg added this to the 0.3.30 milestone Jun 5, 2025

martin-frbg merged commit fe220a0 into OpenMathLib:develop Jun 10, 2025
83 of 86 checks passed

martin-frbg mentioned this pull request Jun 10, 2025

The Performance Issues Caused by TARGET=RISCV64_ZVL256B #5286

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kernel/riscv64:fixed the performance problem in RISCV64_ZVL256 when OPENBLAS_K is small#5291

kernel/riscv64:fixed the performance problem in RISCV64_ZVL256 when OPENBLAS_K is small#5291
martin-frbg merged 2 commits intoOpenMathLib:developfrom
guoyuanplct:develop

guoyuanplct commented Jun 5, 2025

Uh oh!

guoyuanplct commented Jun 5, 2025

Uh oh!

martin-frbg commented Jun 5, 2025 •

edited

Loading

Uh oh!

guoyuanplct commented Jun 5, 2025

Uh oh!

martin-frbg commented Jun 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

guoyuanplct commented Jun 5, 2025

Uh oh!

guoyuanplct commented Jun 5, 2025

Uh oh!

martin-frbg commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

guoyuanplct commented Jun 5, 2025

Uh oh!

martin-frbg commented Jun 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

martin-frbg commented Jun 5, 2025 •

edited

Loading