Feature/rvv support cpu#4425
Conversation
MNNBinaryAddInt8 / Sub / Mul / Min / Max / SqdInt8 MNNPackC4Int8ForMatMul_ASparse MNNScaleAndAddBiasInt8 MNNSumByAxisLForMatmul_A Signed-off-by: Jixingguang <1955992348@qq.com> Signed-off-by: Sherlockzhangjinge <zjgzhangjinge@outlook.com> Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn>
Signed-off-by: Jixingguang <1955992348@qq.com> Signed-off-by: Sherlockzhangjinge <zjgzhangjinge@outlook.com> Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn>
Signed-off-by: Jixingguang <1955992348@qq.com> Signed-off-by: Sherlockzhangjinge <zjgzhangjinge@outlook.com> Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn>
wangzhaode
left a comment
There was a problem hiding this comment.
Thanks for adding RVV support for the Int8 operations! Here are some issues that need to be addressed before merging:
1. 🔴 Missing semicolon — compilation error
In Int8FunctionsOpt.cpp, the function registration is missing a semicolon:
core->MNNSumByAxisLForMatmul_A = MNNSumByAxisLForMatmul_A // <-- missing ';'
#ifdef MNN_USE_SPARSE_COMPUTEThis will cause a compilation error.
2. 🔴 #elif MNN_USE_RVV should be #elif defined(MNN_USE_RVV)
Throughout Int8FunctionsOpt.cpp, the conditional branches use #elif MNN_USE_RVV instead of #elif defined(MNN_USE_RVV). The existing codebase uses #ifdef MNN_USE_RVV for consistency. While -DMNN_USE_RVV typically defines the macro as 1, using defined() is the safer and more consistent pattern.
3. 🟡 VLA (Variable Length Array) in C++ — non-standard
In MNNPackC4Int8ForMatMul_ASparse.cpp:
uint32_t src_idx[vl]; // vl is a runtime variable
uint32_t dst_idx[vl];VLAs are not part of the C++ standard (only C99). While GCC supports them as an extension, this is not portable. Also, computing indices with a scalar loop and then using gather/scatter (vloxei32/vsoxei32) may not provide meaningful speedup over a pure scalar implementation.
4. 🟡 Use shift instead of vdiv in MNNScaleAndAddBiasInt8_RVV
val = __riscv_vdiv_vx_i32m4(val, shift_div, vl); // shift_div = 1 << mShiftBitsDivision by a power of 2 should use arithmetic right shift (vsra) instead of vdiv, as integer division is typically very slow on RVV hardware.
5. 🟡 Missing performance benchmarks
Unlike PR #4426, this PR doesn't include any performance comparison data. It would be very helpful to show before/after numbers for the optimized functions, especially since some implementations (like the sparse packing) may not yield significant speedup.
6. 🟡 All checklist items are unchecked
Please verify and check the relevant items in the PR checklist (commit message format, compilation, testing, etc.).
7. Minor: All new files are missing the trailing newline at end of file (POSIX requirement).
Overall the RVV intrinsic usage looks solid for the binary operations. Please fix the compilation errors and address the other concerns. Thanks!
…h "#elif defined(MNN_USE_RVV)" in Int8FunctionsOpt.cpp, Signed-off-by: Jixingguang <1955992348@qq.com> Signed-off-by: Sherlockzhangjinge <zjgzhangjinge@outlook.com> Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn>
|
performance benchmarks are presented below: performance
|
Build Failure: RVV compilation breaks with 53 errorsI tested this PR with RISC-V RVV cross-compilation ( Environment
Error Categories1. Undeclared variable
|
Description
Optimize functions below with RVV:
MNNBinaryMaxInt8
MNNBinaryMinInt8
MNNBinaryMulInt8
MNNBinarySqdInt8
MNNBinarySubInt8
_ArmBasicMNNPackC4ForMatMul_A
_MNNPackC4Int8ForMatMul_ASparse
MNNScaleAndAddBiasInt8
MNNSumByAxisLForMatmul_A
Module
CPU
Type
Checklist
[Module:Type] Descriptionformat