Skip to content

Feature/rvv support cpu#4425

Open
Sherlockzhangjinge wants to merge 5 commits into
alibaba:masterfrom
Sherlockzhangjinge:feature/rvv_support_cpu
Open

Feature/rvv support cpu#4425
Sherlockzhangjinge wants to merge 5 commits into
alibaba:masterfrom
Sherlockzhangjinge:feature/rvv_support_cpu

Conversation

@Sherlockzhangjinge
Copy link
Copy Markdown
Contributor

@Sherlockzhangjinge Sherlockzhangjinge commented May 6, 2026

Description

Optimize functions below with RVV:
MNNBinaryMaxInt8
MNNBinaryMinInt8
MNNBinaryMulInt8
MNNBinarySqdInt8
MNNBinarySubInt8
_ArmBasicMNNPackC4ForMatMul_A
_MNNPackC4Int8ForMatMul_ASparse
MNNScaleAndAddBiasInt8
MNNSumByAxisLForMatmul_A

Module

CPU

Type

  • Feature
  • Bugfix
  • Perf
  • Refact
  • Style
  • Doc
  • Test
  • Chore

Checklist

  • Commit message follows [Module:Type] Description format
  • Code compiles without errors
  • Tested on relevant platform(s)
  • No unrelated format or style changes included

Sherlockzhangjinge and others added 3 commits May 6, 2026 12:13
MNNBinaryAddInt8 / Sub / Mul / Min / Max / SqdInt8

MNNPackC4Int8ForMatMul_ASparse

MNNScaleAndAddBiasInt8

MNNSumByAxisLForMatmul_A

Signed-off-by: Jixingguang <1955992348@qq.com>

Signed-off-by: Sherlockzhangjinge <zjgzhangjinge@outlook.com>

Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn>
Signed-off-by: Jixingguang <1955992348@qq.com>
Signed-off-by: Sherlockzhangjinge <zjgzhangjinge@outlook.com>
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn>
Signed-off-by: Jixingguang <1955992348@qq.com>
Signed-off-by: Sherlockzhangjinge <zjgzhangjinge@outlook.com>
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn>
@wangzhaode wangzhaode self-assigned this May 9, 2026
Copy link
Copy Markdown
Collaborator

@wangzhaode wangzhaode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding RVV support for the Int8 operations! Here are some issues that need to be addressed before merging:

1. 🔴 Missing semicolon — compilation error
In Int8FunctionsOpt.cpp, the function registration is missing a semicolon:

core->MNNSumByAxisLForMatmul_A = MNNSumByAxisLForMatmul_A  // <-- missing ';'
#ifdef MNN_USE_SPARSE_COMPUTE

This will cause a compilation error.

2. 🔴 #elif MNN_USE_RVV should be #elif defined(MNN_USE_RVV)
Throughout Int8FunctionsOpt.cpp, the conditional branches use #elif MNN_USE_RVV instead of #elif defined(MNN_USE_RVV). The existing codebase uses #ifdef MNN_USE_RVV for consistency. While -DMNN_USE_RVV typically defines the macro as 1, using defined() is the safer and more consistent pattern.

3. 🟡 VLA (Variable Length Array) in C++ — non-standard
In MNNPackC4Int8ForMatMul_ASparse.cpp:

uint32_t src_idx[vl];  // vl is a runtime variable
uint32_t dst_idx[vl];

VLAs are not part of the C++ standard (only C99). While GCC supports them as an extension, this is not portable. Also, computing indices with a scalar loop and then using gather/scatter (vloxei32/vsoxei32) may not provide meaningful speedup over a pure scalar implementation.

4. 🟡 Use shift instead of vdiv in MNNScaleAndAddBiasInt8_RVV

val = __riscv_vdiv_vx_i32m4(val, shift_div, vl);  // shift_div = 1 << mShiftBits

Division by a power of 2 should use arithmetic right shift (vsra) instead of vdiv, as integer division is typically very slow on RVV hardware.

5. 🟡 Missing performance benchmarks
Unlike PR #4426, this PR doesn't include any performance comparison data. It would be very helpful to show before/after numbers for the optimized functions, especially since some implementations (like the sparse packing) may not yield significant speedup.

6. 🟡 All checklist items are unchecked
Please verify and check the relevant items in the PR checklist (commit message format, compilation, testing, etc.).

7. Minor: All new files are missing the trailing newline at end of file (POSIX requirement).

Overall the RVV intrinsic usage looks solid for the binary operations. Please fix the compilation errors and address the other concerns. Thanks!

…h "#elif defined(MNN_USE_RVV)" in Int8FunctionsOpt.cpp,

Signed-off-by: Jixingguang <1955992348@qq.com>
Signed-off-by: Sherlockzhangjinge <zjgzhangjinge@outlook.com>
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn>
@Sherlockzhangjinge
Copy link
Copy Markdown
Contributor Author

performance benchmarks are presented below:

performance

scalar rvv
duration_time/s cycles instructions cache-references cache-misses branches branch-misses duration_time cycles instructions cache-references cache-misses branches branch-misses
MNNBinaryAddInt8 66.53 172,903,860,156 349,852,703,454 680,846,066 331,931,392 42,397,515,963 636,166 12.06 31,330,033,068 19,418,133,938 666,854,980 221,998,791 686,983,222 307,455
MNNBinarySubInt8 66.48 172,763,396,099 349,855,095,439 680,914,625 330,409,288 42,397,127,206 681,841 12.04 31,278,118,708 19,419,414,145 667,090,046 206,141,287 687,082,496 330,229
MNNBinaryMulInt8 203.10 480,912,544,242 375,418,956,351 1,754,654,965 378,504,360 43,481,933,534 7,864,684,046 12.09 31,415,867,812 19,414,891,024 666,894,622 192,042,720 687,005,444 314,552
MNNBinaryMinInt8 134.31 348,721,335,260 375,598,884,514 1,779,000,265 344,254,891 53,043,679,856 8,201,427,052 21.68 56,346,829,053 20,765,124,442 668,407,916 214,227,216 688,709,028 374,613
MNNBinaryMaxInt8 101.91 264,218,429,256 314,743,657,531 1,985,868,509 349,306,085 53,128,780,858 7,883,763,253 21.55 55,994,757,129 20,767,992,937 668,610,267 250,477,984 688,829,526 423,480
MNNBinarySqdInt8 114.14 296,659,717,650 368,717,655,903 1,934,702,827 331,174,957 42,404,845,000 3,176,002,037 12.52 32,545,006,223 20,083,250,785 666,872,870 193,599,975 687,168,233 325,492
_ArmBasicMNNPackC4ForMatMul_A 0.61 1,551,709,009 1,435,828,038 37,703,094 199,039 292,163,779 7,182,908 0.63 1,601,333,949 1,369,221,821 30,564,334 198,645 178,398,338 1,258,289
MNNPackC4Int8ForMatMul_ASparse 2.38 6,133,501,634 8,696,697,664 1,399,052,972 297,685 756,538,167 1,265,049 5.79 14,974,071,109 10,523,703,951 793,246,162 456,525 508,815,102 2,002,872
MNNSumByAxisLForMatmul_A 0.14 355,188,776 634,547,779 186,471 31,358 126,589,595 683,611 0.44 1,142,175,936 492,564,678 230,155 38,927 64,745,342 508,400
MNNScaleAndAddBiasInt8 4.53 11,761,995,284 13,194,406,480 26,196,075 132,992 2,030,835,011 378,039,067 1.30 3,361,665,880 1,420,243,319 13,887,336 60,642 84,232,891 125,620

@wangzhaode
Copy link
Copy Markdown
Collaborator

Build Failure: RVV compilation breaks with 53 errors

I tested this PR with RISC-V RVV cross-compilation (-DMNN_USE_RVV=ON) using riscv64-linux-gnu-g++-13 on Ubuntu 24.04. The build fails completely with 53 errors in source/backend/cpu/compute/Int8FunctionsOpt.cpp.

Environment

  • Docker: Ubuntu 24.04 + gcc-13-riscv64-linux-gnu
  • CMake flags: -DCMAKE_TOOLCHAIN_FILE=riscv64.cmake -DMNN_USE_RVV=ON

Error Categories

1. Undeclared variable offset (lines 1837, 1880, 1923, 1966, 2012, 2057, 2097)

The Binary Int8 fallback functions use offset but it is never defined:

const int maxValue = static_cast<int32_t>(params->maxValue) + offset; // 'offset' undeclared

2. Wrong variable names: inputData0/inputData1/outputData (lines 1837-2080)

Functions use inputData0, inputData1, outputData but the actual parameter names are inputRaw0, inputRaw1, outputRaw:

float inp0 = static_cast<int32_t>(inputData0[i] - ...); // should be inputRaw0
outputData[i] = value; // should be outputRaw

3. Undeclared dstPtr/srcPtr in MNNInt8ToUInt8 (lines 2102-2103)

auto dstZ       = dstPtr + planeNumber * pack * z;  // 'dstPtr' undeclared
const auto srcZ = srcPtr + planeNumber * pack * z;  // 'srcPtr' undeclared

4. Undeclared srcInt8 in MNNSumByAxisLForMatmul_A (lines 2452, 2478)

const auto src_x = srcInt8 + k * (step * LP * blockSizeQuad * kernelxy); // 'srcInt8' undeclared
srcInt8 += col_buffer_unit_size; // 'srcInt8' undeclared

5. Missing semicolon (line 2616)

core->MNNSumByAxisLForMatmul_A = MNNSumByAxisLForMatmul_A  // missing ';'
gCoreFunc = core;

Root Cause

These appear to be copy-paste errors — code was copied from another context where variables had different names (inputData0 vs inputRaw0, dstPtr vs actual parameter names, etc.) and was not updated to match the current function signatures.

Suggestion

Please fix the variable naming throughout the fallback implementations and verify with -DMNN_USE_RVV=ON before re-submitting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants