Feature/rvv support cpu by Sherlockzhangjinge · Pull Request #4425 · alibaba/MNN

Sherlockzhangjinge · 2026-05-06T06:13:52Z

Description

Optimize functions below with RVV:
MNNBinaryMaxInt8
MNNBinaryMinInt8
MNNBinaryMulInt8
MNNBinarySqdInt8
MNNBinarySubInt8
_ArmBasicMNNPackC4ForMatMul_A
_MNNPackC4Int8ForMatMul_ASparse
MNNScaleAndAddBiasInt8
MNNSumByAxisLForMatmul_A

Module

CPU

Type

Checklist

Commit message follows [Module:Type] Description format
Code compiles without errors
Tested on relevant platform(s)
No unrelated format or style changes included

MNNBinaryAddInt8 / Sub / Mul / Min / Max / SqdInt8 MNNPackC4Int8ForMatMul_ASparse MNNScaleAndAddBiasInt8 MNNSumByAxisLForMatmul_A Signed-off-by: Jixingguang <1955992348@qq.com> Signed-off-by: Sherlockzhangjinge <zjgzhangjinge@outlook.com> Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn>

Signed-off-by: Jixingguang <1955992348@qq.com> Signed-off-by: Sherlockzhangjinge <zjgzhangjinge@outlook.com> Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn>

wangzhaode

Thanks for adding RVV support for the Int8 operations! Here are some issues that need to be addressed before merging:

1. 🔴 Missing semicolon — compilation error
In Int8FunctionsOpt.cpp, the function registration is missing a semicolon:

core->MNNSumByAxisLForMatmul_A = MNNSumByAxisLForMatmul_A  // <-- missing ';'
#ifdef MNN_USE_SPARSE_COMPUTE

This will cause a compilation error.

2. 🔴 #elif MNN_USE_RVV should be #elif defined(MNN_USE_RVV)
Throughout Int8FunctionsOpt.cpp, the conditional branches use #elif MNN_USE_RVV instead of #elif defined(MNN_USE_RVV). The existing codebase uses #ifdef MNN_USE_RVV for consistency. While -DMNN_USE_RVV typically defines the macro as 1, using defined() is the safer and more consistent pattern.

3. 🟡 VLA (Variable Length Array) in C++ — non-standard
In MNNPackC4Int8ForMatMul_ASparse.cpp:

uint32_t src_idx[vl];  // vl is a runtime variable
uint32_t dst_idx[vl];

VLAs are not part of the C++ standard (only C99). While GCC supports them as an extension, this is not portable. Also, computing indices with a scalar loop and then using gather/scatter (vloxei32/vsoxei32) may not provide meaningful speedup over a pure scalar implementation.

4. 🟡 Use shift instead of vdiv in MNNScaleAndAddBiasInt8_RVV

val = __riscv_vdiv_vx_i32m4(val, shift_div, vl);  // shift_div = 1 << mShiftBits

Division by a power of 2 should use arithmetic right shift (vsra) instead of vdiv, as integer division is typically very slow on RVV hardware.

5. 🟡 Missing performance benchmarks
Unlike PR #4426, this PR doesn't include any performance comparison data. It would be very helpful to show before/after numbers for the optimized functions, especially since some implementations (like the sparse packing) may not yield significant speedup.

6. 🟡 All checklist items are unchecked
Please verify and check the relevant items in the PR checklist (commit message format, compilation, testing, etc.).

7. Minor: All new files are missing the trailing newline at end of file (POSIX requirement).

Overall the RVV intrinsic usage looks solid for the binary operations. Please fix the compilation errors and address the other concerns. Thanks!

…h "#elif defined(MNN_USE_RVV)" in Int8FunctionsOpt.cpp, Signed-off-by: Jixingguang <1955992348@qq.com> Signed-off-by: Sherlockzhangjinge <zjgzhangjinge@outlook.com> Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn>

Sherlockzhangjinge · 2026-05-12T15:28:23Z

performance benchmarks are presented below:

performance

	scalar							rvv
	duration_time/s	cycles	instructions	cache-references	cache-misses	branches	branch-misses	duration_time	cycles	instructions	cache-references	cache-misses	branches	branch-misses
MNNBinaryAddInt8	66.53	172,903,860,156	349,852,703,454	680,846,066	331,931,392	42,397,515,963	636,166	12.06	31,330,033,068	19,418,133,938	666,854,980	221,998,791	686,983,222	307,455
MNNBinarySubInt8	66.48	172,763,396,099	349,855,095,439	680,914,625	330,409,288	42,397,127,206	681,841	12.04	31,278,118,708	19,419,414,145	667,090,046	206,141,287	687,082,496	330,229
MNNBinaryMulInt8	203.10	480,912,544,242	375,418,956,351	1,754,654,965	378,504,360	43,481,933,534	7,864,684,046	12.09	31,415,867,812	19,414,891,024	666,894,622	192,042,720	687,005,444	314,552
MNNBinaryMinInt8	134.31	348,721,335,260	375,598,884,514	1,779,000,265	344,254,891	53,043,679,856	8,201,427,052	21.68	56,346,829,053	20,765,124,442	668,407,916	214,227,216	688,709,028	374,613
MNNBinaryMaxInt8	101.91	264,218,429,256	314,743,657,531	1,985,868,509	349,306,085	53,128,780,858	7,883,763,253	21.55	55,994,757,129	20,767,992,937	668,610,267	250,477,984	688,829,526	423,480
MNNBinarySqdInt8	114.14	296,659,717,650	368,717,655,903	1,934,702,827	331,174,957	42,404,845,000	3,176,002,037	12.52	32,545,006,223	20,083,250,785	666,872,870	193,599,975	687,168,233	325,492
_ArmBasicMNNPackC4ForMatMul_A	0.61	1,551,709,009	1,435,828,038	37,703,094	199,039	292,163,779	7,182,908	0.63	1,601,333,949	1,369,221,821	30,564,334	198,645	178,398,338	1,258,289
MNNPackC4Int8ForMatMul_ASparse	2.38	6,133,501,634	8,696,697,664	1,399,052,972	297,685	756,538,167	1,265,049	5.79	14,974,071,109	10,523,703,951	793,246,162	456,525	508,815,102	2,002,872
MNNSumByAxisLForMatmul_A	0.14	355,188,776	634,547,779	186,471	31,358	126,589,595	683,611	0.44	1,142,175,936	492,564,678	230,155	38,927	64,745,342	508,400
MNNScaleAndAddBiasInt8	4.53	11,761,995,284	13,194,406,480	26,196,075	132,992	2,030,835,011	378,039,067	1.30	3,361,665,880	1,420,243,319	13,887,336	60,642	84,232,891	125,620

wangzhaode · 2026-05-18T05:04:33Z

Build Failure: RVV compilation breaks with 53 errors

I tested this PR with RISC-V RVV cross-compilation (-DMNN_USE_RVV=ON) using riscv64-linux-gnu-g++-13 on Ubuntu 24.04. The build fails completely with 53 errors in source/backend/cpu/compute/Int8FunctionsOpt.cpp.

Environment

Docker: Ubuntu 24.04 + gcc-13-riscv64-linux-gnu
CMake flags: -DCMAKE_TOOLCHAIN_FILE=riscv64.cmake -DMNN_USE_RVV=ON

Error Categories

1. Undeclared variable `offset` (lines 1837, 1880, 1923, 1966, 2012, 2057, 2097)

The Binary Int8 fallback functions use offset but it is never defined:

const int maxValue = static_cast<int32_t>(params->maxValue) + offset; // 'offset' undeclared

2. Wrong variable names: `inputData0`/`inputData1`/`outputData` (lines 1837-2080)

Functions use inputData0, inputData1, outputData but the actual parameter names are inputRaw0, inputRaw1, outputRaw:

float inp0 = static_cast<int32_t>(inputData0[i] - ...); // should be inputRaw0
outputData[i] = value; // should be outputRaw

3. Undeclared `dstPtr`/`srcPtr` in MNNInt8ToUInt8 (lines 2102-2103)

auto dstZ       = dstPtr + planeNumber * pack * z;  // 'dstPtr' undeclared
const auto srcZ = srcPtr + planeNumber * pack * z;  // 'srcPtr' undeclared

4. Undeclared `srcInt8` in MNNSumByAxisLForMatmul_A (lines 2452, 2478)

const auto src_x = srcInt8 + k * (step * LP * blockSizeQuad * kernelxy); // 'srcInt8' undeclared
srcInt8 += col_buffer_unit_size; // 'srcInt8' undeclared

5. Missing semicolon (line 2616)

core->MNNSumByAxisLForMatmul_A = MNNSumByAxisLForMatmul_A  // missing ';'
gCoreFunc = core;

Root Cause

These appear to be copy-paste errors — code was copied from another context where variables had different names (inputData0 vs inputRaw0, dstPtr vs actual parameter names, etc.) and was not updated to match the current function signatures.

Suggestion

Please fix the variable naming throughout the fallback implementations and verify with -DMNN_USE_RVV=ON before re-submitting.

Sherlockzhangjinge and others added 3 commits May 6, 2026 12:13

[CPU:Feature] register optimized function

fb7e9f6

Signed-off-by: Jixingguang <1955992348@qq.com> Signed-off-by: Sherlockzhangjinge <zjgzhangjinge@outlook.com> Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn>

[CPU:feature] hotfix :inappropriate insertion

efbe945

Signed-off-by: Jixingguang <1955992348@qq.com> Signed-off-by: Sherlockzhangjinge <zjgzhangjinge@outlook.com> Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn>

wangzhaode self-assigned this May 9, 2026

wangzhaode requested changes May 9, 2026

View reviewed changes

Sherlockzhangjinge added 2 commits May 11, 2026 13:00

Merge branch 'master' into feature/rvv_support_cpu

6bacb83

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/rvv support cpu#4425

Feature/rvv support cpu#4425
Sherlockzhangjinge wants to merge 5 commits into
alibaba:masterfrom
Sherlockzhangjinge:feature/rvv_support_cpu

Sherlockzhangjinge commented May 6, 2026 •

edited

Loading

Uh oh!

wangzhaode left a comment

Uh oh!

Sherlockzhangjinge commented May 12, 2026

Uh oh!

wangzhaode commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Sherlockzhangjinge commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Module

Type

Checklist

Uh oh!

wangzhaode left a comment

Choose a reason for hiding this comment

Uh oh!

Sherlockzhangjinge commented May 12, 2026

performance

Uh oh!

wangzhaode commented May 18, 2026

Build Failure: RVV compilation breaks with 53 errors

Environment

Error Categories

1. Undeclared variable offset (lines 1837, 1880, 1923, 1966, 2012, 2057, 2097)

2. Wrong variable names: inputData0/inputData1/outputData (lines 1837-2080)

3. Undeclared dstPtr/srcPtr in MNNInt8ToUInt8 (lines 2102-2103)

4. Undeclared srcInt8 in MNNSumByAxisLForMatmul_A (lines 2452, 2478)

5. Missing semicolon (line 2616)

Root Cause

Suggestion

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Sherlockzhangjinge commented May 6, 2026 •

edited

Loading

1. Undeclared variable `offset` (lines 1837, 1880, 1923, 1966, 2012, 2057, 2097)

2. Wrong variable names: `inputData0`/`inputData1`/`outputData` (lines 1837-2080)

3. Undeclared `dstPtr`/`srcPtr` in MNNInt8ToUInt8 (lines 2102-2103)

4. Undeclared `srcInt8` in MNNSumByAxisLForMatmul_A (lines 2452, 2478)