RVV1.0 Dequantize Layer#6658
Conversation
c4a48de to
c3465db
Compare
c3465db to
ca3e8c2
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6658 +/- ##
========================================
Coverage 95.85% 95.85%
========================================
Files 934 936 +2
Lines 312692 312319 -373
========================================
- Hits 299720 299371 -349
+ Misses 12972 12948 -24 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
|
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 01fa3a135c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
Adds a RISC-V (RVV 1.0) optimized implementation of the Dequantize layer as part of the int8 optimization work, including optional fp16-storage support, and adjusts the unit test suite behavior on RISC-V.
Changes:
- Added RVV-accelerated
Dequantize_riscvforward path for fp32 output. - Added
NCNN_ZFHfp16-storage forward path forDequantize_riscv. - Disabled the pack8 dequantize test case on RISC-V builds.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| tests/test_dequantize.cpp | Skips the pack8 dequantize test on __riscv. |
| src/layer/riscv/dequantize_riscv.h | Declares the RISC-V optimized Dequantize layer class and fp16-storage hook. |
| src/layer/riscv/dequantize_riscv.cpp | Implements RVV-based dequantization to fp32 and dispatch to fp16-storage when enabled. |
| src/layer/riscv/dequantize_riscv_zfh.cpp | Implements fp16-storage output path (optionally using RVV). |
Comments suppressed due to low confidence (2)
src/layer/riscv/dequantize_riscv.cpp:94
- Similarly,
_biasis only initialized forbias_data.w == 1orelempack == vlm1. Ifbias_data.w > 1andelempackis notvlm1,_biasis used uninitialized invfmacc, leading to undefined behavior and wrong outputs for per-channel bias with other pack widths. Add initialization + pack-width handling consistent with the_scalelogic.
#if __riscv_vector
vfloat32m8_t _bias;
if (bias_data.w == 1)
{
_bias = __riscv_vfmv_v_f_f32m8(bias, __riscv_vsetvlmax_e32m8());
}
else if (elempack == vlm1)
{
vfloat32m1_t _b = __riscv_vle32_v_f32m1(bias_data, vlm1);
_bias = __riscv_vcreate_v_f32m1_f32m8(_b, _b, _b, _b, _b, _b, _b, _b);
}
int n = size;
while (n > 0)
{
size_t vl = __riscv_vsetvl_e32m8(n);
vfloat32m8_t _v = __riscv_vfcvt_f_x_v_f32m8(__riscv_vle32_v_i32m8(intptr, vl), vl);
_v = __riscv_vfmacc_vv_f32m8(_bias, _v, _scale, vl);
__riscv_vse32_v_f32m8(ptr, _v, vl);
src/layer/riscv/dequantize_riscv_zfh.cpp:80
- In the RVV fp16-storage path,
_biasis also potentially used uninitialized whenbias_data.w > 1andelempackis notvlm1. Add initialization and support for other pack widths so per-channel bias works for all packed layouts that can reach this code.
#if __riscv_vector
vfloat32m8_t _bias;
if (bias_data.w == 1)
{
_bias = __riscv_vfmv_v_f_f32m8(bias, __riscv_vsetvlmax_e32m8());
}
else if (elempack == vlm1)
{
vfloat32m1_t _b = __riscv_vle32_v_f32m1(bias_data, vlm1);
_bias = __riscv_vcreate_v_f32m1_f32m8(_b, _b, _b, _b, _b, _b, _b, _b);
}
int n = size;
while (n > 0)
{
size_t vl = __riscv_vsetvl_e16m4(n);
vfloat32m8_t _v = __riscv_vfcvt_f_x_v_f32m8(__riscv_vle32_v_i32m8(intptr, vl), vl);
_v = __riscv_vfmacc_vv_f32m8(_bias, _v, _scale, vl);
__riscv_vse16_v_f16m4(ptr, __riscv_vfncvt_f_f_w_f16m4(_v, vl), vl);
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ee2084d509
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 27a556049e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Thanks for your contribution ! |
Road to int8 optimization Episode 2: Dequantize Layer