tools:Move Embed/MHA/RNN/LSTM/GRU weight scale generation to ncnn2table by Roundaboutt · Pull Request #6688 · Tencent/ncnn

Roundaboutt · 2026-04-20T06:08:01Z

Description

This PR moves static weight scale generation for several non-convolution layers from ncnn2int8 to ncnn2table, following the same table-driven workflow already used by other quantized layers.

Changes

Add Embed and MultiHeadAttention weight scale generation to ncnn2table
Add RNN, LSTM, and GRU weight scale generation to ncnn2table
Update ncnn2int8 to read these scales from the calibration table instead of recomputing them locally
Make calibration dataset optional for models that only need static weight scales and do not require activation calibration
Keep SDPA unchanged, since it uses dynamic activation quantization in forward_int8

Test

using minimal RNN,LSTM,GRU,Eembed-Attn network to test:

Eembed-Attn

quantized param files:

7767517
3 3
Input                    in0                      0 1 in0
Embed                    embed_0                  1 1 in0 1 0=8 1=16 3=128 18=2
MultiHeadAttention       attention_1              1 1 1 out0 0=8 1=2 2=64 3=8 4=8 6=5.000000e-01 18=2

precision analysis:

fp32 model : tiny_embed_attn.ncnn.param/.bin
int8 model : tiny_embed_attn_int8.ncnn.param/.bin
samples    : 100
seq_len    : 4
input_size : 8
seed       : 0

overall metrics
  max_abs  = 0.00712827
  mean_abs = 0.00212720
  rmse     = 0.00247913

RNN

quantized param files:

7767517
3 3
Input                    in0                      0 1 in0
RNN                      rnn_1                    1 1 in0 1 0=8 1=64 8=2
Gemm                     gemm_0                   1 1 1 out0 3=1 5=1 6=1 7=4 8=4 9=8 10=4 18=2

precision analysis:

fp32 model : tiny_rnn.ncnn.param/.bin
int8 model : tiny_rnn_int8.ncnn.param/.bin
samples    : 100
seq_len    : 4
input_size : 8
seed       : 0

overall metrics
  max_abs  = 0.04329279
  mean_abs = 0.00797669
  rmse     = 0.01239488

GRU

quantized param files:

7767517
3 3
Input                    in0                      0 1 in0
GRU                      gru_1                    1 1 in0 1 0=8 1=192 8=2
Gemm                     gemm_0                   1 1 1 out0 3=1 5=1 6=1 7=4 8=4 9=8 10=4 18=2

precision analysis:

fp32 model : tiny_gru.ncnn.param/.bin
int8 model : tiny_gru_int8.ncnn.param/.bin
samples    : 100
seq_len    : 4
input_size : 8
seed       : 0

overall metrics
  max_abs  = 0.00559735
  mean_abs = 0.00107971
  rmse     = 0.00136703

LSTM

quantized param files:

7767517
3 3
Input                    in0                      0 1 in0
LSTM                     lstm_1                   1 1 in0 1 0=8 1=256 3=8 8=2
Gemm                     gemm_0                   1 1 1 out0 3=1 5=1 6=1 7=4 8=4 9=8 10=4 18=2

precision analysis:

fp32 model : tiny_lstm.ncnn.param/.bin
int8 model : tiny_lstm_int8.ncnn.param/.bin
samples    : 100
seq_len    : 4
input_size : 8
seed       : 0

overall metrics
  max_abs  = 0.00386286
  mean_abs = 0.00055465
  rmse     = 0.00072828

…from the table in ncnn2int8

nihui · 2026-04-27T02:48:39Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fe827598da

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Roundaboutt · 2026-05-05T12:05:25Z

@codex review

This identical approach is used in previous functions like quantize_convolution():

        if (iter == weight_int8scale_table.end())
        {
            fprintf(stderr, "this layer need to be quantized, but no scale param!\n");
            return -1;
        }

Since the main function doesn't check for this return value, I'm not entirely sure if it's a minor bug. Therefore, I decided to preserve the original implementation.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fe827598da

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copilot

Pull request overview

This PR extends the int8 calibration-table workflow so that static weight scale generation for non-convolution layers (Embed, MultiHeadAttention, RNN/LSTM/GRU) is produced by ncnn2table and then consumed by ncnn2int8, while also allowing ncnn2table to run without a calibration dataset when only static weight scales are needed.

Changes:

Add static weight scale generation + table serialization for Embed, MultiHeadAttention, RNN, LSTM, and GRU in ncnn2table.
Update ncnn2int8 to read these weight scales from the calibration table (instead of recomputing).
Update documentation and ncnn2table CLI parsing to make the calibration dataset optional for models without conv/activation calibration needs.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
tools/quantize/ncnn2table.cpp	Detect Embed/MHA/RNN/LSTM/GRU layers, generate and save their weight scales, and make dataset arguments optional when activation calibration isn’t needed.
tools/quantize/ncnn2int8.cpp	Switch recurrent/attention/embed weight quantization to consume per-layer scale entries from the table.
docs/how-to-use-and-FAQ/quantized-int8-inference.md	Document the dataset-less table generation flow for static-weight-only models.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tencent-adm · 2026-05-18T09:02:45Z

Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ Roundaboutt
❌ nihui
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (4)

tools/quantize/ncnn2int8.cpp:421

Same as above: the missing-scale error message is too generic. Please include layer name/type and the specific missing key (param_0/param_1) so users can fix or regenerate the calibration table correctly.

        char key_xc[256];
        snprintf(key_xc, 256, "%s_param_0", layers[i]->name.c_str());
        std::map<std::string, ncnn::Mat>::iterator iter_xc = weight_int8scale_table.find(key_xc);
        if (iter_xc == weight_int8scale_table.end())
        {
            fprintf(stderr, "this layer need to be quantized, but no scale param!\n");
            return -1;
        }

tools/quantize/ncnn2int8.cpp:503

Same as above: the missing-scale error message is too generic. Please include layer name/type and the specific missing key (param_0/param_1) so users can fix or regenerate the calibration table correctly.

        char key_xc[256];
        snprintf(key_xc, 256, "%s_param_0", layers[i]->name.c_str());
        std::map<std::string, ncnn::Mat>::iterator iter_xc = weight_int8scale_table.find(key_xc);
        if (iter_xc == weight_int8scale_table.end())
        {
            fprintf(stderr, "this layer need to be quantized, but no scale param!\n");
            return -1;
        }

        char key_hc[256];
        snprintf(key_hc, 256, "%s_param_1", layers[i]->name.c_str());
        std::map<std::string, ncnn::Mat>::iterator iter_hc = weight_int8scale_table.find(key_hc);
        if (iter_hc == weight_int8scale_table.end())
        {
            fprintf(stderr, "this layer need to be quantized, but no scale param!\n");
            return -1;
        }

tools/quantize/ncnn2int8.cpp:567

Same as above: the missing-scale error message is too generic. Please include the layer name/type and expected key so users can determine which table entry is required.

        char key[256];
        snprintf(key, 256, "%s_param_0", layers[i]->name.c_str());
        std::map<std::string, ncnn::Mat>::iterator iter = weight_int8scale_table.find(key);
        if (iter == weight_int8scale_table.end())
        {
            fprintf(stderr, "this layer need to be quantized, but no scale param!\n");
            return -1;
        }

tools/quantize/ncnn2int8.cpp:721

Same as above: the missing-scale error message is too generic. Please include the layer name/type and expected key (param_0..param_3) so users can determine which entry is missing from the calibration table.

        char key_q[256];
        snprintf(key_q, 256, "%s_param_0", layers[i]->name.c_str());
        std::map<std::string, ncnn::Mat>::iterator iter_q = weight_int8scale_table.find(key_q);
        if (iter_q == weight_int8scale_table.end())
        {
            fprintf(stderr, "this layer need to be quantized, but no scale param!\n");
            return -1;
        }

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

tools/quantize/ncnn2int8.cpp:747

Same as above: missing-scale failures in the MultiHeadAttention path should report which key is absent (q/k/v/out) and the layer name, rather than the generic "no scale param" message, to help users regenerate/fix their table quickly.

        char key_q[256];
        snprintf(key_q, 256, "%s_param_0", layers[i]->name.c_str());
        std::map<std::string, ncnn::Mat>::iterator iter_q = weight_int8scale_table.find(key_q);
        if (iter_q == weight_int8scale_table.end())
        {
            fprintf(stderr, "this layer need to be quantized, but no scale param!\n");
            return -1;
        }

        char key_k[256];
        snprintf(key_k, 256, "%s_param_1", layers[i]->name.c_str());
        std::map<std::string, ncnn::Mat>::iterator iter_k = weight_int8scale_table.find(key_k);
        if (iter_k == weight_int8scale_table.end())
        {
            fprintf(stderr, "this layer need to be quantized, but no scale param!\n");
            return -1;
        }

        char key_v[256];
        snprintf(key_v, 256, "%s_param_2", layers[i]->name.c_str());
        std::map<std::string, ncnn::Mat>::iterator iter_v = weight_int8scale_table.find(key_v);
        if (iter_v == weight_int8scale_table.end())
        {
            fprintf(stderr, "this layer need to be quantized, but no scale param!\n");
            return -1;
        }

        char key_out[256];
        snprintf(key_out, 256, "%s_param_3", layers[i]->name.c_str());
        std::map<std::string, ncnn::Mat>::iterator iter_out = weight_int8scale_table.find(key_out);
        if (iter_out == weight_int8scale_table.end())
        {
            fprintf(stderr, "this layer need to be quantized, but no scale param!\n");
            return -1;
        }

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7a11ec27a8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

nihui · 2026-05-18T11:38:13Z

Thanks for your contribution !

Roundaboutt added 7 commits April 17, 2026 22:03

tools: add embed weight calibration and int8 quantization

5173dfb

tools: add embed weight calibration and int8 quantization

db9850a

tools:add MutiHeadAttention layers' weight scales in ncnn2table

0207b6b

tools:add weight-only mode without calibration in ncnn2table

08988f2

tools:Change the MultiHeadAttention layer scaling factors to be read …

bd39d13

…from the table in ncnn2int8

complete rnn,gru,lstm layers

94c834a

supplement documents and printing information

43caf20

github-actions Bot added tool doc labels Apr 20, 2026

apply code-format changes

fe82759

chatgpt-codex-connector Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread tools/quantize/ncnn2int8.cpp

chatgpt-codex-connector Bot reviewed May 5, 2026

View reviewed changes

Comment thread tools/quantize/ncnn2int8.cpp

nihui requested a review from Copilot May 7, 2026 11:52

Copilot started reviewing on behalf of nihui May 7, 2026 11:53 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

Comment thread tools/quantize/ncnn2table.cpp Outdated

Comment thread tools/quantize/ncnn2table.cpp Outdated

Comment thread docs/how-to-use-and-FAQ/quantized-int8-inference.md Outdated

Roundaboutt and others added 3 commits May 8, 2026 14:48

Fix the issue of 'const' in ncnn2table and correct docs

943151c

Fix the issue of 'const' in ncnn2table and correct docs

43bec3d

cc

a683628

Merge branch 'master' into opt-quantize-int8

8873d08

nihui requested a review from Copilot May 18, 2026 09:04

Copilot started reviewing on behalf of nihui May 18, 2026 09:05 View session

allow data-free quantize

7a11ec2

Copilot AI reviewed May 18, 2026

View reviewed changes

Comment thread tools/quantize/ncnn2int8.cpp

Comment thread docs/how-to-use-and-FAQ/quantized-int8-inference.md

Comment thread tools/quantize/ncnn2table.cpp

nihui requested a review from Copilot May 18, 2026 09:22

Copilot started reviewing on behalf of nihui May 18, 2026 09:23 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

Comment thread docs/how-to-use-and-FAQ/quantized-int8-inference.md

Comment thread tools/quantize/ncnn2int8.cpp

chatgpt-codex-connector Bot reviewed May 18, 2026

View reviewed changes

Comment thread tools/quantize/ncnn2table.cpp

nihui merged commit 3724d10 into Tencent:master May 18, 2026
26 of 27 checks passed

Conversation

Roundaboutt commented Apr 20, 2026

Description

Changes

Test

Eembed-Attn

RNN

GRU

LSTM

Uh oh!

nihui commented Apr 27, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Roundaboutt commented May 5, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tencent-adm commented May 18, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

nihui commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants