tools:Move Embed/MHA/RNN/LSTM/GRU weight scale generation to ncnn2table#6688
Conversation
…from the table in ncnn2int8
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fe827598da
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
This identical approach is used in previous functions like quantize_convolution(): Since the main function doesn't check for this return value, I'm not entirely sure if it's a minor bug. Therefore, I decided to preserve the original implementation. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fe827598da
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
This PR extends the int8 calibration-table workflow so that static weight scale generation for non-convolution layers (Embed, MultiHeadAttention, RNN/LSTM/GRU) is produced by ncnn2table and then consumed by ncnn2int8, while also allowing ncnn2table to run without a calibration dataset when only static weight scales are needed.
Changes:
- Add static weight scale generation + table serialization for Embed, MultiHeadAttention, RNN, LSTM, and GRU in
ncnn2table. - Update
ncnn2int8to read these weight scales from the calibration table (instead of recomputing). - Update documentation and
ncnn2tableCLI parsing to make the calibration dataset optional for models without conv/activation calibration needs.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| tools/quantize/ncnn2table.cpp | Detect Embed/MHA/RNN/LSTM/GRU layers, generate and save their weight scales, and make dataset arguments optional when activation calibration isn’t needed. |
| tools/quantize/ncnn2int8.cpp | Switch recurrent/attention/embed weight quantization to consume per-layer scale entries from the table. |
| docs/how-to-use-and-FAQ/quantized-int8-inference.md | Document the dataset-less table generation flow for static-weight-only models. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (4)
tools/quantize/ncnn2int8.cpp:421
- Same as above: the missing-scale error message is too generic. Please include layer name/type and the specific missing key (param_0/param_1) so users can fix or regenerate the calibration table correctly.
char key_xc[256];
snprintf(key_xc, 256, "%s_param_0", layers[i]->name.c_str());
std::map<std::string, ncnn::Mat>::iterator iter_xc = weight_int8scale_table.find(key_xc);
if (iter_xc == weight_int8scale_table.end())
{
fprintf(stderr, "this layer need to be quantized, but no scale param!\n");
return -1;
}
tools/quantize/ncnn2int8.cpp:503
- Same as above: the missing-scale error message is too generic. Please include layer name/type and the specific missing key (param_0/param_1) so users can fix or regenerate the calibration table correctly.
char key_xc[256];
snprintf(key_xc, 256, "%s_param_0", layers[i]->name.c_str());
std::map<std::string, ncnn::Mat>::iterator iter_xc = weight_int8scale_table.find(key_xc);
if (iter_xc == weight_int8scale_table.end())
{
fprintf(stderr, "this layer need to be quantized, but no scale param!\n");
return -1;
}
char key_hc[256];
snprintf(key_hc, 256, "%s_param_1", layers[i]->name.c_str());
std::map<std::string, ncnn::Mat>::iterator iter_hc = weight_int8scale_table.find(key_hc);
if (iter_hc == weight_int8scale_table.end())
{
fprintf(stderr, "this layer need to be quantized, but no scale param!\n");
return -1;
}
tools/quantize/ncnn2int8.cpp:567
- Same as above: the missing-scale error message is too generic. Please include the layer name/type and expected key so users can determine which table entry is required.
char key[256];
snprintf(key, 256, "%s_param_0", layers[i]->name.c_str());
std::map<std::string, ncnn::Mat>::iterator iter = weight_int8scale_table.find(key);
if (iter == weight_int8scale_table.end())
{
fprintf(stderr, "this layer need to be quantized, but no scale param!\n");
return -1;
}
tools/quantize/ncnn2int8.cpp:721
- Same as above: the missing-scale error message is too generic. Please include the layer name/type and expected key (param_0..param_3) so users can determine which entry is missing from the calibration table.
char key_q[256];
snprintf(key_q, 256, "%s_param_0", layers[i]->name.c_str());
std::map<std::string, ncnn::Mat>::iterator iter_q = weight_int8scale_table.find(key_q);
if (iter_q == weight_int8scale_table.end())
{
fprintf(stderr, "this layer need to be quantized, but no scale param!\n");
return -1;
}
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
tools/quantize/ncnn2int8.cpp:747
- Same as above: missing-scale failures in the MultiHeadAttention path should report which key is absent (q/k/v/out) and the layer name, rather than the generic "no scale param" message, to help users regenerate/fix their table quickly.
char key_q[256];
snprintf(key_q, 256, "%s_param_0", layers[i]->name.c_str());
std::map<std::string, ncnn::Mat>::iterator iter_q = weight_int8scale_table.find(key_q);
if (iter_q == weight_int8scale_table.end())
{
fprintf(stderr, "this layer need to be quantized, but no scale param!\n");
return -1;
}
char key_k[256];
snprintf(key_k, 256, "%s_param_1", layers[i]->name.c_str());
std::map<std::string, ncnn::Mat>::iterator iter_k = weight_int8scale_table.find(key_k);
if (iter_k == weight_int8scale_table.end())
{
fprintf(stderr, "this layer need to be quantized, but no scale param!\n");
return -1;
}
char key_v[256];
snprintf(key_v, 256, "%s_param_2", layers[i]->name.c_str());
std::map<std::string, ncnn::Mat>::iterator iter_v = weight_int8scale_table.find(key_v);
if (iter_v == weight_int8scale_table.end())
{
fprintf(stderr, "this layer need to be quantized, but no scale param!\n");
return -1;
}
char key_out[256];
snprintf(key_out, 256, "%s_param_3", layers[i]->name.c_str());
std::map<std::string, ncnn::Mat>::iterator iter_out = weight_int8scale_table.find(key_out);
if (iter_out == weight_int8scale_table.end())
{
fprintf(stderr, "this layer need to be quantized, but no scale param!\n");
return -1;
}
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7a11ec27a8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Thanks for your contribution ! |
Description
This PR moves static weight scale generation for several non-convolution layers from ncnn2int8 to ncnn2table, following the same table-driven workflow already used by other quantized layers.
Changes
Test
using minimal RNN,LSTM,GRU,Eembed-Attn network to test:
Eembed-Attn
quantized param files:
precision analysis:
RNN
quantized param files:
precision analysis:
GRU
quantized param files:
precision analysis:
LSTM
quantized param files:
precision analysis: