Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions docs/how-to-use-and-FAQ/quantized-int8-inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,16 +89,16 @@ filelist_in2.txt
```
**Here shape is WHC, because the order of the arguments to `ncnn::Mat`.**

### 3. Quantize model
ncnn2table can generate static weight scales without a calibration dataset for RNN,GRU,LSTM,MultiHeadAttention and Embed layers

Comment thread
nihui marked this conversation as resolved.
```shell
./ncnn2int8 mobilenet-opt.param mobilenet-opt.bin mobilenet-int8.param mobilenet-int8.bin mobilenet.table
./ncnn2table rnn.param rnn.bin rnn.table method=kl
```

If you don’t need static quantization, ncnn supports RNN/LSTM/GRU dynamic quantization. In this case, you can omit the table file.
### 3. Quantize model

Comment thread
nihui marked this conversation as resolved.
```shell
./ncnn2int8 rnn-model.param rnn-model.bin rnn-model-int8.param rnn-model-int8.bin
./ncnn2int8 mobilenet-opt.param mobilenet-opt.bin mobilenet-int8.param mobilenet-int8.bin mobilenet.table
```

## use ncnn int8 inference
Expand Down
1 change: 1 addition & 0 deletions tools/modelwriter.h
Original file line number Diff line number Diff line change
Expand Up @@ -2057,6 +2057,7 @@ int ModelWriter::save(const char* parampath, const char* binpath)
fprintf_param_value(" 4=%d", vdim)
fprintf_param_value(" 5=%d", attn_mask)
fprintf_param_value(" 6=%e", scale)
fprintf_param_value(" 7=%d", kv_cache)
fprintf_param_value(" 18=%d", int8_scale_term)

fwrite_weight_tag_data(op->q_weight_data, bp);
Expand Down
Loading
Loading