You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: unstable_source/openvino_quantizer.rst
+9-9Lines changed: 9 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -118,29 +118,29 @@ After we capture the FX Module to be quantized, we will import the OpenVINOQuant
118
118
119
119
.. code-block:: python
120
120
121
-
from nncf.experimental.torch.fx import OpenVINOQuantizer
121
+
from executorch.backends.openvino.quantizer import OpenVINOQuantizer
122
+
from executorch.backends.openvino.quantizer import QuantizationMode
122
123
123
124
quantizer = OpenVINOQuantizer()
124
125
125
126
``OpenVINOQuantizer`` has several optional parameters that allow tuning the quantization process to get a more accurate model.
126
127
Below is the list of essential parameters and their description:
127
128
128
129
129
-
* ``preset`` - defines quantization scheme for the model. Two types of presets are available:
130
+
* ``mode`` - defines quantization scheme for the model. Multiple modes are supported:
130
131
131
-
* ``PERFORMANCE`` (default) - defines symmetric quantization of weights and activations
132
+
* ``INT8_SYM`` (default) - defines symmetric quantization of weights and activations. This is the best for performance
132
133
133
-
* ``MIXED`` - weights are quantized with symmetric quantization and the activations are quantized with asymmetric quantization. This preset is recommended for models with non-ReLU and asymmetric activation functions, e.g. ELU, PReLU, GELU, etc.
134
+
* ``INT8_MIXED`` - weights are quantized with symmetric quantization and the activations are quantized with asymmetric quantization. This preset is recommended for models with non-ReLU and asymmetric activation functions, e.g. ELU, PReLU, GELU, etc.
* ``INT8_TRANSFORMER`` - special quantization scheme to preserve accuracy after quantization of Transformer models (BERT, Llama, etc.). None is default, i.e. no specific scheme is defined.
138
137
139
-
* ``model_type`` - used to specify quantization scheme required for specific type of the model. Transformer is the only supported special quantization scheme to preserve accuracy after quantization of Transformer models (BERT, Llama, etc.). None is default, i.e. no specific scheme is defined.
138
+
* ``INT8WO_SYM``, ``INT8WO_ASYM``, ``INT4WO_SYM``, ``INT4WO_ASYM`` - these are weights-only quantization schemes. They apply vanilla min-max quantization to model weights to INT8/INT4 with Symmetric and Asymmetric schemes.
* ``ignored_scope`` - this parameter can be used to exclude some layers from the quantization process to preserve the model accuracy. For example, when you want to exclude the last layer of the model from quantization. Below are some examples of how to use this parameter:
0 commit comments