Skip to content

Commit ba70b4a

Browse files
authored
Update openvino_quantizer.rst
1 parent 1655d0a commit ba70b4a

1 file changed

Lines changed: 9 additions & 9 deletions

File tree

unstable_source/openvino_quantizer.rst

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -118,29 +118,29 @@ After we capture the FX Module to be quantized, we will import the OpenVINOQuant
118118

119119
.. code-block:: python
120120
121-
from nncf.experimental.torch.fx import OpenVINOQuantizer
121+
from executorch.backends.openvino.quantizer import OpenVINOQuantizer
122+
from executorch.backends.openvino.quantizer import QuantizationMode
122123
123124
quantizer = OpenVINOQuantizer()
124125
125126
``OpenVINOQuantizer`` has several optional parameters that allow tuning the quantization process to get a more accurate model.
126127
Below is the list of essential parameters and their description:
127128

128129

129-
* ``preset`` - defines quantization scheme for the model. Two types of presets are available:
130+
* ``mode`` - defines quantization scheme for the model. Multiple modes are supported:
130131

131-
* ``PERFORMANCE`` (default) - defines symmetric quantization of weights and activations
132+
* ``INT8_SYM`` (default) - defines symmetric quantization of weights and activations. This is the best for performance
132133

133-
* ``MIXED`` - weights are quantized with symmetric quantization and the activations are quantized with asymmetric quantization. This preset is recommended for models with non-ReLU and asymmetric activation functions, e.g. ELU, PReLU, GELU, etc.
134+
* ``INT8_MIXED`` - weights are quantized with symmetric quantization and the activations are quantized with asymmetric quantization. This preset is recommended for models with non-ReLU and asymmetric activation functions, e.g. ELU, PReLU, GELU, etc.
134135

135-
.. code-block:: python
136-
137-
OpenVINOQuantizer(preset=nncf.QuantizationPreset.MIXED)
136+
* ``INT8_TRANSFORMER`` - special quantization scheme to preserve accuracy after quantization of Transformer models (BERT, Llama, etc.). None is default, i.e. no specific scheme is defined.
138137

139-
* ``model_type`` - used to specify quantization scheme required for specific type of the model. Transformer is the only supported special quantization scheme to preserve accuracy after quantization of Transformer models (BERT, Llama, etc.). None is default, i.e. no specific scheme is defined.
138+
* ``INT8WO_SYM``, ``INT8WO_ASYM``, ``INT4WO_SYM``, ``INT4WO_ASYM`` - these are weights-only quantization schemes. They apply vanilla min-max quantization to model weights to INT8/INT4 with Symmetric and Asymmetric schemes.
140139

141140
.. code-block:: python
142141
143-
OpenVINOQuantizer(model_type=nncf.ModelType.Transformer)
142+
OpenVINOQuantizer(mode=QuantizationMode.INT8_SYM)
143+
144144
145145
* ``ignored_scope`` - this parameter can be used to exclude some layers from the quantization process to preserve the model accuracy. For example, when you want to exclude the last layer of the model from quantization. Below are some examples of how to use this parameter:
146146

0 commit comments

Comments
 (0)