diff --git a/Deployment/Quant_guide.md b/Deployment/Quant_guide.md index e76a7566..7442e724 100644 --- a/Deployment/Quant_guide.md +++ b/Deployment/Quant_guide.md @@ -1,4 +1,4 @@ -Deep learning models are typically trained with floating point data but they can quantized into integers during inference without any loss of performance (i.e. accuracy). Quantizing models includes quantizing both the weights and activation data (or layer input/outputs). In this work, we quantize the floating point weights/activation data to [Qm.n format](https://en.wikipedia.org/wiki/Q_(number_format)), where m,n are fixed within a layer but can vary across different network layers. +Deep learning models are typically trained with floating point data but they can be quantized into integers during inference without any loss of performance (i.e. accuracy). Quantizing models includes quantizing both the weights and activation data (or layer input/outputs). In this work, we quantize the floating point weights/activation data to [Qm.n format](https://en.wikipedia.org/wiki/Q_(number_format)), where m,n are fixed within a layer but can vary across different network layers. ## Quantize weights Quantizing weights is fairly simple, as the weights are fixed after the training and we know their min/max range. Using these ranges, the weights are quantized or discretized to 256 levels. Here is the code snippet for quantizing the weights and biases to 8-bit integers.