You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Describe your changes
Add documentation for AimetQuantization pass
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
## (Optional) Issue link
Signed-off-by: Michael Tuttle <mtuttle@qti.qualcomm.com>
Copy file name to clipboardExpand all lines: docs/source/features/quantization.md
+84Lines changed: 84 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -198,3 +198,87 @@ Olive consolidates the NVIDIA TensorRT Model Optimizer-Windows quantization into
198
198
```
199
199
200
200
Please refer to [Phi3.5 example](https://github.com/microsoft/olive-recipes/tree/main/microsoft-Phi-3.5-mini-instruct/NvTensorRtRtx) for usability and setup details.
201
+
202
+
203
+
## Quantize with AI Model Efficiency Toolkit
204
+
Olive supports quantizing models with Qualcomm's [AI Model Efficiency Toolkit](https://github.com/quic/aimet) (AIMET).
205
+
206
+
AIMET is a software toolkit for quantizing trained ML models to optimize deployment on edge devices such as mobile phones or laptops. AIMET employs post-training and fine-tuning techniques to minimize accuracy loss during quantization.
207
+
208
+
Olive consolidates AIMET quantization into a single pass called AimetQuantization which supports LPBQ, SeqMSE, and AdaRound. Multiple techniques can be applied in a single pass by listing them in the techniques array. If no techniques are specified, AIMET applies basic static quantization to the model using the provided data.
|**LPBQ**| An alternative to blockwise quantization which allows backends to leverage existing per-channel quantization kernels while significantly improving encoding granularity. |
213
+
|**SeqMSE**| Optimizes the weight encodings of each layer of a model to minimize the difference between the layer's original and quantized outputs. |
214
+
|**AdaRound**| Tunes the rounding direction for quantized model weights to minimize the local quantization error at each layer output. |
215
+
216
+
### Example Configuration
217
+
218
+
```json
219
+
{
220
+
"type": "AimetQuantization",
221
+
"data_config": "calib_data_config"
222
+
}
223
+
```
224
+
225
+
#### LPBQ
226
+
227
+
Configurations:
228
+
229
+
-`block_size`: Number of input channels to group in each block (default: `64`).
230
+
-`op_types`: List of operator types for which to enable LPBQ (default: `["Gemm", "MatMul", "Conv"]`).
231
+
-`nodes_to_exclude`: List of node names to exclude from LPBQ weight quantization (default: `None`)
232
+
233
+
234
+
```json
235
+
{
236
+
"type": "AimetQuantization",
237
+
"data_config": "calib_data_config",
238
+
"techniques": [
239
+
{"name": "lpbq", "block_size": 64}
240
+
]
241
+
}
242
+
```
243
+
244
+
#### SeqMSE
245
+
246
+
Configurations:
247
+
248
+
249
+
-`data_config`: Data config to use for SeqMSE optimization. Defaults to calibration set if not specified.
250
+
-`num_candidates`: Number of encoding candidates to sweep for each weight (default: `20`).
251
+
252
+
253
+
```json
254
+
{
255
+
"type": "AimetQuantization",
256
+
"data_config": "calib_data_config",
257
+
"precision": "int4",
258
+
"techniques": [
259
+
{"name": "seqmse", "num_candidates": 20}
260
+
]
261
+
}
262
+
```
263
+
264
+
#### AdaRound
265
+
266
+
Configurations:
267
+
268
+
-`num_iterations`: Number of optimization steps to take for each layer (default: `10000`). Recommended value is
269
+
10K for weight bitwidths >= 8-bits, 15K for weight bitwidths < 8 bits.
270
+
-`nodes_to_exclude`: List of node names to exclude from AdaRound optimization (default: `None`).
0 commit comments