Skip to content

Commit 4021ed1

Browse files
Add AimetQuantization pass documentation (microsoft#2193)
## Describe your changes Add documentation for AimetQuantization pass ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Lint and apply fixes to your code by running `lintrunner -a` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link Signed-off-by: Michael Tuttle <mtuttle@qti.qualcomm.com>
1 parent f9537b1 commit 4021ed1

2 files changed

Lines changed: 91 additions & 0 deletions

File tree

docs/source/features/quantization.md

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -198,3 +198,87 @@ Olive consolidates the NVIDIA TensorRT Model Optimizer-Windows quantization into
198198
```
199199

200200
Please refer to [Phi3.5 example](https://github.com/microsoft/olive-recipes/tree/main/microsoft-Phi-3.5-mini-instruct/NvTensorRtRtx) for usability and setup details.
201+
202+
203+
## Quantize with AI Model Efficiency Toolkit
204+
Olive supports quantizing models with Qualcomm's [AI Model Efficiency Toolkit](https://github.com/quic/aimet) (AIMET).
205+
206+
AIMET is a software toolkit for quantizing trained ML models to optimize deployment on edge devices such as mobile phones or laptops. AIMET employs post-training and fine-tuning techniques to minimize accuracy loss during quantization.
207+
208+
Olive consolidates AIMET quantization into a single pass called AimetQuantization which supports LPBQ, SeqMSE, and AdaRound. Multiple techniques can be applied in a single pass by listing them in the techniques array. If no techniques are specified, AIMET applies basic static quantization to the model using the provided data.
209+
210+
| Technique | Description |
211+
|--------------------------------|-----------------------------------------------------------------------------|
212+
| **LPBQ** | An alternative to blockwise quantization which allows backends to leverage existing per-channel quantization kernels while significantly improving encoding granularity. |
213+
| **SeqMSE** | Optimizes the weight encodings of each layer of a model to minimize the difference between the layer's original and quantized outputs. |
214+
| **AdaRound** | Tunes the rounding direction for quantized model weights to minimize the local quantization error at each layer output. |
215+
216+
### Example Configuration
217+
218+
```json
219+
{
220+
"type": "AimetQuantization",
221+
"data_config": "calib_data_config"
222+
}
223+
```
224+
225+
#### LPBQ
226+
227+
Configurations:
228+
229+
- `block_size`: Number of input channels to group in each block (default: `64`).
230+
- `op_types`: List of operator types for which to enable LPBQ (default: `["Gemm", "MatMul", "Conv"]`).
231+
- `nodes_to_exclude`: List of node names to exclude from LPBQ weight quantization (default: `None`)
232+
233+
234+
```json
235+
{
236+
"type": "AimetQuantization",
237+
"data_config": "calib_data_config",
238+
"techniques": [
239+
{"name": "lpbq", "block_size": 64}
240+
]
241+
}
242+
```
243+
244+
#### SeqMSE
245+
246+
Configurations:
247+
248+
249+
- `data_config`: Data config to use for SeqMSE optimization. Defaults to calibration set if not specified.
250+
- `num_candidates`: Number of encoding candidates to sweep for each weight (default: `20`).
251+
252+
253+
```json
254+
{
255+
"type": "AimetQuantization",
256+
"data_config": "calib_data_config",
257+
"precision": "int4",
258+
"techniques": [
259+
{"name": "seqmse", "num_candidates": 20}
260+
]
261+
}
262+
```
263+
264+
#### AdaRound
265+
266+
Configurations:
267+
268+
- `num_iterations`: Number of optimization steps to take for each layer (default: `10000`). Recommended value is
269+
10K for weight bitwidths >= 8-bits, 15K for weight bitwidths < 8 bits.
270+
- `nodes_to_exclude`: List of node names to exclude from AdaRound optimization (default: `None`).
271+
272+
273+
```json
274+
{
275+
"type": "AimetQuantization",
276+
"data_config": "calib_data_config",
277+
"techniques": [
278+
{"name": "adaround", "num_iterations": 10000, "nodes_to_exclude": ["/lm_head/MatMul"]}
279+
]
280+
}
281+
```
282+
283+
Please refer to [AimetQuantization](aimet_quantization) for more details about the pass and its config parameters.
284+

docs/source/reference/pass.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -194,6 +194,13 @@ ModelBuilder
194194
------------
195195
.. autoconfigclass:: olive.passes.ModelBuilder
196196

197+
.. _aimet_quantization:
198+
199+
AimetQuantization
200+
-----------------
201+
202+
.. autoconfigclass:: olive.passes.AimetQuantization
203+
197204
Pytorch
198205
=================================
199206

0 commit comments

Comments
 (0)