Add AimetQuantization pass documentation (microsoft#2193)

michaelgtuttle · web-flow · commit 4021ed13af4d · 2025-10-06T09:31:03.000-07:00
## Describe your changes

Add documentation for AimetQuantization pass

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.

## (Optional) Issue link

Signed-off-by: Michael Tuttle &lt;mtuttle@qti.qualcomm.com&gt;
diff --git a/docs/source/features/quantization.md b/docs/source/features/quantization.md
@@ -198,3 +198,87 @@ Olive consolidates the NVIDIA TensorRT Model Optimizer-Windows quantization into
 ```
 
 Please refer to [Phi3.5 example](https://github.com/microsoft/olive-recipes/tree/main/microsoft-Phi-3.5-mini-instruct/NvTensorRtRtx)  for usability and setup details.
+
+
+## Quantize with AI Model Efficiency Toolkit
+Olive supports quantizing models with Qualcomm's [AI Model Efficiency Toolkit](https://github.com/quic/aimet) (AIMET).
+
+AIMET is a software toolkit for quantizing trained ML models to optimize deployment on edge devices such as mobile phones or laptops. AIMET employs post-training and fine-tuning techniques to minimize accuracy loss during quantization.
+
+Olive consolidates AIMET quantization into a single pass called AimetQuantization which supports LPBQ, SeqMSE, and AdaRound. Multiple techniques can be applied in a single pass by listing them in the techniques array. If no techniques are specified, AIMET applies basic static quantization to the model using the provided data.
+
+| Technique                      | Description                                                                 |
+|--------------------------------|-----------------------------------------------------------------------------|
+| **LPBQ**     | An alternative to blockwise quantization which allows backends to leverage existing per-channel quantization kernels while significantly improving encoding granularity. |
+| **SeqMSE**   | Optimizes the weight encodings of each layer of a model to minimize the difference between the layer's original and quantized outputs. |
+| **AdaRound** | Tunes the rounding direction for quantized model weights to minimize the local quantization error at each layer output. |
+
+### Example Configuration
+
+```json
+{
+    "type": "AimetQuantization",
+    "data_config": "calib_data_config"
+}
+```
+
+#### LPBQ
+
+Configurations:
+
+- `block_size`: Number of input channels to group in each block (default: `64`).
+- `op_types`: List of operator types for which to enable LPBQ (default: `["Gemm", "MatMul", "Conv"]`).
+- `nodes_to_exclude`: List of node names to exclude from LPBQ weight quantization (default: `None`)
+
+
+```json
+{
+    "type": "AimetQuantization",
+    "data_config": "calib_data_config",
+    "techniques": [
+        {"name": "lpbq", "block_size": 64}
+    ]
+}
+```
+
+#### SeqMSE
+
+Configurations:
+
+
+- `data_config`: Data config to use for SeqMSE optimization. Defaults to calibration set if not specified.
+- `num_candidates`: Number of encoding candidates to sweep for each weight (default: `20`).
+
+
+```json
+{
+    "type": "AimetQuantization",
+    "data_config": "calib_data_config",
+    "precision": "int4",
+    "techniques": [
+        {"name": "seqmse", "num_candidates": 20}
+    ]
+}
+```
+
+#### AdaRound
+
+Configurations:
+
+- `num_iterations`: Number of optimization steps to take for each layer (default: `10000`). Recommended value is
+                10K for weight bitwidths >= 8-bits, 15K for weight bitwidths < 8 bits.
+- `nodes_to_exclude`: List of node names to exclude from AdaRound optimization (default: `None`).
+
+
+```json
+{
+    "type": "AimetQuantization",
+    "data_config": "calib_data_config",
+    "techniques": [
+        {"name": "adaround", "num_iterations": 10000, "nodes_to_exclude": ["/lm_head/MatMul"]}
+    ]
+}
+```
+
+Please refer to [AimetQuantization](aimet_quantization) for more details about the pass and its config parameters.
+
diff --git a/docs/source/reference/pass.rst b/docs/source/reference/pass.rst
@@ -194,6 +194,13 @@ ModelBuilder
 ------------
 .. autoconfigclass:: olive.passes.ModelBuilder
 
+.. _aimet_quantization:
+
+AimetQuantization
+-----------------
+
+.. autoconfigclass:: olive.passes.AimetQuantization
+
 Pytorch
 =================================