Skip to content

Commit 2651f23

Browse files
author
Abhijeet
committed
adding proper thresholds and using them from config.yaml file
1 parent 77afaa7 commit 2651f23

10 files changed

Lines changed: 338 additions & 66 deletions

File tree

436 Bytes
Binary file not shown.
14.6 KB
Binary file not shown.

docs/_build/html/_sources/features/auto_quantization.rst.txt

Lines changed: 106 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -92,28 +92,33 @@ the pipeline determines it automatically using binary search:
9292

9393
.. list-table::
9494
:header-rows: 1
95-
:widths: 30 20 20 30
95+
:widths: 25 20 15 20 20
9696

9797
* - Task Type
9898
- Search Range
9999
- Metric
100-
- Tolerance
100+
- Default Tolerance
101+
- Config Key
101102
* - Classification
102103
- ``[4, 8]``
103104
- Accuracy
104-
- 5% drop
105-
* - Anomaly Detection
106-
- ``[4, 8]``
107-
- MSE
108-
- 2x increase
109-
* - Forecasting
110-
- ``[4, 32]``
111-
- SMAPE
112-
- 2x increase
105+
- 5% drop (``0.05``)
106+
- ``autoquant_tolerance_classification``
113107
* - Regression
114108
- ``[4, 12]``
115109
- R²
116-
- 5% drop
110+
- 5% drop (``0.05``)
111+
- ``autoquant_tolerance_regression``
112+
* - Forecasting
113+
- ``[4, 32]``
114+
- SMAPE
115+
- 3× float baseline (``2.0``)
116+
- ``autoquant_tolerance_forecasting``
117+
* - Anomaly Detection
118+
- ``[4, 8]``
119+
- MSE
120+
- 3× float baseline (``2.0``)
121+
- ``autoquant_tolerance_anomaly``
117122

118123
At each candidate average bit width, a fast calibration pass (no full
119124
QAT retraining) is run and the metric is checked against the tolerance
@@ -123,6 +128,59 @@ The binary search typically converges in two to three iterations.
123128
If the Hessian estimation fails for any reason, the pipeline falls back
124129
to standard uniform 8-bit QAT automatically.
125130

131+
Tolerance Thresholds
132+
--------------------
133+
134+
The tolerance thresholds control how much metric degradation versus the
135+
float baseline is acceptable during the binary-search calibration. They
136+
are set in ``params.py`` and can be overridden per run in ``config.yaml``
137+
under the ``training`` section.
138+
139+
For accuracy and R², **higher values are better**, so the tolerance is a
140+
fraction representing the maximum allowed *drop* from the float baseline.
141+
The quantized metric must stay above ``float_metric × (1 − tolerance)``.
142+
143+
For SMAPE and MSE, **lower values are better**, so the tolerance is a
144+
value added to ``1.0`` to form a ceiling multiplier. The quantized metric
145+
must stay below ``float_metric × (1 + tolerance)``.
146+
147+
**Classification — autoquant_tolerance_classification (default: 0.05)**
148+
149+
Accuracy is higher-is-better. ``0.05`` means the quantized model's
150+
accuracy may drop by at most **5%** relative to the float model. For
151+
example, if the float model achieves 90% accuracy, the threshold is
152+
``90% × (1 − 0.05) = 85.5%``. Any candidate bit width that pushes
153+
accuracy below that threshold is rejected and the algorithm tries a
154+
higher bit width.
155+
156+
**Regression — autoquant_tolerance_regression (default: 0.05)**
157+
158+
R² is higher-is-better. ``0.05`` means the quantized model's R² may
159+
drop by at most **5%** relative to the float baseline. For example, a
160+
float R² of ``0.95`` sets a threshold of ``0.95 × (1 − 0.05) = 0.9025``.
161+
Regression metrics are highly sensitive to quantization, so keeping
162+
this tight ensures the selected bit width genuinely preserves model
163+
quality.
164+
165+
**Forecasting — autoquant_tolerance_forecasting (default: 2.0)**
166+
167+
SMAPE is lower-is-better. The tolerance is used as an additive factor
168+
to form a ceiling: ``threshold = float_SMAPE × (1 + 2.0) = 3 × float_SMAPE``.
169+
So ``2.0`` means the quantized model's SMAPE may be **at most 3× the
170+
float baseline** before the bit width is rejected. SMAPE is an unbounded
171+
ratio metric, so a multiplicative ceiling is more meaningful than a
172+
fixed fraction. The float SMAPE is recorded at the end of float training
173+
and used as the reference.
174+
175+
**Anomaly Detection — autoquant_tolerance_anomaly (default: 2.0)**
176+
177+
MSE is lower-is-better. The same formula applies:
178+
``threshold = float_MSE × (1 + 2.0) = 3 × float_MSE``. So ``2.0``
179+
means the quantized model's reconstruction MSE may be **at most 3× the
180+
float baseline** before the bit width is rejected. The absolute MSE
181+
value is dataset-dependent, which is why a multiplier is used rather
182+
than a fixed threshold.
183+
126184
Configuration
127185
-------------
128186

@@ -158,6 +216,42 @@ to ``False``:
158216
parameter is ignored. Bit widths are assigned per-layer by the greedy
159217
algorithm, not set uniformly.
160218

219+
**Overriding tolerance thresholds**
220+
221+
The tolerance thresholds have defaults set in ``params.py`` but can be
222+
overridden in ``config.yaml`` under the ``training`` key. Only the keys
223+
relevant to your task type need to be specified:
224+
225+
.. code-block:: yaml
226+
227+
training:
228+
model_name: 'REGR_13k'
229+
training_epochs: 100
230+
quantization: 2
231+
auto_quantization: True
232+
# Tighten regression tolerance: allow at most 2% R² drop instead of 5%
233+
autoquant_tolerance_regression: 0.02
234+
235+
.. code-block:: yaml
236+
237+
training:
238+
model_name: 'AD_17K'
239+
training_epochs: 100
240+
quantization: 2
241+
auto_quantization: True
242+
# Relax anomaly tolerance: allow up to 4x MSE increase
243+
autoquant_tolerance_anomaly: 3.0
244+
245+
All four keys and their defaults are:
246+
247+
.. code-block:: yaml
248+
249+
training:
250+
autoquant_tolerance_classification: 0.05 # higher-is-better: max 5% accuracy drop vs float
251+
autoquant_tolerance_regression: 0.05 # higher-is-better: max 5% R² drop vs float
252+
autoquant_tolerance_forecasting: 2.0 # lower-is-better: SMAPE must stay below 3× float (1 + 2.0)
253+
autoquant_tolerance_anomaly: 2.0 # lower-is-better: MSE must stay below 3× float (1 + 2.0)
254+
161255
Task-Specific Behaviour
162256
-----------------------
163257

0 commit comments

Comments
 (0)