@@ -92,28 +92,33 @@ the pipeline determines it automatically using binary search:
9292
9393.. list-table ::
9494 :header-rows: 1
95- :widths: 30 20 20 30
95+ :widths: 25 20 15 20 20
9696
9797 * - Task Type
9898 - Search Range
9999 - Metric
100- - Tolerance
100+ - Default Tolerance
101+ - Config Key
101102 * - Classification
102103 - ``[4, 8] ``
103104 - Accuracy
104- - 5% drop
105- * - Anomaly Detection
106- - ``[4, 8] ``
107- - MSE
108- - 2x increase
109- * - Forecasting
110- - ``[4, 32] ``
111- - SMAPE
112- - 2x increase
105+ - 5% drop (``0.05 ``)
106+ - ``autoquant_tolerance_classification ``
113107 * - Regression
114108 - ``[4, 12] ``
115109 - R²
116- - 5% drop
110+ - 5% drop (``0.05 ``)
111+ - ``autoquant_tolerance_regression ``
112+ * - Forecasting
113+ - ``[4, 32] ``
114+ - SMAPE
115+ - 3× float baseline (``2.0 ``)
116+ - ``autoquant_tolerance_forecasting ``
117+ * - Anomaly Detection
118+ - ``[4, 8] ``
119+ - MSE
120+ - 3× float baseline (``2.0 ``)
121+ - ``autoquant_tolerance_anomaly ``
117122
118123At each candidate average bit width, a fast calibration pass (no full
119124QAT retraining) is run and the metric is checked against the tolerance
@@ -123,6 +128,59 @@ The binary search typically converges in two to three iterations.
123128If the Hessian estimation fails for any reason, the pipeline falls back
124129to standard uniform 8-bit QAT automatically.
125130
131+ Tolerance Thresholds
132+ --------------------
133+
134+ The tolerance thresholds control how much metric degradation versus the
135+ float baseline is acceptable during the binary-search calibration. They
136+ are set in ``params.py `` and can be overridden per run in ``config.yaml ``
137+ under the ``training `` section.
138+
139+ For accuracy and R², **higher values are better **, so the tolerance is a
140+ fraction representing the maximum allowed *drop * from the float baseline.
141+ The quantized metric must stay above ``float_metric × (1 − tolerance) ``.
142+
143+ For SMAPE and MSE, **lower values are better **, so the tolerance is a
144+ value added to ``1.0 `` to form a ceiling multiplier. The quantized metric
145+ must stay below ``float_metric × (1 + tolerance) ``.
146+
147+ **Classification — autoquant_tolerance_classification (default: 0.05) **
148+
149+ Accuracy is higher-is-better. ``0.05 `` means the quantized model's
150+ accuracy may drop by at most **5% ** relative to the float model. For
151+ example, if the float model achieves 90% accuracy, the threshold is
152+ ``90% × (1 − 0.05) = 85.5% ``. Any candidate bit width that pushes
153+ accuracy below that threshold is rejected and the algorithm tries a
154+ higher bit width.
155+
156+ **Regression — autoquant_tolerance_regression (default: 0.05) **
157+
158+ R² is higher-is-better. ``0.05 `` means the quantized model's R² may
159+ drop by at most **5% ** relative to the float baseline. For example, a
160+ float R² of ``0.95 `` sets a threshold of ``0.95 × (1 − 0.05) = 0.9025 ``.
161+ Regression metrics are highly sensitive to quantization, so keeping
162+ this tight ensures the selected bit width genuinely preserves model
163+ quality.
164+
165+ **Forecasting — autoquant_tolerance_forecasting (default: 2.0) **
166+
167+ SMAPE is lower-is-better. The tolerance is used as an additive factor
168+ to form a ceiling: ``threshold = float_SMAPE × (1 + 2.0) = 3 × float_SMAPE ``.
169+ So ``2.0 `` means the quantized model's SMAPE may be **at most 3× the
170+ float baseline ** before the bit width is rejected. SMAPE is an unbounded
171+ ratio metric, so a multiplicative ceiling is more meaningful than a
172+ fixed fraction. The float SMAPE is recorded at the end of float training
173+ and used as the reference.
174+
175+ **Anomaly Detection — autoquant_tolerance_anomaly (default: 2.0) **
176+
177+ MSE is lower-is-better. The same formula applies:
178+ ``threshold = float_MSE × (1 + 2.0) = 3 × float_MSE ``. So ``2.0 ``
179+ means the quantized model's reconstruction MSE may be **at most 3× the
180+ float baseline ** before the bit width is rejected. The absolute MSE
181+ value is dataset-dependent, which is why a multiplier is used rather
182+ than a fixed threshold.
183+
126184Configuration
127185-------------
128186
@@ -158,6 +216,42 @@ to ``False``:
158216 parameter is ignored. Bit widths are assigned per-layer by the greedy
159217 algorithm, not set uniformly.
160218
219+ **Overriding tolerance thresholds **
220+
221+ The tolerance thresholds have defaults set in ``params.py `` but can be
222+ overridden in ``config.yaml `` under the ``training `` key. Only the keys
223+ relevant to your task type need to be specified:
224+
225+ .. code-block :: yaml
226+
227+ training :
228+ model_name : ' REGR_13k'
229+ training_epochs : 100
230+ quantization : 2
231+ auto_quantization : True
232+ # Tighten regression tolerance: allow at most 2% R² drop instead of 5%
233+ autoquant_tolerance_regression : 0.02
234+
235+ .. code-block :: yaml
236+
237+ training :
238+ model_name : ' AD_17K'
239+ training_epochs : 100
240+ quantization : 2
241+ auto_quantization : True
242+ # Relax anomaly tolerance: allow up to 4x MSE increase
243+ autoquant_tolerance_anomaly : 3.0
244+
245+ All four keys and their defaults are:
246+
247+ .. code-block :: yaml
248+
249+ training :
250+ autoquant_tolerance_classification : 0.05 # higher-is-better: max 5% accuracy drop vs float
251+ autoquant_tolerance_regression : 0.05 # higher-is-better: max 5% R² drop vs float
252+ autoquant_tolerance_forecasting : 2.0 # lower-is-better: SMAPE must stay below 3× float (1 + 2.0)
253+ autoquant_tolerance_anomaly : 2.0 # lower-is-better: MSE must stay below 3× float (1 + 2.0)
254+
161255 Task-Specific Behaviour
162256-----------------------
163257
0 commit comments