Usage Reference

Config file

The most important part of the library is a user-defined config yaml file. It has five separate sections: training, pruning, quantization, finetuning, and fitcompress section, currently maintained by TensorFlow only, parameters. By default, the parameters in the config are the following:

Training parameters

The following table outlines the primary parameters used to configure the training process:

Field	Type	Default	Description
`epochs`	int	`200`	Total number of training epochs.
`fine_tuning_epochs`	int	`0`	Additional epochs for fine-tuning.
`pretraining_epochs`	int	`50`	Pretraining / warm-up epochs.
`rewind`	str	`"never"`	Weight rewinding policy.
`rounds`	int	`1`	Number of prune–fine-tune cycles.
`save_weights_epoch`	int	`-1`	Save checkpoint at this epoch (`-1` disables).

If you require additional parameters for the training or optimization loops, please define them directly in the config.yaml file.

Quantization parameters

Field	Type	Default	Description
`default_data_keep_negatives`	bool	`0`	Default k value for data quantization (0 = clamp negatives, 1 = keep).
`default_data_integer_bits`	int	`0`	Default integer bitwidth i for data quantization.
`default_data_fractional_bits`	int	`0`	Default fractional bitwidth f for data quantization.
`default_weight_keep_negatives`	bool	`0`	Default k value for weight quantization (0 or 1).
`default_weight_integer_bits`	int	`0`	Default integer bitwidth i for weight quantization.
`default_weight_fractional_bits`	int	`0`	Default fractional bitwidth f for weight quantization.
`quantize_input`	bool	`true`	Whether inputs to layers are quantized by default.
`quantize_output`	bool	`true`	Whether outputs of layers are quantized by default.
`enable_quantization`	bool	`true`	Global switch to enable or disable quantization.
`hgq_gamma`	float	`0.0`	HGQ regularization coefficient for bitwidth stability.
`hgq_beta`	float	`0.0`	HGQ loss coefficient scaling EBOPs.
`layer_specific`	dict	`{}`	Dictionary for per-layer quantization overrides.
`use_hgq`	bool	`false`	Enable or disable High Granularity Quantization (HGQ).
`use_real_tanh`	bool	`false`	Use a real `tanh` instead of hard/approximate `tanh`.
`overflow_mode_data`	str	`"SAT"`	Overflow handling mode for input and output quantizers(`SAT`, `SAT_SYM`, `WRAP`, `WRAP_SM`).
`overflow_mode_parameters`	str	`"SAT"`	Overflow handling mode for weight and biases quantizers(`SAT`, `SAT_SYM`, `WRAP`, `WRAP_SM`).
`round_mode`	str	`"RND"`	Rounding mode (`TRN`, `RND`, `RND_CONV`, `RND_ZERO`, etc.).
`use_relu_multiplier`	bool	`true`	Enable a learned bit-shift multiplier inside ReLU layers.

Fine-tuning parameters

Field	Type	Default	Description
`experiment_name`	str	`"experiment_1"`	Name of the study.
`model_name`	str	`"resnet18"`	Model architecture name.
`sampler`	str	`GridSampler`	Sampler selection for the search space.
`num_trials`	int	`0`	Number of trials.
`hyperparameter_search`	HyperparameterSearch	`{}`	Ranges for non-grid samplers.

Samplers

Field	Type	Default	Description
`type`	str	`"TPESampler"`	Sampler class name (e.g., `TPESampler`, `GridSampler`).
`params`	Dict[str, Any]	`{}`	Sampler-specific kwargs (e.g., `seed`, `search_space`).

More about samplers can be found in {optuna documentation}

HyperparameterSearch

Field	Type	Default	Description
`numerical`	Dict[str, List[Union[int, float]]]	`{}`	Numeric ranges `[low, high, step]`.
`categorical`	Optional[Dict[str, List[str]]]	`{}`	Categorical choices.

Pruning methods

PQuantML supports seven different pruning methods.

Method Overview

Method	Model
`cs`	`CSPruningModel`
`dst`	`DSTPruningModel`
`pdp`	`PDPPruningModel`
`wanda`	`WandaPruningModel`
`autosparse`	`AutoSparsePruningModel`
`activation_pruning`	`ActivationPruningModel`
`mdmm`	`MDMMPruningModel`

There are the parameters shared by all methods:

Field	Type	Default	Description
`disable_pruning_for_layers`	List[str]	`[]`	Layer names to exclude from pruning.
`enable_pruning`	bool	`true`	Master pruning on/off switch.
`threshold_decay`	float	`0.0`	Optional pruning threshold decay term.

Layer names in `disable_pruning_for_layers` field  must match your framework’s naming (e.g., Keras `layer.name`).

There are more details about every pruning method:

CS Pruning

Field	Type	Default	Description
`pruning_method`	str	`cs`	Selects this pruning schema.
`final_temp`	int	`200`	Target temperature at the end of the schedule.
`threshold_init`	int	`0`	Initial sparsification threshold.

DST Pruning

Field	Type	Default	Description
`pruning_method`	str	`dst`	Selects this pruning schema.
`alpha`	float	`5.0e-06`	Mask dynamics update coefficient.
`max_pruning_pct`	float	`0.99`	Upper bound on total pruning ratio.
`threshold_init`	float	`0.0`	Initial threshold value.
`threshold_type`	str	`"channelwise"`	Thresholding granularity.

PDP Pruning

Field	Type	Default	Description
`pruning_method`	str	`pdp`	Selects this pruning schema.
`epsilon`	float	`0.015`	Smoothing/regularization factor for gating.
`sparsity`	float	`0.8`	Target sparsity level (0–1).
`temperature`	float	`1.0e-05`	Annealing temperature.
`structured_pruning`	bool	`false`	Enable structured pruning.

Wanda Pruning

Field	Type	Default	Description
`pruning_method`	str	`wanda`	Selects this pruning schema.
`M`	Optional[int]	`null`	Optional grouping constant.
`N`	Optional[int]	`null`	Optional grouping constant.
`sparsity`	float	`0.9`	Target sparsity level (0–1).
`t_delta`	int	`100`	Window size / steps for stats collection.
`t_start_collecting_batch`	int	`100`	Warm-up steps before collecting statistics.
`calculate_pruning_budget`	bool	`true`	Auto-compute pruning budget from data.

Autosparse Pruning

Field	Type	Default	Description
`pruning_method`	str	`autosparse`	Selects this pruning schema.
`alpha`	float	`0.5`	Weight/penalty coefficient.
`alpha_reset_epoch`	int	`90`	Epoch at which `alpha` is reset/tuned.
`autotune_epochs`	int	`10`	Number of epochs in the tuning window.
`backward_sparsity`	bool	`false`	Apply sparsity in backward pass (if supported).
`threshold_init`	float	`-5.0`	Initial threshold (often in logit space).
`threshold_type`	str	`"channelwise"`	Thresholding granularity.

Activation Pruning

Field	Type	Default	Description
`pruning_method`	str	`activation_pruning`	Selects this pruning schema.
`threshold`	float	`0.3`	Activation magnitude cutoff.
`t_delta`	int	`50`	Steps used to aggregate statistics.
`t_start_collecting_batch`	int	`50`	Steps to skip before collecting statistics.

MDMM Pruning

Field	Type	Default	Description
`pruning_method`	str	`mdmm`	Selects this pruning schema.
`constraint_type`	ConstraintType	`"Equality"`	Constraint form: equality / ≤ / ≥.
`target_value`	float	`0.0`	Target value for the chosen metric.
`metric_type`	MetricType	`"UnstructuredSparsity"`	Quantity the constraint acts on — see MDMM metric types below.
`target_sparsity`	float	`0.9`	Target sparsity when constraining sparsity.
`rf`	int	`1`	Regularization / frequency parameter.
`epsilon`	float	`1.0e-03`	Feasibility tolerance.
`scale`	float	`10.0`	Penalty scaling for constraint violation.
`damping`	float	`1.0`	Damping term for numerical stability.
`use_grad`	bool	`false`	Use gradient information during updates.
`l0_mode`	`"coarse"` \| `"smooth"`	`"coarse"`	L0 approximation mode.
`scale_mode`	`"mean"` \| `"sum"`	`"mean"`	Aggregation mode for penalties.

MDMM metric types

The metric_type field selects which quantity the MDMM constraint drives. The first two are magnitude-based; the last two are hardware-aware and act on 4D convolution kernels.

metric_type	Constrains
`UnstructuredSparsity`	Element-wise (L0/L1) sparsity toward `target_sparsity`.
`StructuredSparsity`	Fraction of all-zero weight groups of size `rf`.
`FPGAAwareSparsity`	Fraction of zero DSP/BRAM weight groups, modelling FPGA resource packing.
`PACAPatternSparsity`	Mean distance of each conv kernel to a small set of dominant binary patterns.

FPGAAwareSparsity parameters (used only when metric_type: FPGAAwareSparsity):

Field	Type	Default	Description
`precision`	int	`16`	Weight bit-width used to derive BRAM packing.
`target_resource`	`"DSP"` \| `"BRAM"`	`"DSP"`	Hardware resource whose group-sparsity is measured.
`bram_width`	int	`36`	BRAM word width; sets how many DSP groups pack into one BRAM (`BRAM` only).

Weights are grouped into DSP blocks of size rf; for target_resource: BRAM, c = bram_width // precision (or 2*bram_width // precision when not divisible) consecutive DSP groups pack into one BRAM block. The metric reports the fraction of such groups whose L2 norm is below epsilon.

PACAPatternSparsity parameters (used only when metric_type: PACAPatternSparsity):

Field	Type	Default	Description
`num_patterns_to_keep`	int	`16`	Maximum number of dominant kernel patterns retained.
`beta`	float	`0.75`	Cumulative pattern-frequency coverage kept, in `[0, 1]`.
`distance_metric`	`"hamming"` \| `"valued_hamming"` \| `"cosine"`	`"valued_hamming"`	Distance from each kernel to its closest dominant pattern.

`PACAPatternSparsity` always pairs with an equality constraint at target `0` (driving every kernel onto a dominant pattern); the config model sets `constraint_type` and `target_value` for you. During fine-tuning the kernels are projected onto their closest dominant pattern.

The hardware-aware metrics operate on 4D convolution weights; for non-convolutional layers `PACAPatternSparsity` is a no-op. Ready-made configs are available via `mdmm_fpga_config()` and `mdmm_paca_config()`.

Optionally, there is also FITCompress method implemented for PyTorch:

FitCompress method

Field	Type	Default	Description
`enable_fitcompress`	bool	`false`	Master switch that enables or disables FITCompress.
`optimize_quantization`	bool	`true`	Whether FITCompress searches over quantization bit-width candidates.
`quantization_schedule`	List[float]	`[7., 4., 3., 2.]`	Candidate bit-widths evaluated during quantization search.
`pruning_schedule`	dict	`{start: 0, end: -3, steps: 40}`	Logarithmic pruning curve (base 10) with defined start, end, and step count.
`compression_goal`	float	`0.10`	Target compression ratio for the search procedure.
`optimize_pruning`	bool	`false`	Whether FITCompress searches over pruning ratios.
`greedy_astar`	bool	`true`	Disable fallback in A* search: once a candidate is selected, all others discarded.
`approximate`	bool	`true`	Use Fisher Trace approximations to speed up FIT score estimation.
`f_lambda`	float	`1`	Multiplicative factor λ in the distance function (g + λf).

Quantization layers in PQuantML

PQConv*D: Convolutional layers.
PQAvgPool*D: Average pooling layers.
PQBatchNorm*D: BatchNorm layers.
PQDense: Linear layer.
PQActivation: Activation layers (ReLU, Tanh)

Currently, PQuantML supports two quantization modes: layer-wise fixed-point quantization, where each tensor uses a single
bit-width configuration, and High-Granularity Quantization (HGQ).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Usage Reference

Config file

Training parameters

Quantization parameters

Fine-tuning parameters

Samplers

HyperparameterSearch

Pruning methods

Method Overview

CS Pruning

DST Pruning

PDP Pruning

Wanda Pruning

Autosparse Pruning

Activation Pruning

MDMM Pruning

MDMM metric types

FitCompress method

Quantization layers in PQuantML

Uh oh!

FilesExpand file tree

reference.md

Latest commit

History

reference.md

File metadata and controls

Usage Reference

Config file

Training parameters

Quantization parameters

Fine-tuning parameters

Samplers

HyperparameterSearch

Pruning methods

Method Overview

CS Pruning

DST Pruning

PDP Pruning

Wanda Pruning

Autosparse Pruning

Activation Pruning

MDMM Pruning

MDMM metric types

FitCompress method

Quantization layers in PQuantML