You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The most important part of the library is a user-defined config yaml file. It has five separate sections: training, pruning, quantization, finetuning, and fitcompress section, currently maintained by TensorFlow only, parameters. By default, the parameters in the config are the following:
Training parameters
The following table outlines the primary parameters used to configure the training process:
Field
Type
Default
Description
epochs
int
200
Total number of training epochs.
fine_tuning_epochs
int
0
Additional epochs for fine-tuning.
pretraining_epochs
int
50
Pretraining / warm-up epochs.
rewind
str
"never"
Weight rewinding policy.
rounds
int
1
Number of prune–fine-tune cycles.
save_weights_epoch
int
-1
Save checkpoint at this epoch (-1 disables).
If you require additional parameters for the training or optimization loops, please define them directly in the config.yaml file.
Quantization parameters
Field
Type
Default
Description
default_data_keep_negatives
bool
0
Default k value for data quantization (0 = clamp negatives, 1 = keep).
default_data_integer_bits
int
0
Default integer bitwidth i for data quantization.
default_data_fractional_bits
int
0
Default fractional bitwidth f for data quantization.
default_weight_keep_negatives
bool
0
Default k value for weight quantization (0 or 1).
default_weight_integer_bits
int
0
Default integer bitwidth i for weight quantization.
default_weight_fractional_bits
int
0
Default fractional bitwidth f for weight quantization.
quantize_input
bool
true
Whether inputs to layers are quantized by default.
quantize_output
bool
true
Whether outputs of layers are quantized by default.
enable_quantization
bool
true
Global switch to enable or disable quantization.
hgq_gamma
float
0.0
HGQ regularization coefficient for bitwidth stability.
hgq_beta
float
0.0
HGQ loss coefficient scaling EBOPs.
layer_specific
dict
{}
Dictionary for per-layer quantization overrides.
use_hgq
bool
false
Enable or disable High Granularity Quantization (HGQ).
use_real_tanh
bool
false
Use a real tanh instead of hard/approximate tanh.
overflow_mode_data
str
"SAT"
Overflow handling mode for input and output quantizers(SAT, SAT_SYM, WRAP, WRAP_SM).
overflow_mode_parameters
str
"SAT"
Overflow handling mode for weight and biases quantizers(SAT, SAT_SYM, WRAP, WRAP_SM).
PQuantML supports seven different pruning methods.
Method Overview
Method
Model
cs
CSPruningModel
dst
DSTPruningModel
pdp
PDPPruningModel
wanda
WandaPruningModel
autosparse
AutoSparsePruningModel
activation_pruning
ActivationPruningModel
mdmm
MDMMPruningModel
There are the parameters shared by all methods:
Field
Type
Default
Description
disable_pruning_for_layers
List[str]
[]
Layer names to exclude from pruning.
enable_pruning
bool
true
Master pruning on/off switch.
threshold_decay
float
0.0
Optional pruning threshold decay term.
Layer names in `disable_pruning_for_layers` field must match your framework’s naming (e.g., Keras `layer.name`).
There are more details about every pruning method:
CS Pruning
Field
Type
Default
Description
pruning_method
str
cs
Selects this pruning schema.
final_temp
int
200
Target temperature at the end of the schedule.
threshold_init
int
0
Initial sparsification threshold.
DST Pruning
Field
Type
Default
Description
pruning_method
str
dst
Selects this pruning schema.
alpha
float
5.0e-06
Mask dynamics update coefficient.
max_pruning_pct
float
0.99
Upper bound on total pruning ratio.
threshold_init
float
0.0
Initial threshold value.
threshold_type
str
"channelwise"
Thresholding granularity.
PDP Pruning
Field
Type
Default
Description
pruning_method
str
pdp
Selects this pruning schema.
epsilon
float
0.015
Smoothing/regularization factor for gating.
sparsity
float
0.8
Target sparsity level (0–1).
temperature
float
1.0e-05
Annealing temperature.
structured_pruning
bool
false
Enable structured pruning.
Wanda Pruning
Field
Type
Default
Description
pruning_method
str
wanda
Selects this pruning schema.
M
Optional[int]
null
Optional grouping constant.
N
Optional[int]
null
Optional grouping constant.
sparsity
float
0.9
Target sparsity level (0–1).
t_delta
int
100
Window size / steps for stats collection.
t_start_collecting_batch
int
100
Warm-up steps before collecting statistics.
calculate_pruning_budget
bool
true
Auto-compute pruning budget from data.
Autosparse Pruning
Field
Type
Default
Description
pruning_method
str
autosparse
Selects this pruning schema.
alpha
float
0.5
Weight/penalty coefficient.
alpha_reset_epoch
int
90
Epoch at which alpha is reset/tuned.
autotune_epochs
int
10
Number of epochs in the tuning window.
backward_sparsity
bool
false
Apply sparsity in backward pass (if supported).
threshold_init
float
-5.0
Initial threshold (often in logit space).
threshold_type
str
"channelwise"
Thresholding granularity.
Activation Pruning
Field
Type
Default
Description
pruning_method
str
activation_pruning
Selects this pruning schema.
threshold
float
0.3
Activation magnitude cutoff.
t_delta
int
50
Steps used to aggregate statistics.
t_start_collecting_batch
int
50
Steps to skip before collecting statistics.
MDMM Pruning
Field
Type
Default
Description
pruning_method
str
mdmm
Selects this pruning schema.
constraint_type
ConstraintType
"Equality"
Constraint form: equality / ≤ / ≥.
target_value
float
0.0
Target value for the chosen metric.
metric_type
MetricType
"UnstructuredSparsity"
Quantity the constraint acts on — see MDMM metric types below.
target_sparsity
float
0.9
Target sparsity when constraining sparsity.
rf
int
1
Regularization / frequency parameter.
epsilon
float
1.0e-03
Feasibility tolerance.
scale
float
10.0
Penalty scaling for constraint violation.
damping
float
1.0
Damping term for numerical stability.
use_grad
bool
false
Use gradient information during updates.
l0_mode
"coarse" | "smooth"
"coarse"
L0 approximation mode.
scale_mode
"mean" | "sum"
"mean"
Aggregation mode for penalties.
MDMM metric types
The metric_type field selects which quantity the MDMM constraint drives. The first two are magnitude-based; the last two are hardware-aware and act on 4D convolution kernels.
Fraction of zero DSP/BRAM weight groups, modelling FPGA resource packing.
PACAPatternSparsity
Mean distance of each conv kernel to a small set of dominant binary patterns.
FPGAAwareSparsity parameters (used only when metric_type: FPGAAwareSparsity):
Field
Type
Default
Description
precision
int
16
Weight bit-width used to derive BRAM packing.
target_resource
"DSP" | "BRAM"
"DSP"
Hardware resource whose group-sparsity is measured.
bram_width
int
36
BRAM word width; sets how many DSP groups pack into one BRAM (BRAM only).
Weights are grouped into DSP blocks of size rf; for target_resource: BRAM, c = bram_width // precision (or 2*bram_width // precision when not divisible) consecutive DSP groups pack into one BRAM block. The metric reports the fraction of such groups whose L2 norm is below epsilon.
PACAPatternSparsity parameters (used only when metric_type: PACAPatternSparsity):
Field
Type
Default
Description
num_patterns_to_keep
int
16
Maximum number of dominant kernel patterns retained.
beta
float
0.75
Cumulative pattern-frequency coverage kept, in [0, 1].
distance_metric
"hamming" | "valued_hamming" | "cosine"
"valued_hamming"
Distance from each kernel to its closest dominant pattern.
`PACAPatternSparsity` always pairs with an equality constraint at target `0` (driving every kernel onto a dominant pattern); the config model sets `constraint_type` and `target_value` for you. During fine-tuning the kernels are projected onto their closest dominant pattern.
The hardware-aware metrics operate on 4D convolution weights; for non-convolutional layers `PACAPatternSparsity` is a no-op. Ready-made configs are available via `mdmm_fpga_config()` and `mdmm_paca_config()`.
Optionally, there is also FITCompress method implemented for PyTorch:
FitCompress method
Field
Type
Default
Description
enable_fitcompress
bool
false
Master switch that enables or disables FITCompress.
optimize_quantization
bool
true
Whether FITCompress searches over quantization bit-width candidates.
quantization_schedule
List[float]
[7., 4., 3., 2.]
Candidate bit-widths evaluated during quantization search.
pruning_schedule
dict
{start: 0, end: -3, steps: 40}
Logarithmic pruning curve (base 10) with defined start, end, and step count.
compression_goal
float
0.10
Target compression ratio for the search procedure.
optimize_pruning
bool
false
Whether FITCompress searches over pruning ratios.
greedy_astar
bool
true
Disable fallback in A* search: once a candidate is selected, all others discarded.
approximate
bool
true
Use Fisher Trace approximations to speed up FIT score estimation.
f_lambda
float
1
Multiplicative factor λ in the distance function (g + λf).
Quantization layers in PQuantML
PQConv*D: Convolutional layers.
PQAvgPool*D: Average pooling layers.
PQBatchNorm*D: BatchNorm layers.
PQDense: Linear layer.
PQActivation: Activation layers (ReLU, Tanh)
Currently, PQuantML supports two quantization modes: layer-wise fixed-point quantization, where each tensor uses a single
bit-width configuration, and High-Granularity Quantization (HGQ).