Name	Name	Last commit message	Last commit date
parent directory ..
Makefile	Makefile
README.md	README.md
weights.hpp	weights.hpp
xor_quant.cpp	xor_quant.cpp
xor_quant.py	xor_quant.py

PyTorch -> TinyMind int8 Quantization (XOR)

End-to-end post-training int8 quantization demo. The PyTorch script trains a small XOR MLP in float32, observes per-tensor activation ranges over a 9 x 9 calibration grid, then emits a weights.hpp with int8 weights, int32 biases, and per-tensor (scale, zero_point) metadata. The C++ side (xor_quant.cpp) reconstructs the per-layer Requantizer and the qsigmoid LUT host-side and runs a pure-integer forward pass over the four XOR inputs.

This is the affine quantization counterpart to examples/pytorch/xor/, which uses TinyMind's existing single-ValueType Q16.16 fixed-point path.

Pipeline

float32 PyTorch                  int8 TinyMind
---------------                  -------------
    nn.Linear(2, 4)      -->     QDense<int8,int8,int32,int8>
    nn.ReLU                -->   qreluBuffer (or fold into requant)
    nn.Linear(4, 1)      -->     QDense<int8,int8,int32,int8>
    nn.Sigmoid             -->   buildQSigmoidLUT + qApplyLUT

Calibration follows the TFLite / CMSIS-NN convention:

Symmetric per-tensor weights (zero_point = 0, qmax = 127).
Asymmetric activations (zero_point in [-128, 127]).
int32 biases held at bias_scale = input_scale * weight_scale, so they add directly into the int32 accumulator.

The Requantizer (multiplier, shift) pair is rebuilt at startup from the float scales via tinymind::buildRequantizer. For an MCU deployment, that step happens once on the host and the resulting integer triple is embedded as a constant — the inference binary needs neither <cmath> nor float math.

Building and Running

# (Optional) regenerate weights.hpp from PyTorch training. Requires torch.
python3 xor_quant.py

# Build and run the int8 inference pipeline.
make clean && make && make run

The committed weights.hpp ships an exact textbook 2-4-1 ReLU+Sigmoid solver so the example runs without requiring PyTorch. Re-running xor_quant.py overwrites the file with whatever the SGD trainer converges to.

Expected output:

int8 XOR accuracy: 4/4

Weight File Format

weights.hpp is a generated C++ header; see the file itself for the field-by-field layout. Key constants:

Constant	Meaning
`kInputScale`, `kInputZeroPoint`	Asymmetric calibration of the input tensor
`kHiddenScale`, `kHiddenZeroPoint`	Asymmetric calibration of the post-ReLU hidden tensor
`kLogitScale`, `kLogitZeroPoint`	Asymmetric calibration of the pre-sigmoid logit tensor
`kSigmoidOutScale`, `kSigmoidOutZeroPoint`	Output grid of the sigmoid LUT (1/256, -128 covers (0, 1))
`kFc1WeightScale`, `kFc2WeightScale`	Symmetric weight scales
`kFc1Weights`, `kFc1Biases`	Row-major int8 weights and int32 biases for `nn.Linear(2, hidden)`
`kFc2Weights`, `kFc2Biases`	Row-major int8 weights and int32 biases for `nn.Linear(hidden, 1)`

Calibration ranges

The Python script uses a 9 x 9 grid over [0, 1]^2 plus the four exact XOR corners. For larger / non-toy tasks replace the grid with a representative sample of the deployment distribution; the rest of the emit logic stays the same.

Porting to an MCU

Drop the qcalibration.hpp include, replace the buildRequantizer and buildQSigmoidLUT calls with the integer constants those functions emit on the host, and the inference path compiles in the TINYMIND_ENABLE_QUANTIZATION=1, TINYMIND_ENABLE_FLOAT=0, TINYMIND_ENABLE_STD=0 corner exercised by unit_test/embedded.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

PyTorch -> TinyMind int8 Quantization (XOR)

Pipeline

Building and Running

Weight File Format

Calibration ranges

Porting to an MCU

FilesExpand file tree

xor

Directory actions

More options

Directory actions

More options

Latest commit

History

xor

Folders and files

parent directory

README.md

PyTorch -> TinyMind int8 Quantization (XOR)

Pipeline

Building and Running

Weight File Format

Calibration ranges

Porting to an MCU