You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**Note:** This is hand-crafted MNIST Classifier (proof-of-concept), and not production trained. This tiny MLP recognizes digits 0, 1, 4, and 7 using manually designed feature detectors.
34
38
39
+
### INT8 quantized model (CMSIS-NN accelerated)
40
+
41
+
- Use the [CMSIS-NN export script](https://github.com/pytorch/executorch/blob/main/examples/raspberry_pi/pico2/export_mlp_mnist_cmsis.py)
This uses the `CortexMQuantizer` to produce INT8 quantized ops that map to CMSIS-NN kernels on Cortex-M33. The model I/O stays float — quantize and dequantize nodes are inserted inside the graph.
49
+
35
50
## Step 2: Build Firmware for Pico2
36
51
52
+
### FP32 build
53
+
37
54
```bash
38
55
# Generate model (Creates balanced_tiny_mlp_mnist.pte)
./examples/raspberry_pi/pico2/build_firmware_pico.sh --model=balanced_tiny_mlp_mnist.pte # This creates executorch_pico.uf2, a firmware image for Pico2
|`--model=FILE`| Specify model file to embed (relative to pico2/) |
82
+
|`--cmsis`| Build with CMSIS-NN INT8 kernels for Cortex-M33 acceleration |
83
+
|`--clean`| Clean build directories and exit; run separately before building if needed |
84
+
50
85
**Note:** '[build_firmware_pico.sh](https://github.com/pytorch/executorch/blob/main/examples/raspberry_pi/pico2/build_firmware_pico.sh)' script converts given model pte to hex array and generates C code for the same via this helper [script](https://github.com/pytorch/executorch/blob/main/examples/raspberry_pi/pico2/pte_to_array.py). This C code is then compiled to generate final .uf2 binary which is then flashed to Pico2.
The Pico2 uses an RP2350 SoC with a Cortex-M33 core. The CMSIS-NN library provides optimized INT8 kernels that leverage the Cortex-M33's DSP instructions for faster inference compared to FP32 portable ops.
239
+
240
+
### How it works
241
+
242
+
1.`export_mlp_mnist_cmsis.py` uses `CortexMQuantizer` to quantize the model to INT8
243
+
2. The model I/O remains float — quantize/dequantize nodes are inserted inside the graph
244
+
3.`--cmsis` flag builds ExecuTorch with the Cortex-M backend and links CMSIS-NN kernels
245
+
4. At runtime, quantized linear ops dispatch to CMSIS-NN instead of portable kernels
246
+
247
+
### When to use CMSIS-NN
248
+
249
+
- Lower latency on supported ops (linear, conv2d)
250
+
- Smaller model size (INT8 weights vs FP32)
251
+
- Trade-off: slight accuracy loss from quantization
252
+
187
253
## Next Steps
188
254
189
255
### Scale up your deployment
190
256
191
257
- Use real production trained model
192
-
- Optimize further → INT8 quantization, pruning
258
+
- Optimize further → INT8 quantization with CMSIS-NN, pruning
0 commit comments