Overview

This document outlines the steps required to run a simple MNIST digit recognition neural network on the Pico2 microcontroller using ExecuTorch.

Demo Model: Hand-crafted MNIST Classifier

The included export_mlp_mnist.py (in examples/raspberry_pi/pico2) creates a demonstration model with hand-crafted weights (not production-trained). This tiny MLP recognizes digits 0, 1, 4, and 7 using manually designed feature detectors. Note: This is a proof-of-concept. For production use, train your model on real MNIST data.

Bring Your Own Model and Deploy

This demo demonstrates ExecuTorch's ability to bring your own PyTorch model and deploy it to Pico2 with one simple script. The complete pipeline works from any PyTorch model to a runnable binary:

Use existing demo model (examples/raspberry_pi/pico2/export_mlp_mnist.py) or bring your own model
Build firmware with one command and pass the model file (.pte) as an argument
Deploy directly to Pico2

Adapting to Other Baremetal Architectures

While this example targets the Pico2 board, the same pattern — embedding the .pte model as a C array, using BufferDataLoader, and statically allocating memory — can be adapted to other baremetal targets (e.g., RISC-V) by providing your own CMake toolchain file. The key requirement is a correct selective build (see below) so all operators your model needs are included.

Important Caveats

Memory constraints - Models must fit in 520KB SRAM (Pico2)
Missing operators - If you get "Operator missing" (error 20) at runtime, your build is missing operators that the model needs. Use EXECUTORCH_SELECT_OPS_MODEL (see below) to auto-detect the required operators from your .pte file.
Selective builds - Include only operators your model uses to reduce binary size

Memory Constraints & Optimization

Critical: Pico2 has limited memory
- 520KB SRAM (on-chip static RAM)
- 4MB QSPI Flash (onboard storage)

Always apply optimization techniques on large models that do not fit in Pico2 memory:

Large models will not fit. Keep your .pte files small!

Quantization (INT8, INT4)
Model pruning
Operator fusion
Selective builds (include only needed operators)

Selective Build: Choosing the Right Operators

When cross-compiling ExecuTorch for baremetal targets, you need to register the operators your model uses. There are three approaches:

EXECUTORCH_SELECT_OPS_MODEL (recommended) — Point to your .pte file and the build system auto-detects all required operators:
```
cmake ... -DEXECUTORCH_SELECT_OPS_MODEL=/path/to/model.pte
```
This is the most reliable approach because it reads the exact operators from the serialized model, including any operators introduced by compiler passes or edge IR lowering that may not be obvious from the original PyTorch model.
EXECUTORCH_SELECT_OPS_LIST — Manually specify operators by name:
```
cmake ... -DEXECUTORCH_SELECT_OPS_LIST="aten::addmm.out,aten::relu.out,..."
```
This requires you to know the exact operator names (including .out suffixes). If you miss any, you'll get "Operator missing" (error 20) at runtime.
All portable operators (no selective build) — Omit any EXECUTORCH_SELECT_OPS_* options when configuring CMake. This registers all portable operators, which is simple but produces larger binaries, an important consideration on memory-constrained targets.

The build_firmware_pico.sh script uses EXECUTORCH_SELECT_OPS_MODEL by default when a model file is provided.

For more details, refer to the following guides:

(Prerequisites) Prepare the Environment for Arm

Setup executorch development environment. Also see instructions for setting up the environment for Arm. Make sure you have the toolchain configured correctly. Refer to this setup for more details.

which arm-none-eabi-gcc
# Should return: executorch/examples/arm/arm-scratch/arm-gnu-toolchain-13.3.rel1-x86_64-arm-none-eabi/bin/arm-none-eabi-gcc

Build Pico2 Firmware with ExecuTorch

This involves two steps:

Generate your model:

FP32 model (default):

cd examples/raspberry_pi/pico2
python export_mlp_mnist.py # Creates balanced_tiny_mlp_mnist.pte

INT8 quantized model (CMSIS-NN accelerated):

cd examples/raspberry_pi/pico2
python export_mlp_mnist_cmsis.py # Creates balanced_tiny_mlp_mnist_cmsis.pte

Build firmware:

FP32 build:

# In the dir examples/raspberry_pi/pico2
./build_firmware_pico.sh --model=balanced_tiny_mlp_mnist.pte

INT8 CMSIS-NN build:

# In the dir examples/raspberry_pi/pico2
./build_firmware_pico.sh --cmsis --model=balanced_tiny_mlp_mnist_cmsis.pte

Script options:

Flag	Description
`--model=FILE`	Specify model file to embed (relative to pico2/)
`--cmsis`	Build with CMSIS-NN INT8 kernels for Cortex-M33 acceleration
`--clean`	Clean build directories before building

Flash Firmware

Hold the BOOTSEL button on Pico2 and connect to your computer. It mounts as RPI-RP2. Copy executorch_pico.uf2 to this drive.

Verify Execution

The Pico2 LED blinks 10 times at 500ms intervals for successful execution. Via serial terminal, you'll see:

...
...
PREDICTED: 4 (Expected: 4) ✅ CORRECT!

==================================================

=== Digit 7 ===
############################
############################
                        ####
                       ####
                      ####
                     ####
                    ####
                   ####
                  ####
                 ####
                ####
               ####
              ####
             ####
            ####
           ####
          ####
         ####
        ####
       ####
      ####
     ####
    ####
   ####
  ####
 ####
####
###

Input stats: 159 white pixels out of 784 total
Running neural network inference...
⏱️  Inference time: 245 us
✅ Neural network results:
  Digit 0: 370.000
  Digit 1: 0.000
  Digit 2: -3.000
  Digit 3: -3.000
  Digit 4: 860.000
  Digit 5: -3.000
  Digit 6: -3.000
  Digit 7: 1640.000 ← PREDICTED
  Digit 8: -3.000
  Digit 9: -3.000

PREDICTED: 7 (Expected: 7) ✅ CORRECT!

==================================================

📊 Inference latency summary:
  Digit 0: 312 us
  Digit 1: 198 us
  Digit 4: 267 us
  Digit 7: 245 us
  Average: 255 us

📊 Memory usage after method load:
   Method allocator: 45632 / 131072 bytes used
   Activation pool: 4096 bytes allocated

🎉 All tests complete! ExecuTorch inference of neural network works on Pico2!

Debugging via Serial Terminal

On macOS/Linux:

screen /dev/tty.usbmodem1101 115200

Replace /dev/tty.usbmodem1101 with your device path. If LED blinks 10 times at 100ms intervals, check logs for errors, but if it blinks 10 times at 500ms intervals, it is successful!

CMSIS-NN INT8 Acceleration

The Pico2 uses an RP2350 SoC with a Cortex-M33 core. The CMSIS-NN library provides optimized INT8 kernels that leverage the Cortex-M33's DSP instructions for faster inference compared to FP32 portable ops.

How it works

export_mlp_mnist_cmsis.py uses CortexMQuantizer to quantize the model to INT8
The model I/O remains float — quantize/dequantize nodes are inserted inside the graph
--cmsis flag builds ExecuTorch with the Cortex-M backend and links CMSIS-NN kernels
At runtime, quantized linear ops dispatch to CMSIS-NN instead of portable kernels

When to use CMSIS-NN

Lower latency on supported ops (linear, conv2d)
Smaller model size (INT8 weights vs FP32)
Trade-off: slight accuracy loss from quantization

Result: A complete PyTorch → ExecuTorch → Pico2 demo MNIST deployment!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overview

Demo Model: Hand-crafted MNIST Classifier

Bring Your Own Model and Deploy

Adapting to Other Baremetal Architectures

Important Caveats

Memory Constraints & Optimization

Always apply optimization techniques on large models that do not fit in Pico2 memory:

Selective Build: Choosing the Right Operators

(Prerequisites) Prepare the Environment for Arm

Build Pico2 Firmware with ExecuTorch

Generate your model:

Build firmware:

Flash Firmware

Verify Execution

Debugging via Serial Terminal

CMSIS-NN INT8 Acceleration

How it works

When to use CMSIS-NN

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Overview

Demo Model: Hand-crafted MNIST Classifier

Bring Your Own Model and Deploy

Adapting to Other Baremetal Architectures

Important Caveats

Memory Constraints & Optimization

Always apply optimization techniques on large models that do not fit in Pico2 memory:

Selective Build: Choosing the Right Operators

(Prerequisites) Prepare the Environment for Arm

Build Pico2 Firmware with ExecuTorch

Generate your model:

Build firmware:

Flash Firmware

Verify Execution

Debugging via Serial Terminal

CMSIS-NN INT8 Acceleration

How it works

When to use CMSIS-NN