diff --git a/docs/source/backends/arm-cortex-m/arm-cortex-m-overview.md b/docs/source/backends/arm-cortex-m/arm-cortex-m-overview.md index 0258d013471..b1fcf7533ac 100644 --- a/docs/source/backends/arm-cortex-m/arm-cortex-m-overview.md +++ b/docs/source/backends/arm-cortex-m/arm-cortex-m-overview.md @@ -164,3 +164,64 @@ backends/arm/scripts/run_fvp.sh --elf=build/arm_executor_runner --target=ethos-u ``` For a complete end-to-end walkthrough including dataset setup, calibration, and result validation, see the [Cortex-M MobileNetV2 notebook](https://github.com/pytorch/executorch/blob/main/examples/arm/cortex_m_mv2_example.ipynb). + +## Testing with Bundled I/O + +The tutorial above produces a plain `.pte`. For programmatic testing, +`aot_arm_compiler --bundleio` instead produces a bundled (`.bpte`) program +that embeds reference inputs and expected outputs; the Cortex-M test runner +loads the bundle via semihosting and self-checks its outputs against the +embedded references, emitting `Test_result: PASS` or `Test_result: FAIL` +on the UART. + +The driver for this flow is `examples/arm/run.sh`, which exports the model, +builds the Cortex-M test runner, launches the Corstone-300 FVP with +semihosting enabled, and checks the bundled output. Run it from the +ExecuTorch repo root after `./install_executorch.sh`: + +```bash +# One-time: install the Arm toolchain + FVP. +examples/arm/setup.sh --i-agree-to-the-contained-eula +source examples/arm/arm-scratch/setup_path.sh + +# Per model: export, build, and run on the FVP in one step. +# (Quantization is the default for the cortex-m55+int8 target.) +examples/arm/run.sh \ + --model_name= \ + --target=cortex-m55+int8 \ + --bundleio +``` + +Replace `` with any of the validated-model names in the table +below. Without `--calibration_data`, calibration falls back to the model's +`get_example_inputs()` (random data) — enough for bundled-I/O numerical +parity, but not for task-accuracy claims. On `Test_result: FAIL`, inspect +the FVP UART log for the per-tensor diff; supplying a representative +calibration dataset via `--calibration_data=` often resolves +mismatches caused by random-input calibration. + +:::{important} +Bundled I/O checks INT8 **numerical parity** between the exported `.bpte` +and the eager-mode quantized model on reference inputs; it does not +validate task accuracy (VWW / KWS / ImageNet). +::: + +## Validated Models + +The following models are exported, INT8 quantized, lowered, and validated +end-to-end on the Corstone-300 FVP: + +| Model | Task | Input shape | Source | Test | +|--------------------|------------------------------------|---------------|---------------------------------------------------|----------------------------------------------------------| +| `mv2` | Image classification | `1x3x224x224` | `examples/models/mobilenet_v2/` | `backends/cortex_m/test/models/test_mobilenet_v2.py` | +| `mv3` | Image classification | `1x3x224x224` | `examples/models/mobilenet_v3/` | `backends/cortex_m/test/models/test_mobilenet_v3.py` | +| `ds_cnn` | Keyword spotting (MLPerf Tiny) | `1x1x49x10` | `examples/models/mlperf_tiny/ds_cnn.py` | `backends/cortex_m/test/models/test_ds_cnn.py` | +| `mobilenet_v1_025` | Visual Wake Words (MLPerf Tiny) | `1x3x96x96` | `examples/models/mlperf_tiny/mobilenet_v1_025.py` | `backends/cortex_m/test/models/test_mobilenet_v1_025.py` | +| `resnet8` | Image classification (MLPerf Tiny) | `1x3x32x32` | `examples/models/mlperf_tiny/resnet8.py` | `backends/cortex_m/test/models/test_resnet8.py` | +| `deep_autoencoder` | Anomaly detection (MLPerf Tiny) | `1x640` | `examples/models/mlperf_tiny/deep_autoencoder.py` | `backends/cortex_m/test/models/test_deep_autoencoder.py` | + +:::{note} +`mobilenet_v1_025` is the MLPerf Tiny Visual Wake Words benchmark +(MobileNetV1 with width multiplier 0.25) — the canonical person-detection +reference model for TinyML. +:::