Skip to content

Commit d43fa35

Browse files
committed
Arm backend: Document validated TinyML models for Cortex-M
Adds a Validated Models section to the Cortex-M backend overview listing the six models exported, INT8 quantized, and run on the Corstone-300 FVP by CI: mv2 and ds_cnn on trunk, and mv3, mobilenet_v1_025, resnet8, and deep_autoencoder nightly. For each model the table points at the source file and the per-model dialect/implementation test. A short note calls out that mobilenet_v1_025 is the MLPerf Tiny Visual Wake Words reference model — the canonical TinyML person-detection benchmark — since that naming is not obvious from the name. The page also documents the bundled (.bpte) testing flow that CI uses: aot_arm_compiler --bundleio embeds reference inputs and expected outputs in the program, and examples/arm/run.sh drives the full export → build → FVP chain with Test_result PASS/FAIL self-checking, so a reader can reproduce what trunk and nightly do. An admonition clarifies that CI validates INT8 numerical parity between the exported .bpte and the eager-mode quantized model, not task accuracy (VWW / KWS / ImageNet). This change was authored with Claude (claude-opus-4-7[1m]).
1 parent e6efe18 commit d43fa35

1 file changed

Lines changed: 61 additions & 0 deletions

File tree

docs/source/backends/arm-cortex-m/arm-cortex-m-overview.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -164,3 +164,64 @@ backends/arm/scripts/run_fvp.sh --elf=build/arm_executor_runner --target=ethos-u
164164
```
165165

166166
For a complete end-to-end walkthrough including dataset setup, calibration, and result validation, see the [Cortex-M MobileNetV2 notebook](https://github.com/pytorch/executorch/blob/main/examples/arm/cortex_m_mv2_example.ipynb).
167+
168+
## Testing with Bundled I/O
169+
170+
The tutorial above produces a plain `.pte`. For programmatic testing,
171+
`aot_arm_compiler --bundleio` instead produces a bundled (`.bpte`) program
172+
that embeds reference inputs and expected outputs; the Cortex-M test runner
173+
loads the bundle via semihosting and self-checks its outputs against the
174+
embedded references, emitting `Test_result: PASS` or `Test_result: FAIL`
175+
on the UART.
176+
177+
The driver for this flow is `examples/arm/run.sh`, which exports the model,
178+
builds the Cortex-M test runner, launches the Corstone-300 FVP with
179+
semihosting enabled, and checks the bundled output. Run it from the
180+
ExecuTorch repo root after `./install_executorch.sh`:
181+
182+
```bash
183+
# One-time: install the Arm toolchain + FVP.
184+
examples/arm/setup.sh --i-agree-to-the-contained-eula
185+
source examples/arm/arm-scratch/setup_path.sh
186+
187+
# Per model: export, build, and run on the FVP in one step.
188+
# (Quantization is the default for the cortex-m55+int8 target.)
189+
examples/arm/run.sh \
190+
--model_name=<model> \
191+
--target=cortex-m55+int8 \
192+
--bundleio
193+
```
194+
195+
Replace `<model>` with any of the validated-model names in the table
196+
below. Without `--calibration_data`, calibration falls back to the model's
197+
`get_example_inputs()` (random data) — enough for bundled-I/O numerical
198+
parity, but not for task-accuracy claims. On `Test_result: FAIL`, inspect
199+
the FVP UART log for the per-tensor diff; supplying a representative
200+
calibration dataset via `--calibration_data=<dir>` often resolves
201+
mismatches caused by random-input calibration.
202+
203+
:::{important}
204+
Bundled I/O checks INT8 **numerical parity** between the exported `.bpte`
205+
and the eager-mode quantized model on reference inputs; it does not
206+
validate task accuracy (VWW / KWS / ImageNet).
207+
:::
208+
209+
## Validated Models
210+
211+
The following models are exported, INT8 quantized, lowered, and validated
212+
end-to-end on the Corstone-300 FVP:
213+
214+
| Model | Task | Input shape | Source | Test |
215+
|--------------------|------------------------------------|---------------|---------------------------------------------------|----------------------------------------------------------|
216+
| `mv2` | Image classification | `1x3x224x224` | `examples/models/mobilenet_v2/` | `backends/cortex_m/test/models/test_mobilenet_v2.py` |
217+
| `mv3` | Image classification | `1x3x224x224` | `examples/models/mobilenet_v3/` | `backends/cortex_m/test/models/test_mobilenet_v3.py` |
218+
| `ds_cnn` | Keyword spotting (MLPerf Tiny) | `1x1x49x10` | `examples/models/mlperf_tiny/ds_cnn.py` | `backends/cortex_m/test/models/test_ds_cnn.py` |
219+
| `mobilenet_v1_025` | Visual Wake Words (MLPerf Tiny) | `1x3x96x96` | `examples/models/mlperf_tiny/mobilenet_v1_025.py` | `backends/cortex_m/test/models/test_mobilenet_v1_025.py` |
220+
| `resnet8` | Image classification (MLPerf Tiny) | `1x3x32x32` | `examples/models/mlperf_tiny/resnet8.py` | `backends/cortex_m/test/models/test_resnet8.py` |
221+
| `deep_autoencoder` | Anomaly detection (MLPerf Tiny) | `1x640` | `examples/models/mlperf_tiny/deep_autoencoder.py` | `backends/cortex_m/test/models/test_deep_autoencoder.py` |
222+
223+
:::{note}
224+
`mobilenet_v1_025` is the MLPerf Tiny Visual Wake Words benchmark
225+
(MobileNetV1 with width multiplier 0.25) — the canonical person-detection
226+
reference model for TinyML.
227+
:::

0 commit comments

Comments
 (0)