|
| 1 | +# Zephyr: MobileNetV2 on Alif Ensemble E8 with Ethos-U NPU |
| 2 | + |
| 3 | +Run a quantized MobileNetV2 image classifier on the |
| 4 | +Alif Ensemble E8 DevKit using |
| 5 | +ExecuTorch, Zephyr RTOS, and the Arm Ethos-U55 NPU. The same build flow also |
| 6 | +works on the Arm Corstone-320 FVP for development without hardware. |
| 7 | + |
| 8 | +## What You'll Build |
| 9 | + |
| 10 | +- A quantized INT8 MobileNetV2 model fully delegated to the Ethos-U55 NPU |
| 11 | + (110 ops, ~19 ms inference on Alif E8) |
| 12 | +- A Zephyr RTOS application that loads the `.pte` model, runs inference on a |
| 13 | + static test image, and prints the top-5 ImageNet predictions over UART |
| 14 | + |
| 15 | +## Prerequisites |
| 16 | + |
| 17 | +### Hardware (choose one) |
| 18 | + |
| 19 | +| Target | Description | |
| 20 | +|--------|-------------| |
| 21 | +| **Alif Ensemble E8 DevKit** | Cortex-M55 HP core + Ethos-U55 (256 MACs), 4.5 MB HP SRAM, MRAM | |
| 22 | +| **Corstone-320 FVP** | Virtual platform simulating Cortex-M85 + Ethos-U85 (no hardware needed, Linux only) | |
| 23 | + |
| 24 | +### Software |
| 25 | + |
| 26 | +- Linux x86_64 (FVP and Arm toolchain are Linux-only; macOS can export models |
| 27 | + but cannot run the FVP or flash) |
| 28 | +- Python 3.10+ |
| 29 | +- Alif SE Tools for flashing (Alif hardware only) |
| 30 | + |
| 31 | +## Step 1: Set Up the Zephyr Workspace |
| 32 | + |
| 33 | +Create a workspace, install `west`, and initialize the Zephyr tree: |
| 34 | + |
| 35 | +```bash |
| 36 | +mkdir ~/zephyr_workspace && cd ~/zephyr_workspace |
| 37 | +python3 -m venv .venv && source .venv/bin/activate |
| 38 | +pip install west "cmake<4.0.0" pyelftools ninja jsonschema |
| 39 | +west init --manifest-rev v4.3.0 |
| 40 | +``` |
| 41 | + |
| 42 | +Install the Zephyr SDK (compiler toolchain): |
| 43 | + |
| 44 | +```bash |
| 45 | +wget https://github.com/zephyrproject-rtos/sdk-ng/releases/download/v0.17.4/zephyr-sdk-0.17.4_linux-x86_64.tar.xz |
| 46 | +tar -xf zephyr-sdk-0.17.4_linux-x86_64.tar.xz && rm -f zephyr-sdk-0.17.4_linux-x86_64.tar.xz |
| 47 | +./zephyr-sdk-0.17.4/setup.sh -c -t arm-zephyr-eabi |
| 48 | +export ZEPHYR_SDK_INSTALL_DIR=$(realpath ./zephyr-sdk-0.17.4) |
| 49 | +``` |
| 50 | + |
| 51 | +## Step 2: Add ExecuTorch as a Zephyr Module |
| 52 | + |
| 53 | +Copy the submanifest, configure `west` to pull only the modules we need, and |
| 54 | +update: |
| 55 | + |
| 56 | +```bash |
| 57 | +mkdir -p zephyr/submanifests |
| 58 | +cat > zephyr/submanifests/executorch.yaml << 'EOF' |
| 59 | +manifest: |
| 60 | + projects: |
| 61 | + - name: executorch |
| 62 | + url: https://github.com/pytorch/executorch |
| 63 | + revision: main |
| 64 | + path: modules/lib/executorch |
| 65 | +EOF |
| 66 | + |
| 67 | +west config manifest.project-filter -- -.*,+zephyr,+executorch,+cmsis,+cmsis_6,+cmsis-nn,+hal_ethos_u |
| 68 | +west update |
| 69 | +``` |
| 70 | + |
| 71 | +For Alif boards, also add the Alif HAL: |
| 72 | + |
| 73 | +```bash |
| 74 | +west config manifest.project-filter -- -.*,+zephyr,+executorch,+cmsis,+cmsis_6,+cmsis-nn,+hal_ethos_u,+hal_alif |
| 75 | +west update |
| 76 | +``` |
| 77 | + |
| 78 | +## Step 3: Install ExecuTorch and Arm Tools |
| 79 | + |
| 80 | +```bash |
| 81 | +cd modules/lib/executorch |
| 82 | +git submodule sync && git submodule update --init --recursive |
| 83 | +./install_executorch.sh |
| 84 | +cd ../../.. |
| 85 | +``` |
| 86 | + |
| 87 | +Install the Arm toolchain, Vela compiler, and Corstone FVPs: |
| 88 | + |
| 89 | +```bash |
| 90 | +modules/lib/executorch/examples/arm/setup.sh --i-agree-to-the-contained-eula |
| 91 | +source modules/lib/executorch/examples/arm/arm-scratch/setup_path.sh |
| 92 | +``` |
| 93 | + |
| 94 | +## Step 4: Export the MobileNetV2 Model |
| 95 | + |
| 96 | +Export a quantized INT8 MobileNetV2 with Ethos-U delegation. Choose the target |
| 97 | +that matches your hardware: |
| 98 | + |
| 99 | +**For Alif E8 (Ethos-U55 with 256 MACs):** |
| 100 | + |
| 101 | +```bash |
| 102 | +python -m modules.lib.executorch.backends.arm.scripts.aot_arm_compiler \ |
| 103 | + --model_name=mv2_untrained \ |
| 104 | + --quantize --delegate \ |
| 105 | + --target=ethos-u55-256 \ |
| 106 | + --output=mv2_ethosu.pte |
| 107 | +``` |
| 108 | + |
| 109 | +**For Corstone-320 FVP (Ethos-U85 with 256 MACs):** |
| 110 | + |
| 111 | +```bash |
| 112 | +python -m modules.lib.executorch.backends.arm.scripts.aot_arm_compiler \ |
| 113 | + --model_name=mv2_untrained \ |
| 114 | + --quantize --delegate \ |
| 115 | + --target=ethos-u85-256 \ |
| 116 | + --output=mv2_u85_256.pte |
| 117 | +``` |
| 118 | + |
| 119 | +The `--delegate` flag routes all compatible ops through the Ethos-U backend. |
| 120 | +The Vela compiler converts the TOSA intermediate representation into an |
| 121 | +optimized command stream for the NPU. Use `mv2` instead of `mv2_untrained` for |
| 122 | +meaningful predictions (requires torchvision pretrained weights). |
| 123 | + |
| 124 | +## Step 5: Build the Zephyr Application |
| 125 | + |
| 126 | +**For Alif E8:** |
| 127 | + |
| 128 | +```bash |
| 129 | +west build -b alif_e8_dk/ae822fa0e5597xx0/rtss_hp \ |
| 130 | + -S ethos-u55-enable \ |
| 131 | + modules/lib/executorch/zephyr/samples/mv2-ethosu -- \ |
| 132 | + -DET_PTE_FILE_PATH=mv2_ethosu.pte |
| 133 | +``` |
| 134 | + |
| 135 | +**For Corstone-320 FVP:** |
| 136 | + |
| 137 | +```bash |
| 138 | +west build -b mps4/corstone320/fvp \ |
| 139 | + modules/lib/executorch/zephyr/samples/mv2-ethosu -- \ |
| 140 | + -DET_PTE_FILE_PATH=mv2_u85_256.pte |
| 141 | +``` |
| 142 | + |
| 143 | +## Step 6a: Run on Corstone-320 FVP |
| 144 | + |
| 145 | +Set up the FVP paths and run: |
| 146 | + |
| 147 | +```bash |
| 148 | +export FVP_ROOT=$PWD/modules/lib/executorch/examples/arm/arm-scratch/FVP-corstone320 |
| 149 | +export ARMFVP_BIN_PATH=${FVP_ROOT}/models/Linux64_GCC-9.3 |
| 150 | +export LD_LIBRARY_PATH=${FVP_ROOT}/python/lib:${ARMFVP_BIN_PATH}:${LD_LIBRARY_PATH} |
| 151 | +export ARMFVP_EXTRA_FLAGS="-C mps4_board.uart0.shutdown_on_eot=1 -C mps4_board.subsystem.ethosu.num_macs=256" |
| 152 | + |
| 153 | +west build -t run |
| 154 | +``` |
| 155 | + |
| 156 | +MV2 inference is cycle-accurate on the FVP and takes 10-20 minutes of wall |
| 157 | +clock. You should see output like: |
| 158 | + |
| 159 | +``` |
| 160 | +======================================== |
| 161 | +ExecuTorch MobileNetV2 Classification Demo |
| 162 | +======================================== |
| 163 | +Ethos-U backend registered successfully |
| 164 | +Model loaded, has 1 methods |
| 165 | +Inference completed in <N> ms |
| 166 | +--- Classification Results --- |
| 167 | +Top-5 predictions: |
| 168 | + [1] class <id>: <score> |
| 169 | + ... |
| 170 | +MobileNetV2 Demo Complete |
| 171 | +======================================== |
| 172 | +``` |
| 173 | + |
| 174 | +## Step 6b: Flash and Run on Alif E8 |
| 175 | + |
| 176 | +### Flash with Alif SE Tools |
| 177 | + |
| 178 | +Use the Alif SE Tools to program the binary into the E8's MRAM. Create a |
| 179 | +`zephyr.json` in the build output directory: |
| 180 | + |
| 181 | +```bash |
| 182 | +cat > build/zephyr/zephyr.json << 'EOF' |
| 183 | +{ |
| 184 | + "HP_img_class": { |
| 185 | + "binary": "zephyr.bin", |
| 186 | + "version": "1.0.0", |
| 187 | + "mramAddress": "0x80008000", |
| 188 | + "cpu_id": "M55_HP", |
| 189 | + "flags": ["boot"], |
| 190 | + "signed": false |
| 191 | + }, |
| 192 | + "DEVICE": { |
| 193 | + "disabled": false, |
| 194 | + "binary": "app-device-config.json", |
| 195 | + "version": "0.5.00", |
| 196 | + "signed": true |
| 197 | + } |
| 198 | +} |
| 199 | +EOF |
| 200 | +``` |
| 201 | + |
| 202 | +> **Important:** Use `mramAddress: "0x80008000"` (FLASH_LOAD_OFFSET=0x8000), |
| 203 | +> **not** the default `0x80200000`. The default offset does not leave enough |
| 204 | +> MRAM for the ~3.5 MB MV2 model blob. |
| 205 | +
|
| 206 | +Generate the table of contents and flash using the SE Tools: |
| 207 | + |
| 208 | +```bash |
| 209 | +cd build/zephyr |
| 210 | +python <path-to-alif-se-tools>/app-gen-toc.py |
| 211 | +python <path-to-alif-se-tools>/app-write-mram.py |
| 212 | +cd ../.. |
| 213 | +``` |
| 214 | + |
| 215 | +Refer to the Alif SE Tools documentation for installation and detailed usage. |
| 216 | + |
| 217 | +### Connect Serial Console |
| 218 | + |
| 219 | +Connect to UART4 at 115200 baud. On Linux: |
| 220 | + |
| 221 | +```bash |
| 222 | +picocom -b 115200 /dev/ttyUSB0 |
| 223 | +``` |
| 224 | + |
| 225 | +Press the reset button on the E8 DevKit. You should see: |
| 226 | + |
| 227 | +``` |
| 228 | +Booting Zephyr OS build ff8b8697c0f5 *** |
| 229 | +
|
| 230 | +======================================== |
| 231 | +ExecuTorch MobileNetV2 Classification Demo |
| 232 | +======================================== |
| 233 | +
|
| 234 | +I [executorch:main.cpp] Ethos-U backend registered successfully |
| 235 | +I [executorch:main.cpp] Model PTE at 0x8004b290, Size: 3490912 bytes |
| 236 | +I [executorch:main.cpp] Model loaded, has 1 methods |
| 237 | +I [executorch:main.cpp] Running method: forward |
| 238 | +I [executorch:main.cpp] Method allocator pool size: 1572864 bytes. |
| 239 | +I [executorch:main.cpp] Setting up planned buffer 0, size 752640. |
| 240 | +I [executorch:main.cpp] Loading method... |
| 241 | +I [executorch:main.cpp] Method 'forward' loaded successfully |
| 242 | +I [executorch:main.cpp] Preparing input: static RGB image (150528 bytes) |
| 243 | +I [executorch:main.cpp] |
| 244 | +--- Starting inference --- |
| 245 | +I [executorch:main.cpp] Inference completed in 19 ms |
| 246 | +I [executorch:main.cpp] |
| 247 | +--- Classification Results --- |
| 248 | +I [executorch:main.cpp] Top-5 predictions: |
| 249 | +I [executorch:main.cpp] [1] class 0: 0.0000 |
| 250 | +I [executorch:main.cpp] [2] class 1: 0.0000 |
| 251 | +I [executorch:main.cpp] [3] class 2: 0.0000 |
| 252 | +I [executorch:main.cpp] [4] class 3: 0.0000 |
| 253 | +I [executorch:main.cpp] [5] class 4: 0.0000 |
| 254 | +I [executorch:main.cpp] |
| 255 | +======================================== |
| 256 | +I [executorch:main.cpp] MobileNetV2 Demo Complete |
| 257 | +I [executorch:main.cpp] Model size: 3490912 bytes |
| 258 | +I [executorch:main.cpp] Input: 224x224x3 RGB image (150528 bytes) |
| 259 | +I [executorch:main.cpp] Output: 1000 ImageNet classes (top-5 shown) |
| 260 | +I [executorch:main.cpp] Inference time: 19 ms |
| 261 | +I [executorch:main.cpp] ======================================== |
| 262 | +``` |
| 263 | + |
| 264 | +All predictions show `0.0000` because `mv2_untrained` has random weights. |
| 265 | +Use `mv2` (with torchvision pretrained weights) for meaningful class scores. |
| 266 | + |
| 267 | +## Troubleshooting |
| 268 | + |
| 269 | +| Symptom | Cause | Fix | |
| 270 | +|---------|-------|-----| |
| 271 | +| Linker: `region 'FLASH' overflowed` | Model PTE too large for ITCM | Use the DDR overlay (FVP) or verify mramAddress (Alif) | |
| 272 | +| Linker: `region 'RAM' overflowed` | Pools + model copy exceed SRAM | Set `CONFIG_ET_ARM_MODEL_PTE_DMA_ACCESSIBLE=y` to skip the SRAM copy | |
| 273 | +| FVP hangs after "Ethos-U backend registered" | Cycle-accurate MV2 simulation is slow | Wait 10-20 min, or use Corstone-320 (faster than 300) | |
| 274 | +| No serial output on Alif | Wrong UART or baud rate | Use UART4 at 115200 baud | |
| 275 | +| `app-write-mram.py` fails | Wrong mramAddress | Use `0x80008000`, not `0x80200000` | |
| 276 | +| Runtime: method allocator OOM | Pool size too small | Increase `CONFIG_EXECUTORCH_METHOD_ALLOCATOR_POOL_SIZE` in board conf | |
| 277 | + |
| 278 | +## Memory Layout |
| 279 | + |
| 280 | +| Region | Corstone-320 FVP | Alif E8 | |
| 281 | +|--------|-----------------|---------| |
| 282 | +| Code + .rodata | ITCM (512 KB) | MRAM | |
| 283 | +| .data + .bss + pools | ISRAM (4 MB) | HP SRAM (4.5 MB) | |
| 284 | +| Model PTE (~3.5 MB) | DDR (16 MB, via overlay) | MRAM (DMA-accessible) | |
| 285 | +| NPU delegation | Ethos-U85 (256 MACs) | Ethos-U55 (256 MACs) | |
| 286 | + |
| 287 | +## Using Claude Code with Zephyr |
| 288 | + |
| 289 | +If you use [Claude Code](https://docs.anthropic.com/en/docs/claude-code), the |
| 290 | +ExecuTorch repo ships a `/zephyr` skill that can help with: |
| 291 | + |
| 292 | +- **Workspace setup** — scaffolds the Zephyr workspace, west manifests, and SDK install |
| 293 | +- **Board bringup** — generates DTS overlays, board confs, and linker snippets for new boards |
| 294 | +- **Memory debugging** — diagnoses linker overflow errors and runtime allocation failures, |
| 295 | + with the exact pool sizes your model needs |
| 296 | + |
| 297 | +Type `/zephyr` in Claude Code while working in the ExecuTorch repo to activate |
| 298 | +it. Related skills: `/export` for model conversion, `/cortex-m` for baremetal |
| 299 | +Cortex-M builds, `/executorch-kb` for backend-specific debugging. |
| 300 | + |
| 301 | +## Next Steps |
| 302 | + |
| 303 | +- Swap `mv2_untrained` for `mv2` (with torchvision) to get real ImageNet predictions |
| 304 | +- Try other models: `resnet18`, or bring your own `.py` model file |
| 305 | +- Explore the [hello-executorch sample](https://github.com/pytorch/executorch/tree/main/zephyr/samples/hello-executorch) for a minimal starting point |
| 306 | +- See the [Ethos-U Getting Started tutorial](backends/arm-ethos-u/tutorials/ethos-u-getting-started.md) for the baremetal (non-Zephyr) flow |
0 commit comments