|
| 1 | +# Zephyr: MobileNetV2 on Alif Ensemble E8 with Ethos-U NPU |
| 2 | + |
| 3 | +Run a quantized MobileNetV2 image classifier on the |
| 4 | +[Alif Ensemble E8 DevKit](https://alifsemiconductor.com/ensemble/) using |
| 5 | +ExecuTorch, Zephyr RTOS, and the Arm Ethos-U55 NPU. The same build flow also |
| 6 | +works on the Arm Corstone-320 FVP for development without hardware. |
| 7 | + |
| 8 | +## What You'll Build |
| 9 | + |
| 10 | +- A quantized INT8 MobileNetV2 model fully delegated to the Ethos-U55 NPU |
| 11 | + (110 ops, ~19 ms inference on Alif E8) |
| 12 | +- A Zephyr RTOS application that loads the `.pte` model, runs inference on a |
| 13 | + static test image, and prints the top-5 ImageNet predictions over UART |
| 14 | + |
| 15 | +## Prerequisites |
| 16 | + |
| 17 | +### Hardware (choose one) |
| 18 | + |
| 19 | +| Target | Description | |
| 20 | +|--------|-------------| |
| 21 | +| **Alif Ensemble E8 DevKit** | Cortex-M55 HP core + Ethos-U55 (256 MACs), 4.5 MB HP SRAM, MRAM | |
| 22 | +| **Corstone-320 FVP** | Virtual platform simulating Cortex-M85 + Ethos-U85 (no hardware needed, Linux only) | |
| 23 | + |
| 24 | +### Software |
| 25 | + |
| 26 | +- Linux x86_64 (FVP and Arm toolchain are Linux-only; macOS can export models |
| 27 | + but cannot run the FVP or flash) |
| 28 | +- Python 3.10+ |
| 29 | +- [Alif SE Tools](https://alifsemiconductor.com/support/kits/) for flashing |
| 30 | + (Alif hardware only) |
| 31 | + |
| 32 | +## Step 1: Set Up the Zephyr Workspace |
| 33 | + |
| 34 | +Create a workspace, install `west`, and initialize the Zephyr tree: |
| 35 | + |
| 36 | +```bash |
| 37 | +mkdir ~/zephyr_workspace && cd ~/zephyr_workspace |
| 38 | +python3 -m venv .venv && source .venv/bin/activate |
| 39 | +pip install west "cmake<4.0.0" pyelftools ninja jsonschema |
| 40 | +west init --manifest-rev v4.3.0 |
| 41 | +``` |
| 42 | + |
| 43 | +Install the Zephyr SDK (compiler toolchain): |
| 44 | + |
| 45 | +```bash |
| 46 | +wget https://github.com/zephyrproject-rtos/sdk-ng/releases/download/v0.17.4/zephyr-sdk-0.17.4_linux-x86_64.tar.xz |
| 47 | +tar -xf zephyr-sdk-0.17.4_linux-x86_64.tar.xz && rm -f zephyr-sdk-0.17.4_linux-x86_64.tar.xz |
| 48 | +./zephyr-sdk-0.17.4/setup.sh -c -t arm-zephyr-eabi |
| 49 | +export ZEPHYR_SDK_INSTALL_DIR=$(realpath ./zephyr-sdk-0.17.4) |
| 50 | +``` |
| 51 | + |
| 52 | +## Step 2: Add ExecuTorch as a Zephyr Module |
| 53 | + |
| 54 | +Copy the submanifest, configure `west` to pull only the modules we need, and |
| 55 | +update: |
| 56 | + |
| 57 | +```bash |
| 58 | +mkdir -p zephyr/submanifests |
| 59 | +cat > zephyr/submanifests/executorch.yaml << 'EOF' |
| 60 | +manifest: |
| 61 | + projects: |
| 62 | + - name: executorch |
| 63 | + url: https://github.com/pytorch/executorch |
| 64 | + revision: main |
| 65 | + path: modules/lib/executorch |
| 66 | +EOF |
| 67 | + |
| 68 | +west config manifest.project-filter -- -.*,+zephyr,+executorch,+cmsis,+cmsis_6,+cmsis-nn,+hal_ethos_u |
| 69 | +west update |
| 70 | +``` |
| 71 | + |
| 72 | +For Alif boards, also add the Alif HAL: |
| 73 | + |
| 74 | +```bash |
| 75 | +west config manifest.project-filter -- -.*,+zephyr,+executorch,+cmsis,+cmsis_6,+cmsis-nn,+hal_ethos_u,+hal_alif |
| 76 | +west update |
| 77 | +``` |
| 78 | + |
| 79 | +## Step 3: Install ExecuTorch and Arm Tools |
| 80 | + |
| 81 | +```bash |
| 82 | +cd modules/lib/executorch |
| 83 | +git submodule sync && git submodule update --init --recursive |
| 84 | +./install_executorch.sh |
| 85 | +cd ../../.. |
| 86 | +``` |
| 87 | + |
| 88 | +Install the Arm toolchain, Vela compiler, and Corstone FVPs: |
| 89 | + |
| 90 | +```bash |
| 91 | +modules/lib/executorch/examples/arm/setup.sh --i-agree-to-the-contained-eula |
| 92 | +source modules/lib/executorch/examples/arm/arm-scratch/setup_path.sh |
| 93 | +``` |
| 94 | + |
| 95 | +## Step 4: Export the MobileNetV2 Model |
| 96 | + |
| 97 | +Export a quantized INT8 MobileNetV2 with Ethos-U delegation. Choose the target |
| 98 | +that matches your hardware: |
| 99 | + |
| 100 | +**For Alif E8 (Ethos-U55 with 256 MACs):** |
| 101 | + |
| 102 | +```bash |
| 103 | +python -m modules.lib.executorch.backends.arm.scripts.aot_arm_compiler \ |
| 104 | + --model_name=mv2_untrained \ |
| 105 | + --quantize --delegate \ |
| 106 | + --target=ethos-u55-256 \ |
| 107 | + --output=mv2_ethosu.pte |
| 108 | +``` |
| 109 | + |
| 110 | +**For Corstone-320 FVP (Ethos-U85 with 256 MACs):** |
| 111 | + |
| 112 | +```bash |
| 113 | +python -m modules.lib.executorch.backends.arm.scripts.aot_arm_compiler \ |
| 114 | + --model_name=mv2_untrained \ |
| 115 | + --quantize --delegate \ |
| 116 | + --target=ethos-u85-256 \ |
| 117 | + --output=mv2_u85_256.pte |
| 118 | +``` |
| 119 | + |
| 120 | +The `--delegate` flag routes all compatible ops through the Ethos-U backend. |
| 121 | +The Vela compiler converts the TOSA intermediate representation into an |
| 122 | +optimized command stream for the NPU. Use `mv2` instead of `mv2_untrained` for |
| 123 | +meaningful predictions (requires torchvision pretrained weights). |
| 124 | + |
| 125 | +## Step 5: Build the Zephyr Application |
| 126 | + |
| 127 | +**For Alif E8:** |
| 128 | + |
| 129 | +```bash |
| 130 | +west build -b alif_e8_dk/ae822fa0e5597xx0/rtss_hp \ |
| 131 | + -S ethos-u55-enable \ |
| 132 | + modules/lib/executorch/zephyr/samples/mv2-ethosu -- \ |
| 133 | + -DET_PTE_FILE_PATH=mv2_ethosu.pte |
| 134 | +``` |
| 135 | + |
| 136 | +**For Corstone-320 FVP:** |
| 137 | + |
| 138 | +```bash |
| 139 | +west build -b mps4/corstone320/fvp \ |
| 140 | + modules/lib/executorch/zephyr/samples/mv2-ethosu -- \ |
| 141 | + -DET_PTE_FILE_PATH=mv2_u85_256.pte |
| 142 | +``` |
| 143 | + |
| 144 | +## Step 6a: Run on Corstone-320 FVP |
| 145 | + |
| 146 | +Set up the FVP paths and run: |
| 147 | + |
| 148 | +```bash |
| 149 | +export FVP_ROOT=$PWD/modules/lib/executorch/examples/arm/arm-scratch/FVP-corstone320 |
| 150 | +export ARMFVP_BIN_PATH=${FVP_ROOT}/models/Linux64_GCC-9.3 |
| 151 | +export LD_LIBRARY_PATH=${FVP_ROOT}/python/lib:${ARMFVP_BIN_PATH}:${LD_LIBRARY_PATH} |
| 152 | +export ARMFVP_EXTRA_FLAGS="-C mps4_board.uart0.shutdown_on_eot=1 -C mps4_board.subsystem.ethosu.num_macs=256" |
| 153 | + |
| 154 | +west build -t run |
| 155 | +``` |
| 156 | + |
| 157 | +MV2 inference is cycle-accurate on the FVP and takes 10-20 minutes of wall |
| 158 | +clock. You should see output like: |
| 159 | + |
| 160 | +``` |
| 161 | +======================================== |
| 162 | +ExecuTorch MobileNetV2 Classification Demo |
| 163 | +======================================== |
| 164 | +Ethos-U backend registered successfully |
| 165 | +Model loaded, has 1 methods |
| 166 | +Inference completed in <N> ms |
| 167 | +--- Classification Results --- |
| 168 | +Top-5 predictions: |
| 169 | + [1] class <id>: <score> |
| 170 | + ... |
| 171 | +MobileNetV2 Demo Complete |
| 172 | +======================================== |
| 173 | +``` |
| 174 | + |
| 175 | +## Step 6b: Flash and Run on Alif E8 |
| 176 | + |
| 177 | +### Flash with Alif SE Tools |
| 178 | + |
| 179 | +The Alif SE Tools (`app-gen-toc.py` and `app-write-mram.py`) program the |
| 180 | +binary into the E8's MRAM. You need a `zephyr.json` configuration file. |
| 181 | + |
| 182 | +Create `zephyr.json` in the build output directory: |
| 183 | + |
| 184 | +```bash |
| 185 | +cat > build/zephyr.json << 'EOF' |
| 186 | +{ |
| 187 | + "binary": "zephyr.bin", |
| 188 | + "mramAddress": "0x80008000", |
| 189 | + "cpu": "M55_HP" |
| 190 | +} |
| 191 | +EOF |
| 192 | +``` |
| 193 | + |
| 194 | +> **Important:** Use `mramAddress: "0x80008000"` (FLASH_LOAD_OFFSET=0x8000), |
| 195 | +> **not** the default `0x80200000`. The default offset does not leave enough |
| 196 | +> MRAM for the ~3.5 MB MV2 model blob. |
| 197 | +
|
| 198 | +Generate the table of contents and flash: |
| 199 | + |
| 200 | +```bash |
| 201 | +cd build |
| 202 | +python /path/to/alif-se-tools/app-gen-toc.py |
| 203 | +python /path/to/alif-se-tools/app-write-mram.py |
| 204 | +cd .. |
| 205 | +``` |
| 206 | + |
| 207 | +### Connect Serial Console |
| 208 | + |
| 209 | +Connect to UART4 at 115200 baud. On Linux: |
| 210 | + |
| 211 | +```bash |
| 212 | +picocom -b 115200 /dev/ttyUSB0 |
| 213 | +``` |
| 214 | + |
| 215 | +Press the reset button on the E8 DevKit. You should see the classification |
| 216 | +output within a few seconds (~19 ms inference). |
| 217 | + |
| 218 | +## Troubleshooting |
| 219 | + |
| 220 | +| Symptom | Cause | Fix | |
| 221 | +|---------|-------|-----| |
| 222 | +| Linker: `region 'FLASH' overflowed` | Model PTE too large for ITCM | Use the DDR overlay (FVP) or verify mramAddress (Alif) | |
| 223 | +| Linker: `region 'RAM' overflowed` | Pools + model copy exceed SRAM | Set `CONFIG_ET_ARM_MODEL_PTE_DMA_ACCESSIBLE=y` to skip the SRAM copy | |
| 224 | +| FVP hangs after "Ethos-U backend registered" | Cycle-accurate MV2 simulation is slow | Wait 10-20 min, or use Corstone-320 (faster than 300) | |
| 225 | +| No serial output on Alif | Wrong UART or baud rate | Use UART4 at 115200 baud | |
| 226 | +| `app-write-mram.py` fails | Wrong mramAddress | Use `0x80008000`, not `0x80200000` | |
| 227 | +| Runtime: method allocator OOM | Pool size too small | Increase `CONFIG_EXECUTORCH_METHOD_ALLOCATOR_POOL_SIZE` in board conf | |
| 228 | + |
| 229 | +## Memory Layout |
| 230 | + |
| 231 | +| Region | Corstone-320 FVP | Alif E8 | |
| 232 | +|--------|-----------------|---------| |
| 233 | +| Code + .rodata | ITCM (512 KB) | MRAM | |
| 234 | +| .data + .bss + pools | ISRAM (4 MB) | HP SRAM (4.5 MB) | |
| 235 | +| Model PTE (~3.5 MB) | DDR (16 MB, via overlay) | MRAM (DMA-accessible) | |
| 236 | +| NPU delegation | Ethos-U85 (256 MACs) | Ethos-U55 (256 MACs) | |
| 237 | + |
| 238 | +## Next Steps |
| 239 | + |
| 240 | +- Swap `mv2_untrained` for `mv2` (with torchvision) to get real ImageNet predictions |
| 241 | +- Try other models: `resnet18`, or bring your own `.py` model file |
| 242 | +- Explore the [hello-executorch sample](https://github.com/pytorch/executorch/tree/main/zephyr/samples/hello-executorch) for a minimal starting point |
| 243 | +- See the [Ethos-U Getting Started tutorial](backends/arm-ethos-u/tutorials/ethos-u-getting-started.md) for the baremetal (non-Zephyr) flow |
0 commit comments