Add /zephyr skill for ExecuTorch on Zephyr embedded boards (#19208)

psiddh · claude · web-flow · commit d8da621c4761 · 2026-04-29T13:03:23.000-07:00
## Summary

Adds a `/zephyr` Claude Code skill for building and configuring
ExecuTorch as a Zephyr RTOS module on embedded boards.

Three files:
- **SKILL.md** — workspace setup, building with west, ET-specific Zephyr
concepts (model embedding, allocator pools, DMA accessibility, selective
ops), key files reference, supported boards table
- **board_bringup.md** — adding a new board: DTS overlays, Kconfig
confs, linker snippets for custom memory regions, pool sizing strategy,
memory budget formula
- **memory_debugging.md** — diagnosing link-time overflow (orphan
sections, region overflow) and runtime allocation failures, common
patterns and fixes, useful inspection commands

Complements existing skills: `/cortex-m` (bare-metal path), `/building`
(general C++ build), `/export` (model conversion).

## Test plan
- [x] Skill appears in Claude Code skill list after creation
- [x] Content derived from hands-on Zephyr workspace setup + MV2 build
debugging session
- [x] Follows existing skill structure (`/cortex-m`, `/qualcomm` as
templates)

Co-authored-by: Claude &lt;noreply@anthropic.com&gt;
diff --git a/.claude/skills/zephyr/SKILL.md b/.claude/skills/zephyr/SKILL.md
@@ -0,0 +1,201 @@
+---
+name: zephyr
+description: Build and configure ExecuTorch as a Zephyr RTOS module for embedded boards. Use when setting up a Zephyr workspace with ET, adding board support (overlays, confs, memory layout), building with west, or debugging linker memory overflow.
+---
+
+# ExecuTorch on Zephyr
+
+## When to use this skill
+
+- Setting up a Zephyr workspace with ExecuTorch as a module
+- Adding or configuring a board for an ET Zephyr sample
+- Debugging `west build` failures (linker overflow, section placement, CMake/Python issues)
+- Sizing allocator pools for a specific model + board combination
+
+## When to use a different skill
+
+| Need | Skill |
+|------|-------|
+| Export a model to .pte | `/export` |
+| Bare-metal Cortex-M (no RTOS) | `/cortex-m` |
+| General ET C++ build (not Zephyr) | `/building` |
+| Backend op support / known issues | `/executorch-kb` |
+
+## Advanced Topics
+
+| Topic | File | When to read |
+|-------|------|--------------|
+| Adding a new board | `board_bringup.md` | User wants to add overlay, conf, linker snippets for a new board |
+| Memory overflow debugging | `memory_debugging.md` | Build fails with region overflow, or runtime allocation failure |
+
+## Architecture
+
+ExecuTorch integrates as a **Zephyr external module** via `zephyr/module.yml`. The module exposes ET libraries (runtime, kernels, backends) as Zephyr CMake targets that applications link against.
+
+```
+zephyr_workspace/
+├── zephyr/                      # Zephyr kernel
+│   └── submanifests/
+│       └── executorch.yaml      # pulls ET as a west project
+├── modules/lib/executorch/      # ET source (or symlink for dev)
+│   └── zephyr/
+│       ├── module.yml           # declares ET as Zephyr module
+│       ├── CMakeLists.txt       # top-level Zephyr-aware build
+│       └── samples/
+│           ├── hello-executorch/
+│           └── mv2-ethosu/
+└── build/                       # west build output
+```
+
+## Setup
+
+### 1. Create Zephyr workspace
+
+```bash
+mkdir zephyr_workspace && cd zephyr_workspace
+python3 -m venv .venv && source .venv/bin/activate
+pip install west "cmake<4.0.0" pyelftools ninja jsonschema
+
+west init --manifest-rev v4.3.0
+```
+
+### 2. Add ExecuTorch as a module
+
+Create `zephyr/submanifests/executorch.yaml` with the manifest snippet (see
+`zephyr/README.md` in the ET repo for the canonical content), or copy it from
+an existing ET checkout:
+
+```bash
+# From an existing ET checkout (before west update):
+cp /path/to/your/executorch/zephyr/executorch.yaml zephyr/submanifests/
+```
+
+For local development, symlink your ET checkout after `west update`:
+
+```bash
+west config manifest.project-filter -- '-.*,+zephyr,+executorch,+cmsis,+cmsis_6,+cmsis-nn,+hal_ethos_u'
+west update
+rm -rf modules/lib/executorch
+ln -s /path/to/your/executorch modules/lib/executorch
+```
+
+### 3. Install ExecuTorch
+
+```bash
+cd modules/lib/executorch
+git submodule sync && git submodule update --init --recursive
+./install_executorch.sh
+cd ../../..
+```
+
+### 4. Install Zephyr SDK
+
+```bash
+wget https://github.com/zephyrproject-rtos/sdk-ng/releases/download/v0.17.4/zephyr-sdk-0.17.4_linux-x86_64_minimal.tar.xz
+tar xf zephyr-sdk-0.17.4_linux-x86_64_minimal.tar.xz
+wget https://github.com/zephyrproject-rtos/sdk-ng/releases/download/v0.17.4/toolchain_linux-x86_64_arm-zephyr-eabi.tar.xz
+tar xf toolchain_linux-x86_64_arm-zephyr-eabi.tar.xz -C zephyr-sdk-0.17.4/
+./zephyr-sdk-0.17.4/setup.sh -c -t arm-zephyr-eabi
+export ZEPHYR_SDK_INSTALL_DIR=$(realpath ./zephyr-sdk-0.17.4)
+```
+
+### 5. Install Ethos-U tools (if targeting NPU boards)
+
+```bash
+modules/lib/executorch/examples/arm/setup.sh --i-agree-to-the-contained-eula
+source modules/lib/executorch/examples/arm/arm-scratch/setup_path.sh
+```
+
+This installs Vela compiler and Corstone FVP binaries.
+
+## Building
+
+### Basic build
+
+```bash
+west build -b <board> modules/lib/executorch/zephyr/samples/<sample> -- \
+    -DET_PTE_FILE_PATH=<path/to/model.pte>
+```
+
+### Build and run on FVP
+
+```bash
+west build -b mps3/corstone300/fvp modules/lib/executorch/zephyr/samples/mv2-ethosu -t run -- \
+    -DET_PTE_FILE_PATH=mv2_u55_128.pte
+```
+
+### Force correct Python
+
+If CMake picks up the wrong Python (common on systems with multiple interpreters):
+
+```bash
+west build ... -- -DPython3_EXECUTABLE=$(which python3)
+```
+
+### Clean rebuild
+
+```bash
+rm -rf build && west build ...
+```
+
+## ET-Specific Zephyr Concepts
+
+### Model embedding
+
+`pte_to_header.py` converts a `.pte` file into a C header with the model bytes placed in a named section (default: `network_model_sec`). The section name is controlled by `ET_PTE_SECTION` in CMakeLists.txt.
+
+### Allocator pools
+
+ET requires three memory pools at runtime. The method and temp pools are sized via Zephyr Kconfig; fast scratch is a compile-time macro in some samples.
+
+| Pool | Setting | Purpose |
+|------|---------|---------|
+| Method allocator | `CONFIG_EXECUTORCH_METHOD_ALLOCATOR_POOL_SIZE` (Kconfig) | Planned buffers + input/output tensors |
+| Temp allocator | `CONFIG_EXECUTORCH_TEMP_ALLOCATOR_POOL_SIZE` (Kconfig) | Delegate scratch (e.g., Ethos-U scratch buffer) |
+| Fast scratch | `ET_ARM_BAREMETAL_FAST_SCRATCH_TEMP_ALLOCATOR_POOL_SIZE` (compile macro) | Small Ethos-U fast memory |
+
+Defaults are sample- and model-dependent — check the specific sample's `prj.conf`, board `.conf`, and Kconfig definitions rather than assuming a single repo-wide default.
+
+Pool sizing depends on the model. To find required sizes:
+1. Build and run — runtime errors report exactly how many bytes were requested vs available
+2. The method pool must hold the largest planned buffer + all input tensors
+3. The temp pool must hold the delegate's scratch buffer (model-dependent)
+4. If used, fast scratch must cover the backend's small fast-memory requirement
+
+### DMA accessibility
+
+NPU backends (Ethos-U) require model data and scratch buffers in DMA-accessible memory. Which regions are DMA-accessible depends on the board:
+
+| Board | DMA-accessible regions |
+|-------|----------------------|
+| Corstone-300 FVP | ISRAM (0x31000000), DDR (0x60000000+) |
+| Corstone-320 FVP | ISRAM (0x31000000), DDR (0x70000000+) |
+| Alif Ensemble | MRAM (model in-place), SRAM |
+
+When `CONFIG_ET_ARM_MODEL_PTE_DMA_ACCESSIBLE=y` is set in the board conf (which defines `ET_ARM_MODEL_PTE_DMA_ACCESSIBLE` for the C++ sources), the runtime skips copying the model blob to a writable SRAM buffer. Use this when the model already resides in DMA-accessible memory (DDR, MRAM). Note: this Kconfig symbol is defined per-sample (e.g., in `mv2-ethosu/Kconfig`), not globally.
+
+### Selective ops build
+
+`gen_oplist.py` reads the .pte file to determine which ops are needed and generates a selective kernel build. If the model is fully NPU-delegated, no portable ops are built. If fallback ops exist (e.g., `aten::` ops not handled by the delegate), only those specific kernels are compiled.
+
+## Key Files
+
+| File | Purpose |
+|------|---------|
+| `zephyr/module.yml` | Declares ET as a Zephyr module |
+| `zephyr/CMakeLists.txt` | Top-level Zephyr-aware CMake (builds ET libs as Zephyr targets) |
+| `zephyr/Kconfig` | Root Kconfig for ET module (build options, portable ops toggle) |
+| `zephyr/executorch.yaml` | West submanifest — pulls ET + dependencies |
+| `zephyr/samples/*/CMakeLists.txt` | Per-sample build (model embedding, op selection, pool sizing) |
+| `zephyr/samples/*/boards/*.overlay` | Board-specific DTS overlays (memory regions, chosen nodes) |
+| `zephyr/samples/*/boards/*.conf` | Board-specific Kconfig (drivers, pool sizes, DMA flags) |
+| `codegen/tools/gen_oplist.py` | Reads .pte to generate selective op list |
+| `examples/arm/executor_runner/pte_to_header.py` | Converts .pte to C header with section attribute |
+
+## Supported Boards
+
+| Board | NPU | Memory | Status |
+|-------|-----|--------|--------|
+| mps3/corstone300/fvp | Ethos-U55 | 512K ITCM + 2M ISRAM | hello-executorch (full), MV2 (build-only) |
+| mps4/corstone320/fvp | Ethos-U85 | 4M ISRAM + 2M SRAM | hello-executorch (full), MV2 (full) |
+| alif_e8_dk | Ethos-U55-256 | 4M MRAM + 2M SRAM | MV2 (model runs from MRAM in-place) |
diff --git a/.claude/skills/zephyr/board_bringup.md b/.claude/skills/zephyr/board_bringup.md
@@ -0,0 +1,151 @@
+# Adding a New Board
+
+## What you need
+
+1. A Zephyr BSP for the board (upstream or custom)
+2. The board's memory map (from datasheet or DTS)
+3. A .pte model file to embed
+
+## Files to create
+
+For a sample called `my-sample` targeting board `my_board`:
+
+```
+zephyr/samples/my-sample/
+├── boards/
+│   ├── my_board.overlay    # DTS: memory regions, chosen nodes
+│   └── my_board.conf       # Kconfig: drivers, pool sizes
+├── CMakeLists.txt           # Build: model embedding, ops, linking
+├── Kconfig                  # Sample-level config options
+├── prj.conf                 # Default config (all boards)
+└── src/main.cpp             # Application code
+```
+
+If adding a board to an **existing** sample (e.g., `mv2-ethosu`), you only need the `boards/` files.
+
+## Step 1: DTS overlay
+
+The overlay configures memory regions for your board. Key decisions:
+
+### Which `chosen` nodes to set
+
+```dts
+/ {
+    chosen {
+        zephyr,sram = &my_sram;   /* where .data/.bss land (allocator pools) */
+        /* zephyr,flash is usually set by the board DTS already */
+    };
+};
+```
+
+**Rule:** `zephyr,sram` must point to the largest contiguous RAM region that fits
+the allocator pools. If the default is too small, override it.
+
+**Warning:** If `zephyr,flash` and `zephyr,sram` point to the same physical memory,
+Zephyr's non-XIP linker places .text and .data sequentially. This works but means
+code and data share the region's capacity.
+
+### Adding a DDR / external memory region
+
+If the model blob is too large for on-chip memory, place it in external memory:
+
+```dts
+/ {
+    model_ddr: memory@70000000 {
+        compatible = "zephyr,memory-region", "mmio-sram";
+        reg = <0x70000000 DT_SIZE_M(16)>;
+        zephyr,memory-region = "MODEL_DDR";
+    };
+};
+```
+
+Then create a linker snippet (`model_section.ld.in`) to route the model section there.
+See `zephyr/samples/mv2-ethosu/model_section.ld.in` for the template.
+
+The CMakeLists.txt detects the DTS node and generates the linker snippet:
+
+```cmake
+dt_nodelabel(model_ddr_path NODELABEL "model_ddr")
+if(model_ddr_path)
+  configure_file(model_section.ld.in ${CMAKE_CURRENT_BINARY_DIR}/model_section.ld @ONLY)
+  zephyr_linker_sources(SECTIONS ${CMAKE_CURRENT_BINARY_DIR}/model_section.ld)
+endif()
+```
+
+### DMA accessibility check
+
+If using an NPU (Ethos-U), the model data and scratch buffers must be in
+DMA-accessible memory. Check the board's TRM (Technical Reference Manual) for
+which memory regions the NPU's DMA engine can reach.
+
+Common patterns:
+- **FVP boards**: ISRAM and DDR are DMA-accessible; DTCM is not
+- **Alif Ensemble**: MRAM and SRAM are DMA-accessible
+- **General rule**: tightly coupled memories (TCM) are usually NOT DMA-accessible
+
+## Step 2: Kconfig conf
+
+```kconfig
+# Enable NPU driver (if applicable)
+CONFIG_ETHOS_U=y
+
+# Skip SRAM model copy if model is in DMA-accessible memory.
+# This Kconfig symbol is defined per-sample (e.g., mv2-ethosu/Kconfig),
+# not globally. If your sample doesn't define it, add the Kconfig entry
+# or pass -DET_ARM_MODEL_PTE_DMA_ACCESSIBLE via CMake directly.
+CONFIG_ET_ARM_MODEL_PTE_DMA_ACCESSIBLE=y
+
+# Pool sizes — adjust based on model requirements
+# Run a build first, then tune from runtime error messages
+CONFIG_EXECUTORCH_METHOD_ALLOCATOR_POOL_SIZE=1572864
+CONFIG_EXECUTORCH_TEMP_ALLOCATOR_POOL_SIZE=1572864
+```
+
+### Pool sizing strategy
+
+1. Start with the sample's defaults (check `prj.conf` and board `.conf` — varies by sample)
+2. Build and run — if allocation fails, the error tells you exactly what was requested
+3. **Method pool**: must hold largest planned buffer + all input tensors
+4. **Temp pool**: must hold delegate scratch buffer (varies by model and backend)
+5. Total pools must fit in the `zephyr,sram` region minus ~112 KiB overhead (stack, heap, .data/.bss)
+
+### Memory budget formula
+
+```
+available_for_pools = zephyr_sram_size - code_if_shared - stack - heap - bss_overhead
+```
+
+Where:
+- `code_if_shared`: 0 if flash and sram are separate regions; .text size if they share a region
+- `stack`: `CONFIG_MAIN_STACK_SIZE` (default 16 KiB)
+- `heap`: `CONFIG_HEAP_MEM_POOL_SIZE` (default 64 KiB)
+- `bss_overhead`: ~30 KiB for ET runtime + Zephyr kernel
+
+## Step 3: Build and verify
+
+```bash
+west build -b my_board modules/lib/executorch/zephyr/samples/my-sample -- \
+    -DET_PTE_FILE_PATH=model.pte
+```
+
+Check the memory map in the build output:
+
+```
+Memory region         Used Size  Region Size  %age Used
+           FLASH:      459668 B       512 KB     87.67%
+             RAM:     1963160 B         2 MB     93.61%
+       MODEL_DDR:     3541440 B        16 MB     21.11%
+```
+
+**Green flags:** all regions under 95%, model in the expected region.
+**Red flags:** any region near 100%, orphan section warnings, overflow errors.
+
+## Reference: existing board configs
+
+| Board | Overlay | Conf | Notes |
+|-------|---------|------|-------|
+| Corstone-300 | `mps3_corstone300_fvp.overlay` | `mps3_corstone300_fvp.conf` | ISRAM for sram, DDR for model, reduced pools |
+| Corstone-320 | `mps4_corstone320_fvp.overlay` | `mps4_corstone320_fvp.conf` | Shared ISRAM for flash+sram, DDR for model |
+| Alif E8 | (no overlay needed) | (no conf needed) | MRAM holds model in-place, defaults work |
+
+Use these as templates — copy the closest match and adapt the memory addresses and sizes.
diff --git a/.claude/skills/zephyr/memory_debugging.md b/.claude/skills/zephyr/memory_debugging.md