Skip to content

Commit d8da621

Browse files
psiddhclaude
andauthored
Add /zephyr skill for ExecuTorch on Zephyr embedded boards (#19208)
## Summary Adds a `/zephyr` Claude Code skill for building and configuring ExecuTorch as a Zephyr RTOS module on embedded boards. Three files: - **SKILL.md** — workspace setup, building with west, ET-specific Zephyr concepts (model embedding, allocator pools, DMA accessibility, selective ops), key files reference, supported boards table - **board_bringup.md** — adding a new board: DTS overlays, Kconfig confs, linker snippets for custom memory regions, pool sizing strategy, memory budget formula - **memory_debugging.md** — diagnosing link-time overflow (orphan sections, region overflow) and runtime allocation failures, common patterns and fixes, useful inspection commands Complements existing skills: `/cortex-m` (bare-metal path), `/building` (general C++ build), `/export` (model conversion). ## Test plan - [x] Skill appears in Claude Code skill list after creation - [x] Content derived from hands-on Zephyr workspace setup + MV2 build debugging session - [x] Follows existing skill structure (`/cortex-m`, `/qualcomm` as templates) Co-authored-by: Claude <noreply@anthropic.com>
1 parent 38fa0ff commit d8da621

3 files changed

Lines changed: 521 additions & 0 deletions

File tree

.claude/skills/zephyr/SKILL.md

Lines changed: 201 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,201 @@
1+
---
2+
name: zephyr
3+
description: Build and configure ExecuTorch as a Zephyr RTOS module for embedded boards. Use when setting up a Zephyr workspace with ET, adding board support (overlays, confs, memory layout), building with west, or debugging linker memory overflow.
4+
---
5+
6+
# ExecuTorch on Zephyr
7+
8+
## When to use this skill
9+
10+
- Setting up a Zephyr workspace with ExecuTorch as a module
11+
- Adding or configuring a board for an ET Zephyr sample
12+
- Debugging `west build` failures (linker overflow, section placement, CMake/Python issues)
13+
- Sizing allocator pools for a specific model + board combination
14+
15+
## When to use a different skill
16+
17+
| Need | Skill |
18+
|------|-------|
19+
| Export a model to .pte | `/export` |
20+
| Bare-metal Cortex-M (no RTOS) | `/cortex-m` |
21+
| General ET C++ build (not Zephyr) | `/building` |
22+
| Backend op support / known issues | `/executorch-kb` |
23+
24+
## Advanced Topics
25+
26+
| Topic | File | When to read |
27+
|-------|------|--------------|
28+
| Adding a new board | `board_bringup.md` | User wants to add overlay, conf, linker snippets for a new board |
29+
| Memory overflow debugging | `memory_debugging.md` | Build fails with region overflow, or runtime allocation failure |
30+
31+
## Architecture
32+
33+
ExecuTorch integrates as a **Zephyr external module** via `zephyr/module.yml`. The module exposes ET libraries (runtime, kernels, backends) as Zephyr CMake targets that applications link against.
34+
35+
```
36+
zephyr_workspace/
37+
├── zephyr/ # Zephyr kernel
38+
│ └── submanifests/
39+
│ └── executorch.yaml # pulls ET as a west project
40+
├── modules/lib/executorch/ # ET source (or symlink for dev)
41+
│ └── zephyr/
42+
│ ├── module.yml # declares ET as Zephyr module
43+
│ ├── CMakeLists.txt # top-level Zephyr-aware build
44+
│ └── samples/
45+
│ ├── hello-executorch/
46+
│ └── mv2-ethosu/
47+
└── build/ # west build output
48+
```
49+
50+
## Setup
51+
52+
### 1. Create Zephyr workspace
53+
54+
```bash
55+
mkdir zephyr_workspace && cd zephyr_workspace
56+
python3 -m venv .venv && source .venv/bin/activate
57+
pip install west "cmake<4.0.0" pyelftools ninja jsonschema
58+
59+
west init --manifest-rev v4.3.0
60+
```
61+
62+
### 2. Add ExecuTorch as a module
63+
64+
Create `zephyr/submanifests/executorch.yaml` with the manifest snippet (see
65+
`zephyr/README.md` in the ET repo for the canonical content), or copy it from
66+
an existing ET checkout:
67+
68+
```bash
69+
# From an existing ET checkout (before west update):
70+
cp /path/to/your/executorch/zephyr/executorch.yaml zephyr/submanifests/
71+
```
72+
73+
For local development, symlink your ET checkout after `west update`:
74+
75+
```bash
76+
west config manifest.project-filter -- '-.*,+zephyr,+executorch,+cmsis,+cmsis_6,+cmsis-nn,+hal_ethos_u'
77+
west update
78+
rm -rf modules/lib/executorch
79+
ln -s /path/to/your/executorch modules/lib/executorch
80+
```
81+
82+
### 3. Install ExecuTorch
83+
84+
```bash
85+
cd modules/lib/executorch
86+
git submodule sync && git submodule update --init --recursive
87+
./install_executorch.sh
88+
cd ../../..
89+
```
90+
91+
### 4. Install Zephyr SDK
92+
93+
```bash
94+
wget https://github.com/zephyrproject-rtos/sdk-ng/releases/download/v0.17.4/zephyr-sdk-0.17.4_linux-x86_64_minimal.tar.xz
95+
tar xf zephyr-sdk-0.17.4_linux-x86_64_minimal.tar.xz
96+
wget https://github.com/zephyrproject-rtos/sdk-ng/releases/download/v0.17.4/toolchain_linux-x86_64_arm-zephyr-eabi.tar.xz
97+
tar xf toolchain_linux-x86_64_arm-zephyr-eabi.tar.xz -C zephyr-sdk-0.17.4/
98+
./zephyr-sdk-0.17.4/setup.sh -c -t arm-zephyr-eabi
99+
export ZEPHYR_SDK_INSTALL_DIR=$(realpath ./zephyr-sdk-0.17.4)
100+
```
101+
102+
### 5. Install Ethos-U tools (if targeting NPU boards)
103+
104+
```bash
105+
modules/lib/executorch/examples/arm/setup.sh --i-agree-to-the-contained-eula
106+
source modules/lib/executorch/examples/arm/arm-scratch/setup_path.sh
107+
```
108+
109+
This installs Vela compiler and Corstone FVP binaries.
110+
111+
## Building
112+
113+
### Basic build
114+
115+
```bash
116+
west build -b <board> modules/lib/executorch/zephyr/samples/<sample> -- \
117+
-DET_PTE_FILE_PATH=<path/to/model.pte>
118+
```
119+
120+
### Build and run on FVP
121+
122+
```bash
123+
west build -b mps3/corstone300/fvp modules/lib/executorch/zephyr/samples/mv2-ethosu -t run -- \
124+
-DET_PTE_FILE_PATH=mv2_u55_128.pte
125+
```
126+
127+
### Force correct Python
128+
129+
If CMake picks up the wrong Python (common on systems with multiple interpreters):
130+
131+
```bash
132+
west build ... -- -DPython3_EXECUTABLE=$(which python3)
133+
```
134+
135+
### Clean rebuild
136+
137+
```bash
138+
rm -rf build && west build ...
139+
```
140+
141+
## ET-Specific Zephyr Concepts
142+
143+
### Model embedding
144+
145+
`pte_to_header.py` converts a `.pte` file into a C header with the model bytes placed in a named section (default: `network_model_sec`). The section name is controlled by `ET_PTE_SECTION` in CMakeLists.txt.
146+
147+
### Allocator pools
148+
149+
ET requires three memory pools at runtime. The method and temp pools are sized via Zephyr Kconfig; fast scratch is a compile-time macro in some samples.
150+
151+
| Pool | Setting | Purpose |
152+
|------|---------|---------|
153+
| Method allocator | `CONFIG_EXECUTORCH_METHOD_ALLOCATOR_POOL_SIZE` (Kconfig) | Planned buffers + input/output tensors |
154+
| Temp allocator | `CONFIG_EXECUTORCH_TEMP_ALLOCATOR_POOL_SIZE` (Kconfig) | Delegate scratch (e.g., Ethos-U scratch buffer) |
155+
| Fast scratch | `ET_ARM_BAREMETAL_FAST_SCRATCH_TEMP_ALLOCATOR_POOL_SIZE` (compile macro) | Small Ethos-U fast memory |
156+
157+
Defaults are sample- and model-dependent — check the specific sample's `prj.conf`, board `.conf`, and Kconfig definitions rather than assuming a single repo-wide default.
158+
159+
Pool sizing depends on the model. To find required sizes:
160+
1. Build and run — runtime errors report exactly how many bytes were requested vs available
161+
2. The method pool must hold the largest planned buffer + all input tensors
162+
3. The temp pool must hold the delegate's scratch buffer (model-dependent)
163+
4. If used, fast scratch must cover the backend's small fast-memory requirement
164+
165+
### DMA accessibility
166+
167+
NPU backends (Ethos-U) require model data and scratch buffers in DMA-accessible memory. Which regions are DMA-accessible depends on the board:
168+
169+
| Board | DMA-accessible regions |
170+
|-------|----------------------|
171+
| Corstone-300 FVP | ISRAM (0x31000000), DDR (0x60000000+) |
172+
| Corstone-320 FVP | ISRAM (0x31000000), DDR (0x70000000+) |
173+
| Alif Ensemble | MRAM (model in-place), SRAM |
174+
175+
When `CONFIG_ET_ARM_MODEL_PTE_DMA_ACCESSIBLE=y` is set in the board conf (which defines `ET_ARM_MODEL_PTE_DMA_ACCESSIBLE` for the C++ sources), the runtime skips copying the model blob to a writable SRAM buffer. Use this when the model already resides in DMA-accessible memory (DDR, MRAM). Note: this Kconfig symbol is defined per-sample (e.g., in `mv2-ethosu/Kconfig`), not globally.
176+
177+
### Selective ops build
178+
179+
`gen_oplist.py` reads the .pte file to determine which ops are needed and generates a selective kernel build. If the model is fully NPU-delegated, no portable ops are built. If fallback ops exist (e.g., `aten::` ops not handled by the delegate), only those specific kernels are compiled.
180+
181+
## Key Files
182+
183+
| File | Purpose |
184+
|------|---------|
185+
| `zephyr/module.yml` | Declares ET as a Zephyr module |
186+
| `zephyr/CMakeLists.txt` | Top-level Zephyr-aware CMake (builds ET libs as Zephyr targets) |
187+
| `zephyr/Kconfig` | Root Kconfig for ET module (build options, portable ops toggle) |
188+
| `zephyr/executorch.yaml` | West submanifest — pulls ET + dependencies |
189+
| `zephyr/samples/*/CMakeLists.txt` | Per-sample build (model embedding, op selection, pool sizing) |
190+
| `zephyr/samples/*/boards/*.overlay` | Board-specific DTS overlays (memory regions, chosen nodes) |
191+
| `zephyr/samples/*/boards/*.conf` | Board-specific Kconfig (drivers, pool sizes, DMA flags) |
192+
| `codegen/tools/gen_oplist.py` | Reads .pte to generate selective op list |
193+
| `examples/arm/executor_runner/pte_to_header.py` | Converts .pte to C header with section attribute |
194+
195+
## Supported Boards
196+
197+
| Board | NPU | Memory | Status |
198+
|-------|-----|--------|--------|
199+
| mps3/corstone300/fvp | Ethos-U55 | 512K ITCM + 2M ISRAM | hello-executorch (full), MV2 (build-only) |
200+
| mps4/corstone320/fvp | Ethos-U85 | 4M ISRAM + 2M SRAM | hello-executorch (full), MV2 (full) |
201+
| alif_e8_dk | Ethos-U55-256 | 4M MRAM + 2M SRAM | MV2 (model runs from MRAM in-place) |
Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
# Adding a New Board
2+
3+
## What you need
4+
5+
1. A Zephyr BSP for the board (upstream or custom)
6+
2. The board's memory map (from datasheet or DTS)
7+
3. A .pte model file to embed
8+
9+
## Files to create
10+
11+
For a sample called `my-sample` targeting board `my_board`:
12+
13+
```
14+
zephyr/samples/my-sample/
15+
├── boards/
16+
│ ├── my_board.overlay # DTS: memory regions, chosen nodes
17+
│ └── my_board.conf # Kconfig: drivers, pool sizes
18+
├── CMakeLists.txt # Build: model embedding, ops, linking
19+
├── Kconfig # Sample-level config options
20+
├── prj.conf # Default config (all boards)
21+
└── src/main.cpp # Application code
22+
```
23+
24+
If adding a board to an **existing** sample (e.g., `mv2-ethosu`), you only need the `boards/` files.
25+
26+
## Step 1: DTS overlay
27+
28+
The overlay configures memory regions for your board. Key decisions:
29+
30+
### Which `chosen` nodes to set
31+
32+
```dts
33+
/ {
34+
chosen {
35+
zephyr,sram = &my_sram; /* where .data/.bss land (allocator pools) */
36+
/* zephyr,flash is usually set by the board DTS already */
37+
};
38+
};
39+
```
40+
41+
**Rule:** `zephyr,sram` must point to the largest contiguous RAM region that fits
42+
the allocator pools. If the default is too small, override it.
43+
44+
**Warning:** If `zephyr,flash` and `zephyr,sram` point to the same physical memory,
45+
Zephyr's non-XIP linker places .text and .data sequentially. This works but means
46+
code and data share the region's capacity.
47+
48+
### Adding a DDR / external memory region
49+
50+
If the model blob is too large for on-chip memory, place it in external memory:
51+
52+
```dts
53+
/ {
54+
model_ddr: memory@70000000 {
55+
compatible = "zephyr,memory-region", "mmio-sram";
56+
reg = <0x70000000 DT_SIZE_M(16)>;
57+
zephyr,memory-region = "MODEL_DDR";
58+
};
59+
};
60+
```
61+
62+
Then create a linker snippet (`model_section.ld.in`) to route the model section there.
63+
See `zephyr/samples/mv2-ethosu/model_section.ld.in` for the template.
64+
65+
The CMakeLists.txt detects the DTS node and generates the linker snippet:
66+
67+
```cmake
68+
dt_nodelabel(model_ddr_path NODELABEL "model_ddr")
69+
if(model_ddr_path)
70+
configure_file(model_section.ld.in ${CMAKE_CURRENT_BINARY_DIR}/model_section.ld @ONLY)
71+
zephyr_linker_sources(SECTIONS ${CMAKE_CURRENT_BINARY_DIR}/model_section.ld)
72+
endif()
73+
```
74+
75+
### DMA accessibility check
76+
77+
If using an NPU (Ethos-U), the model data and scratch buffers must be in
78+
DMA-accessible memory. Check the board's TRM (Technical Reference Manual) for
79+
which memory regions the NPU's DMA engine can reach.
80+
81+
Common patterns:
82+
- **FVP boards**: ISRAM and DDR are DMA-accessible; DTCM is not
83+
- **Alif Ensemble**: MRAM and SRAM are DMA-accessible
84+
- **General rule**: tightly coupled memories (TCM) are usually NOT DMA-accessible
85+
86+
## Step 2: Kconfig conf
87+
88+
```kconfig
89+
# Enable NPU driver (if applicable)
90+
CONFIG_ETHOS_U=y
91+
92+
# Skip SRAM model copy if model is in DMA-accessible memory.
93+
# This Kconfig symbol is defined per-sample (e.g., mv2-ethosu/Kconfig),
94+
# not globally. If your sample doesn't define it, add the Kconfig entry
95+
# or pass -DET_ARM_MODEL_PTE_DMA_ACCESSIBLE via CMake directly.
96+
CONFIG_ET_ARM_MODEL_PTE_DMA_ACCESSIBLE=y
97+
98+
# Pool sizes — adjust based on model requirements
99+
# Run a build first, then tune from runtime error messages
100+
CONFIG_EXECUTORCH_METHOD_ALLOCATOR_POOL_SIZE=1572864
101+
CONFIG_EXECUTORCH_TEMP_ALLOCATOR_POOL_SIZE=1572864
102+
```
103+
104+
### Pool sizing strategy
105+
106+
1. Start with the sample's defaults (check `prj.conf` and board `.conf` — varies by sample)
107+
2. Build and run — if allocation fails, the error tells you exactly what was requested
108+
3. **Method pool**: must hold largest planned buffer + all input tensors
109+
4. **Temp pool**: must hold delegate scratch buffer (varies by model and backend)
110+
5. Total pools must fit in the `zephyr,sram` region minus ~112 KiB overhead (stack, heap, .data/.bss)
111+
112+
### Memory budget formula
113+
114+
```
115+
available_for_pools = zephyr_sram_size - code_if_shared - stack - heap - bss_overhead
116+
```
117+
118+
Where:
119+
- `code_if_shared`: 0 if flash and sram are separate regions; .text size if they share a region
120+
- `stack`: `CONFIG_MAIN_STACK_SIZE` (default 16 KiB)
121+
- `heap`: `CONFIG_HEAP_MEM_POOL_SIZE` (default 64 KiB)
122+
- `bss_overhead`: ~30 KiB for ET runtime + Zephyr kernel
123+
124+
## Step 3: Build and verify
125+
126+
```bash
127+
west build -b my_board modules/lib/executorch/zephyr/samples/my-sample -- \
128+
-DET_PTE_FILE_PATH=model.pte
129+
```
130+
131+
Check the memory map in the build output:
132+
133+
```
134+
Memory region Used Size Region Size %age Used
135+
FLASH: 459668 B 512 KB 87.67%
136+
RAM: 1963160 B 2 MB 93.61%
137+
MODEL_DDR: 3541440 B 16 MB 21.11%
138+
```
139+
140+
**Green flags:** all regions under 95%, model in the expected region.
141+
**Red flags:** any region near 100%, orphan section warnings, overflow errors.
142+
143+
## Reference: existing board configs
144+
145+
| Board | Overlay | Conf | Notes |
146+
|-------|---------|------|-------|
147+
| Corstone-300 | `mps3_corstone300_fvp.overlay` | `mps3_corstone300_fvp.conf` | ISRAM for sram, DDR for model, reduced pools |
148+
| Corstone-320 | `mps4_corstone320_fvp.overlay` | `mps4_corstone320_fvp.conf` | Shared ISRAM for flash+sram, DDR for model |
149+
| Alif E8 | (no overlay needed) | (no conf needed) | MRAM holds model in-place, defaults work |
150+
151+
Use these as templates — copy the closest match and adapt the memory addresses and sizes.

0 commit comments

Comments
 (0)