Place model PTE in DDR to fix FVP link-time memory overflow#19199
Place model PTE in DDR to fix FVP link-time memory overflow#19199psiddh merged 6 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19199
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 1 New Failure, 1 Cancelled Job, 4 Pending, 3 Unrelated FailuresAs of commit 03bf8c8 with merge base 4ac044b ( NEW FAILURE - The following job has failed:
CANCELLED JOB - The following job was cancelled. Please retry:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
There was a problem hiding this comment.
Pull request overview
Routes the MobileNetV2 model PTE blob into a new DDR memory region on Corstone-300/320 FVPs to avoid link-time overflow of ITCM/ISRAM, and updates the sample configuration to use the model blob in-place when DMA-accessible.
Changes:
- Add a Zephyr linker snippet to place
.network_model_secinto a devicetree-definedMODEL_DDRregion. - Add DTS overlays for Corstone-300/320 FVPs declaring a 16 MiB DDR region at
0x7000_0000. - Add a new Kconfig option and board configs to enable “DMA-accessible model” behavior and adjust allocator pool sizes.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
zephyr/samples/mv2-ethosu/model_section.ld |
New linker section placing model data into MODEL_DDR. |
zephyr/samples/mv2-ethosu/boards/mps4_corstone320_fvp.overlay |
Declares the MODEL_DDR memory region in DDR for Corstone-320 FVP. |
zephyr/samples/mv2-ethosu/boards/mps4_corstone320_fvp.conf |
Enables DMA-accessible model behavior and reduces allocator pools for Corstone-320. |
zephyr/samples/mv2-ethosu/boards/mps3_corstone300_fvp.overlay |
Adds MODEL_DDR memory region in DDR alongside existing SRAM routing override. |
zephyr/samples/mv2-ethosu/boards/mps3_corstone300_fvp.conf |
Enables DMA-accessible model behavior for Corstone-300 and keeps reduced pool sizes. |
zephyr/samples/mv2-ethosu/Kconfig |
Adds ET_ARM_MODEL_PTE_DMA_ACCESSIBLE option for skipping the SRAM model copy. |
zephyr/samples/mv2-ethosu/CMakeLists.txt |
Conditionally adds the linker snippet and compile definition when the new Kconfig is enabled. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
fcfbe5c to
fb5cec0
Compare
The MV2 model PTE (~3.5 MB) overflows both the 512 KiB ITCM (FLASH) and 2 MiB ISRAM (RAM) regions on Corstone-300 FVP, and similarly on Corstone-320. Fix: declare a DDR memory region (0x7000_0000, 16 MiB) via DTS overlay on both FVP boards and route the network_model_sec linker section there via a Zephyr linker snippet. A new Kconfig option ET_ARM_MODEL_PTE_DMA_ACCESSIBLE (enabled for FVP boards) tells main.cpp to use the model blob in-place instead of memcpy-ing it into a second SRAM buffer, since the Ethos-U can DMA from DDR on the FVP. Also adds pool-size overrides for Corstone-320 (previously only had CONFIG_ETHOS_U=y). Co-authored-by: Claude <noreply@anthropic.com>
fb5cec0 to
b6366da
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Corstone-320: redirect zephyr,sram to the 4 MiB ISRAM so the ~3 MiB of allocator pools (method + Ethos-U scratch) fit alongside code. Bump pool sizes to the 1.5 MiB defaults — verified locally that the build links at FLASH 11% / RAM 78% / MODEL_DDR 23%. Corstone-300: 2 MiB ISRAM cannot hold the ~2.9 MiB of runtime pools MV2 needs regardless of pool tuning. Document this in the board conf and skip the MV2 ethos-u55 matrix entry in trunk CI. The hello-executorch sample already validates the Corstone-300 + Ethos-U55 pipeline end-to-end. Co-authored-by: Claude <noreply@anthropic.com>
pte_to_header.py generates __attribute__((section("network_model_sec")))
(no leading dot), but the linker snippet used *(.network_model_sec) which
doesn't match. The model ended up in MODEL_DDR anyway because the linker
placed the orphan into the identically-named output section, but this
produced a warning. Drop the leading dots so the pattern matches directly.
Co-authored-by: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
MV2 cycle-accurate FVP simulation is too slow for the 120-minute CI timeout. Corstone-300 also lacks enough ISRAM for the runtime pools. Run export + build for both ethos-u55 and ethos-u85 so link/compile regressions are still caught without risking a timeout. Split the README build and run steps for both boards so CI can reference the build-only markers. Users can still run `west build -t run` manually. Co-authored-by: Claude <noreply@anthropic.com>
a9acb90 to
9d70310
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Claude <noreply@anthropic.com>
Summary
The MV2 model PTE (~3.5 MB) overflows both the 512 KiB ITCM (FLASH) and 2 MiB ISRAM (RAM) regions on Corstone-300 FVP, and similarly on Corstone-320.
Fix: declare a DDR memory region (0x7000_0000, 16 MiB) via DTS overlay on both FVP boards and route the network_model_sec linker section there via a Zephyr linker snippet. A new Kconfig option ET_ARM_MODEL_PTE_DMA_ACCESSIBLE (enabled for FVP boards) tells main.cpp to use the model blob in-place instead of memcpy-ing it into a second SRAM buffer, since the Ethos-U can DMA from DDR on the FVP.
Also adds pool-size overrides for Corstone-320 (previously only had CONFIG_ETHOS_U=y).
TestPlan
Before this fix:
ld.bfd: region
FLASH' overflowed by 3,476,864 bytes ld.bfd: regionRAM' overflowed by 3,128,920 bytesBuild fails — the 3.4 MB PTE blob can't fit in 512 KB FLASH + 2 MB ISRAM.
After this fix:
FLASH: 459,668 B / 512 KB (87.67%) ✓
RAM: 1,684,632 B / 2 MB (80.33%) ✓
MODEL_DDR: 3,541,440 B / 16 MB (21.11%) ← model blob here
Build succeeds. The FVP boots and the Ethos-U NPU initializes. Full inference would complete but takes 10-20 min of wall clock in
cycle-accurate simulation.