Skip to content

Place model PTE in DDR to fix FVP link-time memory overflow#19199

Merged
psiddh merged 6 commits intopytorch:mainfrom
psiddh:fix-mv2-fvp-memory-overflow
Apr 29, 2026
Merged

Place model PTE in DDR to fix FVP link-time memory overflow#19199
psiddh merged 6 commits intopytorch:mainfrom
psiddh:fix-mv2-fvp-memory-overflow

Conversation

@psiddh
Copy link
Copy Markdown
Contributor

@psiddh psiddh commented Apr 28, 2026

Summary

The MV2 model PTE (~3.5 MB) overflows both the 512 KiB ITCM (FLASH) and 2 MiB ISRAM (RAM) regions on Corstone-300 FVP, and similarly on Corstone-320.

Fix: declare a DDR memory region (0x7000_0000, 16 MiB) via DTS overlay on both FVP boards and route the network_model_sec linker section there via a Zephyr linker snippet. A new Kconfig option ET_ARM_MODEL_PTE_DMA_ACCESSIBLE (enabled for FVP boards) tells main.cpp to use the model blob in-place instead of memcpy-ing it into a second SRAM buffer, since the Ethos-U can DMA from DDR on the FVP.

Also adds pool-size overrides for Corstone-320 (previously only had CONFIG_ETHOS_U=y).

TestPlan

Before this fix:

ld.bfd: region FLASH' overflowed by 3,476,864 bytes ld.bfd: region RAM' overflowed by 3,128,920 bytes
Build fails — the 3.4 MB PTE blob can't fit in 512 KB FLASH + 2 MB ISRAM.

After this fix:

FLASH: 459,668 B / 512 KB (87.67%) ✓
RAM: 1,684,632 B / 2 MB (80.33%) ✓
MODEL_DDR: 3,541,440 B / 16 MB (21.11%) ← model blob here
Build succeeds. The FVP boots and the Ethos-U NPU initializes. Full inference would complete but takes 10-20 min of wall clock in
cycle-accurate simulation.

Copilot AI review requested due to automatic review settings April 28, 2026 22:26
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 28, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19199

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 1 New Failure, 1 Cancelled Job, 4 Pending, 3 Unrelated Failures

As of commit 03bf8c8 with merge base 4ac044b (image):

NEW FAILURE - The following job has failed:

CANCELLED JOB - The following job was cancelled. Please retry:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 28, 2026
@github-actions
Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Routes the MobileNetV2 model PTE blob into a new DDR memory region on Corstone-300/320 FVPs to avoid link-time overflow of ITCM/ISRAM, and updates the sample configuration to use the model blob in-place when DMA-accessible.

Changes:

  • Add a Zephyr linker snippet to place .network_model_sec into a devicetree-defined MODEL_DDR region.
  • Add DTS overlays for Corstone-300/320 FVPs declaring a 16 MiB DDR region at 0x7000_0000.
  • Add a new Kconfig option and board configs to enable “DMA-accessible model” behavior and adjust allocator pool sizes.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
zephyr/samples/mv2-ethosu/model_section.ld New linker section placing model data into MODEL_DDR.
zephyr/samples/mv2-ethosu/boards/mps4_corstone320_fvp.overlay Declares the MODEL_DDR memory region in DDR for Corstone-320 FVP.
zephyr/samples/mv2-ethosu/boards/mps4_corstone320_fvp.conf Enables DMA-accessible model behavior and reduces allocator pools for Corstone-320.
zephyr/samples/mv2-ethosu/boards/mps3_corstone300_fvp.overlay Adds MODEL_DDR memory region in DDR alongside existing SRAM routing override.
zephyr/samples/mv2-ethosu/boards/mps3_corstone300_fvp.conf Enables DMA-accessible model behavior for Corstone-300 and keeps reduced pool sizes.
zephyr/samples/mv2-ethosu/Kconfig Adds ET_ARM_MODEL_PTE_DMA_ACCESSIBLE option for skipping the SRAM model copy.
zephyr/samples/mv2-ethosu/CMakeLists.txt Conditionally adds the linker snippet and compile definition when the new Kconfig is enabled.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread zephyr/samples/mv2-ethosu/model_section.ld Outdated
Comment thread zephyr/samples/mv2-ethosu/CMakeLists.txt Outdated
@psiddh psiddh force-pushed the fix-mv2-fvp-memory-overflow branch from fcfbe5c to fb5cec0 Compare April 28, 2026 22:43
The MV2 model PTE (~3.5 MB) overflows both the 512 KiB ITCM (FLASH)
and 2 MiB ISRAM (RAM) regions on Corstone-300 FVP, and similarly on
Corstone-320.

Fix: declare a DDR memory region (0x7000_0000, 16 MiB) via DTS overlay
on both FVP boards and route the network_model_sec linker section there
via a Zephyr linker snippet.  A new Kconfig option
ET_ARM_MODEL_PTE_DMA_ACCESSIBLE (enabled for FVP boards) tells main.cpp
to use the model blob in-place instead of memcpy-ing it into a second
SRAM buffer, since the Ethos-U can DMA from DDR on the FVP.

Also adds pool-size overrides for Corstone-320 (previously only had
CONFIG_ETHOS_U=y).

Co-authored-by: Claude <noreply@anthropic.com>
@psiddh psiddh force-pushed the fix-mv2-fvp-memory-overflow branch from fb5cec0 to b6366da Compare April 28, 2026 23:15
Copilot AI review requested due to automatic review settings April 28, 2026 23:15
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread zephyr/samples/mv2-ethosu/model_section.ld.in Outdated
Corstone-320: redirect zephyr,sram to the 4 MiB ISRAM so the ~3 MiB of
allocator pools (method + Ethos-U scratch) fit alongside code.  Bump pool
sizes to the 1.5 MiB defaults — verified locally that the build links at
FLASH 11% / RAM 78% / MODEL_DDR 23%.

Corstone-300: 2 MiB ISRAM cannot hold the ~2.9 MiB of runtime pools MV2
needs regardless of pool tuning.  Document this in the board conf and skip
the MV2 ethos-u55 matrix entry in trunk CI.  The hello-executorch sample
already validates the Corstone-300 + Ethos-U55 pipeline end-to-end.

Co-authored-by: Claude <noreply@anthropic.com>
@psiddh psiddh requested review from rascani and zingo April 29, 2026 05:34
pte_to_header.py generates __attribute__((section("network_model_sec")))
(no leading dot), but the linker snippet used *(.network_model_sec) which
doesn't match.  The model ended up in MODEL_DDR anyway because the linker
placed the orphan into the identically-named output section, but this
produced a warning.  Drop the leading dots so the pattern matches directly.

Co-authored-by: Claude <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 29, 2026 05:52
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/trunk.yml Outdated
Comment thread .github/workflows/trunk.yml Outdated
MV2 cycle-accurate FVP simulation is too slow for the 120-minute CI
timeout.  Corstone-300 also lacks enough ISRAM for the runtime pools.
Run export + build for both ethos-u55 and ethos-u85 so link/compile
regressions are still caught without risking a timeout.

Split the README build and run steps for both boards so CI can reference
the build-only markers.  Users can still run `west build -t run` manually.

Co-authored-by: Claude <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 29, 2026 07:43
@psiddh psiddh force-pushed the fix-mv2-fvp-memory-overflow branch from a9acb90 to 9d70310 Compare April 29, 2026 07:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread zephyr/samples/mv2-ethosu/model_section.ld.in Outdated
Co-authored-by: Claude <noreply@anthropic.com>
@psiddh psiddh merged commit ddd8ac6 into pytorch:main Apr 29, 2026
564 of 580 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants