Skip to content

Cortex-M backend: Make BundleIO ATOL/RTOL configurable per-test via runtime args#18739

Open
BryanBradfo wants to merge 1 commit intopytorch:mainfrom
BryanBradfo:bryanbradfo/configurable-bundleio-tolerances
Open

Cortex-M backend: Make BundleIO ATOL/RTOL configurable per-test via runtime args#18739
BryanBradfo wants to merge 1 commit intopytorch:mainfrom
BryanBradfo:bryanbradfo/configurable-bundleio-tolerances

Conversation

@BryanBradfo
Copy link
Copy Markdown

@BryanBradfo BryanBradfo commented Apr 7, 2026

Summary

Previously ATOL/RTOL were compile-time constants baked into the executor runner binary, applied uniformly to all models. This masked numerical regressions on models that could pass with tighter bounds.

  • Add -atol and -rtol CLI args to the semihosting runner for runtime override
  • Tighten defaults from 0.01 to 1e-3
  • Remove hardcoded -DET_ATOL=5.0 -DET_RTOL=1.0 from build_test_runner.sh
  • Add per-model tolerance lookup table in test_cortex_m_e2e.sh

Fixes #18424

Test plan

Ran bash .ci/scripts/test_cortex_m_e2e.sh mv2 on Corstone-300 FVP:

  • Confirmed runtime args are parsed: Setting atol to 5.000000 / Setting rtol to 2.500000
  • Confirmed Test_result: PASS with calibrated tolerances (ATOL=5.0, RTOL=2.5)
  • Confirmed Test_result: FAIL with zero tolerances (-atol 0.0 -rtol 0.0), proving runtime override works

cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell @psiddh @AdrianLundell, @rascani, @AdrianLundell

Copilot AI review requested due to automatic review settings April 7, 2026 15:48
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 7, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18739

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 9 New Failures, 1 Cancelled Job, 5 Unrelated Failures

As of commit 45215f4 with merge base c11ba1b (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOB - The following job was cancelled. Please retry:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 7, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes BundleIO output-comparison tolerances (ATOL/RTOL) configurable per test run for the ARM semihosting executor_runner, enabling tighter defaults while allowing per-model overrides in Cortex-M end-to-end tests.

Changes:

  • Add runtime -atol / -rtol CLI args to the semihosting runner and tighten default tolerances to 1e-3.
  • Remove hardcoded tolerance compile definitions from the Cortex-M test runner build script.
  • Add a per-model tolerance lookup table in the Cortex-M e2e script and pass tolerances through the FVP semihosting command line.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
examples/arm/executor_runner/CMakeLists.txt Tightens default ET_ATOL/ET_RTOL CMake cache values and fixes RTOL help text.
examples/arm/executor_runner/arm_executor_runner.cpp Introduces runtime -atol/-rtol overrides for BundleIO comparisons and updates defaults/usage text.
backends/cortex_m/test/build_test_runner.sh Stops baking relaxed ATOL/RTOL into the runner binary at build time.
.ci/scripts/test_cortex_m_e2e.sh Adds per-model tolerances and forwards them to the runner via semihosting cmd line.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread examples/arm/executor_runner/arm_executor_runner.cpp Outdated
Comment thread examples/arm/executor_runner/arm_executor_runner.cpp
Comment thread .ci/scripts/test_cortex_m_e2e.sh
@BryanBradfo BryanBradfo force-pushed the bryanbradfo/configurable-bundleio-tolerances branch from 1c01def to 143c585 Compare April 7, 2026 15:55
Copilot AI review requested due to automatic review settings April 7, 2026 16:00
@BryanBradfo BryanBradfo force-pushed the bryanbradfo/configurable-bundleio-tolerances branch from 143c585 to 7abb0a6 Compare April 7, 2026 16:00
@BryanBradfo
Copy link
Copy Markdown
Author

@pytorchbot label "release notes: none"

@pytorch-bot pytorch-bot Bot added the release notes: none Do not include this in the release notes label Apr 7, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread examples/arm/executor_runner/CMakeLists.txt
Comment thread examples/arm/executor_runner/arm_executor_runner.cpp
@BryanBradfo BryanBradfo force-pushed the bryanbradfo/configurable-bundleio-tolerances branch from 7abb0a6 to d85f1a5 Compare April 7, 2026 19:50
@nil-is-all nil-is-all added module: arm Issues related to arm backend module: microcontrollers For embedded MCUs like Cortex-M, or RTOS like Zephyr, does not track NPU backend like Arm Ethos. labels Apr 7, 2026
@BryanBradfo
Copy link
Copy Markdown
Author

Hi @psiddh and @nil-is-all, I've addressed the feedback from the Copilot bot. The PR is ready for review! Could someone please trigger the CI workflows for this latest commit? Thanks!

@zingo zingo added the partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm label Apr 9, 2026
@zingo zingo changed the title Make BundleIO ATOL/RTOL configurable per-test via runtime args Cortex-M backend: Make BundleIO ATOL/RTOL configurable per-test via runtime args Apr 9, 2026
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 9, 2026

To add the ciflow label ciflow/trunk please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

@zingo
Copy link
Copy Markdown
Collaborator

zingo commented Apr 9, 2026

Thanks for the PR :)
As this runner is used for Arm Backed Ethos-U testing and the defines are renamed lets make sure to test the full suite, I have added "trunk" table so more testing is done.

@nil-is-all
Copy link
Copy Markdown
Contributor

@zingo any relevant errors @BryanBradfo should take a look at?

Comment on lines +419 to +420

if(ET_ATOL)
if(DEFINED ET_ATOL)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think of doing this way? This makes the intent clear: ATOL/RTOL only matter when BundleIO is enabled.

if(ET_BUNDLE_IO)
target_compile_definitions(arm_executor_runner PUBLIC -DET_BUNDLE_IO)
if(ET_ATOL)
target_compile_definitions(arm_executor_runner PUBLIC ET_ATOL=${ET_ATOL})
endif()
if(ET_RTOL)
target_compile_definitions(arm_executor_runner PUBLIC ET_RTOL=${ET_RTOL})
endif()
endif()

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, I'll nest the ATOL/RTOL definitions inside the if(ET_BUNDLE_IO) block!

* ET_ATOL - The atol used to compare the output and ref data
* when using ET_BUNDLE_IO ET_RTOL - The rtol used to compare the
* output and ref data when using ET_BUNDLE_IO
* ET_ATOL - The absolute tolerance used to compare output and
Copy link
Copy Markdown
Contributor

@psiddh psiddh Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Thinking aloud) If any test runner other than test_cortex_m_e2e.sh invokes this, without passing -atol/-rtol at runtime, it will silently use the new tight 1e-3 default instead of the old 5.0/1.0. We should keep an eye on it . hopefully there are no errors as a result of this change.

[EDIT:]

  1. .github/workflows/test-mcu-cortex-m.yml (line 44) — builds the binary, then runs pytest on backends/cortex_m/test/. Those pytest tests go through backends/arm/test/runner_utils.py → run_corstone()
    which does NOT pass -atol/-rtol flags. So these tests would hit the new tight 1e-3 default.
  2. .github/workflows/trunk.yml (line 1087) — builds the binary, then calls test_cortex_m_e2e.sh for mv2/mv3. This script does pass -atol/-rtol per the PR. This caller is fine.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the pytest path uses run_corstone() with plain .pte files, so BundleIO verification never runs there. BundleIO is also used by test_arm_baremetal.sh (via run.sh --bundleio), which doesn't pass -atol/-rtol and uses the compile-time default. CI confirms those tests pass with the tighter 1e-3 default.

-C "cpu0.semihosting-cwd=${WORK_DIR}" \
-C "ethosu.extra_args='--fast'" \
-C "cpu0.semihosting-cmd_line='executor_runner -m ${MODEL}.bpte -i dummy.bin -o out'" \
-C "cpu0.semihosting-cmd_line='executor_runner -m ${MODEL}.bpte -i dummy.bin -o out -atol ${ATOL} -rtol ${RTOL}'" \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why does the error bound enforcing logic has to live in the runner? Can the runner report the ATOL, RTOL, and other stats by comparing against the bundled output and then we can compare it outside?

Rationale is, (1) it will save cycles, and error reporting from the runtime on Cortex-M, (2) we can have it more flexible if we validate it outside. And support multiple PTEs for a same runner - through semihosting or something else.

These two doesn't have to be mutually exclusive, i.e. we support both, (1) reporting detailed comparison stats for outside comparison, (2) optionally allowing users to bake in thresholds.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add to this we are already running a number of e2e tests on FVP in https://github.com/pytorch/executorch/tree/main/backends/cortex_m/test/models with individual tolerances, isn't that good enough to catch regressions?

@zingo
Copy link
Copy Markdown
Collaborator

zingo commented Apr 13, 2026

@zingo any relevant errors @BryanBradfo should take a look at?

Basically nothing named test-arm* is expected to fail, there are some low probabilety flakey tests that fail sometime but re-running the test is expected to clear it in that case. What we usualy do is re-run to prove a PR is not causing a 100% error. :)

This is the status on latest main for referece https://hud.pytorch.org/hud/pytorch/executorch/main/1?per_page=50&name_filter=test-arm%7Ccortex-m&useRegexFilter=true
(the red you see here is usually the flaky stuff)

Previously tolerances were compile-time constants (ET_ATOL=5.0, ET_RTOL=1.0) baked into the executor runner binary, applied uniformly to all models. This masked numerical regressions on precise models.

Add -atol and -rtol CLI args to the semihosting runner so tolerances can be set per-model at runtime. Tighten defaults to 1e-3 and move per-model overrides to a lookup table in test_cortex_m_e2e.sh.

Resolves pytorch#18424
Copilot AI review requested due to automatic review settings April 16, 2026 12:19
@BryanBradfo BryanBradfo force-pushed the bryanbradfo/configurable-bundleio-tolerances branch from d85f1a5 to 45215f4 Compare April 16, 2026 12:19
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread examples/arm/executor_runner/CMakeLists.txt
@BryanBradfo BryanBradfo requested a review from psiddh April 21, 2026 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: arm Issues related to arm backend module: microcontrollers For embedded MCUs like Cortex-M, or RTOS like Zephyr, does not track NPU backend like Arm Ethos. partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm release notes: none Do not include this in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ATOL and RTOL should probably be configurable per-test. Ideally, we also make the defaults as small as possible and override it on tests that need it.

7 participants