Cortex-M backend: Make BundleIO ATOL/RTOL configurable per-test via runtime args#18739
Cortex-M backend: Make BundleIO ATOL/RTOL configurable per-test via runtime args#18739BryanBradfo wants to merge 1 commit intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18739
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 9 New Failures, 1 Cancelled Job, 5 Unrelated FailuresAs of commit 45215f4 with merge base c11ba1b ( NEW FAILURES - The following jobs have failed:
CANCELLED JOB - The following job was cancelled. Please retry:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Pull request overview
This PR makes BundleIO output-comparison tolerances (ATOL/RTOL) configurable per test run for the ARM semihosting executor_runner, enabling tighter defaults while allowing per-model overrides in Cortex-M end-to-end tests.
Changes:
- Add runtime
-atol/-rtolCLI args to the semihosting runner and tighten default tolerances to1e-3. - Remove hardcoded tolerance compile definitions from the Cortex-M test runner build script.
- Add a per-model tolerance lookup table in the Cortex-M e2e script and pass tolerances through the FVP semihosting command line.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| examples/arm/executor_runner/CMakeLists.txt | Tightens default ET_ATOL/ET_RTOL CMake cache values and fixes RTOL help text. |
| examples/arm/executor_runner/arm_executor_runner.cpp | Introduces runtime -atol/-rtol overrides for BundleIO comparisons and updates defaults/usage text. |
| backends/cortex_m/test/build_test_runner.sh | Stops baking relaxed ATOL/RTOL into the runner binary at build time. |
| .ci/scripts/test_cortex_m_e2e.sh | Adds per-model tolerances and forwards them to the runner via semihosting cmd line. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
1c01def to
143c585
Compare
143c585 to
7abb0a6
Compare
|
@pytorchbot label "release notes: none" |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
7abb0a6 to
d85f1a5
Compare
|
Hi @psiddh and @nil-is-all, I've addressed the feedback from the Copilot bot. The PR is ready for review! Could someone please trigger the CI workflows for this latest commit? Thanks! |
|
To add the ciflow label This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows. |
|
Thanks for the PR :) |
|
@zingo any relevant errors @BryanBradfo should take a look at? |
|
|
||
| if(ET_ATOL) | ||
| if(DEFINED ET_ATOL) |
There was a problem hiding this comment.
what do you think of doing this way? This makes the intent clear: ATOL/RTOL only matter when BundleIO is enabled.
if(ET_BUNDLE_IO)
target_compile_definitions(arm_executor_runner PUBLIC -DET_BUNDLE_IO)
if(ET_ATOL)
target_compile_definitions(arm_executor_runner PUBLIC ET_ATOL=${ET_ATOL})
endif()
if(ET_RTOL)
target_compile_definitions(arm_executor_runner PUBLIC ET_RTOL=${ET_RTOL})
endif()
endif()
There was a problem hiding this comment.
Good suggestion, I'll nest the ATOL/RTOL definitions inside the if(ET_BUNDLE_IO) block!
| * ET_ATOL - The atol used to compare the output and ref data | ||
| * when using ET_BUNDLE_IO ET_RTOL - The rtol used to compare the | ||
| * output and ref data when using ET_BUNDLE_IO | ||
| * ET_ATOL - The absolute tolerance used to compare output and |
There was a problem hiding this comment.
(Thinking aloud) If any test runner other than test_cortex_m_e2e.sh invokes this, without passing -atol/-rtol at runtime, it will silently use the new tight 1e-3 default instead of the old 5.0/1.0. We should keep an eye on it . hopefully there are no errors as a result of this change.
[EDIT:]
- .github/workflows/test-mcu-cortex-m.yml (line 44) — builds the binary, then runs pytest on backends/cortex_m/test/. Those pytest tests go through backends/arm/test/runner_utils.py → run_corstone()
which does NOT pass -atol/-rtol flags. So these tests would hit the new tight 1e-3 default. - .github/workflows/trunk.yml (line 1087) — builds the binary, then calls test_cortex_m_e2e.sh for mv2/mv3. This script does pass -atol/-rtol per the PR. This caller is fine.
There was a problem hiding this comment.
Right, the pytest path uses run_corstone() with plain .pte files, so BundleIO verification never runs there. BundleIO is also used by test_arm_baremetal.sh (via run.sh --bundleio), which doesn't pass -atol/-rtol and uses the compile-time default. CI confirms those tests pass with the tighter 1e-3 default.
| -C "cpu0.semihosting-cwd=${WORK_DIR}" \ | ||
| -C "ethosu.extra_args='--fast'" \ | ||
| -C "cpu0.semihosting-cmd_line='executor_runner -m ${MODEL}.bpte -i dummy.bin -o out'" \ | ||
| -C "cpu0.semihosting-cmd_line='executor_runner -m ${MODEL}.bpte -i dummy.bin -o out -atol ${ATOL} -rtol ${RTOL}'" \ |
There was a problem hiding this comment.
Curious why does the error bound enforcing logic has to live in the runner? Can the runner report the ATOL, RTOL, and other stats by comparing against the bundled output and then we can compare it outside?
Rationale is, (1) it will save cycles, and error reporting from the runtime on Cortex-M, (2) we can have it more flexible if we validate it outside. And support multiple PTEs for a same runner - through semihosting or something else.
These two doesn't have to be mutually exclusive, i.e. we support both, (1) reporting detailed comparison stats for outside comparison, (2) optionally allowing users to bake in thresholds.
There was a problem hiding this comment.
To add to this we are already running a number of e2e tests on FVP in https://github.com/pytorch/executorch/tree/main/backends/cortex_m/test/models with individual tolerances, isn't that good enough to catch regressions?
Basically nothing named test-arm* is expected to fail, there are some low probabilety flakey tests that fail sometime but re-running the test is expected to clear it in that case. What we usualy do is re-run to prove a PR is not causing a 100% error. :) This is the status on latest main for referece https://hud.pytorch.org/hud/pytorch/executorch/main/1?per_page=50&name_filter=test-arm%7Ccortex-m&useRegexFilter=true |
Previously tolerances were compile-time constants (ET_ATOL=5.0, ET_RTOL=1.0) baked into the executor runner binary, applied uniformly to all models. This masked numerical regressions on precise models. Add -atol and -rtol CLI args to the semihosting runner so tolerances can be set per-model at runtime. Tighten defaults to 1e-3 and move per-model overrides to a lookup table in test_cortex_m_e2e.sh. Resolves pytorch#18424
d85f1a5 to
45215f4
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary
Previously ATOL/RTOL were compile-time constants baked into the executor runner binary, applied uniformly to all models. This masked numerical regressions on models that could pass with tighter bounds.
-atoland-rtolCLI args to the semihosting runner for runtime override-DET_ATOL=5.0 -DET_RTOL=1.0from build_test_runner.shFixes #18424
Test plan
Ran
bash .ci/scripts/test_cortex_m_e2e.sh mv2on Corstone-300 FVP:Setting atol to 5.000000/Setting rtol to 2.500000Test_result: PASSwith calibrated tolerances (ATOL=5.0, RTOL=2.5)Test_result: FAILwith zero tolerances (-atol 0.0 -rtol 0.0), proving runtime override workscc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell @psiddh @AdrianLundell, @rascani, @AdrianLundell