Qualcomm AI Engine Direct - AMD backend error by winskuo-quic · Pull Request #18098 · pytorch/executorch

winskuo-quic · 2026-03-11T11:22:48Z

Summary

We noticed that when performing inference with AMD CPU, we will run into Floating point exception (core dumped).
This can be easily reproduced with following lines of code:

import torch.nn as nn
import torch
w2_conv = nn.Conv2d(1536, 32, 1, bias=False)
x = torch.randn(1,1536,1,32)
w2_conv(x)

Temp solution is to set mkldnn.enabled=False:
torch.backends.mkldnn.enabled = False

Test plan

NA

cc @cccclai @cbilgin

pytorch-bot · 2026-03-11T11:22:52Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18098

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 6 New Failures, 1 Cancelled Job, 4 Unrelated Failures

As of commit ae571ae with merge base c7f1d72 ():

NEW FAILURES - The following jobs have failed:

Cadence Build & Test / cpu-test / test-aot / test-aot (gh)
backends/cadence/aot/tests/test_replace_ops_passes.py::TestReplaceOpsPasses::test_replace_transposed_conv_with_linear_4
Cadence Build & Test / cpu-test / test-ops / test-ops (gh)
examples/cadence/operators/test_g3_ops.py::ATenOpTestCases::test_g3_native_layer_norm_out_17
pull / android / run-emulator (gh)
The process '/usr/bin/sh' failed with exit code 1
pull / unittest / linux / linux-job (gh)
examples/models/test/test_export.py::ExportTest::test_efficient_sam_export_to_executorch
pull / unittest-arm-backend-with-no-deps (test_pytest_ops_tosa) / linux-job (gh)
RuntimeError: Command docker exec -t bc950a1d8f1b15a8794c20fd4ca255cdd8f114d8171465899cf44aa7ece09a3a /exec failed with exit code 1
trunk / unittest-release / macos / macos-job (gh)
export/tests/test_target_recipes.py::TestTargetRecipes::test_linear_model

CANCELLED JOB - The following job was cancelled. Please retry:

pull / unittest-buck / macos / macos-job (gh)
##[error]The operation was canceled.

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / unittest-editable / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.
trunk / unittest-release / linux / linux-job (gh) (trunk failure)
[ FAILED ] KernelTempMemoryAllocatorIntegrationTest.UsingTempMemoryAllocator
trunk / unittest-release / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-03-11T11:23:30Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

winskuo-quic · 2026-03-11T11:27:00Z

Hi @cccclai, @abhinaykukkadapu,
We have noticed that AMD CPU during AOT will run into the error: Floating point exception (core dumped). This happens during inference, including nn.Module.
There's a sample in summary section to reproduce.
This PR is a quick workaround to fix the issue, but I am assuming if this is a AMD or Torch issue, placing these logic under QNN probably isn't the best option.
Please have a look.
Thanks

digantdesai

I assume this is during eager model runs?

digantdesai · 2026-03-11T15:53:02Z

Would you mind creating a ticket on PyTorch/PyTorch?

winskuo-quic · 2026-03-12T04:14:14Z

I assume this is during eager model runs?

Hi @digantdesai,
This issue happened in both Eager Model and Exported Program when we are calibrating the model.
I have created an issue ticket under pytorch/pytorch: pytorch/pytorch#177227

abhinaykukkadapu · 2026-03-30T22:50:34Z

@winskuo-quic there are bunch of failures, can we rebase, i will monitor and push if the CI is green. Thanks

winskuo-quic · 2026-03-31T06:29:44Z

@winskuo-quic there are bunch of failures, can we rebase, i will monitor and push if the CI is green. Thanks

@abhinaykukkadapu,
Thanks for the help to monitor the CI. I have rebased.
Also just FYI, I included a new library under backends/qualcomm/scripts/build.sh: pip install py-cpuinfo.
So if some CI's doesn't run build.sh, python might not be able to find the library.

abhinaykukkadapu · 2026-04-03T20:28:35Z

 import os

-from .scripts.download_qnn_sdk import install_qnn_sdk, is_linux_x86, QNN_ZIP_URL
+import cpuinfo


Move this to lazily import within the if "amd" in vendor:

I have moved all these into the try catch statement.

abhinaykukkadapu · 2026-04-03T20:29:54Z

+
+info = cpuinfo.get_cpu_info()
+vendor = info.get("vendor_id_raw", "").lower()
+if "amd" in vendor:


Please also consider surrounding this in try/except if there are any ImportError with logging to let user know to install py-cpuinfo

Added a try catch statement.

abhinaykukkadapu · 2026-04-03T20:36:23Z

@winskuo-quic there are bunch of failures, can we rebase, i will monitor and push if the CI is green. Thanks

@abhinaykukkadapu, Thanks for the help to monitor the CI. I have rebased. Also just FYI, I included a new library under backends/qualcomm/scripts/build.sh: pip install py-cpuinfo. So if some CI's doesn't run build.sh, python might not be able to find the library.

Thanks @winskuo-quic, provided few suggestions, also we would want something like this:

executorch/.ci/scripts/setup-openvino.sh

Line 47 in 28f3cf3

pip install -r backends/openvino/requirements.txt

If you can add it in this pr, otherwise i can take it up.

winskuo-quic · 2026-04-07T01:59:45Z

Hi @abhinaykukkadapu,
Thanks for reviewing and providing some valuable suggestions. I have added executorch/backends/qualcomm/requirements.txt and place all python libraries under this file. Please take a look and let me know if it anything is missing.

abhinaykukkadapu · 2026-04-07T06:25:37Z

Hi @abhinaykukkadapu, Thanks for reviewing and providing some valuable suggestions. I have added executorch/backends/qualcomm/requirements.txt and place all python libraries under this file. Please take a look and let me know if it anything is missing.

Thanks @winskuo-quic for addressing the comments, i think we missed one more spot to run the pip install requirements.

--- a/.ci/scripts/test_wheel_package_qnn.sh                                                       
  +++ b/.ci/scripts/test_wheel_package_qnn.sh                                                       
  @@ -176,6 +176,9 @@ run_core_tests () {                                                           
     "$PIPBIN" install . --no-build-isolation                                                       
     popd > /dev/null                                         
                                                                                                    
  +  # Install qualcomm backend dependencies                                                        
  +  "$PIPBIN" install -r "$REPO_ROOT/backends/qualcomm/requirements.txt"
  +                                                                                                 
     echo "=== [$LABEL] Import smoke tests ==="
     "$PYBIN" -c "import executorch; print('executorch imported successfully')"
     "$PYBIN" -c "import executorch.backends.qualcomm; print('executorch.backends.qualcomm imported 
  successfully')"

meta-codesync · 2026-04-07T17:07:53Z

@abhinaykukkadapu has imported this pull request. If you are a Meta employee, you can view this in D99858579.

winskuo-quic requested review from abhinaykukkadapu and cccclai as code owners March 11, 2026 11:22

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 11, 2026

digantdesai approved these changes Mar 11, 2026

View reviewed changes

digantdesai added the module: qnn Issues related to Qualcomm's QNN delegate and code under backends/qualcomm/ label Mar 11, 2026

haowhsu-quic mentioned this pull request Mar 18, 2026

Qualcomm AI Engine Direct - calibration thread auto-tuning #18184

Merged

winskuo-quic force-pushed the dev1/winskuo/amd_cpu_fix branch from 6455008 to f54c655 Compare March 31, 2026 06:05

abhinaykukkadapu reviewed Apr 3, 2026

View reviewed changes

winskuo-quic added 2 commits April 7, 2026 09:56

Temp fix on amd vendor

e6b0d04

Code Review

5e7314f

winskuo-quic force-pushed the dev1/winskuo/amd_cpu_fix branch from f54c655 to 5e7314f Compare April 7, 2026 01:57

use requirments.txt for wheel qnn build

ae571ae

abhinaykukkadapu approved these changes Apr 7, 2026

View reviewed changes

abhinaykukkadapu merged commit 19f7ff2 into pytorch:main Apr 7, 2026
295 of 306 checks passed

jpiat pushed a commit to jpiat/executorch that referenced this pull request Apr 14, 2026

Qualcomm AI Engine Direct - AMD backend error (pytorch#18098)

a4f33a7

Conversation

winskuo-quic commented Mar 11, 2026 • edited by pytorch-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

pytorch-bot Bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18098

❌ 6 New Failures, 1 Cancelled Job, 4 Unrelated Failures

Uh oh!

github-actions Bot commented Mar 11, 2026

This PR needs a release notes: label

Uh oh!

winskuo-quic commented Mar 11, 2026

Uh oh!

digantdesai left a comment

Choose a reason for hiding this comment

Uh oh!

digantdesai commented Mar 11, 2026

Uh oh!

winskuo-quic commented Mar 12, 2026

Uh oh!

abhinaykukkadapu commented Mar 30, 2026

Uh oh!

winskuo-quic commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abhinaykukkadapu Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

winskuo-quic Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

abhinaykukkadapu Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

winskuo-quic Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

abhinaykukkadapu commented Apr 3, 2026

Uh oh!

winskuo-quic commented Apr 7, 2026

Uh oh!

abhinaykukkadapu commented Apr 7, 2026

Uh oh!

meta-codesync Bot commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

winskuo-quic commented Mar 11, 2026 •

edited by pytorch-bot Bot

Loading

pytorch-bot Bot commented Mar 11, 2026 •

edited

Loading

This PR needs a `release notes:` label

winskuo-quic commented Mar 31, 2026 •

edited

Loading

abhinaykukkadapu Apr 3, 2026 •

edited

Loading