Skip to content

Qualcomm AI Engine Direct - AMD backend error#18098

Merged
abhinaykukkadapu merged 3 commits intopytorch:mainfrom
CodeLinaro:dev1/winskuo/amd_cpu_fix
Apr 7, 2026
Merged

Qualcomm AI Engine Direct - AMD backend error#18098
abhinaykukkadapu merged 3 commits intopytorch:mainfrom
CodeLinaro:dev1/winskuo/amd_cpu_fix

Conversation

@winskuo-quic
Copy link
Copy Markdown
Collaborator

@winskuo-quic winskuo-quic commented Mar 11, 2026

Summary

We noticed that when performing inference with AMD CPU, we will run into Floating point exception (core dumped).
This can be easily reproduced with following lines of code:

import torch.nn as nn
import torch
w2_conv = nn.Conv2d(1536, 32, 1, bias=False)
x = torch.randn(1,1536,1,32)
w2_conv(x)

Temp solution is to set mkldnn.enabled=False:
torch.backends.mkldnn.enabled = False

Test plan

NA

cc @cccclai @cbilgin

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Mar 11, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18098

Note: Links to docs will display an error until the docs builds have been completed.

❌ 6 New Failures, 1 Cancelled Job, 4 Unrelated Failures

As of commit ae571ae with merge base c7f1d72 (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOB - The following job was cancelled. Please retry:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 11, 2026
@github-actions
Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@winskuo-quic
Copy link
Copy Markdown
Collaborator Author

Hi @cccclai, @abhinaykukkadapu,
We have noticed that AMD CPU during AOT will run into the error: Floating point exception (core dumped). This happens during inference, including nn.Module.
There's a sample in summary section to reproduce.
This PR is a quick workaround to fix the issue, but I am assuming if this is a AMD or Torch issue, placing these logic under QNN probably isn't the best option.
Please have a look.
Thanks

Copy link
Copy Markdown
Contributor

@digantdesai digantdesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is during eager model runs?

@digantdesai digantdesai added the module: qnn Issues related to Qualcomm's QNN delegate and code under backends/qualcomm/ label Mar 11, 2026
@digantdesai
Copy link
Copy Markdown
Contributor

Would you mind creating a ticket on PyTorch/PyTorch?

@winskuo-quic
Copy link
Copy Markdown
Collaborator Author

I assume this is during eager model runs?

Hi @digantdesai,
This issue happened in both Eager Model and Exported Program when we are calibrating the model.
I have created an issue ticket under pytorch/pytorch: pytorch/pytorch#177227

@abhinaykukkadapu
Copy link
Copy Markdown
Contributor

@winskuo-quic there are bunch of failures, can we rebase, i will monitor and push if the CI is green. Thanks

@winskuo-quic winskuo-quic force-pushed the dev1/winskuo/amd_cpu_fix branch from 6455008 to f54c655 Compare March 31, 2026 06:05
@winskuo-quic
Copy link
Copy Markdown
Collaborator Author

winskuo-quic commented Mar 31, 2026

@winskuo-quic there are bunch of failures, can we rebase, i will monitor and push if the CI is green. Thanks

@abhinaykukkadapu,
Thanks for the help to monitor the CI. I have rebased.
Also just FYI, I included a new library under backends/qualcomm/scripts/build.sh: pip install py-cpuinfo.
So if some CI's doesn't run build.sh, python might not be able to find the library.

Comment thread backends/qualcomm/__init__.py Outdated
import os

from .scripts.download_qnn_sdk import install_qnn_sdk, is_linux_x86, QNN_ZIP_URL
import cpuinfo
Copy link
Copy Markdown
Contributor

@abhinaykukkadapu abhinaykukkadapu Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to lazily import within the if "amd" in vendor:

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have moved all these into the try catch statement.

Comment thread backends/qualcomm/__init__.py Outdated

info = cpuinfo.get_cpu_info()
vendor = info.get("vendor_id_raw", "").lower()
if "amd" in vendor:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also consider surrounding this in try/except if there are any ImportError with logging to let user know to install py-cpuinfo

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a try catch statement.

@abhinaykukkadapu
Copy link
Copy Markdown
Contributor

@winskuo-quic there are bunch of failures, can we rebase, i will monitor and push if the CI is green. Thanks

@abhinaykukkadapu, Thanks for the help to monitor the CI. I have rebased. Also just FYI, I included a new library under backends/qualcomm/scripts/build.sh: pip install py-cpuinfo. So if some CI's doesn't run build.sh, python might not be able to find the library.

Thanks @winskuo-quic, provided few suggestions, also we would want something like this:

pip install -r backends/openvino/requirements.txt

If you can add it in this pr, otherwise i can take it up.

@winskuo-quic winskuo-quic force-pushed the dev1/winskuo/amd_cpu_fix branch from f54c655 to 5e7314f Compare April 7, 2026 01:57
@winskuo-quic
Copy link
Copy Markdown
Collaborator Author

Hi @abhinaykukkadapu,
Thanks for reviewing and providing some valuable suggestions. I have added executorch/backends/qualcomm/requirements.txt and place all python libraries under this file. Please take a look and let me know if it anything is missing.

@abhinaykukkadapu
Copy link
Copy Markdown
Contributor

Hi @abhinaykukkadapu, Thanks for reviewing and providing some valuable suggestions. I have added executorch/backends/qualcomm/requirements.txt and place all python libraries under this file. Please take a look and let me know if it anything is missing.

Thanks @winskuo-quic for addressing the comments, i think we missed one more spot to run the pip install requirements.

--- a/.ci/scripts/test_wheel_package_qnn.sh                                                       
  +++ b/.ci/scripts/test_wheel_package_qnn.sh                                                       
  @@ -176,6 +176,9 @@ run_core_tests () {                                                           
     "$PIPBIN" install . --no-build-isolation                                                       
     popd > /dev/null                                         
                                                                                                    
  +  # Install qualcomm backend dependencies                                                        
  +  "$PIPBIN" install -r "$REPO_ROOT/backends/qualcomm/requirements.txt"
  +                                                                                                 
     echo "=== [$LABEL] Import smoke tests ==="
     "$PYBIN" -c "import executorch; print('executorch imported successfully')"
     "$PYBIN" -c "import executorch.backends.qualcomm; print('executorch.backends.qualcomm imported 
  successfully')"

@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented Apr 7, 2026

@abhinaykukkadapu has imported this pull request. If you are a Meta employee, you can view this in D99858579.

@abhinaykukkadapu abhinaykukkadapu merged commit 19f7ff2 into pytorch:main Apr 7, 2026
295 of 306 checks passed
jpiat pushed a commit to jpiat/executorch that referenced this pull request Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: qnn Issues related to Qualcomm's QNN delegate and code under backends/qualcomm/

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants