Skip to content

Fix overflow and stride>1 fallback in cadence::quantized_conv1d HiFi kernels (#19193)#19193

Open
hsharma35 wants to merge 1 commit intopytorch:mainfrom
hsharma35:export-D102821209
Open

Fix overflow and stride>1 fallback in cadence::quantized_conv1d HiFi kernels (#19193)#19193
hsharma35 wants to merge 1 commit intopytorch:mainfrom
hsharma35:export-D102821209

Conversation

@hsharma35
Copy link
Copy Markdown
Contributor

@hsharma35 hsharma35 commented Apr 28, 2026

Fixes two correctness bugs in the HiFi kernels for cadence::quantized_conv1d_ncl.out and cadence::quantized_conv1d_nlc.out. The int8 path (xa_nn_conv2d_per_chan_sym8sxasym8s) produces incorrect results with stride > 1 on some backends (e.g., Artemis HiFi4) and is now redirected to the generic fallback for that case. The uint8 path overflowed WORD32 when computing out_multiplier32 if eff_scale >= 1.0 (i.e., output_scale > bias_scale), which is now clamped to INT32_MAX.

Reviewed By: zonglinpeng

Differential Revision: D102821209

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 28, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19193

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ You can merge normally! (2 Unrelated Failures)

As of commit de7fb48 with merge base d9688da (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented Apr 28, 2026

@hsharma35 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D102821209.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 28, 2026
@github-actions
Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@hsharma35 hsharma35 requested a review from mcremon-meta April 28, 2026 18:51
hsharma35 added a commit to hsharma35/executorch that referenced this pull request Apr 28, 2026
…kernels

Summary:
PR pytorch#19193
Fixes two correctness bugs in the HiFi kernels for cadence::quantized_conv1d_ncl.out and cadence::quantized_conv1d_nlc.out. The int8 path (xa_nn_conv2d_per_chan_sym8sxasym8s) produces incorrect results with stride > 1 on some backends (e.g., Artemis HiFi4) and is now redirected to the generic fallback for that case. The uint8 path overflowed WORD32 when computing out_multiplier32 if eff_scale >= 1.0 (i.e., output_scale > bias_scale), which is now clamped to INT32_MAX.

Reviewed By: zonglinpeng

Differential Revision: D102821209
hsharma35 added a commit to hsharma35/executorch that referenced this pull request Apr 28, 2026
…kernels (pytorch#19193)

Summary:

PR pytorch#19193
Fixes two correctness bugs in the HiFi kernels for cadence::quantized_conv1d_ncl.out and cadence::quantized_conv1d_nlc.out. The int8 path (xa_nn_conv2d_per_chan_sym8sxasym8s) produces incorrect results with stride > 1 on some backends (e.g., Artemis HiFi4) and is now redirected to the generic fallback for that case. The uint8 path overflowed WORD32 when computing out_multiplier32 if eff_scale >= 1.0 (i.e., output_scale > bias_scale), which is now clamped to INT32_MAX.

Reviewed By: zonglinpeng

Differential Revision: D102821209
hsharma35 added a commit to hsharma35/executorch that referenced this pull request Apr 29, 2026
…kernels (pytorch#19193)

Summary:

PR pytorch#19193
Fixes two correctness bugs in the HiFi kernels for cadence::quantized_conv1d_ncl.out and cadence::quantized_conv1d_nlc.out. The int8 path (xa_nn_conv2d_per_chan_sym8sxasym8s) produces incorrect results with stride > 1 on some backends (e.g., Artemis HiFi4) and is now redirected to the generic fallback for that case. The uint8 path overflowed WORD32 when computing out_multiplier32 if eff_scale >= 1.0 (i.e., output_scale > bias_scale), which is now clamped to INT32_MAX.

Reviewed By: zonglinpeng

Differential Revision: D102821209
…kernels (pytorch#19193)

Summary:

PR pytorch#19193
Fixes two correctness bugs in the HiFi kernels for cadence::quantized_conv1d_ncl.out and cadence::quantized_conv1d_nlc.out. The int8 path (xa_nn_conv2d_per_chan_sym8sxasym8s) produces incorrect results with stride > 1 on some backends (e.g., Artemis HiFi4) and is now redirected to the generic fallback for that case. The uint8 path overflowed WORD32 when computing out_multiplier32 if eff_scale >= 1.0 (i.e., output_scale > bias_scale), which is now clamped to INT32_MAX.

Reviewed By: zonglinpeng

Differential Revision: D102821209
@meta-codesync meta-codesync Bot changed the title Fix overflow and stride>1 fallback in cadence::quantized_conv1d HiFi kernels Fix overflow and stride>1 fallback in cadence::quantized_conv1d HiFi kernels (#19193) Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants