Fix overflow and stride>1 fallback in cadence::quantized_conv1d HiFi kernels (#19193)#19193
Fix overflow and stride>1 fallback in cadence::quantized_conv1d HiFi kernels (#19193)#19193hsharma35 wants to merge 1 commit intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19193
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ You can merge normally! (2 Unrelated Failures)As of commit de7fb48 with merge base d9688da ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following job failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@hsharma35 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D102821209. |
This PR needs a
|
b223e7d to
2564f71
Compare
…kernels Summary: PR pytorch#19193 Fixes two correctness bugs in the HiFi kernels for cadence::quantized_conv1d_ncl.out and cadence::quantized_conv1d_nlc.out. The int8 path (xa_nn_conv2d_per_chan_sym8sxasym8s) produces incorrect results with stride > 1 on some backends (e.g., Artemis HiFi4) and is now redirected to the generic fallback for that case. The uint8 path overflowed WORD32 when computing out_multiplier32 if eff_scale >= 1.0 (i.e., output_scale > bias_scale), which is now clamped to INT32_MAX. Reviewed By: zonglinpeng Differential Revision: D102821209
2564f71 to
fa8c00c
Compare
…kernels (pytorch#19193) Summary: PR pytorch#19193 Fixes two correctness bugs in the HiFi kernels for cadence::quantized_conv1d_ncl.out and cadence::quantized_conv1d_nlc.out. The int8 path (xa_nn_conv2d_per_chan_sym8sxasym8s) produces incorrect results with stride > 1 on some backends (e.g., Artemis HiFi4) and is now redirected to the generic fallback for that case. The uint8 path overflowed WORD32 when computing out_multiplier32 if eff_scale >= 1.0 (i.e., output_scale > bias_scale), which is now clamped to INT32_MAX. Reviewed By: zonglinpeng Differential Revision: D102821209
…kernels (pytorch#19193) Summary: PR pytorch#19193 Fixes two correctness bugs in the HiFi kernels for cadence::quantized_conv1d_ncl.out and cadence::quantized_conv1d_nlc.out. The int8 path (xa_nn_conv2d_per_chan_sym8sxasym8s) produces incorrect results with stride > 1 on some backends (e.g., Artemis HiFi4) and is now redirected to the generic fallback for that case. The uint8 path overflowed WORD32 when computing out_multiplier32 if eff_scale >= 1.0 (i.e., output_scale > bias_scale), which is now clamped to INT32_MAX. Reviewed By: zonglinpeng Differential Revision: D102821209
…kernels (pytorch#19193) Summary: PR pytorch#19193 Fixes two correctness bugs in the HiFi kernels for cadence::quantized_conv1d_ncl.out and cadence::quantized_conv1d_nlc.out. The int8 path (xa_nn_conv2d_per_chan_sym8sxasym8s) produces incorrect results with stride > 1 on some backends (e.g., Artemis HiFi4) and is now redirected to the generic fallback for that case. The uint8 path overflowed WORD32 when computing out_multiplier32 if eff_scale >= 1.0 (i.e., output_scale > bias_scale), which is now clamped to INT32_MAX. Reviewed By: zonglinpeng Differential Revision: D102821209
fa8c00c to
de7fb48
Compare
Fixes two correctness bugs in the HiFi kernels for cadence::quantized_conv1d_ncl.out and cadence::quantized_conv1d_nlc.out. The int8 path (xa_nn_conv2d_per_chan_sym8sxasym8s) produces incorrect results with stride > 1 on some backends (e.g., Artemis HiFi4) and is now redirected to the generic fallback for that case. The uint8 path overflowed WORD32 when computing out_multiplier32 if eff_scale >= 1.0 (i.e., output_scale > bias_scale), which is now clamped to INT32_MAX.
Reviewed By: zonglinpeng
Differential Revision: D102821209