Add a16w8 per-op test for gelu by christine-long-meta · Pull Request #19598 · pytorch/executorch

christine-long-meta · 2026-05-14T16:48:00Z

Summary:
Add int16 activation / int8 weight (a16w8) quantization tests for aten.gelu on Ethos-U55 and Ethos-U85.

Changes

Add a16w8_gelu_test_parameters dict with 3 test configurations covering rank-1, rank-2, and rank-3 tensors
Add test_gelu_a16w8_u55_INT using EthosU55PipelineINT with a16w8_quantization=True, symmetric_io_quantization=True, qtol=128, epsilon=2**-16
Add test_gelu_a16w8_u85_INT using EthosU85PipelineINT with same kwargs
Register ops/test_gelu.py in fbcode/ and xplat/ targets.bzl

bypass-pytorch-oss-checks

Differential Revision: D104532359

Summary: Add int16 activation / int8 weight (a16w8) quantization tests for `aten.exp` on Ethos-U55 and Ethos-U85. ## Context The `exp` op is part of the softmax decomposition (`softmax(x) = exp(x) / sum(exp(x))`), which is used in the attention mechanism of EMG2Pose Conformer models. This op was identified as the root cause of the U85 SNR regression investigated in SEV T267939669 — without dedicated a16w8 per-op coverage, the numerics issue was only visible at the full-model level. Adding per-op tests allows us to catch int16 precision regressions at the operator granularity before they propagate to end-to-end model accuracy. ## Changes - Add `a16w8_exp_test_parameters` dict with 3 test configurations covering rank-1, rank-2, and rank-3 tensors - Add `test_exp_a16w8_u55_INT` using `EthosU55PipelineINT` with `a16w8_quantization=True, symmetric_io_quantization=True, qtol=128, epsilon=2**-16` - Add `test_exp_a16w8_u85_INT` using `EthosU85PipelineINT` with same kwargs - Register `ops/test_exp.py` in `fbcode/` and `xplat/` `targets.bzl` bypass-pytorch-oss-checks Differential Revision: D104532358

Summary: Add int16 activation / int8 weight (a16w8) quantization tests for `aten.reciprocal` on Ethos-U55 and Ethos-U85. ## Context The `reciprocal` op is the second half of the softmax decomposition (`softmax(x) = exp(x) * reciprocal(sum(exp(x)))`), paired with `exp`. Together they form the attention mechanism in EMG2Pose Conformer models. Like `exp`, this op was implicated in the U85 SNR regression (SEV T267939669) — the division-by-reciprocal path can amplify quantization error when the denominator is itself quantized at int16. Adding dedicated a16w8 coverage isolates reciprocal numerics from the rest of the softmax pipeline. ## Changes - Add `a16w8_reciprocal_test_parameters` dict with 3 test configurations covering rank-1, rank-2, and rank-3 tensors (all shifted by +0.1 to avoid division near zero) - Add `test_reciprocal_a16w8_u55_INT` using `EthosU55PipelineINT` with `a16w8_quantization=True, symmetric_io_quantization=True, qtol=128, epsilon=2**-16` - Add `test_reciprocal_a16w8_u85_INT` using `EthosU85PipelineINT` with same kwargs - Register `ops/test_reciprocal.py` in `fbcode/` and `xplat/` `targets.bzl` bypass-pytorch-oss-checks Differential Revision: D104532357

Summary: Add int16 activation / int8 weight (a16w8) quantization tests for `aten.mean.dim` on Ethos-U55 and Ethos-U85. ## Context The `mean_dim` op is a core component of the LayerNorm decomposition (`LayerNorm = (x - mean) / sqrt(var + eps) * gamma + beta`). It is used across multiple EMG production models including CC, CASCADE, HW, WAKE, and BTD. Despite this wide usage, no a16w8 per-op coverage existed — the int16 quantization path was only exercised indirectly through end-to-end model tests, making it difficult to isolate mean-specific numerics issues from other LayerNorm components. ## Changes - Add `a16w8_mean_test_parameters` dict with 11 test configurations covering keepdim/no-keepdim, positive/negative dims, dim=None, and ranks 1-4 - Add `test_mean_dim_a16w8_u55_INT` using `EthosU55PipelineINT` with `a16w8_quantization=True, symmetric_io_quantization=True, qtol=128, epsilon=2**-16` - Add `test_mean_dim_a16w8_u85_INT` using `EthosU85PipelineINT` with same kwargs - Register `ops/test_mean_dim.py` in `fbcode/` and `xplat/` `targets.bzl` bypass-pytorch-oss-checks Differential Revision: D104532361

Summary: Add int16 activation / int8 weight (a16w8) quantization tests for `aten.var` on Ethos-U55 and Ethos-U85. ## Context The `var` op is the second component of the LayerNorm decomposition, paired with `mean_dim`. Together they compute the normalization statistics used in every LayerNorm layer across EMG models including EMG2Pose. Variance computation is particularly sensitive to int16 quantization because it involves squaring differences — small quantization errors in the mean subtraction are amplified quadratically. Dedicated a16w8 coverage isolates variance numerics from the rest of the LayerNorm pipeline. ## Changes - Add `a16w8_var_test_parameters` dict with 4 test configurations covering keepdim/no-keepdim and correction values 0, 0.5, and 1 - Add `test_var_a16w8_u55_INT` using `EthosU55PipelineINT` with `a16w8_quantization=True, symmetric_io_quantization=True, qtol=128, epsilon=2**-16` - Add `test_var_a16w8_u85_INT` using `EthosU85PipelineINT` with same kwargs - Register `ops/test_var.py` in `fbcode/` and `xplat/` `targets.bzl` bypass-pytorch-oss-checks Differential Revision: D104532362

Summary: Add int16 activation / int8 weight (a16w8) quantization tests for `aten.conv1d` on Ethos-U55 and Ethos-U85. ## Context `conv1d` is the most critical op in the EMG stack — it is used by ALL 8 production EMG models (CC, CASCADE, HW, WAKE, BTD, AUTH, EMG2Pose, EMG2Touch) for temporal feature extraction from raw EMG signals. Despite this, only `conv2d` had a16w8 test coverage; `conv1d` was completely uncovered at the int16 activation precision. This gap meant that any Vela or quantizer regression affecting 1D convolutions at int16 IO would go undetected until full-model validation, making root-cause analysis significantly harder. The test matrix is the largest in this stack because conv1d has the most configuration surface: kernel sizes (1, 3, 5), strides, padding, dilation, depthwise groups, and bias/no-bias variants are all crossed with per-channel vs. per-tensor quantization. ## Changes - Add `a16w8_conv1d_test_parameters` dict with 14 test configurations (7 conv configs × {per_channel_quant=True, False}) covering kernel sizes 1/3/5, stride 1/2, dilation, depthwise, and no-bias variants - Add `test_conv1d_a16w8_u55_INT` using `EthosU55PipelineINT` with `a16w8_quantization=True, symmetric_io_quantization=True, per_channel_quantization=<varied>, qtol=128, epsilon=2**-16` - Add `test_conv1d_a16w8_u85_INT` using `EthosU85PipelineINT` with same kwargs - Register `ops/test_conv1d.py` in `fbcode/` and `xplat/` `targets.bzl` bypass-pytorch-oss-checks Differential Revision: D104532360

pytorch-bot · 2026-05-14T16:48:04Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19598

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Run pull request jobs on OSDC runners in shadow mode

❌ 3 New Failures, 40 Pending, 1 Unrelated Failure

As of commit 60842b0 with merge base 58b4f26 ():

NEW FAILURES - The following jobs have failed:

pull / test-qnn-delegate-linux / linux-job (gh)
backends/qualcomm/tests/test_qnn_delegate.py::TestQNNFloatingPointOperator::test_qnn_backend_argmin
pull / unittest-editable / linux / linux-job (gh)
backends/xnnpack/test/ops/test_conv2d.py::TestConv2d::test_dq_conv2d_seq
trunk / test-arm-backend-ethos-u (test_pytest_ops_ethos_u85) / linux-job (gh)
RuntimeError: Command docker exec -t 216be2abdd0846d5d890ea475f748003586a9700f287349c9119606a47f88612 /exec failed with exit code 1

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / test-qnn-testsuite-linux / test-backend-linux (qnn, models) / linux-job (gh) (matched linux rule in flaky-rules.json)
The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot · 2026-05-14T16:48:19Z

~~Workflows were awaiting approval.~~ CI has now been triggered for the ciflow labels on this PR.

github-actions · 2026-05-14T16:48:44Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Summary: Add int16 activation / int8 weight (a16w8) quantization tests for `aten.gelu` on Ethos-U55 and Ethos-U85. ## Changes - Add `a16w8_gelu_test_parameters` dict with 3 test configurations covering rank-1, rank-2, and rank-3 tensors - Add `test_gelu_a16w8_u55_INT` using `EthosU55PipelineINT` with `a16w8_quantization=True, symmetric_io_quantization=True, qtol=128, epsilon=2**-16` - Add `test_gelu_a16w8_u85_INT` using `EthosU85PipelineINT` with same kwargs - Register `ops/test_gelu.py` in `fbcode/` and `xplat/` `targets.bzl` bypass-pytorch-oss-checks Differential Revision: D104532359

meta-codesync · 2026-05-14T16:54:06Z

@christine-long-meta has exported this pull request. If you are a Meta employee, you can view the originating Diff in D104532359.

meta-codesync · 2026-05-14T16:54:25Z

@christine-long-meta has exported this pull request. If you are a Meta employee, you can view the originating Diff in D104532359.

meta-codesync · 2026-05-14T16:54:35Z

@christine-long-meta has exported this pull request. If you are a Meta employee, you can view the originating Diff in D104532359.

Summary: Add int16 activation / int8 weight (a16w8) quantization tests for `aten.gelu` on Ethos-U55 and Ethos-U85. ## Changes - Add `a16w8_gelu_test_parameters` dict with 3 test configurations covering rank-1, rank-2, and rank-3 tensors - Add `test_gelu_a16w8_u55_INT` using `EthosU55PipelineINT` with `a16w8_quantization=True, symmetric_io_quantization=True, qtol=128, epsilon=2**-16` - Add `test_gelu_a16w8_u85_INT` using `EthosU85PipelineINT` with same kwargs - Register `ops/test_gelu.py` in `fbcode/` and `xplat/` `targets.bzl` bypass-pytorch-oss-checks Differential Revision: D104532359

christine-long-meta added 5 commits May 12, 2026 18:09

christine-long-meta requested a review from digantdesai as a code owner May 14, 2026 16:48

github-actions Bot added ciflow/trunk module: arm Issues related to arm backend labels May 14, 2026

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 14, 2026

christine-long-meta force-pushed the export-D104532359 branch from d718c90 to 1c447dc Compare May 14, 2026 16:48

meta-codesync Bot added fb-exported meta-exported labels May 14, 2026

christine-long-meta force-pushed the export-D104532359 branch from 1c447dc to ea36e20 Compare May 14, 2026 16:48

christine-long-meta force-pushed the export-D104532359 branch from ea36e20 to 42c2abe Compare May 14, 2026 16:53

meta-codesync Bot changed the title ~~Add a16w8 per-op test for gelu~~ Add a16w8 per-op test for gelu (#19598) May 14, 2026

christine-long-meta force-pushed the export-D104532359 branch 2 times, most recently from 15ffc91 to 1371cae Compare May 14, 2026 16:53

christine-long-meta force-pushed the export-D104532359 branch from 1371cae to 60842b0 Compare May 14, 2026 20:53

meta-codesync Bot changed the title ~~Add a16w8 per-op test for gelu (#19598)~~ Add a16w8 per-op test for gelu May 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a16w8 per-op test for gelu#19598

Add a16w8 per-op test for gelu#19598
christine-long-meta wants to merge 6 commits into
pytorch:mainfrom
christine-long-meta:export-D104532359

christine-long-meta commented May 14, 2026 •

edited by meta-codesync Bot

Loading

Uh oh!

pytorch-bot Bot commented May 14, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented May 14, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

meta-codesync Bot commented May 14, 2026

Uh oh!

meta-codesync Bot commented May 14, 2026

Uh oh!

meta-codesync Bot commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

christine-long-meta commented May 14, 2026 • edited by meta-codesync Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

pytorch-bot Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19598

❗ 1 Active SEVs

❌ 3 New Failures, 40 Pending, 1 Unrelated Failure

Uh oh!

pytorch-bot Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 14, 2026

This PR needs a release notes: label

Uh oh!

meta-codesync Bot commented May 14, 2026

Uh oh!

meta-codesync Bot commented May 14, 2026

Uh oh!

meta-codesync Bot commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

christine-long-meta commented May 14, 2026 •

edited by meta-codesync Bot

Loading

pytorch-bot Bot commented May 14, 2026 •

edited

Loading

pytorch-bot Bot commented May 14, 2026 •

edited

Loading

This PR needs a `release notes:` label