Skip to content

Replace AVG_POOL2D with REDUCE_SUM in DecomposeMeanDimPass (#19242)#19242

Merged
meta-codesync[bot] merged 1 commit intomainfrom
export-D101418199
May 2, 2026
Merged

Replace AVG_POOL2D with REDUCE_SUM in DecomposeMeanDimPass (#19242)#19242
meta-codesync[bot] merged 1 commit intomainfrom
export-D101418199

Conversation

@mcremon-meta
Copy link
Copy Markdown
Contributor

@mcremon-meta mcremon-meta commented May 1, 2026

Summary:

Replace the avg_pool2d decomposition path in DecomposeMeanDimPass with
REDUCE_SUM + MUL(1/N) for all mean.dim reductions.

AVG_POOL2D can only pool over spatial (H×W) axes in TOSA/NHWC layout,
which forces the compiler to insert TRANSPOSE ops when the reduction is
over channels (common in LayerNorm). REDUCE_SUM works on any axis without
layout constraints, avoiding those transposes entirely.

Reviewed By: 3l1

Differential Revision: D101418199

@mcremon-meta mcremon-meta requested a review from digantdesai as a code owner May 1, 2026 03:51
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented May 1, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19242

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Pending, 4 Unrelated Failures

As of commit 68c038b with merge base a3dd0fa (image):

NEW FAILURE - The following job has failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 1, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented May 1, 2026

@mcremon-meta has exported this pull request. If you are a Meta employee, you can view the originating Diff in D101418199.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@gggekov
Copy link
Copy Markdown
Collaborator

gggekov commented May 1, 2026

Thanks for the PR, Matthias! I see a few failing tests, could you take a look into them?
You are indeed reducing the number of permutes in a few cases, can you please update the lower number of expected permutes in backends/arm/test/misc/test_transpose_counts.py ?
I am also seeing a failure in backends/arm/test/ops/test_layer_norm.py for the test_native_layer_norm_16a8w_u85_INT suite and for backends/arm/test/ops/test_cond -k test_cond_u85_INT for the one_arg_two_outputs test case, wonder why is that?

@3l1 3l1 added the partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm label May 1, 2026
@meta-codesync meta-codesync Bot changed the title Replace AVG_POOL2D with REDUCE_SUM in DecomposeMeanDimPass Replace AVG_POOL2D with REDUCE_SUM in DecomposeMeanDimPass (#19242) May 1, 2026
meta-codesync Bot pushed a commit that referenced this pull request May 1, 2026
Summary:

Replace the avg_pool2d decomposition path in DecomposeMeanDimPass with
REDUCE_SUM + MUL(1/N) for all mean.dim reductions.

AVG_POOL2D can only pool over spatial (H×W) axes in TOSA/NHWC layout,
which forces the compiler to insert TRANSPOSE ops when the reduction is
over channels (common in LayerNorm). REDUCE_SUM works on any axis without
layout constraints, avoiding those transposes entirely.

Reviewed By: 3l1

Differential Revision: D101418199
@meta-codesync meta-codesync Bot force-pushed the export-D101418199 branch from 8dff93f to b3fffa9 Compare May 1, 2026 21:14
@mcremon-meta
Copy link
Copy Markdown
Contributor Author

mcremon-meta commented May 1, 2026

Thanks for the PR, Matthias! I see a few failing tests, could you take a look into them? You are indeed reducing the number of permutes in a few cases, can you please update the lower number of expected permutes in backends/arm/test/misc/test_transpose_counts.py ? I am also seeing a failure in backends/arm/test/ops/test_layer_norm.py for the test_native_layer_norm_16a8w_u85_INT suite and for backends/arm/test/ops/test_cond -k test_cond_u85_INT for the one_arg_two_outputs test case, wonder why is that?

test_layer_norm.py: Removed randn_last_three_dims and randn_last_three_dims_no_bias from U85 16a8w xfails — the new sum-based decomposition improved accuracy enough that these tests now pass.

test_transpose_counts.py: Updated 5 expected TRANSPOSE counts (groupnorm 1→0, groupnorm_channels_last 3→2, model_2 11→9, model_4 5→3, model_5 6→4) — replacing avg_pool2d with sum+mul eliminates NHWC layout conversions.

test_cond.py: Xfailed one_arg_two_outputs in both TOSA INT and U85 INT — the new decomposition creates a full constant inside torch.cond branches which PyTorch's constant folder freezes into a parameter the branch submodule can't access.

@gggekov please note the last one. Seems like there is an issue with higher order ops and constant folding. I do think it's out of scope for this, so flagging it for later fix!

Edit: changed the op from mean to sum in the test_cond file, since it's not the point of the test to care about that op. No xfail added in the end

@3l1 3l1 self-requested a review May 1, 2026 22:11
Copy link
Copy Markdown
Contributor

@3l1 3l1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's review why we fail here before merging

tosa_int_xfails = {
    "one_arg_two_outputs": "mean decomposition creates frozen constant inside cond branch that breaks re-export",
}

meta-codesync Bot pushed a commit that referenced this pull request May 2, 2026
Summary:

Replace the avg_pool2d decomposition path in DecomposeMeanDimPass with
REDUCE_SUM + MUL(1/N) for all mean.dim reductions.

AVG_POOL2D can only pool over spatial (H×W) axes in TOSA/NHWC layout,
which forces the compiler to insert TRANSPOSE ops when the reduction is
over channels (common in LayerNorm). REDUCE_SUM works on any axis without
layout constraints, avoiding those transposes entirely.

Reviewed By: 3l1

Differential Revision: D101418199
@meta-codesync meta-codesync Bot force-pushed the export-D101418199 branch from b3fffa9 to c755445 Compare May 2, 2026 04:21
Summary:

Replace the avg_pool2d decomposition path in DecomposeMeanDimPass with
REDUCE_SUM + MUL(1/N) for all mean.dim reductions.

AVG_POOL2D can only pool over spatial (H×W) axes in TOSA/NHWC layout,
which forces the compiler to insert TRANSPOSE ops when the reduction is
over channels (common in LayerNorm). REDUCE_SUM works on any axis without
layout constraints, avoiding those transposes entirely.

Reviewed By: 3l1

Differential Revision: D101418199
@meta-codesync meta-codesync Bot force-pushed the export-D101418199 branch from c755445 to 68c038b Compare May 2, 2026 04:30
@mcremon-meta
Copy link
Copy Markdown
Contributor Author

let's review why we fail here before merging

tosa_int_xfails = {
    "one_arg_two_outputs": "mean decomposition creates frozen constant inside cond branch that breaks re-export",
}

Copied from above comment: changed the op from mean to sum in the test_cond file, since it's not the point of the test to care about that op. No xfail added in the end

@meta-codesync meta-codesync Bot merged commit d410aaf into main May 2, 2026
441 of 451 checks passed
@meta-codesync meta-codesync Bot deleted the export-D101418199 branch May 2, 2026 06:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported module: arm Issues related to arm backend partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants