Skip to content

Extend ExecuTorch reinplace pass (#19482)#19482

Merged
meta-codesync[bot] merged 1 commit into
pytorch:mainfrom
metascroy:export-D104152427
May 14, 2026
Merged

Extend ExecuTorch reinplace pass (#19482)#19482
meta-codesync[bot] merged 1 commit into
pytorch:mainfrom
metascroy:export-D104152427

Conversation

@metascroy

@metascroy metascroy commented May 11, 2026

Copy link
Copy Markdown
Contributor

Summary:

This diff expands the ExecuTorch reinplace pass to be configurable for backends to use.

We can introduce more in-place portable ops to expand the default set in ExecuTorch over time as well.

Note: if two tensors have the same spec, they are now deduped and emitted as the same value. Previously (with copy_ writeback), the synthetic out got a new value, but with the same allocation info as the self value.

What's new

  • reinplace_pass now takes ops_to_inplace= and inplace_overrides= so backends can extend the candidate set. The in-place form is auto-derived from the op schema, and registrations are validated up front.
  • Mutated arg positions are derived from Tensor(a!) annotations, so ops with multiple mutated args are handled correctly. kwargs are also forwarded to the in-place op (e.g. accumulate= for index_put).
  • Fixed a bug where, after a successful rewrite, the new node's input nodes were not added to seen_nodes, so an upstream candidate mutating one of those reads could incorrectly look safe to reinplace. Covered by a new index_put-only regression test.
  • Memory planner now aliases an in-place op's result TensorSpec onto its mutated input's spec (schema-driven via output_to_aliased_input_map), so they share an allocation slot.
  • Emitter dedupes by TensorSpec identity so two FX nodes sharing one spec share one Value in the lowered IR.

Reviewed By: JacobSzwejbka

Differential Revision: D104152427

@pytorch-bot

pytorch-bot Bot commented May 11, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19482

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 1 New Failure, 1 Unclassified Failure

As of commit 27995cb with merge base 65c7ee2 (image):

NEW FAILURE - The following job has failed:

UNCLASSIFIED FAILURE - DrCI could not classify the following job because the workflow did not run on the merge base. The failure may be pre-existing on trunk or introduced by this PR:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 11, 2026
@meta-codesync

meta-codesync Bot commented May 11, 2026

Copy link
Copy Markdown
Contributor

@metascroy has exported this pull request. If you are a Meta employee, you can view the originating Diff in D104152427.

@JacobSzwejbka JacobSzwejbka left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review automatically exported from Phabricator review in Meta.

@github-actions

Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@meta-codesync meta-codesync Bot changed the title Extend ExecuTorch reinplace pass Extend ExecuTorch reinplace pass (#19482) May 13, 2026
metascroy added a commit to metascroy/executorch that referenced this pull request May 13, 2026
Summary:

This diff expands the ExecuTorch reinplace pass to be configurable for backends to use.

We can introduce more in-place portable ops to expand the default set in ExecuTorch over time as well.

Note: if two tensors have the same spec, they are now deduped and emitted as the same value. Previously (with copy_ writeback), the synthetic out got a new value, but with the same allocation info as the self value.

What's new

* reinplace_pass now takes ops_to_inplace= and inplace_overrides= so backends can extend the candidate set. The in-place form is auto-derived from the op schema, and registrations are validated up front.
* Mutated arg positions are derived from Tensor(a!) annotations, so ops with multiple mutated args are handled correctly. kwargs are also forwarded to the in-place op (e.g. accumulate= for index_put).
* Fixed a bug where, after a successful rewrite, the new node's input nodes were not added to seen_nodes, so an upstream candidate mutating one of those reads could incorrectly look safe to reinplace. Covered by a new index_put-only regression test.
* Memory planner now aliases an in-place op's result TensorSpec onto its mutated input's spec (schema-driven via output_to_aliased_input_map), so they share an allocation slot.
* Emitter dedupes by TensorSpec identity so two FX nodes sharing one spec share one Value in the lowered IR.

Reviewed By: JacobSzwejbka

Differential Revision: D104152427
@metascroy metascroy force-pushed the export-D104152427 branch from 24b23f5 to a2cdaba Compare May 13, 2026 17:12
metascroy added a commit to metascroy/executorch that referenced this pull request May 13, 2026
Summary:

This diff expands the ExecuTorch reinplace pass to be configurable for backends to use.

We can introduce more in-place portable ops to expand the default set in ExecuTorch over time as well.

Note: if two tensors have the same spec, they are now deduped and emitted as the same value. Previously (with copy_ writeback), the synthetic out got a new value, but with the same allocation info as the self value.

What's new

* reinplace_pass now takes ops_to_inplace= and inplace_overrides= so backends can extend the candidate set. The in-place form is auto-derived from the op schema, and registrations are validated up front.
* Mutated arg positions are derived from Tensor(a!) annotations, so ops with multiple mutated args are handled correctly. kwargs are also forwarded to the in-place op (e.g. accumulate= for index_put).
* Fixed a bug where, after a successful rewrite, the new node's input nodes were not added to seen_nodes, so an upstream candidate mutating one of those reads could incorrectly look safe to reinplace. Covered by a new index_put-only regression test.
* Memory planner now aliases an in-place op's result TensorSpec onto its mutated input's spec (schema-driven via output_to_aliased_input_map), so they share an allocation slot.
* Emitter dedupes by TensorSpec identity so two FX nodes sharing one spec share one Value in the lowered IR.

Reviewed By: JacobSzwejbka

Differential Revision: D104152427
@metascroy metascroy force-pushed the export-D104152427 branch from a2cdaba to b6a82ef Compare May 13, 2026 17:15
Summary:

This diff expands the ExecuTorch reinplace pass to be configurable for backends to use.

We can introduce more in-place portable ops to expand the default set in ExecuTorch over time as well.

Note: if two tensors have the same spec, they are now deduped and emitted as the same value. Previously (with copy_ writeback), the synthetic out got a new value, but with the same allocation info as the self value.

What's new

* reinplace_pass now takes ops_to_inplace= and inplace_overrides= so backends can extend the candidate set. The in-place form is auto-derived from the op schema, and registrations are validated up front.
* Mutated arg positions are derived from Tensor(a!) annotations, so ops with multiple mutated args are handled correctly. kwargs are also forwarded to the in-place op (e.g. accumulate= for index_put).
* Fixed a bug where, after a successful rewrite, the new node's input nodes were not added to seen_nodes, so an upstream candidate mutating one of those reads could incorrectly look safe to reinplace. Covered by a new index_put-only regression test.
* Memory planner now aliases an in-place op's result TensorSpec onto its mutated input's spec (schema-driven via output_to_aliased_input_map), so they share an allocation slot.
* Emitter dedupes by TensorSpec identity so two FX nodes sharing one spec share one Value in the lowered IR.

Reviewed By: JacobSzwejbka

Differential Revision: D104152427
metascroy added a commit to metascroy/executorch that referenced this pull request May 13, 2026
Summary:

This diff expands the ExecuTorch reinplace pass to be configurable for backends to use.

We can introduce more in-place portable ops to expand the default set in ExecuTorch over time as well.

Note: if two tensors have the same spec, they are now deduped and emitted as the same value. Previously (with copy_ writeback), the synthetic out got a new value, but with the same allocation info as the self value.

What's new

* reinplace_pass now takes ops_to_inplace= and inplace_overrides= so backends can extend the candidate set. The in-place form is auto-derived from the op schema, and registrations are validated up front.
* Mutated arg positions are derived from Tensor(a!) annotations, so ops with multiple mutated args are handled correctly. kwargs are also forwarded to the in-place op (e.g. accumulate= for index_put).
* Fixed a bug where, after a successful rewrite, the new node's input nodes were not added to seen_nodes, so an upstream candidate mutating one of those reads could incorrectly look safe to reinplace. Covered by a new index_put-only regression test.
* Memory planner now aliases an in-place op's result TensorSpec onto its mutated input's spec (schema-driven via output_to_aliased_input_map), so they share an allocation slot.
* Emitter dedupes by TensorSpec identity so two FX nodes sharing one spec share one Value in the lowered IR.

Reviewed By: JacobSzwejbka

Differential Revision: D104152427
@metascroy metascroy force-pushed the export-D104152427 branch 2 times, most recently from f4e0451 to 27995cb Compare May 13, 2026 20:11
@meta-codesync meta-codesync Bot merged commit 9c9ce9b into pytorch:main May 14, 2026
175 of 177 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants