Extend ExecuTorch reinplace pass (#19482)#19482
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19482
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 1 New Failure, 1 Unclassified FailureAs of commit 27995cb with merge base 65c7ee2 ( NEW FAILURE - The following job has failed:
UNCLASSIFIED FAILURE - DrCI could not classify the following job because the workflow did not run on the merge base. The failure may be pre-existing on trunk or introduced by this PR:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@metascroy has exported this pull request. If you are a Meta employee, you can view the originating Diff in D104152427. |
JacobSzwejbka
left a comment
There was a problem hiding this comment.
Review automatically exported from Phabricator review in Meta.
This PR needs a
|
Summary: This diff expands the ExecuTorch reinplace pass to be configurable for backends to use. We can introduce more in-place portable ops to expand the default set in ExecuTorch over time as well. Note: if two tensors have the same spec, they are now deduped and emitted as the same value. Previously (with copy_ writeback), the synthetic out got a new value, but with the same allocation info as the self value. What's new * reinplace_pass now takes ops_to_inplace= and inplace_overrides= so backends can extend the candidate set. The in-place form is auto-derived from the op schema, and registrations are validated up front. * Mutated arg positions are derived from Tensor(a!) annotations, so ops with multiple mutated args are handled correctly. kwargs are also forwarded to the in-place op (e.g. accumulate= for index_put). * Fixed a bug where, after a successful rewrite, the new node's input nodes were not added to seen_nodes, so an upstream candidate mutating one of those reads could incorrectly look safe to reinplace. Covered by a new index_put-only regression test. * Memory planner now aliases an in-place op's result TensorSpec onto its mutated input's spec (schema-driven via output_to_aliased_input_map), so they share an allocation slot. * Emitter dedupes by TensorSpec identity so two FX nodes sharing one spec share one Value in the lowered IR. Reviewed By: JacobSzwejbka Differential Revision: D104152427
24b23f5 to
a2cdaba
Compare
Summary: This diff expands the ExecuTorch reinplace pass to be configurable for backends to use. We can introduce more in-place portable ops to expand the default set in ExecuTorch over time as well. Note: if two tensors have the same spec, they are now deduped and emitted as the same value. Previously (with copy_ writeback), the synthetic out got a new value, but with the same allocation info as the self value. What's new * reinplace_pass now takes ops_to_inplace= and inplace_overrides= so backends can extend the candidate set. The in-place form is auto-derived from the op schema, and registrations are validated up front. * Mutated arg positions are derived from Tensor(a!) annotations, so ops with multiple mutated args are handled correctly. kwargs are also forwarded to the in-place op (e.g. accumulate= for index_put). * Fixed a bug where, after a successful rewrite, the new node's input nodes were not added to seen_nodes, so an upstream candidate mutating one of those reads could incorrectly look safe to reinplace. Covered by a new index_put-only regression test. * Memory planner now aliases an in-place op's result TensorSpec onto its mutated input's spec (schema-driven via output_to_aliased_input_map), so they share an allocation slot. * Emitter dedupes by TensorSpec identity so two FX nodes sharing one spec share one Value in the lowered IR. Reviewed By: JacobSzwejbka Differential Revision: D104152427
a2cdaba to
b6a82ef
Compare
Summary: This diff expands the ExecuTorch reinplace pass to be configurable for backends to use. We can introduce more in-place portable ops to expand the default set in ExecuTorch over time as well. Note: if two tensors have the same spec, they are now deduped and emitted as the same value. Previously (with copy_ writeback), the synthetic out got a new value, but with the same allocation info as the self value. What's new * reinplace_pass now takes ops_to_inplace= and inplace_overrides= so backends can extend the candidate set. The in-place form is auto-derived from the op schema, and registrations are validated up front. * Mutated arg positions are derived from Tensor(a!) annotations, so ops with multiple mutated args are handled correctly. kwargs are also forwarded to the in-place op (e.g. accumulate= for index_put). * Fixed a bug where, after a successful rewrite, the new node's input nodes were not added to seen_nodes, so an upstream candidate mutating one of those reads could incorrectly look safe to reinplace. Covered by a new index_put-only regression test. * Memory planner now aliases an in-place op's result TensorSpec onto its mutated input's spec (schema-driven via output_to_aliased_input_map), so they share an allocation slot. * Emitter dedupes by TensorSpec identity so two FX nodes sharing one spec share one Value in the lowered IR. Reviewed By: JacobSzwejbka Differential Revision: D104152427
Summary: This diff expands the ExecuTorch reinplace pass to be configurable for backends to use. We can introduce more in-place portable ops to expand the default set in ExecuTorch over time as well. Note: if two tensors have the same spec, they are now deduped and emitted as the same value. Previously (with copy_ writeback), the synthetic out got a new value, but with the same allocation info as the self value. What's new * reinplace_pass now takes ops_to_inplace= and inplace_overrides= so backends can extend the candidate set. The in-place form is auto-derived from the op schema, and registrations are validated up front. * Mutated arg positions are derived from Tensor(a!) annotations, so ops with multiple mutated args are handled correctly. kwargs are also forwarded to the in-place op (e.g. accumulate= for index_put). * Fixed a bug where, after a successful rewrite, the new node's input nodes were not added to seen_nodes, so an upstream candidate mutating one of those reads could incorrectly look safe to reinplace. Covered by a new index_put-only regression test. * Memory planner now aliases an in-place op's result TensorSpec onto its mutated input's spec (schema-driven via output_to_aliased_input_map), so they share an allocation slot. * Emitter dedupes by TensorSpec identity so two FX nodes sharing one spec share one Value in the lowered IR. Reviewed By: JacobSzwejbka Differential Revision: D104152427
f4e0451 to
27995cb
Compare
Summary:
This diff expands the ExecuTorch reinplace pass to be configurable for backends to use.
We can introduce more in-place portable ops to expand the default set in ExecuTorch over time as well.
Note: if two tensors have the same spec, they are now deduped and emitted as the same value. Previously (with copy_ writeback), the synthetic out got a new value, but with the same allocation info as the self value.
What's new
Reviewed By: JacobSzwejbka
Differential Revision: D104152427