Convert contiguous select_copy to zero-copy view in ReplaceViewCopyWithViewPass (#19198)#19198
Convert contiguous select_copy to zero-copy view in ReplaceViewCopyWithViewPass (#19198)#19198JacobSzwejbka wants to merge 1 commit intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19198
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 2 Unrelated FailuresAs of commit 6387dc4 with merge base e4ede92 ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@JacobSzwejbka has exported this pull request. If you are a Meta employee, you can view the originating Diff in D102396195. |
This PR needs a
|
…thViewPass (pytorch#19198) Summary: Extends the ReplaceViewCopyWithViewPass to convert `select_copy` ops to zero-copy `memory.select` views when the output is a contiguous sub-region of the base tensor. This is the same pattern used for `view_copy` -> `memory.view`, but for select operations. The pass checks that the base is densely packed, non-constant, and static, and that the selected output forms a dense packing. It uses the base spec dim_order to compute actual memory strides, so this works for any contiguous layout (C-contiguous, channels-last, etc.). For static memory-planned subviews, the emitter elides the op entirely (no runtime instruction) by serializing tensor metadata with `mem_offset = base_offset + byte_delta`. For dynamic shapes, a new `executorch_prim::et_select` runtime op sets the output data pointer to `self.data_ptr + offset`. Changes: - `exir/memory.py`: Added `memory.select` function - `exir/passes/replace_view_copy_with_view_pass.py`: Extended `_ViewSpec` with `byte_offset`, `stride`, `dim_order` params; added contiguity check using `stride_from_dim_order`; added select_copy handling in the pass - Pipeline integration: memory planner, to_out_var skiplist, emitter, serialization - `kernels/prim_ops/et_select.{h,cpp}`: C++ runtime op for dynamic select views - Tests: 5 new Python tests + 1 C++ test Authored with Claude. Differential Revision: D102396195
6aeff0a to
340c71b
Compare
…thViewPass (pytorch#19198) Summary: Extends the ReplaceViewCopyWithViewPass to convert `select_copy` ops to zero-copy `memory.select` views when the output is a contiguous sub-region of the base tensor. This is the same pattern used for `view_copy` -> `memory.view`, but for select operations. The pass checks that the base is densely packed, non-constant, and static, and that the selected output forms a dense packing. It uses the base spec dim_order to compute actual memory strides, so this works for any contiguous layout (C-contiguous, channels-last, etc.). For static memory-planned subviews, the emitter elides the op entirely (no runtime instruction) by serializing tensor metadata with `mem_offset = base_offset + byte_delta`. For dynamic shapes, a new `executorch_prim::et_select` runtime op sets the output data pointer to `self.data_ptr + offset`. Changes: - `exir/memory.py`: Added `memory.select` function - `exir/passes/replace_view_copy_with_view_pass.py`: Extended `_ViewSpec` with `byte_offset`, `stride`, `dim_order` params; added contiguity check using `stride_from_dim_order`; added select_copy handling in the pass - Pipeline integration: memory planner, to_out_var skiplist, emitter, serialization - `kernels/prim_ops/et_select.{h,cpp}`: C++ runtime op for dynamic select views - Tests: 5 new Python tests + 1 C++ test Authored with Claude. Differential Revision: D102396195
340c71b to
2e6d5ea
Compare
…thViewPass (pytorch#19198) Summary: Extends the ReplaceViewCopyWithViewPass to convert `select_copy` ops to zero-copy `memory.select` views when the output is a contiguous sub-region of the base tensor. This is the same pattern used for `view_copy` -> `memory.view`, but for select operations. The pass checks that the base is densely packed, non-constant, and static, and that the selected output forms a dense packing. For static memory-planned subviews, the emitter elides the op entirely (no runtime instruction) by serializing tensor metadata with `mem_offset = base_offset + byte_delta`. For dynamic shapes, a new `executorch_prim::et_select` runtime op sets the output data pointer to `self.data_ptr + offset`. Changes: - `exir/memory.py`: Added `memory.select` function - `exir/passes/replace_view_copy_with_view_pass.py`: Extended `_ViewSpec` with `byte_offset`, `stride`, `dim_order` params; added contiguity check using `stride_from_dim_order`; added select_copy handling in the pass - Pipeline integration: memory planner, to_out_var skiplist, emitter, serialization - `kernels/prim_ops/et_select.{h,cpp}`: C++ runtime op for dynamic select views - Tests: 5 new Python tests + 1 C++ test Authored with Claude. Differential Revision: D102396195
2e6d5ea to
ed21ca2
Compare
…thViewPass (pytorch#19198) Summary: Extends the ReplaceViewCopyWithViewPass to convert `select_copy` ops to zero-copy `memory.select` views when the output is a contiguous sub-region of the base tensor. This is the same pattern used for `view_copy` -> `memory.view`, but for select operations. The pass checks that the base is densely packed, non-constant, and static, and that the selected output forms a dense packing. For static memory-planned subviews, the emitter elides the op entirely (no runtime instruction) by serializing tensor metadata with `mem_offset = base_offset + byte_delta`. For dynamic shapes, a new `executorch_prim::et_select` runtime op sets the output data pointer to `self.data_ptr + offset`. Changes: - `exir/memory.py`: Added `memory.select` function - `exir/passes/replace_view_copy_with_view_pass.py`: Extended `_ViewSpec` with `byte_offset`, `stride`, `dim_order` params; added contiguity check using `stride_from_dim_order`; added select_copy handling in the pass - Pipeline integration: memory planner, to_out_var skiplist, emitter, serialization - `kernels/prim_ops/et_select.{h,cpp}`: C++ runtime op for dynamic select views - Tests: 5 new Python tests + 1 C++ test Authored with Claude. Differential Revision: D102396195
ed21ca2 to
fa78ec0
Compare
…thViewPass (pytorch#19198) Summary: Extends the ReplaceViewCopyWithViewPass to convert `select_copy` ops to zero-copy `memory.select` views when the output is a contiguous sub-region of the base tensor. This is the same pattern used for `view_copy` -> `memory.view`, but for select operations. The pass checks that the base is densely packed, non-constant, and static, and that the selected output forms a dense packing. For static memory-planned subviews, the emitter elides the op entirely (no runtime instruction) by serializing tensor metadata with `mem_offset = base_offset + byte_delta`. For dynamic shapes, a new `executorch_prim::et_select` runtime op sets the output data pointer to `self.data_ptr + offset`. Changes: - `exir/memory.py`: Added `memory.select` function - `exir/passes/replace_view_copy_with_view_pass.py`: Extended `_ViewSpec` with `byte_offset`, `stride`, `dim_order` params; added contiguity check using `stride_from_dim_order`; added select_copy handling in the pass - Pipeline integration: memory planner, to_out_var skiplist, emitter, serialization - `kernels/prim_ops/et_select.{h,cpp}`: C++ runtime op for dynamic select views - Tests: 5 new Python tests + 1 C++ test Authored with Claude. Differential Revision: D102396195
fa78ec0 to
199c9fe
Compare
…thViewPass (pytorch#19198) Summary: Extends the ReplaceViewCopyWithViewPass to convert `select_copy` ops to zero-copy `memory.select` views when the output is a contiguous sub-region of the base tensor. This is the same pattern used for `view_copy` -> `memory.view`, but for select operations. The pass checks that the base is densely packed, non-constant, and static, and that the selected output forms a dense packing. For static memory-planned subviews, the emitter elides the op entirely (no runtime instruction) by serializing tensor metadata with `mem_offset = base_offset + byte_delta`. For dynamic shapes, a new `executorch_prim::et_select` runtime op sets the output data pointer to `self.data_ptr + offset`. Changes: - `exir/memory.py`: Added `memory.select` function - `exir/passes/replace_view_copy_with_view_pass.py`: Extended `_ViewSpec` with `byte_offset`, `stride`, `dim_order` params; added contiguity check using `stride_from_dim_order`; added select_copy handling in the pass - Pipeline integration: memory planner, to_out_var skiplist, emitter, serialization - `kernels/prim_ops/et_select.{h,cpp}`: C++ runtime op for dynamic select views - Tests: 5 new Python tests + 1 C++ test Authored with Claude. Differential Revision: D102396195
7177f19 to
d546d84
Compare
…thViewPass (pytorch#19198) Summary: Extends the ReplaceViewCopyWithViewPass to convert `select_copy` ops to zero-copy `memory.select` views when the output is a contiguous sub-region of the base tensor. This is the same pattern used for `view_copy` -> `memory.view`, but for select operations. The pass checks that the base is densely packed, non-constant, and static, and that the selected output forms a dense packing. For static memory-planned subviews, the emitter elides the op entirely (no runtime instruction) by serializing tensor metadata with `mem_offset = base_offset + byte_delta`. For dynamic shapes, a new `executorch_prim::et_select` runtime op sets the output data pointer to `self.data_ptr + offset`. Changes: - `exir/memory.py`: Added `memory.select` function - `exir/passes/replace_view_copy_with_view_pass.py`: Extended `_ViewSpec` with `byte_offset`, `stride`, `dim_order` params; added contiguity check using `stride_from_dim_order`; added select_copy handling in the pass - Pipeline integration: memory planner, to_out_var skiplist, emitter, serialization - `kernels/prim_ops/et_select.{h,cpp}`: C++ runtime op for dynamic select views - Tests: 5 new Python tests + 1 C++ test Authored with Claude. Reviewed By: metascroy Differential Revision: D102396195
…thViewPass (pytorch#19198) Summary: Extends the ReplaceViewCopyWithViewPass to convert `select_copy` ops to zero-copy `memory.select` views when the output is a contiguous sub-region of the base tensor. This is the same pattern used for `view_copy` -> `memory.view`, but for select operations. The pass checks that the base is densely packed, non-constant, and static, and that the selected output forms a dense packing. For static memory-planned subviews, the emitter elides the op entirely (no runtime instruction) by serializing tensor metadata with `mem_offset = base_offset + byte_delta`. For dynamic shapes, a new `executorch_prim::et_select` runtime op sets the output data pointer to `self.data_ptr + offset`. Changes: - `exir/memory.py`: Added `memory.select` function - `exir/passes/replace_view_copy_with_view_pass.py`: Extended `_ViewSpec` with `byte_offset`, `stride`, `dim_order` params; added contiguity check using `stride_from_dim_order`; added select_copy handling in the pass - Pipeline integration: memory planner, to_out_var skiplist, emitter, serialization - `kernels/prim_ops/et_select.{h,cpp}`: C++ runtime op for dynamic select views - Tests: 5 new Python tests + 1 C++ test Authored with Claude. Reviewed By: metascroy Differential Revision: D102396195
d546d84 to
2b185e0
Compare
…thViewPass (pytorch#19198) Summary: Extends the ReplaceViewCopyWithViewPass to convert `select_copy` ops to zero-copy `memory.select` views when the output is a contiguous sub-region of the base tensor. This is the same pattern used for `view_copy` -> `memory.view`, but for select operations. The pass checks that the base is densely packed, non-constant, and static, and that the selected output forms a dense packing. For static memory-planned subviews, the emitter elides the op entirely (no runtime instruction) by serializing tensor metadata with `mem_offset = base_offset + byte_delta`. For dynamic shapes, a new `executorch_prim::et_select` runtime op sets the output data pointer to `self.data_ptr + offset`. Changes: - `exir/memory.py`: Added `memory.select` function - `exir/passes/replace_view_copy_with_view_pass.py`: Extended `_ViewSpec` with `byte_offset`, `stride`, `dim_order` params; added contiguity check using `stride_from_dim_order`; added select_copy handling in the pass - Pipeline integration: memory planner, to_out_var skiplist, emitter, serialization - `kernels/prim_ops/et_select.{h,cpp}`: C++ runtime op for dynamic select views - Tests: 5 new Python tests + 1 C++ test Authored with Claude. Reviewed By: metascroy Differential Revision: D102396195
2b185e0 to
6387dc4
Compare
Summary:
Extends the ReplaceViewCopyWithViewPass to convert
select_copyops to zero-copymemory.selectviews when the output is a contiguous sub-region of the base tensor. This is the same pattern used forview_copy->memory.view, but for select operations.The pass checks that the base is densely packed, non-constant, and static, and that the selected output forms a dense packing.
For static memory-planned subviews, the emitter elides the op entirely (no runtime instruction) by serializing tensor metadata with
mem_offset = base_offset + byte_delta. For dynamic shapes, a newexecutorch_prim::et_selectruntime op sets the output data pointer toself.data_ptr + offset.Changes:
exir/memory.py: Addedmemory.selectfunctionexir/passes/replace_view_copy_with_view_pass.py: Extended_ViewSpecwithbyte_offset,stride,dim_orderparams; added contiguity check usingstride_from_dim_order; added select_copy handling in the passkernels/prim_ops/et_select.{h,cpp}: C++ runtime op for dynamic select viewsAuthored with Claude.
Reviewed By: metascroy
Differential Revision: D102396195