Skip to content

Convert contiguous select_copy to zero-copy view in ReplaceViewCopyWithViewPass (#19198)#19198

Open
JacobSzwejbka wants to merge 1 commit intopytorch:mainfrom
JacobSzwejbka:export-D102396195
Open

Convert contiguous select_copy to zero-copy view in ReplaceViewCopyWithViewPass (#19198)#19198
JacobSzwejbka wants to merge 1 commit intopytorch:mainfrom
JacobSzwejbka:export-D102396195

Conversation

@JacobSzwejbka
Copy link
Copy Markdown
Contributor

@JacobSzwejbka JacobSzwejbka commented Apr 28, 2026

Summary:

Extends the ReplaceViewCopyWithViewPass to convert select_copy ops to zero-copy memory.select views when the output is a contiguous sub-region of the base tensor. This is the same pattern used for view_copy -> memory.view, but for select operations.

The pass checks that the base is densely packed, non-constant, and static, and that the selected output forms a dense packing.

For static memory-planned subviews, the emitter elides the op entirely (no runtime instruction) by serializing tensor metadata with mem_offset = base_offset + byte_delta. For dynamic shapes, a new executorch_prim::et_select runtime op sets the output data pointer to self.data_ptr + offset.

Changes:

  • exir/memory.py: Added memory.select function
  • exir/passes/replace_view_copy_with_view_pass.py: Extended _ViewSpec with byte_offset, stride, dim_order params; added contiguity check using stride_from_dim_order; added select_copy handling in the pass
  • Pipeline integration: memory planner, to_out_var skiplist, emitter, serialization
  • kernels/prim_ops/et_select.{h,cpp}: C++ runtime op for dynamic select views
  • Tests: 5 new Python tests + 1 C++ test

Authored with Claude.

Reviewed By: metascroy

Differential Revision: D102396195

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 28, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19198

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 28, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented Apr 28, 2026

@JacobSzwejbka has exported this pull request. If you are a Meta employee, you can view the originating Diff in D102396195.

@github-actions
Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@meta-codesync meta-codesync Bot changed the title Convert contiguous select_copy to zero-copy view in ReplaceViewCopyWithViewPass Convert contiguous select_copy to zero-copy view in ReplaceViewCopyWithViewPass (#19198) Apr 29, 2026
JacobSzwejbka added a commit to JacobSzwejbka/executorch-1 that referenced this pull request Apr 29, 2026
…thViewPass (pytorch#19198)

Summary:

Extends the ReplaceViewCopyWithViewPass to convert `select_copy` ops to zero-copy `memory.select` views when the output is a contiguous sub-region of the base tensor. This is the same pattern used for `view_copy` -> `memory.view`, but for select operations.

The pass checks that the base is densely packed, non-constant, and static, and that the selected output forms a dense packing. It uses the base spec dim_order to compute actual memory strides, so this works for any contiguous layout (C-contiguous, channels-last, etc.).

For static memory-planned subviews, the emitter elides the op entirely (no runtime instruction) by serializing tensor metadata with `mem_offset = base_offset + byte_delta`. For dynamic shapes, a new `executorch_prim::et_select` runtime op sets the output data pointer to `self.data_ptr + offset`.

Changes:
- `exir/memory.py`: Added `memory.select` function
- `exir/passes/replace_view_copy_with_view_pass.py`: Extended `_ViewSpec` with `byte_offset`, `stride`, `dim_order` params; added contiguity check using `stride_from_dim_order`; added select_copy handling in the pass
- Pipeline integration: memory planner, to_out_var skiplist, emitter, serialization
- `kernels/prim_ops/et_select.{h,cpp}`: C++ runtime op for dynamic select views
- Tests: 5 new Python tests + 1 C++ test

Authored with Claude.

Differential Revision: D102396195
JacobSzwejbka added a commit to JacobSzwejbka/executorch-1 that referenced this pull request Apr 29, 2026
…thViewPass (pytorch#19198)

Summary:

Extends the ReplaceViewCopyWithViewPass to convert `select_copy` ops to zero-copy `memory.select` views when the output is a contiguous sub-region of the base tensor. This is the same pattern used for `view_copy` -> `memory.view`, but for select operations.

The pass checks that the base is densely packed, non-constant, and static, and that the selected output forms a dense packing. It uses the base spec dim_order to compute actual memory strides, so this works for any contiguous layout (C-contiguous, channels-last, etc.).

For static memory-planned subviews, the emitter elides the op entirely (no runtime instruction) by serializing tensor metadata with `mem_offset = base_offset + byte_delta`. For dynamic shapes, a new `executorch_prim::et_select` runtime op sets the output data pointer to `self.data_ptr + offset`.

Changes:
- `exir/memory.py`: Added `memory.select` function
- `exir/passes/replace_view_copy_with_view_pass.py`: Extended `_ViewSpec` with `byte_offset`, `stride`, `dim_order` params; added contiguity check using `stride_from_dim_order`; added select_copy handling in the pass
- Pipeline integration: memory planner, to_out_var skiplist, emitter, serialization
- `kernels/prim_ops/et_select.{h,cpp}`: C++ runtime op for dynamic select views
- Tests: 5 new Python tests + 1 C++ test

Authored with Claude.

Differential Revision: D102396195
JacobSzwejbka added a commit to JacobSzwejbka/executorch-1 that referenced this pull request Apr 29, 2026
…thViewPass (pytorch#19198)

Summary:

Extends the ReplaceViewCopyWithViewPass to convert `select_copy` ops to zero-copy `memory.select` views when the output is a contiguous sub-region of the base tensor. This is the same pattern used for `view_copy` -> `memory.view`, but for select operations.

The pass checks that the base is densely packed, non-constant, and static, and that the selected output forms a dense packing.

For static memory-planned subviews, the emitter elides the op entirely (no runtime instruction) by serializing tensor metadata with `mem_offset = base_offset + byte_delta`. For dynamic shapes, a new `executorch_prim::et_select` runtime op sets the output data pointer to `self.data_ptr + offset`.

Changes:
- `exir/memory.py`: Added `memory.select` function
- `exir/passes/replace_view_copy_with_view_pass.py`: Extended `_ViewSpec` with `byte_offset`, `stride`, `dim_order` params; added contiguity check using `stride_from_dim_order`; added select_copy handling in the pass
- Pipeline integration: memory planner, to_out_var skiplist, emitter, serialization
- `kernels/prim_ops/et_select.{h,cpp}`: C++ runtime op for dynamic select views
- Tests: 5 new Python tests + 1 C++ test

Authored with Claude.

Differential Revision: D102396195
JacobSzwejbka added a commit to JacobSzwejbka/executorch-1 that referenced this pull request Apr 29, 2026
…thViewPass (pytorch#19198)

Summary:

Extends the ReplaceViewCopyWithViewPass to convert `select_copy` ops to zero-copy `memory.select` views when the output is a contiguous sub-region of the base tensor. This is the same pattern used for `view_copy` -> `memory.view`, but for select operations.

The pass checks that the base is densely packed, non-constant, and static, and that the selected output forms a dense packing.

For static memory-planned subviews, the emitter elides the op entirely (no runtime instruction) by serializing tensor metadata with `mem_offset = base_offset + byte_delta`. For dynamic shapes, a new `executorch_prim::et_select` runtime op sets the output data pointer to `self.data_ptr + offset`.

Changes:
- `exir/memory.py`: Added `memory.select` function
- `exir/passes/replace_view_copy_with_view_pass.py`: Extended `_ViewSpec` with `byte_offset`, `stride`, `dim_order` params; added contiguity check using `stride_from_dim_order`; added select_copy handling in the pass
- Pipeline integration: memory planner, to_out_var skiplist, emitter, serialization
- `kernels/prim_ops/et_select.{h,cpp}`: C++ runtime op for dynamic select views
- Tests: 5 new Python tests + 1 C++ test

Authored with Claude.

Differential Revision: D102396195
JacobSzwejbka added a commit to JacobSzwejbka/executorch-1 that referenced this pull request Apr 29, 2026
…thViewPass (pytorch#19198)

Summary:

Extends the ReplaceViewCopyWithViewPass to convert `select_copy` ops to zero-copy `memory.select` views when the output is a contiguous sub-region of the base tensor. This is the same pattern used for `view_copy` -> `memory.view`, but for select operations.

The pass checks that the base is densely packed, non-constant, and static, and that the selected output forms a dense packing.

For static memory-planned subviews, the emitter elides the op entirely (no runtime instruction) by serializing tensor metadata with `mem_offset = base_offset + byte_delta`. For dynamic shapes, a new `executorch_prim::et_select` runtime op sets the output data pointer to `self.data_ptr + offset`.

Changes:
- `exir/memory.py`: Added `memory.select` function
- `exir/passes/replace_view_copy_with_view_pass.py`: Extended `_ViewSpec` with `byte_offset`, `stride`, `dim_order` params; added contiguity check using `stride_from_dim_order`; added select_copy handling in the pass
- Pipeline integration: memory planner, to_out_var skiplist, emitter, serialization
- `kernels/prim_ops/et_select.{h,cpp}`: C++ runtime op for dynamic select views
- Tests: 5 new Python tests + 1 C++ test

Authored with Claude.

Differential Revision: D102396195
JacobSzwejbka added a commit to JacobSzwejbka/executorch-1 that referenced this pull request Apr 29, 2026
…thViewPass (pytorch#19198)

Summary:

Extends the ReplaceViewCopyWithViewPass to convert `select_copy` ops to zero-copy `memory.select` views when the output is a contiguous sub-region of the base tensor. This is the same pattern used for `view_copy` -> `memory.view`, but for select operations.

The pass checks that the base is densely packed, non-constant, and static, and that the selected output forms a dense packing.

For static memory-planned subviews, the emitter elides the op entirely (no runtime instruction) by serializing tensor metadata with `mem_offset = base_offset + byte_delta`. For dynamic shapes, a new `executorch_prim::et_select` runtime op sets the output data pointer to `self.data_ptr + offset`.

Changes:
- `exir/memory.py`: Added `memory.select` function
- `exir/passes/replace_view_copy_with_view_pass.py`: Extended `_ViewSpec` with `byte_offset`, `stride`, `dim_order` params; added contiguity check using `stride_from_dim_order`; added select_copy handling in the pass
- Pipeline integration: memory planner, to_out_var skiplist, emitter, serialization
- `kernels/prim_ops/et_select.{h,cpp}`: C++ runtime op for dynamic select views
- Tests: 5 new Python tests + 1 C++ test

Authored with Claude.

Differential Revision: D102396195
@JacobSzwejbka JacobSzwejbka force-pushed the export-D102396195 branch 2 times, most recently from 7177f19 to d546d84 Compare April 29, 2026 21:11
JacobSzwejbka added a commit to JacobSzwejbka/executorch-1 that referenced this pull request Apr 29, 2026
…thViewPass (pytorch#19198)

Summary:

Extends the ReplaceViewCopyWithViewPass to convert `select_copy` ops to zero-copy `memory.select` views when the output is a contiguous sub-region of the base tensor. This is the same pattern used for `view_copy` -> `memory.view`, but for select operations.

The pass checks that the base is densely packed, non-constant, and static, and that the selected output forms a dense packing.

For static memory-planned subviews, the emitter elides the op entirely (no runtime instruction) by serializing tensor metadata with `mem_offset = base_offset + byte_delta`. For dynamic shapes, a new `executorch_prim::et_select` runtime op sets the output data pointer to `self.data_ptr + offset`.

Changes:
- `exir/memory.py`: Added `memory.select` function
- `exir/passes/replace_view_copy_with_view_pass.py`: Extended `_ViewSpec` with `byte_offset`, `stride`, `dim_order` params; added contiguity check using `stride_from_dim_order`; added select_copy handling in the pass
- Pipeline integration: memory planner, to_out_var skiplist, emitter, serialization
- `kernels/prim_ops/et_select.{h,cpp}`: C++ runtime op for dynamic select views
- Tests: 5 new Python tests + 1 C++ test

Authored with Claude.

Reviewed By: metascroy

Differential Revision: D102396195
JacobSzwejbka added a commit to JacobSzwejbka/executorch-1 that referenced this pull request Apr 29, 2026
…thViewPass (pytorch#19198)

Summary:

Extends the ReplaceViewCopyWithViewPass to convert `select_copy` ops to zero-copy `memory.select` views when the output is a contiguous sub-region of the base tensor. This is the same pattern used for `view_copy` -> `memory.view`, but for select operations.

The pass checks that the base is densely packed, non-constant, and static, and that the selected output forms a dense packing.

For static memory-planned subviews, the emitter elides the op entirely (no runtime instruction) by serializing tensor metadata with `mem_offset = base_offset + byte_delta`. For dynamic shapes, a new `executorch_prim::et_select` runtime op sets the output data pointer to `self.data_ptr + offset`.

Changes:
- `exir/memory.py`: Added `memory.select` function
- `exir/passes/replace_view_copy_with_view_pass.py`: Extended `_ViewSpec` with `byte_offset`, `stride`, `dim_order` params; added contiguity check using `stride_from_dim_order`; added select_copy handling in the pass
- Pipeline integration: memory planner, to_out_var skiplist, emitter, serialization
- `kernels/prim_ops/et_select.{h,cpp}`: C++ runtime op for dynamic select views
- Tests: 5 new Python tests + 1 C++ test

Authored with Claude.

Reviewed By: metascroy

Differential Revision: D102396195
…thViewPass (pytorch#19198)

Summary:

Extends the ReplaceViewCopyWithViewPass to convert `select_copy` ops to zero-copy `memory.select` views when the output is a contiguous sub-region of the base tensor. This is the same pattern used for `view_copy` -> `memory.view`, but for select operations.

The pass checks that the base is densely packed, non-constant, and static, and that the selected output forms a dense packing.

For static memory-planned subviews, the emitter elides the op entirely (no runtime instruction) by serializing tensor metadata with `mem_offset = base_offset + byte_delta`. For dynamic shapes, a new `executorch_prim::et_select` runtime op sets the output data pointer to `self.data_ptr + offset`.

Changes:
- `exir/memory.py`: Added `memory.select` function
- `exir/passes/replace_view_copy_with_view_pass.py`: Extended `_ViewSpec` with `byte_offset`, `stride`, `dim_order` params; added contiguity check using `stride_from_dim_order`; added select_copy handling in the pass
- Pipeline integration: memory planner, to_out_var skiplist, emitter, serialization
- `kernels/prim_ops/et_select.{h,cpp}`: C++ runtime op for dynamic select views
- Tests: 5 new Python tests + 1 C++ test

Authored with Claude.

Reviewed By: metascroy

Differential Revision: D102396195
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant