[ET-VK][ez] Add AOT support for PackedInt8_4C1W dtype#17389
Merged
meta-codesync[bot] merged 8 commits intogh/SS-JIA/420/basefrom Feb 13, 2026
Merged
[ET-VK][ez] Add AOT support for PackedInt8_4C1W dtype#17389meta-codesync[bot] merged 8 commits intogh/SS-JIA/420/basefrom
meta-codesync[bot] merged 8 commits intogh/SS-JIA/420/basefrom
Conversation
This adds end-to-end support for the PackedInt8_4C1W memory layout throughout the serialization and AOT pipeline. The 4C1W layout packs 4 channels into a single texel with width-major ordering, which is the natural output layout for convolutions that produce channel-packed results. - Adds PACKED_INT8_4C1W = 8 to the FlatBuffers schema and Python schema class - Adds deserialization mapping in VulkanBackend.cpp - Updates quantize/dequantize per-tensor op registrations to accept any PackedInt8 layout (not just 4W4C), enabling the layout propagation pass to choose the optimal layout - Adds new TensorRepSet constants: PACKED_INT8_BUFFER (all quantized layouts), PACKED_INT8_4C1W_BUFFER, and PACKED_INT8_CHANNELS_PACKED_BUFFER (4W4C + 4C1W) Differential Revision: [D93000167](https://our.internmc.facebook.com/intern/diff/D93000167/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17389
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit c139846 with merge base dcfd12d ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This was referenced Feb 11, 2026
This was referenced Feb 11, 2026
This PR needs a
|
This adds end-to-end support for the PackedInt8_4C1W memory layout throughout the serialization and AOT pipeline. The 4C1W layout packs 4 channels into a single texel with width-major ordering, which is the natural output layout for convolutions that produce channel-packed results. - Adds PACKED_INT8_4C1W = 8 to the FlatBuffers schema and Python schema class - Adds deserialization mapping in VulkanBackend.cpp - Updates quantize/dequantize per-tensor op registrations to accept any PackedInt8 layout (not just 4W4C), enabling the layout propagation pass to choose the optimal layout - Adds new TensorRepSet constants: PACKED_INT8_BUFFER (all quantized layouts), PACKED_INT8_4C1W_BUFFER, and PACKED_INT8_CHANNELS_PACKED_BUFFER (4W4C + 4C1W) Differential Revision: [D93000167](https://our.internmc.facebook.com/intern/diff/D93000167/) [ghstack-poisoned]
This adds end-to-end support for the PackedInt8_4C1W memory layout throughout the serialization and AOT pipeline. The 4C1W layout packs 4 channels into a single texel with width-major ordering, which is the natural output layout for convolutions that produce channel-packed results. - Adds PACKED_INT8_4C1W = 8 to the FlatBuffers schema and Python schema class - Adds deserialization mapping in VulkanBackend.cpp - Updates quantize/dequantize per-tensor op registrations to accept any PackedInt8 layout (not just 4W4C), enabling the layout propagation pass to choose the optimal layout - Adds new TensorRepSet constants: PACKED_INT8_BUFFER (all quantized layouts), PACKED_INT8_4C1W_BUFFER, and PACKED_INT8_CHANNELS_PACKED_BUFFER (4W4C + 4C1W) Differential Revision: [D93000167](https://our.internmc.facebook.com/intern/diff/D93000167/) [ghstack-poisoned]
This adds end-to-end support for the PackedInt8_4C1W memory layout throughout the serialization and AOT pipeline. The 4C1W layout packs 4 channels into a single texel with width-major ordering, which is the natural output layout for convolutions that produce channel-packed results. - Adds PACKED_INT8_4C1W = 8 to the FlatBuffers schema and Python schema class - Adds deserialization mapping in VulkanBackend.cpp - Updates quantize/dequantize per-tensor op registrations to accept any PackedInt8 layout (not just 4W4C), enabling the layout propagation pass to choose the optimal layout - Adds new TensorRepSet constants: PACKED_INT8_BUFFER (all quantized layouts), PACKED_INT8_4C1W_BUFFER, and PACKED_INT8_CHANNELS_PACKED_BUFFER (4W4C + 4C1W) Differential Revision: [D93000167](https://our.internmc.facebook.com/intern/diff/D93000167/) [ghstack-poisoned]
This adds end-to-end support for the PackedInt8_4C1W memory layout throughout the serialization and AOT pipeline. The 4C1W layout packs 4 channels into a single texel with width-major ordering, which is the natural output layout for convolutions that produce channel-packed results. - Adds PACKED_INT8_4C1W = 8 to the FlatBuffers schema and Python schema class - Adds deserialization mapping in VulkanBackend.cpp - Updates quantize/dequantize per-tensor op registrations to accept any PackedInt8 layout (not just 4W4C), enabling the layout propagation pass to choose the optimal layout - Adds new TensorRepSet constants: PACKED_INT8_BUFFER (all quantized layouts), PACKED_INT8_4C1W_BUFFER, and PACKED_INT8_CHANNELS_PACKED_BUFFER (4W4C + 4C1W) Differential Revision: [D93000167](https://our.internmc.facebook.com/intern/diff/D93000167/) [ghstack-poisoned]
manuelcandales
approved these changes
Feb 12, 2026
added 3 commits
February 12, 2026 15:34
This adds end-to-end support for the PackedInt8_4C1W memory layout throughout the serialization and AOT pipeline. The 4C1W layout packs 4 channels into a single texel with width-major ordering, which is the natural output layout for convolutions that produce channel-packed results. - Adds PACKED_INT8_4C1W = 8 to the FlatBuffers schema and Python schema class - Adds deserialization mapping in VulkanBackend.cpp - Updates quantize/dequantize per-tensor op registrations to accept any PackedInt8 layout (not just 4W4C), enabling the layout propagation pass to choose the optimal layout - Adds new TensorRepSet constants: PACKED_INT8_BUFFER (all quantized layouts), PACKED_INT8_4C1W_BUFFER, and PACKED_INT8_CHANNELS_PACKED_BUFFER (4W4C + 4C1W) Differential Revision: [D93000167](https://our.internmc.facebook.com/intern/diff/D93000167/) [ghstack-poisoned]
This adds end-to-end support for the PackedInt8_4C1W memory layout throughout the serialization and AOT pipeline. The 4C1W layout packs 4 channels into a single texel with width-major ordering, which is the natural output layout for convolutions that produce channel-packed results. - Adds PACKED_INT8_4C1W = 8 to the FlatBuffers schema and Python schema class - Adds deserialization mapping in VulkanBackend.cpp - Updates quantize/dequantize per-tensor op registrations to accept any PackedInt8 layout (not just 4W4C), enabling the layout propagation pass to choose the optimal layout - Adds new TensorRepSet constants: PACKED_INT8_BUFFER (all quantized layouts), PACKED_INT8_4C1W_BUFFER, and PACKED_INT8_CHANNELS_PACKED_BUFFER (4W4C + 4C1W) Differential Revision: [D93000167](https://our.internmc.facebook.com/intern/diff/D93000167/) [ghstack-poisoned]
This adds end-to-end support for the PackedInt8_4C1W memory layout throughout the serialization and AOT pipeline. The 4C1W layout packs 4 channels into a single texel with width-major ordering, which is the natural output layout for convolutions that produce channel-packed results. - Adds PACKED_INT8_4C1W = 8 to the FlatBuffers schema and Python schema class - Adds deserialization mapping in VulkanBackend.cpp - Updates quantize/dequantize per-tensor op registrations to accept any PackedInt8 layout (not just 4W4C), enabling the layout propagation pass to choose the optimal layout - Adds new TensorRepSet constants: PACKED_INT8_BUFFER (all quantized layouts), PACKED_INT8_4C1W_BUFFER, and PACKED_INT8_CHANNELS_PACKED_BUFFER (4W4C + 4C1W) Differential Revision: [D93000167](https://our.internmc.facebook.com/intern/diff/D93000167/) [ghstack-poisoned]
18bd1b3
into
gh/SS-JIA/420/base
195 of 197 checks passed
SS-JIA
pushed a commit
that referenced
this pull request
Feb 13, 2026
Pull Request resolved: #17389 This adds end-to-end support for the PackedInt8_4C1W memory layout throughout the serialization and AOT pipeline. The 4C1W layout packs 4 channels into a single texel with width-major ordering, which is the natural output layout for convolutions that produce channel-packed results. - Adds PACKED_INT8_4C1W = 8 to the FlatBuffers schema and Python schema class - Adds deserialization mapping in VulkanBackend.cpp - Updates quantize/dequantize per-tensor op registrations to accept any PackedInt8 layout (not just 4W4C), enabling the layout propagation pass to choose the optimal layout - Adds new TensorRepSet constants: PACKED_INT8_BUFFER (all quantized layouts), PACKED_INT8_4C1W_BUFFER, and PACKED_INT8_CHANNELS_PACKED_BUFFER (4W4C + 4C1W) ghstack-source-id: 340983070 @exported-using-ghexport Differential Revision: [D93000167](https://our.internmc.facebook.com/intern/diff/D93000167/)
SS-JIA
pushed a commit
that referenced
this pull request
Feb 13, 2026
Pull Request resolved: #17389 This adds end-to-end support for the PackedInt8_4C1W memory layout throughout the serialization and AOT pipeline. The 4C1W layout packs 4 channels into a single texel with width-major ordering, which is the natural output layout for convolutions that produce channel-packed results. - Adds PACKED_INT8_4C1W = 8 to the FlatBuffers schema and Python schema class - Adds deserialization mapping in VulkanBackend.cpp - Updates quantize/dequantize per-tensor op registrations to accept any PackedInt8 layout (not just 4W4C), enabling the layout propagation pass to choose the optimal layout - Adds new TensorRepSet constants: PACKED_INT8_BUFFER (all quantized layouts), PACKED_INT8_4C1W_BUFFER, and PACKED_INT8_CHANNELS_PACKED_BUFFER (4W4C + 4C1W) ghstack-source-id: 340983070 @exported-using-ghexport Differential Revision: [D93000167](https://our.internmc.facebook.com/intern/diff/D93000167/)
SS-JIA
pushed a commit
that referenced
this pull request
Feb 13, 2026
Pull Request resolved: #17389 This adds end-to-end support for the PackedInt8_4C1W memory layout throughout the serialization and AOT pipeline. The 4C1W layout packs 4 channels into a single texel with width-major ordering, which is the natural output layout for convolutions that produce channel-packed results. - Adds PACKED_INT8_4C1W = 8 to the FlatBuffers schema and Python schema class - Adds deserialization mapping in VulkanBackend.cpp - Updates quantize/dequantize per-tensor op registrations to accept any PackedInt8 layout (not just 4W4C), enabling the layout propagation pass to choose the optimal layout - Adds new TensorRepSet constants: PACKED_INT8_BUFFER (all quantized layouts), PACKED_INT8_4C1W_BUFFER, and PACKED_INT8_CHANNELS_PACKED_BUFFER (4W4C + 4C1W) ghstack-source-id: 340983070 @exported-using-ghexport Differential Revision: [D93000167](https://our.internmc.facebook.com/intern/diff/D93000167/)
SS-JIA
pushed a commit
that referenced
this pull request
Feb 13, 2026
Pull Request resolved: #17389 This adds end-to-end support for the PackedInt8_4C1W memory layout throughout the serialization and AOT pipeline. The 4C1W layout packs 4 channels into a single texel with width-major ordering, which is the natural output layout for convolutions that produce channel-packed results. - Adds PACKED_INT8_4C1W = 8 to the FlatBuffers schema and Python schema class - Adds deserialization mapping in VulkanBackend.cpp - Updates quantize/dequantize per-tensor op registrations to accept any PackedInt8 layout (not just 4W4C), enabling the layout propagation pass to choose the optimal layout - Adds new TensorRepSet constants: PACKED_INT8_BUFFER (all quantized layouts), PACKED_INT8_4C1W_BUFFER, and PACKED_INT8_CHANNELS_PACKED_BUFFER (4W4C + 4C1W) ghstack-source-id: 340983070 @exported-using-ghexport Differential Revision: [D93000167](https://our.internmc.facebook.com/intern/diff/D93000167/)
SS-JIA
pushed a commit
that referenced
this pull request
Feb 13, 2026
Pull Request resolved: #17389 This adds end-to-end support for the PackedInt8_4C1W memory layout throughout the serialization and AOT pipeline. The 4C1W layout packs 4 channels into a single texel with width-major ordering, which is the natural output layout for convolutions that produce channel-packed results. - Adds PACKED_INT8_4C1W = 8 to the FlatBuffers schema and Python schema class - Adds deserialization mapping in VulkanBackend.cpp - Updates quantize/dequantize per-tensor op registrations to accept any PackedInt8 layout (not just 4W4C), enabling the layout propagation pass to choose the optimal layout - Adds new TensorRepSet constants: PACKED_INT8_BUFFER (all quantized layouts), PACKED_INT8_4C1W_BUFFER, and PACKED_INT8_CHANNELS_PACKED_BUFFER (4W4C + 4C1W) ghstack-source-id: 340983070 @exported-using-ghexport Differential Revision: [D93000167](https://our.internmc.facebook.com/intern/diff/D93000167/)
chizkiyahu
pushed a commit
to chizkiyahu/executorch
that referenced
this pull request
Feb 23, 2026
Pull Request resolved: pytorch#17389 This adds end-to-end support for the PackedInt8_4C1W memory layout throughout the serialization and AOT pipeline. The 4C1W layout packs 4 channels into a single texel with width-major ordering, which is the natural output layout for convolutions that produce channel-packed results. - Adds PACKED_INT8_4C1W = 8 to the FlatBuffers schema and Python schema class - Adds deserialization mapping in VulkanBackend.cpp - Updates quantize/dequantize per-tensor op registrations to accept any PackedInt8 layout (not just 4W4C), enabling the layout propagation pass to choose the optimal layout - Adds new TensorRepSet constants: PACKED_INT8_BUFFER (all quantized layouts), PACKED_INT8_4C1W_BUFFER, and PACKED_INT8_CHANNELS_PACKED_BUFFER (4W4C + 4C1W) ghstack-source-id: 340983070 @exported-using-ghexport Differential Revision: [D93000167](https://our.internmc.facebook.com/intern/diff/D93000167/)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stack from ghstack (oldest at bottom):
This adds end-to-end support for the PackedInt8_4C1W memory layout throughout the serialization and AOT pipeline. The 4C1W layout packs 4 channels into a single texel with width-major ordering, which is the natural output layout for convolutions that produce channel-packed results.
Differential Revision: D93000167