[ET-VK][ez] Make q8ta_conv2d use 4C1W layout#17390
Merged
meta-codesync[bot] merged 8 commits intogh/SS-JIA/421/basefrom Feb 13, 2026
Merged
[ET-VK][ez] Make q8ta_conv2d use 4C1W layout#17390meta-codesync[bot] merged 8 commits intogh/SS-JIA/421/basefrom
meta-codesync[bot] merged 8 commits intogh/SS-JIA/421/basefrom
Conversation
This changes the q8ta_conv2d and q8ta_conv2d_dw operators' input layout from PackedInt8_4W4C to PackedInt8_4C1W in the op registry. The 4C1W layout aligns with the natural output format of channel-packed convolutions, avoiding unnecessary layout conversions between consecutive conv layers. Also adds explicit `outputs_storage` declarations (PACKED_INT8_CHANNELS_PACKED_BUFFER) to both the PW and general q8ta_conv2d op registrations, ensuring the layout propagation pass can correctly determine output layouts. Differential Revision: [D93000165](https://our.internmc.facebook.com/intern/diff/D93000165/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17390
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit a575c84 with merge base dcfd12d ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This was referenced Feb 11, 2026
This PR needs a
|
This changes the q8ta_conv2d and q8ta_conv2d_dw operators' input layout from PackedInt8_4W4C to PackedInt8_4C1W in the op registry. The 4C1W layout aligns with the natural output format of channel-packed convolutions, avoiding unnecessary layout conversions between consecutive conv layers. Also adds explicit `outputs_storage` declarations (PACKED_INT8_CHANNELS_PACKED_BUFFER) to both the PW and general q8ta_conv2d op registrations, ensuring the layout propagation pass can correctly determine output layouts. Differential Revision: [D93000165](https://our.internmc.facebook.com/intern/diff/D93000165/) [ghstack-poisoned]
This changes the q8ta_conv2d and q8ta_conv2d_dw operators' input layout from PackedInt8_4W4C to PackedInt8_4C1W in the op registry. The 4C1W layout aligns with the natural output format of channel-packed convolutions, avoiding unnecessary layout conversions between consecutive conv layers. Also adds explicit `outputs_storage` declarations (PACKED_INT8_CHANNELS_PACKED_BUFFER) to both the PW and general q8ta_conv2d op registrations, ensuring the layout propagation pass can correctly determine output layouts. Differential Revision: [D93000165](https://our.internmc.facebook.com/intern/diff/D93000165/) [ghstack-poisoned]
This changes the q8ta_conv2d and q8ta_conv2d_dw operators' input layout from PackedInt8_4W4C to PackedInt8_4C1W in the op registry. The 4C1W layout aligns with the natural output format of channel-packed convolutions, avoiding unnecessary layout conversions between consecutive conv layers. Also adds explicit `outputs_storage` declarations (PACKED_INT8_CHANNELS_PACKED_BUFFER) to both the PW and general q8ta_conv2d op registrations, ensuring the layout propagation pass can correctly determine output layouts. Differential Revision: [D93000165](https://our.internmc.facebook.com/intern/diff/D93000165/) [ghstack-poisoned]
This changes the q8ta_conv2d and q8ta_conv2d_dw operators' input layout from PackedInt8_4W4C to PackedInt8_4C1W in the op registry. The 4C1W layout aligns with the natural output format of channel-packed convolutions, avoiding unnecessary layout conversions between consecutive conv layers. Also adds explicit `outputs_storage` declarations (PACKED_INT8_CHANNELS_PACKED_BUFFER) to both the PW and general q8ta_conv2d op registrations, ensuring the layout propagation pass can correctly determine output layouts. Differential Revision: [D93000165](https://our.internmc.facebook.com/intern/diff/D93000165/) [ghstack-poisoned]
manuelcandales
approved these changes
Feb 12, 2026
added 3 commits
February 12, 2026 15:34
This changes the q8ta_conv2d and q8ta_conv2d_dw operators' input layout from PackedInt8_4W4C to PackedInt8_4C1W in the op registry. The 4C1W layout aligns with the natural output format of channel-packed convolutions, avoiding unnecessary layout conversions between consecutive conv layers. Also adds explicit `outputs_storage` declarations (PACKED_INT8_CHANNELS_PACKED_BUFFER) to both the PW and general q8ta_conv2d op registrations, ensuring the layout propagation pass can correctly determine output layouts. Differential Revision: [D93000165](https://our.internmc.facebook.com/intern/diff/D93000165/) [ghstack-poisoned]
This changes the q8ta_conv2d and q8ta_conv2d_dw operators' input layout from PackedInt8_4W4C to PackedInt8_4C1W in the op registry. The 4C1W layout aligns with the natural output format of channel-packed convolutions, avoiding unnecessary layout conversions between consecutive conv layers. Also adds explicit `outputs_storage` declarations (PACKED_INT8_CHANNELS_PACKED_BUFFER) to both the PW and general q8ta_conv2d op registrations, ensuring the layout propagation pass can correctly determine output layouts. Differential Revision: [D93000165](https://our.internmc.facebook.com/intern/diff/D93000165/) [ghstack-poisoned]
This changes the q8ta_conv2d and q8ta_conv2d_dw operators' input layout from PackedInt8_4W4C to PackedInt8_4C1W in the op registry. The 4C1W layout aligns with the natural output format of channel-packed convolutions, avoiding unnecessary layout conversions between consecutive conv layers. Also adds explicit `outputs_storage` declarations (PACKED_INT8_CHANNELS_PACKED_BUFFER) to both the PW and general q8ta_conv2d op registrations, ensuring the layout propagation pass can correctly determine output layouts. Differential Revision: [D93000165](https://our.internmc.facebook.com/intern/diff/D93000165/) [ghstack-poisoned]
27d0e53
into
gh/SS-JIA/421/base
195 of 197 checks passed
SS-JIA
pushed a commit
that referenced
this pull request
Feb 13, 2026
Pull Request resolved: #17390 This changes the q8ta_conv2d and q8ta_conv2d_dw operators' input layout from PackedInt8_4W4C to PackedInt8_4C1W in the op registry. The 4C1W layout aligns with the natural output format of channel-packed convolutions, avoiding unnecessary layout conversions between consecutive conv layers. Also adds explicit `outputs_storage` declarations (PACKED_INT8_CHANNELS_PACKED_BUFFER) to both the PW and general q8ta_conv2d op registrations, ensuring the layout propagation pass can correctly determine output layouts. ghstack-source-id: 340983082 @exported-using-ghexport Differential Revision: [D93000165](https://our.internmc.facebook.com/intern/diff/D93000165/)
SS-JIA
pushed a commit
that referenced
this pull request
Feb 13, 2026
Pull Request resolved: #17390 This changes the q8ta_conv2d and q8ta_conv2d_dw operators' input layout from PackedInt8_4W4C to PackedInt8_4C1W in the op registry. The 4C1W layout aligns with the natural output format of channel-packed convolutions, avoiding unnecessary layout conversions between consecutive conv layers. Also adds explicit `outputs_storage` declarations (PACKED_INT8_CHANNELS_PACKED_BUFFER) to both the PW and general q8ta_conv2d op registrations, ensuring the layout propagation pass can correctly determine output layouts. ghstack-source-id: 340983082 @exported-using-ghexport Differential Revision: [D93000165](https://our.internmc.facebook.com/intern/diff/D93000165/)
SS-JIA
pushed a commit
that referenced
this pull request
Feb 13, 2026
Pull Request resolved: #17390 This changes the q8ta_conv2d and q8ta_conv2d_dw operators' input layout from PackedInt8_4W4C to PackedInt8_4C1W in the op registry. The 4C1W layout aligns with the natural output format of channel-packed convolutions, avoiding unnecessary layout conversions between consecutive conv layers. Also adds explicit `outputs_storage` declarations (PACKED_INT8_CHANNELS_PACKED_BUFFER) to both the PW and general q8ta_conv2d op registrations, ensuring the layout propagation pass can correctly determine output layouts. ghstack-source-id: 340983082 @exported-using-ghexport Differential Revision: [D93000165](https://our.internmc.facebook.com/intern/diff/D93000165/)
SS-JIA
pushed a commit
that referenced
this pull request
Feb 13, 2026
Pull Request resolved: #17390 This changes the q8ta_conv2d and q8ta_conv2d_dw operators' input layout from PackedInt8_4W4C to PackedInt8_4C1W in the op registry. The 4C1W layout aligns with the natural output format of channel-packed convolutions, avoiding unnecessary layout conversions between consecutive conv layers. Also adds explicit `outputs_storage` declarations (PACKED_INT8_CHANNELS_PACKED_BUFFER) to both the PW and general q8ta_conv2d op registrations, ensuring the layout propagation pass can correctly determine output layouts. ghstack-source-id: 340983082 @exported-using-ghexport Differential Revision: [D93000165](https://our.internmc.facebook.com/intern/diff/D93000165/)
SS-JIA
pushed a commit
that referenced
this pull request
Feb 13, 2026
Pull Request resolved: #17390 This changes the q8ta_conv2d and q8ta_conv2d_dw operators' input layout from PackedInt8_4W4C to PackedInt8_4C1W in the op registry. The 4C1W layout aligns with the natural output format of channel-packed convolutions, avoiding unnecessary layout conversions between consecutive conv layers. Also adds explicit `outputs_storage` declarations (PACKED_INT8_CHANNELS_PACKED_BUFFER) to both the PW and general q8ta_conv2d op registrations, ensuring the layout propagation pass can correctly determine output layouts. ghstack-source-id: 340983082 @exported-using-ghexport Differential Revision: [D93000165](https://our.internmc.facebook.com/intern/diff/D93000165/)
SS-JIA
pushed a commit
that referenced
this pull request
Feb 13, 2026
Pull Request resolved: #17390 This changes the q8ta_conv2d and q8ta_conv2d_dw operators' input layout from PackedInt8_4W4C to PackedInt8_4C1W in the op registry. The 4C1W layout aligns with the natural output format of channel-packed convolutions, avoiding unnecessary layout conversions between consecutive conv layers. Also adds explicit `outputs_storage` declarations (PACKED_INT8_CHANNELS_PACKED_BUFFER) to both the PW and general q8ta_conv2d op registrations, ensuring the layout propagation pass can correctly determine output layouts. ghstack-source-id: 340983082 @exported-using-ghexport Differential Revision: [D93000165](https://our.internmc.facebook.com/intern/diff/D93000165/)
chizkiyahu
pushed a commit
to chizkiyahu/executorch
that referenced
this pull request
Feb 23, 2026
Pull Request resolved: pytorch#17390 This changes the q8ta_conv2d and q8ta_conv2d_dw operators' input layout from PackedInt8_4W4C to PackedInt8_4C1W in the op registry. The 4C1W layout aligns with the natural output format of channel-packed convolutions, avoiding unnecessary layout conversions between consecutive conv layers. Also adds explicit `outputs_storage` declarations (PACKED_INT8_CHANNELS_PACKED_BUFFER) to both the PW and general q8ta_conv2d op registrations, ensuring the layout propagation pass can correctly determine output layouts. ghstack-source-id: 340983082 @exported-using-ghexport Differential Revision: [D93000165](https://our.internmc.facebook.com/intern/diff/D93000165/)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stack from ghstack (oldest at bottom):
This changes the q8ta_conv2d and q8ta_conv2d_dw operators' input layout from PackedInt8_4W4C to PackedInt8_4C1W in the op registry. The 4C1W layout aligns with the natural output format of channel-packed convolutions, avoiding unnecessary layout conversions between consecutive conv layers.
Also adds explicit
outputs_storagedeclarations (PACKED_INT8_CHANNELS_PACKED_BUFFER) to both the PW and general q8ta_conv2d op registrations, ensuring the layout propagation pass can correctly determine output layouts.Differential Revision: D93000165