Skip to content

Implement PackDepthwiseConvMatrix in NEON + deprecate aarch64 compat layers (#5779)#5779

Closed
Nicoshev wants to merge 1 commit into
pytorch:mainfrom
Nicoshev:export-D106137964
Closed

Implement PackDepthwiseConvMatrix in NEON + deprecate aarch64 compat layers (#5779)#5779
Nicoshev wants to merge 1 commit into
pytorch:mainfrom
Nicoshev:export-D106137964

Conversation

@Nicoshev
Copy link
Copy Markdown
Contributor

@Nicoshev Nicoshev commented May 23, 2026

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/2709

Add a NEON-based aarch64 implementation of the PackedDepthWiseConvMatrix constructor in PackDepthwiseConvMatrix.cc, alongside the existing AVX2 x86 implementation. The constructor packs depthwise convolution weight matrices into a SIMD-friendly interleaved layout.

Rename depthwise-convolution related files, as NEON and AVX2 implementations already co-exist

Remove compilation of avx2 source files for aarch64 targets and remove usage of aarch64 compat layers

Reviewed By: q10, YifanYuan3

Differential Revision: D106137964

@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented May 23, 2026

@Nicoshev has exported this pull request. If you are a Meta employee, you can view the originating Diff in D106137964.

@meta-cla meta-cla Bot added the cla signed label May 23, 2026
Nicoshev added a commit to Nicoshev/FBGEMM that referenced this pull request May 25, 2026
…layers (pytorch#5779)

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/2709

Pull Request resolved: pytorch#5779

Add a NEON-based aarch64 implementation of the `PackedDepthWiseConvMatrix` constructor in `PackDepthwiseConvMatrix.cc`, alongside the existing AVX2 x86 implementation. The constructor packs depthwise convolution weight matrices into a SIMD-friendly interleaved layout.

Rename depthwise-convolution related files, as NEON and AVX2 implementations already co-exist

Remove compilation of avx2 source files for aarch64 targets and remove usage of aarch64 compat layers

Differential Revision: D106137964
@meta-codesync meta-codesync Bot changed the title Implement PackDepthwiseConvMatrix in NEON + deprecate aarch64 compat layers Implement PackDepthwiseConvMatrix in NEON + deprecate aarch64 compat layers (#5779) May 25, 2026
@Nicoshev Nicoshev force-pushed the export-D106137964 branch from 3cf57eb to 32e1b26 Compare May 25, 2026 13:57
Nicoshev added a commit to Nicoshev/FBGEMM that referenced this pull request May 25, 2026
…layers (pytorch#5779)

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/2709

Pull Request resolved: pytorch#5779

Add a NEON-based aarch64 implementation of the `PackedDepthWiseConvMatrix` constructor in `PackDepthwiseConvMatrix.cc`, alongside the existing AVX2 x86 implementation. The constructor packs depthwise convolution weight matrices into a SIMD-friendly interleaved layout.

Rename depthwise-convolution related files, as NEON and AVX2 implementations already co-exist

Remove compilation of avx2 source files for aarch64 targets and remove usage of aarch64 compat layers

Differential Revision: D106137964
@Nicoshev Nicoshev force-pushed the export-D106137964 branch 2 times, most recently from bcc69a5 to 513f790 Compare May 29, 2026 18:54
Nicoshev added a commit to Nicoshev/FBGEMM that referenced this pull request May 29, 2026
…layers (pytorch#5779)

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/2709

Pull Request resolved: pytorch#5779

Add a NEON-based aarch64 implementation of the `PackedDepthWiseConvMatrix` constructor in `PackDepthwiseConvMatrix.cc`, alongside the existing AVX2 x86 implementation. The constructor packs depthwise convolution weight matrices into a SIMD-friendly interleaved layout.

Rename depthwise-convolution related files, as NEON and AVX2 implementations already co-exist

Remove compilation of avx2 source files for aarch64 targets and remove usage of aarch64 compat layers

Reviewed By: q10, YifanYuan3

Differential Revision: D106137964
Nicoshev added a commit to Nicoshev/FBGEMM that referenced this pull request May 29, 2026
…layers (pytorch#5779)

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/2709

Pull Request resolved: pytorch#5779

Add a NEON-based aarch64 implementation of the `PackedDepthWiseConvMatrix` constructor in `PackDepthwiseConvMatrix.cc`, alongside the existing AVX2 x86 implementation. The constructor packs depthwise convolution weight matrices into a SIMD-friendly interleaved layout.

Rename depthwise-convolution related files, as NEON and AVX2 implementations already co-exist

Remove compilation of avx2 source files for aarch64 targets and remove usage of aarch64 compat layers

Reviewed By: q10, YifanYuan3

Differential Revision: D106137964
@Nicoshev Nicoshev force-pushed the export-D106137964 branch from 513f790 to bad96b5 Compare May 29, 2026 19:00
…layers (pytorch#5779)

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/2709

Pull Request resolved: pytorch#5779

Add a NEON-based aarch64 implementation of the `PackedDepthWiseConvMatrix` constructor in `PackDepthwiseConvMatrix.cc`, alongside the existing AVX2 x86 implementation. The constructor packs depthwise convolution weight matrices into a SIMD-friendly interleaved layout.

Rename depthwise-convolution related files, as NEON and AVX2 implementations already co-exist

Remove compilation of avx2 source files for aarch64 targets and remove usage of aarch64 compat layers

Reviewed By: q10, YifanYuan3

Differential Revision: D106137964
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented May 30, 2026

This pull request has been merged in 07767a8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant