-
Notifications
You must be signed in to change notification settings - Fork 3.9k
[CPU/CUDA ep] Improve DeformConv op performance #27824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
tianleiwu
merged 54 commits into
microsoft:main
from
ShirasawaSama:feature/improve-deform-conv-pref
Apr 9, 2026
Merged
Changes from all commits
Commits
Show all changes
54 commits
Select commit
Hold shift + click to select a range
0ec9bb0
Remove DeformConvCopyGemmOutputRowMajorToNCHW
ShirasawaSama 3b5087c
Adjust parallel cost for DeformableIm2col
ShirasawaSama dffbe68
Refactor deform conv bilinear with plan
ShirasawaSama d07e843
Simplify DeformConv im2col plan paths and fix mask indexing bug
ShirasawaSama 7d5125d
Refactor deform conv im2col to use a unified tiled path with context …
ShirasawaSama 435bada
Refactor DeformConv sampling plan to AoSoA layout and use Eigen for b…
ShirasawaSama c963bd9
Refine DeformConv naming clarity and avoid redundant workspace size r…
ShirasawaSama 5bcb402
Optimize DeformConv by removing streaming plan logic and making bilin…
ShirasawaSama 0c2a602
Refactor Deformconv cpu op
ShirasawaSama 3f2fee9
Harden DeformConv integer bounds checks and streamline hot-path casts…
ShirasawaSama 7ae47a4
Refactor DeformConv bounds validation
ShirasawaSama 7c0d414
Add compute-time bounds checks with size_t-safe indexing
ShirasawaSama 02f9e0c
Optimize CPU DeformConv plan generation with kernel meta precompute
ShirasawaSama 083b33c
Refactor DeformConv kernel meta setup into a params-based cached
ShirasawaSama afe2dd1
Refactor CPU DeformConv bias add to avoid div/mod and extract DeformC…
ShirasawaSama d61d36c
Annotate DeformConv CPU bias/col paths with ORT_CPU_RESTRICT and forc…
ShirasawaSama baf51ac
CPU DeformConv bilinear sampling uses fast floor and inverted bounds …
ShirasawaSama 43730c2
Flatten CPU DeformConv bilinear sampling plan build tasks across spat…
ShirasawaSama 47bb183
Optimize CPU DeformConv sampling and bias parallelism with flattened …
ShirasawaSama 3520625
Add detailed comments for DeformConv CPU implementation
ShirasawaSama 052507e
Reformat codes
ShirasawaSama b92e8c8
Optimize DeformConv CPU kernel by removing mutex and heap allocations
ShirasawaSama a9e5cc7
Optimize CUDA DeformConv kernel with static mask branching and tuned …
ShirasawaSama 6359225
CUDA DeformConv reduce 64 bit index pressure in im2col hot path
ShirasawaSama 47ab139
Increase InlinedVector capacity in DeformConv for 7x7 kernels
ShirasawaSama f590281
Optimize DeformConv bias indexing with int32/int64 dispatch and clean…
ShirasawaSama b29da63
Optimize CUDA DeformConv bias add with 2D launch fast path and int32/…
ShirasawaSama 2c5a52c
Optimize CUDA DeformConv by using 32-bit index arithmetic when safe a…
ShirasawaSama 4579077
Refactor path indexing
ShirasawaSama cf21200
optimize deformconv bilinear sampling with interior fast path
ShirasawaSama 97f2598
Rduce deformconv address math in dynamic im2col path
ShirasawaSama 226d3ad
Tune deform conv im2col addressing and bilinear sampling
ShirasawaSama bdb90bc
Cuda deform conv replace 5x5 im2col launch specialization with 7x7
ShirasawaSama af3639e
Pick chunk size by min rounds then balanced ceil
ShirasawaSama f223c69
Fix CUDA DeformConv im2col mask stride unused-variable warning
ShirasawaSama 9632a20
Document and tidy CUDA DeformConv
ShirasawaSama 16e990c
Make deform conv bilinear sampling branchless with masked safe loads
ShirasawaSama e0558b4
Improve comments and code styles
ShirasawaSama d6ebfb9
Improve deform conv im2col load balance for offset_group=1
ShirasawaSama 3831979
Harden DeformConv index-width guard and align mask test comment
ShirasawaSama bc4f9c8
Optimize BilinearInterpolate with one-sided bounds and float mask selp
ShirasawaSama 2068b1a
Make deform_conv_attributes.h self-contained for numeric_limits
ShirasawaSama 8c6ed74
Clarify bilinear index int32 safety comments
ShirasawaSama db3d449
Fix CeilDiv signed overflow in CUDA DeformConv chunk sizing
ShirasawaSama f9c1d8c
Rename offset_byte_offset to offset_elem_offset in CUDA DeformConv im…
ShirasawaSama 1024172
Document heuristic threshold for DeformConv CUDA bias-add 2D launch path
ShirasawaSama 6fb5f4f
Document CPU DeformConv sampling-plan tail invariants
ShirasawaSama 394d676
Add test cases
ShirasawaSama 8fea660
Add pointer restrict annotations to DeformConv CPU and CUDA
ShirasawaSama 76dfdba
Fix DeformConv CUDA tail chunk col stride and add regression test
ShirasawaSama 2574ee2
Document DeformConv aliasing assumptions for input and output buffers
ShirasawaSama cdb979c
Fix DeformConv CUDA grouped tail chunk col-buffer strides and add tai…
ShirasawaSama ad977c6
Clarify DeformConv CUDA tail-chunk stride comment for grouped GEMM
ShirasawaSama a767525
Reuse validated common dims for GetNParallelImgs to keep overflow che…
ShirasawaSama File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.