Optimising sortperm!#90
Open
shreyas-omkar wants to merge 3 commits into
Open
Conversation
…er at large n) merge_sortperm_lowmem! carries a comparator that dereferences v[ix] and v[iy] from global memory on every binary-search step inside the merge pass, making the effective traffic O(n log²n). merge_sortperm! instead copies the keys into shared memory alongside the indices so all comparisons stay in L1/shared memory. Benchmarks on RTX 5080 (CUDA 13.2, Julia 1.12): n=2^18: 0.541 ms → 0.286 ms (1.9×) n=2^20: 2.185 ms → 0.490 ms (4.5×) n=2^22: 10.668 ms → 2.847 ms (3.7×) n=2^24: 53.453 ms → 11.900 ms (4.5×) sortperm! is now within 1.3× of plain sort! across all tested sizes. The public temp kwarg is preserved: it maps to temp_ix in merge_sortperm! (same semantics — a pre-allocated index swap buffer). Tests: extend sortperm testset with full permutation-validity checks, 6 new element types (Int16/UInt16/Int64/UInt64/Float64/UInt8), edge sizes (n=1..2049), data-distribution coverage, comparator options, temp-reuse, exact Base.sortperm match, and a merge sort stability check. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Without hoisting, the by(elem) transform fires inside every binary-search comparison step across all O(n log²n) merge operations. With hoisting, we broadcast by.(v) once to build a key array, then delegate to merge_sort_by_key! which keeps keys in shared memory alongside values. Benchmarks on RTX 5080 (Float32, n=2^22): by=abs: 2.197 ms → 1.912 ms (-13%) by=x->x^2: was worse → 1.920 ms rev=true: unchanged (no by, not hoisted) identity: unchanged (guarded by by !== identity check) The temp kwarg maps to temp_values in merge_sort_by_key! preserving the existing API contract. All paths (sort!, merge_sort!, merge_sort_by_key!) now benefit automatically for any non-identity by= function. Tests: add sort_by_transform testset with exact Base.sort output matching for Float32/Float64/Int32, edge sizes (n=1,2,513,1025), temp kwarg forwarding, type-changing by= (Float32→Bool), and identity/rev=true non-regression checks. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Member
|
KA has |
Member
Author
Yea.. I checked it now. Thanks for giving up heads up. I will change the commit. |
2b78b0b to
b5f4620
Compare
b5f4620 to
270e578
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
sortperm!Throughput (Float32)sort!withby=Transform (Float32,identity(No-op / Baseline)by=absby=x->x^2