You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
perf: Fix near-miss penalty in _morton_order with hybrid ceiling+argsort strategy
For shapes just above a power-of-2 (e.g. (33,33,33)), the ceiling-only
approach generates n_z=262,144 Morton codes for only 35,937 valid
coordinates (7.3× overgeneration). The floor+scalar approach is even
worse since the scalar loop iterates n_z-n_floor times (229,376 for
(33,33,33)), not n_total-n_floor.
The fix: when n_z > 4*n_total, use an argsort strategy that enumerates
all n_total valid coordinates via meshgrid, encodes each to a Morton code
using vectorized bit manipulation, then sorts by Morton code. This avoids
the large overgeneration while remaining fully vectorized.
Result for test_morton_order_iter:
(30,30,30): 24ms (ceiling, ratio=1.21)
(32,32,32): 28ms (ceiling, ratio=1.00)
(33,33,33): 32ms (argsort, ratio=7.3 → fixed from ~820ms with scalar)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
0 commit comments