Skip to content

Commit 0305437

Browse files
committed
Phase 13.14.GB: Vectorize _build_dense_lookup (3.6s → <0.05s)
Profile showed _build_dense_lookup Python loop was 58% of total time. Replaced with vectorized numpy: shifted * strides, sum, fancy indexing. Combined with previous bincount + prange optimizations: Step 1 (bin mapping): 0.5s — unchanged Step 2 (per-bin stats): 0.6s — bincount (was 2.3s) Step 3 (window accum): 0.5s — prange (was 6s) _build_dense_lookup: <0.05s — vectorized (was 3.6s) Total per TF: ~1.5s (was 7-8s, originally 440s) 494 passed, 4 pre-existing, 0 regressions
1 parent 209e7c2 commit 0305437

1 file changed

Lines changed: 6 additions & 6 deletions

File tree

UTILS/dfextensions/groupby_regression/groupby_regression_sliding_window.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4015,7 +4015,8 @@ def make_sliding_window_fit_parallel(
40154015
def _build_dense_lookup(bin_coords: np.ndarray, bounds: dict, gb_columns: list):
40164016
"""Build a dense N-D array mapping grid coordinates to compact bin indices.
40174017
4018-
Returns (lookup, grid_shape, mins) where lookup[shifted_coords] = bin_index (-1 = empty).
4018+
Returns (lookup, grid_shape, mins, strides) where lookup[flat_idx] = bin_index (-1 = empty).
4019+
Vectorized — no Python loop over bins.
40194020
"""
40204021
n_bins, n_dims = bin_coords.shape
40214022
mins = np.array([bounds[dim][0] for dim in gb_columns], dtype=np.int64)
@@ -4029,11 +4030,10 @@ def _build_dense_lookup(bin_coords: np.ndarray, bounds: dict, gb_columns: list):
40294030
for d in range(n_dims - 2, -1, -1):
40304031
strides[d] = strides[d + 1] * grid_shape[d + 1]
40314032

4032-
for bi in range(n_bins):
4033-
flat_idx = 0
4034-
for d in range(n_dims):
4035-
flat_idx += (int(bin_coords[bi, d]) - int(mins[d])) * int(strides[d])
4036-
lookup[flat_idx] = bi
4033+
# Vectorized flat index computation — no loop over bins
4034+
shifted = bin_coords - mins[np.newaxis, :] # (n_bins, n_dims)
4035+
flat_indices = (shifted * strides[np.newaxis, :]).sum(axis=1) # (n_bins,)
4036+
lookup[flat_indices] = np.arange(n_bins, dtype=np.int32)
40374037

40384038
return lookup, grid_shape, mins, strides
40394039

0 commit comments

Comments
 (0)