Phase 13.14.GB: Vectorize _build_dense_lookup (3.6s → <0.05s)

miranov25 · miranov25 · commit 030543737c2f · 2026-03-24T10:17:43.000+01:00
Profile showed _build_dense_lookup Python loop was 58% of total time.
Replaced with vectorized numpy: shifted * strides, sum, fancy indexing.

Combined with previous bincount + prange optimizations:
  Step 1 (bin mapping):     0.5s  — unchanged
  Step 2 (per-bin stats):   0.6s  — bincount (was 2.3s)
  Step 3 (window accum):    0.5s  — prange (was 6s)
  _build_dense_lookup:     &lt;0.05s — vectorized (was 3.6s)
  Total per TF:            ~1.5s  (was 7-8s, originally 440s)

494 passed, 4 pre-existing, 0 regressions
diff --git a/UTILS/dfextensions/groupby_regression/groupby_regression_sliding_window.py b/UTILS/dfextensions/groupby_regression/groupby_regression_sliding_window.py
@@ -4015,7 +4015,8 @@ def make_sliding_window_fit_parallel(
 def _build_dense_lookup(bin_coords: np.ndarray, bounds: dict, gb_columns: list):
     """Build a dense N-D array mapping grid coordinates to compact bin indices.
 
-    Returns (lookup, grid_shape, mins) where lookup[shifted_coords] = bin_index (-1 = empty).
+    Returns (lookup, grid_shape, mins, strides) where lookup[flat_idx] = bin_index (-1 = empty).
+    Vectorized — no Python loop over bins.
     """
     n_bins, n_dims = bin_coords.shape
     mins = np.array([bounds[dim][0] for dim in gb_columns], dtype=np.int64)
@@ -4029,11 +4030,10 @@ def _build_dense_lookup(bin_coords: np.ndarray, bounds: dict, gb_columns: list):
     for d in range(n_dims - 2, -1, -1):
         strides[d] = strides[d + 1] * grid_shape[d + 1]
 
-    for bi in range(n_bins):
-        flat_idx = 0
-        for d in range(n_dims):
-            flat_idx += (int(bin_coords[bi, d]) - int(mins[d])) * int(strides[d])
-        lookup[flat_idx] = bi
+    # Vectorized flat index computation — no loop over bins
+    shifted = bin_coords - mins[np.newaxis, :]  # (n_bins, n_dims)
+    flat_indices = (shifted * strides[np.newaxis, :]).sum(axis=1)  # (n_bins,)
+    lookup[flat_indices] = np.arange(n_bins, dtype=np.int32)
 
     return lookup, grid_shape, mins, strides