You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
perf: remove implicit ListViewArray rebuild during take and filter operations (#8048)
## Summary
Removes implicit rebuilds from `ListViewArray` `take` and `filter`
compute kernels, adds density-introspection methods to support deciding
when to rebuild explicitly at materialization boundaries, and defers
rebuilding until array export to duckdb and arrow.
**Motivation.** The previous code rebuilt the elements buffer eagerly on
every `take` or `filter` whose row-fraction dropped below
`REBUILD_DENSITY_THRESHOLD`. In an execution tree like `take → take →
...`, an eager mid-pipeline rebuild costs an allocation and a full copy
of referenced ranges that the next operator may immediately sparsify
away.
The row-fraction heuristic was also inaccurate. It doesn't account for
per-row size variance, unreferenced elements, and duplicate references.
Instead, `ListViewArray::estimate_density` uses sum of `sizes` instead
of row-fraction. This will overestimate density when there are
overlapping references, but it is typically preferable to not compact.
**Changes:**
- **Drop implicit rebuild from `TakeReduce::take` and
`TakeExecute::take`**
- **Drop implicit rebuild from `filter_listview`**
- **Add methods to calculate reference density**:
`compute_referenced_elements_mask`, `compute_density`, and
`estimate_density`
- **Estimate density and conditionally rebuild on export boundaries for
duckdb and arrow**
## API Changes
Adds `ListViewArray::estimate_density`,
`ListViewArray::compute_density`, and
`ListViewArray::compute_referenced_elements_mask`.
## Testing
- New `vortex-array/src/arrays/listview/tests/density.rs`
---------
Signed-off-by: Matthew Katz <katz@spiraldb.com>
Co-authored-by: Matt Katz <mattkatz@Matts-MacBook-Pro.local>
Co-authored-by: Matt Katz <mattkatz@Matts-MBP.localdomain>
Co-authored-by: Matthew Katz <katz@spiraldb.com>
0 commit comments