Commit 5310f36
authored
feat: support IVF partitions multi-split (#6423)
## Feature
### What is the new feature?
This PR allows a single `optimize_indices` call on the v3 IVF
incremental optimize path to split multiple oversized IVF partitions in
one pass.
### Why do we need this feature?
Previously, optimize could split at most one oversized partition per
run. After large appends, several partitions can exceed the split
threshold at the same time, which forced repeated optimize cycles to
bring the index back into a healthy partition layout.
### How does it work?
- `check_partition_adjustment` now collects all split candidates from
the current snapshot and keeps the existing single-partition join
fallback.
- The multi-split path preserves existing partition ids and appends one
new partition per split partition.
- When any split happens, optimize continues to merge all existing delta
indices in the same round, preserving the existing merge semantics.
- Candidate rows from overlapping reassign partitions are resolved
globally so the same row is moved at most once, choosing the best
destination by distance.
## Performance Improvement
### What is the performance issue or bottleneck?
The initial multi-split implementation removed the functional
limitation, but overlapping reassign partitions still had avoidable
overhead:
- split planning was done sequentially
- split plans retained full raw vector payloads longer than necessary
- a reused candidate partition recomputed its baseline distance to the
original centroid for every overlapping split request
### How does this PR improve performance?
- split plans are now built with bounded parallelism across compute CPUs
- split plans no longer retain raw partition vectors after producing the
original-partition assign ops
- each reused candidate partition is loaded once and computes its
baseline distance once, then reuses that result across overlapping split
requests
- best candidate moves are updated in-place per row id instead of
materializing intermediate move vectors per request
## Testing
- `cargo test -p lance
compute_reassign_candidate_moves_vectors_to_new_centroids`
- `cargo test -p lance test_partition_split_on_append_multivec`
- `cargo test -p lance test_split_multiple_partitions_in_one_optimize`
- `cargo test -p lance test_join_partition_on_delete_multivec`1 parent 87ef5e2 commit 5310f36
2 files changed
Lines changed: 589 additions & 293 deletions
0 commit comments