Skip to content

Commit 5310f36

Browse files
authored
feat: support IVF partitions multi-split (#6423)
## Feature ### What is the new feature? This PR allows a single `optimize_indices` call on the v3 IVF incremental optimize path to split multiple oversized IVF partitions in one pass. ### Why do we need this feature? Previously, optimize could split at most one oversized partition per run. After large appends, several partitions can exceed the split threshold at the same time, which forced repeated optimize cycles to bring the index back into a healthy partition layout. ### How does it work? - `check_partition_adjustment` now collects all split candidates from the current snapshot and keeps the existing single-partition join fallback. - The multi-split path preserves existing partition ids and appends one new partition per split partition. - When any split happens, optimize continues to merge all existing delta indices in the same round, preserving the existing merge semantics. - Candidate rows from overlapping reassign partitions are resolved globally so the same row is moved at most once, choosing the best destination by distance. ## Performance Improvement ### What is the performance issue or bottleneck? The initial multi-split implementation removed the functional limitation, but overlapping reassign partitions still had avoidable overhead: - split planning was done sequentially - split plans retained full raw vector payloads longer than necessary - a reused candidate partition recomputed its baseline distance to the original centroid for every overlapping split request ### How does this PR improve performance? - split plans are now built with bounded parallelism across compute CPUs - split plans no longer retain raw partition vectors after producing the original-partition assign ops - each reused candidate partition is loaded once and computes its baseline distance once, then reuses that result across overlapping split requests - best candidate moves are updated in-place per row id instead of materializing intermediate move vectors per request ## Testing - `cargo test -p lance compute_reassign_candidate_moves_vectors_to_new_centroids` - `cargo test -p lance test_partition_split_on_append_multivec` - `cargo test -p lance test_split_multiple_partitions_in_one_optimize` - `cargo test -p lance test_join_partition_on_delete_multivec`
1 parent 87ef5e2 commit 5310f36

2 files changed

Lines changed: 589 additions & 293 deletions

File tree

0 commit comments

Comments
 (0)