feat: route find-or-create merge_insert through the v2 write path#6473
Open
wombatu-kun wants to merge 1 commit intolance-format:mainfrom
Open
feat: route find-or-create merge_insert through the v2 write path#6473wombatu-kun wants to merge 1 commit intolance-format:mainfrom
wombatu-kun wants to merge 1 commit intolance-format:mainfrom
Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
f98ecea to
7a823c9
Compare
Find-or-create merge_insert jobs (`WhenMatched::DoNothing` + `WhenNotMatched::InsertAll`, the default `MergeInsertBuilder` config) used to fall back to the legacy v1 `Merger` path even though every other piece of the v2 pipeline was already prepared to handle them: - `merge_insert_action` already emits `Action::Nothing` for `DoNothing` matches. - `FullSchemaMergeInsertExec::process_row_action` already handles `Action::Nothing` by dropping the row without counting it as a delete or update. - `MergeInsertPlanner::is_delete_only` already routes `DoNothing` to `FullSchemaMergeInsertExec` (only `WhenMatched::Delete` takes the delete-only exec). - The `no_upsert` eligibility check inside `can_use_create_plan` itself already accepted `DoNothing` for the partial-schema (key-only) branch. The only gap was the outer `matches!(self.params.when_matched, ...)` gate in `can_use_create_plan`, which listed `UpdateAll | UpdateIf | Fail | Delete` but was missing `DoNothing`, so the fast path was never attempted. Add `DoNothing` to that match. Consequences: - `explain_plan()` now works for find-or-create configurations (previously returned `Error::NotSupported`). - Bloom-filter conflict detection (`inserted_rows_filter`) is now populated for find-or-create operations on unenforced primary keys, because non-matching rows take the existing `Action::Insert` branch that populates the filter. v1 always returned `None` here, so two concurrent find-or-create jobs inserting the same fresh key could previously silently race; they now fail fast with `TooMuchWriteContention`. - The doc comment on `can_use_create_plan` is refreshed to list all supported `WhenMatched` modes (it had already drifted out of sync with the `Delete` addition). Tests added in `rust/lance/src/dataset/write/merge_insert.rs`: - `test_fast_path_find_or_create` — plan-shape assertion mirroring `test_fast_path_update_only`; proves a default find-or-create job now produces a v2 plan with a Right join and would fail on main. - `test_explain_plan_find_or_create` — asserts `explain_plan()` now succeeds for the default find-or-create config. - `test_concurrent_find_or_create_same_new_key` — bloom-filter conflict-detection regression mirroring `test_concurrent_insert_same_new_key` with `WhenMatched::DoNothing`. The existing `test_basic_merge` find-or-create coverage continues to pass unchanged on both `LanceFileVersion::Legacy` and `LanceFileVersion::V2_0`, satisfying the issue's "Existing DoNothing tests pass unchanged" acceptance criterion. Closes lance-format#6441 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
af49d1d to
cf1f95b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
WhenMatched::DoNothingto the outermatches!gate incan_use_create_planso find-or-create merge_insert jobs (the defaultMergeInsertBuilderconfig:DoNothing+InsertAll) take the v2FullSchemaMergeInsertExecpath instead of falling back to the legacy v1Merger. The downstream pipeline (merge_insert_action,FullSchemaMergeInsertExec::process_row_action,MergeInsertPlanner::is_delete_only) was already prepared to handleDoNothing; the outer gate was the only gap.explain_plan()now works for find-or-create configurations (previously returnedError::NotSupported), and bloom-filter conflict detection (inserted_rows_filter) is now populated for find-or-create on unenforced primary keys — two concurrent jobs inserting the same fresh key now fail fast withTooMuchWriteContentioninstead of silently racing.can_use_create_planto list all supportedWhenMatchedmodes.Closes #6441. This is the
DoNothing-shaped counterpart of #6442 (commit 46639be).Test plan
cargo fmt --allcargo clippy -p lance --tests --benches -- -D warningscargo test -p lance --lib dataset::write::merge_insert— all 127 tests pass, including the three new tests below and unchanged coverage (test_basic_merge,test_fast_path_*,test_explain_plan*,test_concurrent_*,test_when_matched_delete_*).cargo test -p lance --lib dataset::write— all 190 tests pass across merge_insert, update, and write paths.test_fast_path_find_or_create(new) — plan-shape assertion proving a default find-or-create job produces a v2 plan with a Right join. Would fail onmain.test_explain_plan_find_or_create(new) — assertsexplain_plan()now succeeds for the default find-or-create config, covering the issue'sexplain_planacceptance criterion.test_concurrent_find_or_create_same_new_key(new) — bloom-filter conflict-detection regression mirroringtest_concurrent_insert_same_new_keywithWhenMatched::DoNothing, covering the issue's Bloom-filter acceptance criterion.Acceptance criteria (from the issue)
WhenMatched::DoNothingoperations use the v2 path (verified viaexplain_planintest_explain_plan_find_or_createand via plan assertion intest_fast_path_find_or_create).DoNothingtests pass unchanged (test_basic_mergefind-or-create case still green on bothLegacyandV2_0).explain_planwithDoNothingconfig to confirm v2 is used (test_explain_plan_find_or_create).DoNothingoperations (test_concurrent_find_or_create_same_new_key).🤖 Generated with Claude Code