fix: sync structured primary key fields after metadata update#6326
fix: sync structured primary key fields after metadata update#6326wombatu-kun wants to merge 5 commits intolance-format:mainfrom
Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
| self.unenforced_primary_key_position = self | ||
| .metadata | ||
| .get(LANCE_UNENFORCED_PRIMARY_KEY_POSITION) | ||
| .and_then(|s| s.parse::<u32>().ok()) |
There was a problem hiding this comment.
I think this will silently swallow invalid values. Since this is publicly exposed, it should probably return a Result to propagate the parse errors. We could add a few tests the validate this.
There was a problem hiding this comment.
Fixed and added tests.
And CI failed for some other reason, i suppose.
(FAILED python/tests/test_scalar_index.py::test_create_inverted_index_progress_callback - assert 75 == 150)
a230aba to
550fc69
Compare
hamersaw
left a comment
There was a problem hiding this comment.
Thanks for the PR (and patience on review)! Overall it looks great! I think the testing is a bit excessive and not validating the fix proposed in this PR. I'm wondering if we can have one test for set + sync and then one for merge insert. IMO that would adequately cover this.
| unenforced_primary_key_position: None, | ||
| }; | ||
| result.sync_embedded_metadata()?; |
There was a problem hiding this comment.
I would prefer a refactoring where we don't create a type with None and then immediately update it, but not a blocker.
There was a problem hiding this comment.
Refactored in c4c3d9d — extracted a parse_unenforced_primary_key_position helper so TryFrom<&ArrowField> now computes the value before the struct literal, no more None + sync_embedded_metadata() update.
|
Thanks for the review! Trimmed the test suite in dataset::metadata to the two you suggested — test_update_field_metadata_sets_unenforced_primary_key (set + sync: covers both the legacy boolean flag path and explicit :position in one test) and test_update_field_metadata_primary_key_used_by_merge_insert (end-to-end via MergeInsert). Dropped the three parse-error tests. |
- Keep only "set + sync" and "merge insert" tests in dataset::metadata per review; parse-error coverage lives in lance-core field unit tests. - Extract parse_unenforced_primary_key_position helper so TryFrom<&ArrowField> computes the value before the struct literal instead of the None+update pattern. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6bd7759 to
828b97c
Compare
Oh, it seems you don't have write access to this repo. Do you know somebody who has it? Could you tag them please to help with merging? |
Summary
Fixes #6324
UpdateConfig.fieldMetadataUpdates()correctly persistslance-schema:unenforced-primary-keyandlance-schema:unenforced-primary-key:positionin the field's metadata HashMap, but the corresponding structured fieldunenforced_primary_key_position(used for protobuf serialization andis_unenforced_primary_key()) was never updated. This caused primary key information set via metadata updates to be lost on the next dataset reload.Root cause
apply_update_map()intransaction.rsonly modifiesfield.metadata(a HashMap<String, String>) without syncing the structuredField.unenforced_primary_key_positionfield that drives protobuf serialization and theis_unenforced_primary_key()API.Fix
Field::sync_embedded_metadata()that re-parses well-known metadata keys (lance-schema:unenforced-primary-key,lance-schema:unenforced-primary-key:position) and updates the corresponding structured fields.sync_embedded_metadata()after everyapply_update_map()on field metadata inTransaction::build_manifest().TryFrom<&ArrowField>for Field to reusesync_embedded_metadata()instead of duplicating the parsing logic.