Skip to content

fix: sync structured primary key fields after metadata update#6326

Open
wombatu-kun wants to merge 5 commits intolance-format:mainfrom
wombatu-kun:fix-sync-unenforced-pk-fields-with-metadata
Open

fix: sync structured primary key fields after metadata update#6326
wombatu-kun wants to merge 5 commits intolance-format:mainfrom
wombatu-kun:fix-sync-unenforced-pk-fields-with-metadata

Conversation

@wombatu-kun
Copy link
Copy Markdown
Contributor

Summary
Fixes #6324

UpdateConfig.fieldMetadataUpdates() correctly persists lance-schema:unenforced-primary-key and lance-schema:unenforced-primary-key:position in the field's metadata HashMap, but the corresponding structured field unenforced_primary_key_position (used for protobuf serialization and is_unenforced_primary_key()) was never updated. This caused primary key information set via metadata updates to be lost on the next dataset reload.

Root cause
apply_update_map() in transaction.rs only modifies field.metadata (a HashMap<String, String>) without syncing the structured Field.unenforced_primary_key_position field that drives protobuf serialization and the is_unenforced_primary_key() API.

Fix

  • Added Field::sync_embedded_metadata() that re-parses well-known metadata keys (lance-schema:unenforced-primary-key, lance-schema:unenforced-primary-key:position) and updates the corresponding structured fields.
  • Call sync_embedded_metadata() after every apply_update_map() on field metadata in Transaction::build_manifest().
  • Refactored TryFrom<&ArrowField> for Field to reuse sync_embedded_metadata() instead of duplicating the parsing logic.

@github-actions github-actions bot added the bug Something isn't working label Mar 28, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 28, 2026

Codecov Report

❌ Patch coverage is 97.74436% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-core/src/datatypes/field.rs 93.33% 0 Missing and 2 partials ⚠️
rust/lance/src/dataset/transaction.rs 0.00% 0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

Comment thread rust/lance-core/src/datatypes/field.rs Outdated
self.unenforced_primary_key_position = self
.metadata
.get(LANCE_UNENFORCED_PRIMARY_KEY_POSITION)
.and_then(|s| s.parse::<u32>().ok())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will silently swallow invalid values. Since this is publicly exposed, it should probably return a Result to propagate the parse errors. We could add a few tests the validate this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed and added tests.
And CI failed for some other reason, i suppose.
(FAILED python/tests/test_scalar_index.py::test_create_inverted_index_progress_callback - assert 75 == 150)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hamersaw all tests pass!

@wombatu-kun wombatu-kun force-pushed the fix-sync-unenforced-pk-fields-with-metadata branch 2 times, most recently from a230aba to 550fc69 Compare April 7, 2026 04:42
@wombatu-kun wombatu-kun requested a review from hamersaw April 7, 2026 05:11
Copy link
Copy Markdown
Contributor

@hamersaw hamersaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR (and patience on review)! Overall it looks great! I think the testing is a bit excessive and not validating the fix proposed in this PR. I'm wondering if we can have one test for set + sync and then one for merge insert. IMO that would adequately cover this.

Comment thread rust/lance-core/src/datatypes/field.rs Outdated
Comment on lines +1139 to +1141
unenforced_primary_key_position: None,
};
result.sync_embedded_metadata()?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer a refactoring where we don't create a type with None and then immediately update it, but not a blocker.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored in c4c3d9d — extracted a parse_unenforced_primary_key_position helper so TryFrom<&ArrowField> now computes the value before the struct literal, no more None + sync_embedded_metadata() update.

@wombatu-kun
Copy link
Copy Markdown
Contributor Author

Thanks for the review! Trimmed the test suite in dataset::metadata to the two you suggested — test_update_field_metadata_sets_unenforced_primary_key (set + sync: covers both the legacy boolean flag path and explicit :position in one test) and test_update_field_metadata_primary_key_used_by_merge_insert (end-to-end via MergeInsert). Dropped the three parse-error tests.

@wombatu-kun wombatu-kun requested a review from hamersaw April 13, 2026 04:53
Copy link
Copy Markdown
Contributor

@hamersaw hamersaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks.

Vova Kolmakov and others added 3 commits April 18, 2026 15:30
- Keep only "set + sync" and "merge insert" tests in dataset::metadata per review; parse-error coverage lives in lance-core field unit tests.
- Extract parse_unenforced_primary_key_position helper so TryFrom<&ArrowField> computes the value before the struct literal instead of the None+update pattern.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@wombatu-kun wombatu-kun force-pushed the fix-sync-unenforced-pk-fields-with-metadata branch from 6bd7759 to 828b97c Compare April 18, 2026 08:51
@wombatu-kun
Copy link
Copy Markdown
Contributor Author

Looks great! Thanks.

Oh, it seems you don't have write access to this repo. Do you know somebody who has it? Could you tag them please to help with merging?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UpdateConfig.fieldMetadataUpdates does not set protobuf 'unenforced_primary_key' field

2 participants