Closed
Conversation
…lient (#1854) ## What changes are proposed in this pull request? Split the single UCCommitsClient trait into two focused traits: - UCCommitClient: only commit() — used by UCCommitter - UCGetCommitsClient: only get_commits() — used by UCCatalog This allows consumers like the UC-Committer to depend only on the commit capability without pulling in the get_commits interface. UCCommitsRestClient and InMemoryCommitsClient implement both traits. <!-- **Uncomment** this section if there are any changes affecting public APIs. Else, **delete** this section. ### This PR affects the following public APIs If there are breaking changes, please ensure the `breaking-changes` label gets added by CI, and describe why the changes are needed. Note that _new_ public APIs are not considered breaking. --> ## How was this change tested? Existing unit test sufficient for refactoring changes
## What changes are proposed in this pull request? Instead of adding the value in the loop, we push the operation down to Arrow which is much faster. ## How was this change tested?
## What changes are proposed in this pull request? Update Cargo.toml to avoid resolving broken native-tls 0.2.17 * See rust-native-tls/rust-native-tls#370 ## How was this change tested? CI passes now.
) ## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta-kernel-rs/pull/1759/files/378c06c6a8b934bd4b212e71e666fa53e244e6b8..da2e2bb13219eb868e7a64c8e2a73677d68ab671) to review incremental changes. - [stack/1684-read-config](#1758) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1758/files)] - [**stack/1682-table-info**](#1759) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1759/files/378c06c6a8b934bd4b212e71e666fa53e244e6b8..da2e2bb13219eb868e7a64c8e2a73677d68ab671)] - [stack/1681-read-spec](#1760) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1760/files/db121ff255a42c8f0b8784dbbaf7cff9e982fefc..75d88fb3300b8b35ac1da5b814fd63fce6f93359)] - [stack/1807-workload-spec-models](#1816) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1816/files/75d88fb3300b8b35ac1da5b814fd63fce6f93359..7b140ffdf8f9eb64fcb56bbb26addff61a704579)] - [stack/1793-read-metadata-runner](#1826) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1826/files/3bf783e6c5d75325c2a08f5cc2b8e42243a2c031..1f047b8b5efbbf4cd4b9f5213eb13de9177debfd)] - [stack/workload-loading](#1867) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1867/files/e6483d21687058267890110c6784692fa9cfebed..fe148876ccd411ed3814a6c37c862f6d1e3d3108)] - [stack/criterion-benchmarking](#1871) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1871/files/fe148876ccd411ed3814a6c37c862f6d1e3d3108..d1201313f4308a3d79a271231dcc82c7cbf63b50)] --------- ## What changes are proposed in this pull request? <!-- **Uncomment** this section if there are any changes affecting public APIs. Else, **delete** this section. ### This PR affects the following public APIs If there are breaking changes, please ensure the `breaking-changes` label gets added by CI, and describe why the changes are needed. Note that _new_ public APIs are not considered breaking. --> fix: #1682 (explained in ticket details) ## How was this change tested? Tests added to verify correct deserialization behavior of TableInfo for the following cases: - All fields present - No name - No description - Extra fields
) ## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta-kernel-rs/pull/1760/files/db121ff255a42c8f0b8784dbbaf7cff9e982fefc..75d88fb3300b8b35ac1da5b814fd63fce6f93359) to review incremental changes. - [stack/1684-read-config](#1758) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1758/files)] - [stack/1682-table-info](#1759) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1759/files/378c06c6a8b934bd4b212e71e666fa53e244e6b8..da2e2bb13219eb868e7a64c8e2a73677d68ab671)] - [**stack/1681-read-spec**](#1760) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1760/files/db121ff255a42c8f0b8784dbbaf7cff9e982fefc..75d88fb3300b8b35ac1da5b814fd63fce6f93359)] - [stack/1807-workload-spec-models](#1816) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1816/files/75d88fb3300b8b35ac1da5b814fd63fce6f93359..7b140ffdf8f9eb64fcb56bbb26addff61a704579)] - [stack/1793-read-metadata-runner](#1826) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1826/files/3bf783e6c5d75325c2a08f5cc2b8e42243a2c031..1f047b8b5efbbf4cd4b9f5213eb13de9177debfd)] - [stack/workload-loading](#1867) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1867/files/e6483d21687058267890110c6784692fa9cfebed..fe148876ccd411ed3814a6c37c862f6d1e3d3108)] - [stack/criterion-benchmarking](#1871) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1871/files/fe148876ccd411ed3814a6c37c862f6d1e3d3108..d1201313f4308a3d79a271231dcc82c7cbf63b50)] --------- ## What changes are proposed in this pull request? <!-- **Uncomment** this section if there are any changes affecting public APIs. Else, **delete** this section. ### This PR affects the following public APIs If there are breaking changes, please ensure the `breaking-changes` label gets added by CI, and describe why the changes are needed. Note that _new_ public APIs are not considered breaking. --> fix: #1681 (explained in ticket details) ## How was this change tested? Tests added to verify correct deserialization behavior for the following cases: - Version included - Version not included - Missing type - Unsupported type - Extra fields
…ime API safety (#1842) ## What changes are proposed in this pull request? ``` Replace the CreateTableTransaction newtype wrapper with a type-state pattern using Transaction<S> generic over marker types ExistingTable and CreateTable. This provides compile-time prevention of invalid operations on create-table transactions (e.g., file removal, blind append, DV updates) instead of deferring to commit-time validation. What's a no-op (code moves / reorganization — can skim) The bulk of the diff is pure code movement with no logic changes: create_table.rs → builder/create_table.rs (~858 lines): CreateTableTransactionBuilder and all its validation helpers moved to a new transaction/builder/ submodule. Zero logic changes — just a different file. CreateTableTransaction type alias + docs moved from mod.rs → create_table.rs (re-exported, so all import paths still work). try_new_create_table constructor moved from mod.rs → create_table.rs. Create-table unit tests moved from mod.rs → kernel/tests/create_table.rs (integration tests). Removed test_table_setup internal helper from kernel/src/utils.rs (no longer needed; integration tests use test_utils crate). > Tip: Review commit-by-commit. The first commit (c30210c) contains the actual type-state refactor. The second commit (5f8f1d3) is purely file reorganization with no functional changes. What needs detailed review All of the actual logic changes are in kernel/src/transaction/mod.rs (first commit): Type-state pattern on Transaction<S> — Transaction gains a PhantomData<S> marker. S is either ExistingTable (default, full API) or CreateTable (restricted API). impl block split — Methods are now split by which transaction types they apply to: impl<S> Transaction<S> — shared methods: commit(), with_domain_metadata(), with_engine_info(), with_data_change(), add_files(), get_write_context(), stats_columns(), stats_schema(), all validation and action generation. impl Transaction (i.e. Transaction<ExistingTable>) — existing-table-only: with_blind_append(), with_operation(), with_transaction_id(), with_domain_metadata_removed(), remove_files(), update_deletion_vectors(), with_committer(). Type-state flows through return types — CommitResult<S>, ConflictedTransaction<S>, and RetryableTransaction<S> are now generic, preserving the type-state marker across commit retries. Runtime defense-in-depth checks preserved — The existing is_create_table() runtime checks in validate_blind_append_semantics(), validate_domain_metadata(), etc. are kept as defensive fallbacks. The test_validate_blind_append_rejects_create_table unit test verifies these still work by directly setting private fields. ``` Closes #1768 <!-- **Uncomment** this section if there are any changes affecting public APIs. Else, **delete** this section. ### This PR affects the following public APIs If there are breaking changes, please ensure the `breaking-changes` label gets added by CI, and describe why the changes are needed. Note that _new_ public APIs are not considered breaking. --> ## How was this change tested?
## What changes are proposed in this pull request? There's a semantic gap in strum's `EnumString`: The parsing API it derives is always fallible (impl `FromString` and `TryFrom`, both with strum parsing error), but enums with a default variant have infallible parsing (because any unknown value goes into the default). Define and use a new trait, `IntoTableFeature`, which mimicks `Into`. We can't "just" impl From for TableFeature because the blanket `impl<T: From> TryFrom for T` conflicts with the `impl TryFrom for TableFeature` strum emits. Result: Code that works with table features no longer needs to parse+unwrap, and there's now a narrow waist to update some day if/when this gets fixed upstream (just remove the extension trait and `impl From for TableFeature` instead). See also * Peternator7/strum#430 * Peternator7/strum#432 ## How was this change tested? Updated unit tests.
## What changes are proposed in this pull request? A bunch of feature-related unit tests use illegal (bogus or incomplete) protocol and metadata combinations that kernel isn't smart enough to detect and reject yet. Fix them. ## How was this change tested? Test-only change.
## Summary This PR adds nextest support to the miri test job in CI, providing consistency with other test jobs and better test output. ## Changes - Install nextest action in miri job - Switch from `cargo miri test` to `cargo miri nextest run` - Maintains same feature flags (`default-engine-rustls`) and environment variables (`MIRIFLAGS=-Zmiri-disable-isolation`) ## Benefits - Consistent test runner across all CI jobs - Better test output and reporting - Improved test execution insights ## Testing The changes will be validated when CI runs on this PR. Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> Co-authored-by: chiinlquah <chiinlquah@gmail.com>
## What changes are proposed in this pull request? Unpin the Miri nightly toolchain version that was pinned in #1845. The upstream issue (rust-lang/miri#4855) has been fixed, so we no longer need to pin to `nightly-2026-02-05`. ## How was this change tested? CI should pass with the unpinned nightly toolchain.
## What changes are proposed in this pull request? This extends visitors to be able to visit REE columns. The initial need is to visit the _filename metadata column which is string only. i added support for all primitive types just to avoid future possible surprised (one place of extension could be to use REE with literal transform expressions) ## How was this change tested? Added unit tests. --------- Co-authored-by: Nick Lanham <nicklan@users.noreply.github.com>
…commit (#1811) Co-authored-by: Louis Shawn <louis.shawn@qq.com> Co-authored-by: Drake Lin <drakelin18@gmail.com> Co-authored-by: Nick Lanham <nicklan@users.noreply.github.com>
Co-authored-by: Louis Shawn <louis.shawn@qq.com> Co-authored-by: Nick Lanham <nicklan@users.noreply.github.com>
## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta-kernel-rs/pull/1836/files) to review incremental changes. - [**stack/use-physical-stats**](#1836) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1836/files)] - [stack/write-ctx-cm](#1837) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1837/files/b168896d4907e4c8ceffa0a528c5faa59d8e2ea4..30afa922e9716d016acdfb002eb71477d66e3143)] - [stack/correct-transform](#1862) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1862/files/30afa922e9716d016acdfb002eb71477d66e3143..cca68bd1278db0b4df42e4e6c309a784c918a268)] - [stack/cm-partition](#1870) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1870/files/cca68bd1278db0b4df42e4e6c309a784c918a268..68e8407a7599dd006d540a8c8e3e6bcbd02341bd)] - [stack/support-cm-write](#1863) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1863/files/68e8407a7599dd006d540a8c8e3e6bcbd02341bd..dfcb7beb7f39a50d0ba6439e27b161030b1794f5)] - [stack/support-CM-with-flag](#1910) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1910/files/cca68bd1278db0b4df42e4e6c309a784c918a268..eb1c6ce93ac2e97825e0d60be387a81fded683c6)] --------- ## What changes are proposed in this pull request? In the past, when we generate `WriteContext` in `Transaction`, we use logical column names for stats columns. As `WriteContext` is used for writes, it should be physical names. This PR changes it to physical column names. <!-- **Uncomment** this section if there are any changes affecting public APIs. Else, **delete** this section. ### This PR affects the following public APIs If there are breaking changes, please ensure the `breaking-changes` label gets added by CI, and describe why the changes are needed. Note that _new_ public APIs are not considered breaking. --> ## How was this change tested? Added tests
…ng and `materializePartitionColumns` (#1837) ## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta-kernel-rs/pull/1837/files) to review incremental changes. - [**stack/write-ctx-cm**](#1837) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1837/files)] - [stack/correct-transform](#1862) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1862/files/105c22d3f8eb7ddf273eaf94666e58fbd7079d7f..9d2156a008912b9578309c20919e4f9c56951e98)] - [stack/cm-partition](#1870) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1870/files/9d2156a008912b9578309c20919e4f9c56951e98..0879a373beac7290a93e2d5fe008c58bac31f238)] - [stack/support-cm-write](#1863) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1863/files/0879a373beac7290a93e2d5fe008c58bac31f238..8c19a6d03ef12ebaba35eaa12dfc0ac3f8e0ebd9)] - [stack/support-CM-with-flag](#1910) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1910/files/9d2156a008912b9578309c20919e4f9c56951e98..b591037ece8a8cf30e25f7fb5a0ff609d824baf0)] --------- ## What changes are proposed in this pull request? In the past, when we generate `WriteContext` in Transaction, the `physical_schema` field doesn't consider column mapping and `materializePartitionColumns`. This PR is to generate the `physical_schema` w.r.t column mapping and `materializePartitionColumns`. <!-- **Uncomment** this section if there are any changes affecting public APIs. Else, **delete** this section. ### This PR affects the following public APIs If there are breaking changes, please ensure the `breaking-changes` label gets added by CI, and describe why the changes are needed. Note that _new_ public APIs are not considered breaking. --> ## How was this change tested? Added tests.
## What changes are proposed in this pull request? Move `Protocol::is_catalog_managed` to `TableConfiguration::is_catalog_managed`, alongside other similar methods. Many call sites also invoke `table_configuration.protocol().partition_columns()`. Add and use `TableConfiguration::partition_columns` instead, for simpler code. While we're at it, make the method return`&[String]` instead of `&Vec<String>` (more idiomatic). Both changes reinforces `TableConfiguration` as the preferred way to interact with a table's metadata, rather than dipping into raw `Protocol` or `Metadata`. ## How was this change tested? Compilation suffices.
## What changes are proposed in this pull request? uses [install-and-cache-homebrew-tools](https://github.com/marketplace/actions/install-and-cache-homebrew-tools) on OSX. Note this does remove an explicit error if we somehow run on not linux or osx (which shouldn't happen since we specify that this should only run there), but regardless if no arrow-glib is installed then the tests themselves will fail to compile so we'll know something went wrong. ## How was this change tested? Checking the ci runs. After one run it hits the cache and greatly reduces runtime: <img width="2601" height="650" alt="image" src="https://github.com/user-attachments/assets/878be5a1-c0f8-4d06-8f9c-7572f0460282" /> [cache miss run](https://github.com/delta-io/delta-kernel-rs/actions/runs/22208011288/job/64236127126?pr=1909) [cache hit run](https://github.com/delta-io/delta-kernel-rs/actions/runs/22208205504/job/64236686596?pr=1909)
## What changes are proposed in this pull request? This PR improves memory allocation efficiency by pre-allocating `Vec` and `HashSet` collections when the size is known or can be reasonably estimated. This avoids repeated reallocations as collections grow. ## How was this change tested? - Existing test suite passes - Changes are allocation-only optimizations with no behavioral changes --------- Co-authored-by: emkornfield <emkornfield@gmail.com>
…1907) ## What changes are proposed in this pull request? Fold `Protocol::validate_table_features` into `Protocol::try_new` and update all tests to actually use the constructor. Also define convenience constructors `Protocol::try_new_legacy` and `Protocol::try_new_modern` and use them where appropriate. ## How was this change tested? Mostly a test change. Prod call sides verified by compiler and unit tests. --------- Co-authored-by: Nick Lanham <nicklan@users.noreply.github.com>
…es (#1901) ## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta-kernel-rs/pull/1901/files) to review incremental changes. - [**stack/stats-validation-types**](#1901) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1901/files)] - [stack/getdata-then-simplify](#1918) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1918/files/9318ff760c960332e35048f9e62d4d2d6639b022..ce4221d92b355f983f3eed3607a066f90647b706)] - [stack/stats-validation](#1667) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1667/files/ce4221d92b355f983f3eed3607a066f90647b706..54aa9ea9d212ca263bbc41c171b9f5efae878c5c)] --------- ## What changes are proposed in this pull request? implement GetData for types which didn't have it ## How was this change tested? The ones didn't have unit test, added them
## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta-kernel-rs/pull/1918/files) to review incremental changes. - [**stack/getdata-then-simplify**](#1918) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1918/files)] - [stack/stats-validation](#1667) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1667/files/301f01205582cff6560384ef49a042872b66418d..2529aea0f508e1511222fee83bb401ecf5de6000)] --------- ## What changes are proposed in this pull request? quick followup to previous pr ## How was this change tested? Existing unit tests
## What changes are proposed in this pull request? Reader protocol 3 and writer protocol 7 are special, because they introduce the concept of table features (which require feature lists where lower protocols forbid them). But they are _also_ special because they're currently the highest protocol version numbers that kernel supports. Replace a bunch of magic values in the code with proper constants to make clearer what's going on. If/when the Delta spec bumps the max protocol version number, we'll hopefully have fewer ambiguous `3` and `7` values that need careful auditing. ## How was this change tested? Mechanical replacement of literals with same-valued constants. Compilation and unit tests cover it.
…1818) ## What changes are proposed in this pull request? Before this change, there was an assumption that columns would be extracted in the order they appear in the schema. This is an easy prerequisite to mess up and this code is typically not on the hot path, so we can afford to be more robust. According to the [trait docs](https://github.com/delta-io/delta-kernel-rs/blob/main/kernel/src/engine_data.rs#L294): ``` /// The names and types of leaf fields this visitor accesses. The `EngineData` being visited /// validates these types when extracting column getters, and [`RowVisitor::visit`] will receive /// one getter for each selected field, in the requested order. ``` ## How was this change tested? Added additional unit test to demonstrate the issue.
## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta-kernel-rs/pull/1862/files) to review incremental changes. - [**stack/correct-transform**](#1862) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1862/files)] - [stack/cm-partition](#1870) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1870/files/6b36ecc5022a5be295d128fe6b66192b74d39614..76691de21b81671453a8e52a1fd58b58fdcbf1fe)] - [stack/support-cm-write](#1863) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1863/files/76691de21b81671453a8e52a1fd58b58fdcbf1fe..1e938eb9de2f5f8a52943ae794984585867f31a5)] - [stack/support-CM-with-flag](#1910) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1910/files/6b36ecc5022a5be295d128fe6b66192b74d39614..7e184b835db7ac1142cb878524561847007dc297)] --------- ## What changes are proposed in this pull request? Add with_dropped_field_if_exists to Transform for optionally dropping fields that may not exist in the input, and use it in generate_logical_to_physical to drop partition columns without erroring when the input data doesn't contain them. Change `WriteContext::logical_to_physical` will now correctly rename nested struct fields to their physical names under column mapping. <!-- **Uncomment** this section if there are any changes affecting public APIs. Else, **delete** this section. ### This PR affects the following public APIs If there are breaking changes, please ensure the `breaking-changes` label gets added by CI, and describe why the changes are needed. Note that _new_ public APIs are not considered breaking. --> ## How was this change tested? Added test to verify the renaming successful and `with_dropped_field_if_exists` works as expected
…code refactor. (#1872) ## What changes are proposed in this pull request? Closes #1657 Currently, there is no Schema modification method supporting insert/remove/replace a field from the Schema. Schema modification is left to the caller to construct, which create the boilerplate code across the codebase. This PR introduced the schema modification method for inserting, removing and replacing a field from schema. In addition, the schema modification method is integrated into the stats_transform to reduce boilerplate code when creating schema. Change included: - New StructType method for schema modification : - `pub fn with_field_inserted_after(mut self, after: Option<&str>, new_field: StructField) -> Self` - `pub fn with_field_removed(mut self, name: &str) -> Self` - `pub fn with_field_replaced(mut self, name: &str, new_field: StructField)` - `pub fn with_field_inserted_after(mut self, before: Option<&str>, new_field: StructField) -> Self` - Unit-test for StructType to verify the correct behaviour of the schema modification method `schema::tests::test_with_field_` - Update the stats_transform.rs function `transform_add_schema`, `build_add_output_schema`, `add_stats_parsed_to_add_schema` ## How was this change tested? - Unit test for Schema modification method: `cargo test --package delta_kernel --lib -- schema::tests::test_with_field_` - Unit test for stat_transform `cargo test --package delta_kernel --lib -- checkpoint::stats_transform::test` --------- Co-authored-by: Nick Lanham <nicklan@users.noreply.github.com>
## What changes are proposed in this pull request? This PR adds documentation for new tags added to the workloads tarball. Current plan: - If users want to fully understand the tables that exist and/or test their features with new tables quickly, they should run benchmarking locally - We've added more tags for CI to be more useful in the short-term; the README explains generally what each tag is used for (along with the names of the tables that currently have those tags) and the user can decide themselves which tags they think are relevant for their PRs. After initially running benchmarking with these new tags, if the user wanted more detail on which tables were ran they can run benchmarking locally to inspect the tables more closely - I opted to not add any more detail in the README about the specific tables that the tags correspond to (beyond the table names) because at that point, that would be just reiterating the table info. This can also get out of date from the tarball quickly. When these new tags are approved, the tarball will be updated. ## How was this change tested? Tarball sent to @OussamaSaoudi, can also be sent to anyone else if they want to inspect the tables (but the README changes describe which tables have the updated tags)
## What changes are proposed in this pull request? Added a line mentioning that the timing in CI isn't fully accurate and there isn't a quick fix for that, but benchmarking is still useful to see performance changes - see this ticket: #2301 ## How was this change tested?
…n columns (#2142) ## What changes are proposed in this pull request? In the past, we compute physical schema without partition columns repeatedly. The PR is to centralize them to `TableConfiguration` <!-- **Uncomment** this section if there are any changes affecting public APIs. Else, **delete** this section. ### This PR affects the following public APIs If there are breaking changes, please ensure the `breaking-changes` label gets added by CI, and describe why the changes are needed. Note that _new_ public APIs are not considered breaking. --> ## How was this change tested? Existing
## What changes are proposed in this pull request? Currently, there is no method to read the Snapshot's timestamp according to Delta protocol. This PR added a method to get Snapshot's timestamp based on the in commit timestamp (ICT). If the ICT table properties is enabled in the table, the Snapshot's timestamp will be extracted from the in commit timestamp via `snapshot::get_in_commit_timestamp`. If the ICT properties is disabled, snapshot tried to read the latest file modification timestamp from the commit logs. Here is the reference implementation from Delta protocol and the java-kernel: - Delta protocol: https://github.com/delta-io/delta/blob/master/PROTOCOL.md#recommendations-for-readers-of-tables-with-in-commit-timestamps - Java kernel reference implementation: https://github.com/delta-io/delta/blob/fc7d92cf740c96a955537a7b262ce578ce47c40e/kernel/kernel-api/src/main/java/io/delta/kernel/internal/SnapshotImpl.java#L172 In the future, the Snapshot::get_timestamp will be exposed to FFI function, allowing engines to use it. ## How was this change tested? - New unit test was added. - `cargo test --package delta_kernel --lib --all-features -- snapshot::tests::test_get_timestamp`
…2281) ## What changes are proposed in this pull request? - `remove.tags`, `remove.partitionValues`, `checkpointMetadata.tags`, `cdc.tags`, and `sidecar.tags` all allow null values in map. These null values will be ignored when read by kernel. - Extend the `#[allow_null_container_values]` derive macro attribute to support `Option<HashMap<...>>` fields The Delta protocol allows null values in partitionValues maps (a null partition value means the partition column is null for that file) and in tags maps. The Add action schema already handled this correctly, but Remove and other actions did not, causing Arrow validation errors when reading actions with null map values. ## How was this change tested? - New parameterized test read_actions_with_null_map_values covers null map values across all action types (remove, add, cdc, sidecar, checkpointMetadata). - NOTE: commitInfo.operationParameters and metadata.configuration are left as future work. - verified `#[allow_null_container_values]` works on Option<HashMap<...>> and fails on other non-hashmap Optionals. - Existing schema tests updated to expect nullable map values
## What changes are proposed in this pull request? Add support for benchmarking remote Delta tables via Unity Catalog credential vending and direct S3 access. - Add `CatalogInfo` to `TableInfo` for UC-managed tables. This is mutually exclusive with table_path. - `resolve_snapshot_strategy` handles UC credential vending, S3 engine setup from env vars, or local filesystem -- returning an engine + `SnapshotStrategy` pair used by both runners. The SnapshotStrategy enum determines how snapshots are loaded: - Standard: uses `Snapshot::builder_for` (local, S3, or UC non-catalog-managed tables) - CatalogManaged: uses `UCKernelClient::load_snapshot` (This is detected via `delta.feature.catalogManaged` in UC's get_table response properties) - Add `KERNEL_BENCH_WORKLOAD_DIR` env var to load workloads from an external directory (for remote tables without local data). - Runners now accept Arc<Runtime> instead of Arc<dyn Engine>, constructing per-table engines internally. ## How was this change tested? - Unit tests for CatalogInfo deserialization, TimeTravel::as_version, unsupported URL schemes, and snapshot construction with time travel. - existing workloads are unaffected. - Validated end-to-end on a live UC catalog-managed table confirming SnapshotStrategy::CatalogManaged is correctly selected and UCKernelClient::load_snapshot succeeds.
## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta-kernel-rs/pull/2293/files) to review incremental changes. - [**stack/kernel-catalog-managed-create**](#2293) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2293/files)] - [stack/ccv2-create](#2203) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2203/files/8cf518a944cd62f87f35cad6db205195b224c513..9c6826254bddecce6b2fe428aa9a1b6a1af1b6d2)] - [stack/ccv2-create-pt2](#2247) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2247/files/9c6826254bddecce6b2fe428aa9a1b6a1af1b6d2..6847bd06fd174c05674b34e4970671d5cf073d5a)] - [stack/create-table-utils-pt3](#2250) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2250/files/6847bd06fd174c05674b34e4970671d5cf073d5a..e1035044d9e0c419eaee5d4ebb1284adb8646bf0)] - [stack/create-table-utils-pt4](#2254) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2254/files/e1035044d9e0c419eaee5d4ebb1284adb8646bf0..ba618b1a85366699b2312ea3430d6adec8aefc4d)] --------- ## What changes are proposed in this pull request? Enables creating catalog-managed Delta tables by adding `CatalogManaged` to `ALLOWED_DELTA_FEATURES` in the `create_table` builder. This allows connectors to pass `delta.feature.catalogManaged = "supported"` as a table property during table creation. The feature is gated behind the `catalog-managed` feature flag. ## How was this change tested? `test_catalog_managed_feature_signal_accepted` - verifies the feature signal is accepted, removed from properties, and added to both reader and writer features
## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta-kernel-rs/pull/2203/files/8cf518a944cd62f87f35cad6db205195b224c513..9c6826254bddecce6b2fe428aa9a1b6a1af1b6d2) to review incremental changes. - [stack/kernel-catalog-managed-create](#2293) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2293/files)] - [**stack/ccv2-create**](#2203) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2203/files/8cf518a944cd62f87f35cad6db205195b224c513..9c6826254bddecce6b2fe428aa9a1b6a1af1b6d2)] - [stack/ccv2-create-pt2](#2247) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2247/files/9c6826254bddecce6b2fe428aa9a1b6a1af1b6d2..6847bd06fd174c05674b34e4970671d5cf073d5a)] - [stack/create-table-utils-pt3](#2250) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2250/files/6847bd06fd174c05674b34e4970671d5cf073d5a..e1035044d9e0c419eaee5d4ebb1284adb8646bf0)] - [stack/create-table-utils-pt4](#2254) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2254/files/e1035044d9e0c419eaee5d4ebb1284adb8646bf0..ba618b1a85366699b2312ea3430d6adec8aefc4d)] --------- ## What changes are proposed in this pull request? Adds `create_utils` module to `delta-kernel-unity-catalog` with two public functions for building the required table properties during UC catalog-managed table creation: 1. `get_required_properties_for_disk(uc_table_id)` -- returns properties that must be written to disk in `000.json` (catalogManaged, vacuumProtocolCheck, tableId, ICT) 2. `get_final_required_properties_for_uc(snapshot, engine)` -- extracts post-commit properties to send to UC (protocol versions, feature signals, metadata config, clustering columns, ICT timestamp 3. Also adds public getters on `Snapshot` for protocol/metadata inspection: `min_reader_version`, `min_writer_version`, `reader_features`, `writer_features`, `metadata_configuration`, and `get_clustering_columns`. Other changes: - Adds `physical_to_logical_column_name` to `column_mapping.rs` for converting physical clustering column names back to logical names - Adds `CatalogManaged` to `ALLOWED_DELTA_FEATURES` in `create_table` builder - Extracts shared UC constants to `constants.rs` module ## How was this change tested? 1. `test_get_required_properties_for_disk` - verifies all 4 disk properties 2. `test_get_final_required_properties_for_uc` - round-trip test: creates table, loads snapshot, extracts UC properties, verifies protocol versions, feature signals, metadata config, ICT timestamp, and version 3. `test_get_final_required_properties_for_uc_with_clustering` - same with clustering columns, verifies JSON serialization 4. `test_public_protocol_getters` - tests Snapshot protocol getters against fixture 5. `test_metadata_configuration` - tests Snapshot metadata config getter 6. `test_catalog_managed_feature_signal_accepted` - verifies CatalogManaged in ALLOWED_DELTA_FEATURES
## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta-kernel-rs/pull/2247/files/9c6826254bddecce6b2fe428aa9a1b6a1af1b6d2..6847bd06fd174c05674b34e4970671d5cf073d5a) to review incremental changes. - [stack/kernel-catalog-managed-create](#2293) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2293/files)] - [stack/ccv2-create](#2203) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2203/files/8cf518a944cd62f87f35cad6db205195b224c513..9c6826254bddecce6b2fe428aa9a1b6a1af1b6d2)] - [**stack/ccv2-create-pt2**](#2247) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2247/files/9c6826254bddecce6b2fe428aa9a1b6a1af1b6d2..6847bd06fd174c05674b34e4970671d5cf073d5a)] - [stack/create-table-utils-pt3](#2250) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2250/files/6847bd06fd174c05674b34e4970671d5cf073d5a..e1035044d9e0c419eaee5d4ebb1284adb8646bf0)] - [stack/create-table-utils-pt4](#2254) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2254/files/e1035044d9e0c419eaee5d4ebb1284adb8646bf0..ba618b1a85366699b2312ea3430d6adec8aefc4d)] --------- ## What changes are proposed in this pull request? UCCommitter previously rejected version 0 commits with an unsupported error. This adds support for table creation by writing the version 0 commit file directly to the published path, bypassing the staged commit + UC ratify flow (which requires the table to already exist in UC). The connector is responsible for finalizing the table via the UC create table API after the commit succeeds. ## How was this change tested? 1. `commit_version_0_writes_published_commit`: verifies the commit file is written to the published path and exists on disk 2. `commit_version_0_conflict_when_file_exists`: verifies conflict response when the version 0 file already exists 3. existing tests pass
…uator (#2160) ## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta-kernel-rs/pull/2160/files) to review incremental changes. - [**stack/skip-name-based-val**](#2160) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2160/files)] --------- ## What changes are proposed in this pull request? When reading `stats_parsed` from checkpoints with column mapping enabled, struct column data has physical names but the expected output schema has logical names. The `validate_array_type` check in `evaluate_expression` does name-based matching via `ensure_data_types`, which rejects the structurally correct data due to the name mismatch. Skip name-based validation for `Column` expressions with `Struct` result type. The downstream ordinal-based `apply_schema` transformation already validates types and handles renaming. ## How was this change tested? New test
…2314) ## What changes are proposed in this pull request? Fixes #2263. This PR updates the `checkpoint` method to return a tuple of `(CheckpointWriteResult, SnapshotRef)` instead of just `SnapshotRef`. This way, callers can easily detect whether a checkpoint was newly written or already existed. Also previously, an existing checkpoint would just be silently overwritten. But with this new logic, we will exit early instead of rewriting the checkpoint. ### This PR affects the following public APIs This is non-backwards compatible for the `Snapshot.checkpoint` API since return type changes. ## How was this change tested? Integration test inside of `kernel/tests/maintenance_ops.rs`
## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta-kernel-rs/pull/2270/files) to review incremental changes. - [**stack/cache-output-schema**](#2270) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2270/files)] - [stack/sidecar-iter](#2271) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2271/files/36ae1abfe90c9c15aec5e24208c8f2454440fd7b..80094b977ef0472f671b07a58ba32e4f1e77c558)] - [stack/engine-update](#2287) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2287/files/80094b977ef0472f671b07a58ba32e4f1e77c558..1f66fc9dcc47fc34f98a33e1512fbeb0115c6b8f)] - [stack/extract-schemas](#2313) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2313/files/1f66fc9dcc47fc34f98a33e1512fbeb0115c6b8f..fecb21e445e8312c48c6a6cfd91c38b86e55d57d)] - [stack/support-sidecar](#2304) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2304/files/fecb21e445e8312c48c6a6cfd91c38b86e55d57d..4a23712205cec1467354acc986ee641a25ebbcb8)] --------- ## What changes are proposed in this pull request? Cached output schema for checkpoint in CheckpointWriter, added utils to do that. The cached output schema will be used in following PRs to perform v2 checkpoint sidecar writes <!-- **Uncomment** this section if there are any changes affecting public APIs. Else, **delete** this section. ### This PR affects the following public APIs If there are breaking changes, please ensure the `breaking-changes` label gets added by CI, and describe why the changes are needed. Note that _new_ public APIs are not considered breaking. --> ## How was this change tested? Existing + added
…t reuse (#2255) # Summary This PR refactors the snapshot FFI to use a builder pattern, while also adding an optimization option that allows passing in an old snapshot. ## Changes The existing flat functions are augmented by a builder-pattern API: **Existing:** ``` c++ // snapshot at latest version snapshot(table_path_slice, engine) // snapshot at specific version snapshot_at_version(table_path_slice, engine, version) // with log tail (catalog-managed) snapshot_with_log_tail(table_path_slice, engine, log_tail) snapshot_at_version_with_log_tail(table_path_slice, engine, version, log_tail) ``` **Builder:** ``` c++ // fresh snapshot from table path HandleMutableFfiSnapshotBuilder builder = get_snapshot_builder(table_path_slice, engine).ok; // incremental update reusing an existing snapshot (avoids re-reading the log) HandleMutableFfiSnapshotBuilder builder = get_snapshot_builder_from(old_snapshot, engine).ok; // optional: pin a version snapshot_builder_set_version(&builder, version); // optional: set log tail (catalog-managed feature) snapshot_builder_set_log_tail(&builder, log_tail); // produce the snapshot SharedSnapshot* snapshot = snapshot_builder_build(builder).ok; // or discard without building free_snapshot_builder(&builder); ``` For backwards compatibility, `snapshot()` and `snapshot_with_log_tail()` functions are kept around for now. Also: fix existing `test_snapshot_log_tail` test. --------- Co-authored-by: Sam Ansmink <samansmink@hotmail.com> Co-authored-by: Nick Lanham <nicklan@users.noreply.github.com> Co-authored-by: chiinlquah <259692855+chiinlquah@users.noreply.github.com>
…ching (#2250) ## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta-kernel-rs/pull/2250/files) to review incremental changes. - [**stack/create-table-utils-pt3**](#2250) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2250/files)] - [stack/create-table-utils-pt4](#2254) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2254/files/c71e28b2c5147bed800bf672f1d9920fff456882..e553d73653b343e65473fdba3833823d6cce74e1)] - [stack/remove-catalog-managed-flag](#2310) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2310/files/e553d73653b343e65473fdba3833823d6cce74e1..21b747eb65a5d99133cf1d6dfabf417564ff92b9)] --------- ## What changes are proposed in this pull request? Expands `CommitMetadata` to carry protocol/metadata state and commit change detection: - `protocol`/`metadata`: read snapshot state - `new_protocol`/`new_metadata`: present when the commit changes P&M - `domain_metadata_changes`: domain metadata actions in this commit - Public getters: `has_protocol_change()`, `has_metadata_change()`, `has_domain_metadata_change` Also adds committer/table type validations: a catalog committer can only commit to catalog-managed tables, and a non-catalog committer can only commit to non-catalog-managed tables. ## How was this change tested? 1. `disallow_catalog_committer_for_non_catalog_managed_table` -- existing non catalog managed table with catalog committer returns error 2. `disallow_catalog_committer_for_non_catalog_managed_create_table` -- create-table without catalogManaged with catalog committer returns error 3. Updated `test_commit_metadata` and `FileSystemCommitter` tests for new `CommitMetadata::new` signature 4. Existing tests pass
## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta-kernel-rs/pull/2254/files) to review incremental changes. - [**stack/create-table-utils-pt4**](#2254) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2254/files)] - [stack/remove-catalog-managed-flag](#2310) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2310/files/9adf0e37b1a76b8c401e34bd4dd3abdcce8a6dd3..bbe546c9476d0f2fa5189b35818a923819599127)] --------- ## What changes are proposed in this pull request? Adds UCCommitter validation to enforce catalog-managed table invariants on every commit. The committer validates required protocol features (catalogManaged, vacuumProtocolCheck, inCommitTimestamp), metadata properties (table ID match, ICT enabled), and rejects ALTER TABLE operations (protocol/metadata/clustering changes) and upgrade/downgrade attempts. This PR also introduces a `require!` macro and centralized `errors.rs` module for validation logic. ## How was this change tested? 9 new tests added, existing tests pass.
## What changes are proposed in this pull request? Extracts the in-memory test table setup logic from `scan.rs` into a shared `setup_snapshot` helper in `ffi_test_utils.rs`. This removes the duplicated `setup_test_table` function and makes the helper available to other FFI test modules (e.g. transaction tests). ## How was this change tested? Existing `scan_builder_tests` pass unchanged -- they now call the shared helper instead of a local copy.
…ection (#2170) ## What changes are proposed in this pull request? Addresses #2165 When a predicate references a column added after the checkpoint was written, the parquet projection mask selects no leaf columns inside stats structs like `minValues`/`maxValues`, causing the parquet reader to omit them -- but `get_indices` still created `Nested` reorder entries that indexed into the missing columns To fix, we add a guard that creates a `Missing` entry instead of a `Nested` entry if none of struct childrens match. Example 1. V0: create table with schema `[id: long, name: string]`, `writeStatsAsStruct=true` 2. V1: write data, create checkpoint -- `stats_parsed` covers `{numRecords, nullCount: {id, name}, minValues: {id, name}, maxValues: {id, name}}` 3. V2: schema evolves to `[id: long, name: string, age: long]` 4. V3: write more data 5. Scan with predicate `age > 30` -- **panics** ## How was this change tested?
## What changes are proposed in this pull request? Adds the ability to read and write file histograms using lazy crcs. Adds two fields to FileStatsDelta (Added Histogram and Removed Histogram) which each track the added and removed file histograms for each transaction. If there was a CRC with a set of histogram buckets before this commit then those buckets are used otherwise we use the default buckets. Consolidation of the histogram happens in the CRC Delta apply and is simply Base Histogram + Added Histogram - Removed Histogram. If we run into histogram combination issues such as negative histograms we just drop the histogram from the CRC all together (it is an optional field). ## How was this change tested? Unit tests and integration tests
## What changes are proposed in this pull request? In #2266 , the `Snapshot::get_timestamp` was added, providing the capability to read the timestamp of a Snapshot. This PR added an FFI function `snapshot_timestamp`, allowing other language to use the `Snapshot::get_timestamp` method. ## How was this change tested? New unit test was added.
…constants (#1717) (#1774) ## What changes are proposed in this pull request? - Replace separate *_INDEX + string field-name constants in ActionReconciliationVisitor with GetterColumn { index, name } constants. - Update call sites to use .index when selecting getters[...] and .name when calling get_*. - Keep the column order aligned with selected_column_names_and_types(). ## How was this change tested? - cargo test -p delta_kernel - cargo clippy -p delta_kernel --all-features --tests --benches -- -D warnings Fixes #1717 --------- Co-authored-by: Nick Lanham <nicklan@users.noreply.github.com>
## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta-kernel-rs/pull/2296/files) to review incremental changes. - [**stack/ffi/create-table**](#2296) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2296/files)] - [stack/ffi/remove-files](#2297) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2297/files/fa37f1bcc100c8c9cabf2586c801760f828a7a7f..4d52a5ebfaedb7f0993fee391bd611209166082f)] --------- ## What changes are proposed in this pull request? Adds FFI support for creating Delta tables via a builder pattern (following the convention from #2255): - `get_create_table_builder()` -- creates an `FfiCreateTableBuilder` from a path, schema, and engine info - `create_table_builder_with_table_property()` -- adds a single table property (consumes and returns the builder handle) - `create_table_builder_with_committer()` -- sets a custom committer (e.g. Unity Catalog); defaults to `FileSystemCommitter` if not called - `create_table_builder_build()` -- builds the transaction and returns a standard `ExclusiveTransaction` handle, enabling the caller to optionally stage initial data with `add_files()` before committing with `commit()` (CTAS support) - `free_create_table_builder()` -- frees the builder without building The create-table flow reuses the existing transaction FFI functions (`add_files`, `set_data_change`, `commit`) rather than bundling build+commit into a single call. At the FFI boundary, `Transaction<CreateTable>` is transmuted to `Transaction<ExistingTable>` since they have identical layouts (only `PhantomData<S>` differs) and runtime create-table behavior is driven by the snapshot version, not the type parameter. Also extracts `commit_result_to_version` as a shared helper used by both `commit` and the create-table path, replacing the inline match in `commit`. ### This PR affects the following public APIs New FFI functions: `get_create_table_builder`, `create_table_builder_with_table_property`, `create_table_builder_with_committer`, `create_table_builder_build`, `free_create_table_builder`. New FFI types: `ExclusiveCreateTableBuilder`, `FfiCreateTableBuilder`. ## How was this change tested? Unit tests covering: - Basic table creation with schema verification (build + commit two-step) - Table creation with custom properties - Table creation with a custom committer handle - Error when creating a table that already exists - Error on empty schema - Builder free without commit (no leak/panic)
## What changes are proposed in this pull request? ### This PR affects the following public APIs Removes the backwards-compatible snapshot FFI convenience functions that were kept around in #2255 when the builder API was introduced: - `snapshot(path, engine)` - `snapshot_at_version(path, engine, version)` - `snapshot_with_log_tail(path, engine, log_tail)` (catalog-managed) - `snapshot_at_version_with_log_tail(path, engine, version, log_tail)` (catalog-managed) Callers should use the builder API instead: - `get_snapshot_builder` / `get_snapshot_builder_from` - `snapshot_builder_set_version` - `snapshot_builder_set_log_tail` - `snapshot_builder_build` All internal test call sites are updated to use the builder API. The C example comment referencing the old functions is also updated. ## How was this change tested? All existing FFI tests updated and passing: `cargo test -p delta_kernel_ffi --all-features` (51 passed) `cargo clippy -p delta_kernel_ffi --all-features --tests -- -D warnings` (clean)
## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta-kernel-rs/pull/2310/files) to review incremental changes. - [**stack/remove-catalog-managed-flag**](#2310) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2310/files)] --------- ## What changes are proposed in this pull request? Removes the `catalog-managed` Cargo feature flag, making catalog-managed table support always compiled in. The feature was already enabled by all consumers (UC crate, FFI, kernel dev-deps) and runtime behavior is gated by whether the table has `catalogManaged` in its protocol. This is a breaking change because the feature flag no longer exists in `kernel/Cargo.toml` or `ffi/Cargo.toml`, so any downstream `Cargo.toml` referencing it will fail to compile. `test_feature_signal_accepted` now includes `CatalogManaged` as a case (merged from the standalone `test_feature_signal_accepted_catalog_managed`) ## How was this change tested? 1. All previously `#[cfg(feature = "catalog-managed")]`gated tests now run unconditionally and pass 2. Full workspace `cargo nextest run --all-features` passes
## What changes are proposed in this pull request? Adds FFI functions for removing files from a Delta table transaction: - `remove_files` -- removes files using raw engine data and a selection vector. The selection vector indicates which rows in the data represent files to remove (true = selected). A null/empty selection vector selects all rows. These functions enable C/C++ consumers to implement DELETE/MERGE operations by scanning a table, identifying files to rewrite, and removing the originals. ### This PR affects the following public APIs New FFI functions: `remove_files` ## How was this change tested? Unit tests covering: - `remove_files` with selection vector: same end-to-end flow using raw engine data - Null selection vector creates empty `FilteredEngineData` (all rows selected)
## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta-kernel-rs/pull/2324/files) to review incremental changes. - [**stack/refactor-helpers**](#2324) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/2324/files)] - [stack/row-group-1](#1893) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1893/files/89b4729dc0e45dbd22dd5b80dfbcdecad9aa4610..1d1a8d46072ead0fb2abc3a0c54e9311632e84c2)] - [stack/row-group-2](#1931) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1931/files/1d1a8d46072ead0fb2abc3a0c54e9311632e84c2..6c11c9f3d6b6a38ace81d38acef096b3af5ba52e)] --------- ## What changes are proposed in this pull request? Extracts stat extraction helpers from `RowGroupFilter` methods into shared free functions, preparing for reuse by `CheckpointRowGroupFilter` ## How was this change tested? Existing tests pass unchanged (test impls updated to wrap return values in `Some()`).
## What changes are proposed in this pull request? Adds `SharedProtocol` and `SharedMetadata` as first-class opaque FFI handle types, giving FFI callers structured access to snapshot protocol and metadata fields. Previously there was no way for FFI callers to access protocol versions, table features, or core metadata fields (id, name, description, format provider, created time). This PR adds the full surface via two new handle types and visitor-based accessors. **New FFI functions:** - `snapshot_get_protocol` / `free_protocol` -- acquire and release a protocol handle - `visit_protocol(handle, ctx, visit_versions, visit_feature)` -- single-call visitor that reports `(minReaderVersion, minWriterVersion)` and iterates reader/writer feature names (skipped for legacy protocols) - `snapshot_get_metadata` / `free_metadata` -- acquire and release a metadata handle - `visit_metadata(handle, allocate_fn, ctx, visitor)` -- single-call visitor that reports id, name, description, format provider, and created time (strings allocated via caller's `allocate_fn`) **Kernel change:** Adds `Metadata::format_provider() -> &str` (gated behind `#[internal_api]`) since `Metadata.format` is a private field with no existing accessor. ## How was this change tested? 4 new `#[tokio::test]` cases in `ffi/src/lib.rs`: - `test_visit_protocol_legacy` -- legacy protocol (minRV=1, minWV=2): verifies versions, confirms feature visitor is never called - `test_visit_protocol_with_features` -- modern protocol (minRV=3, minWV=7) with columnMapping + rowTracking: verifies both reader and writer feature callbacks - `test_visit_metadata_default` -- default metadata: id, null name/description, format provider, createdTime - `test_visit_metadata_with_name` -- metadata with name set: id, name, createdTime
## What changes are proposed in this pull request? ### This PR affects the following public APIs Two new FFI functions are added (not breaking -- purely additive): - `with_domain_metadata(txn, domain, configuration, engine)` -- add a `domainMetadata` action to a transaction - `with_domain_metadata_removed(txn, domain, engine)` -- add a `domainMetadata` removal tombstone to a transaction Both follow the existing consume-and-return handle pattern used by `with_engine_info`. ## How was this change tested? Three new FFI tests: - `test_domain_metadata_add_and_remove` -- happy path: adds domain metadata, commits, verifies JSON in commit log; then removes it in a second transaction and verifies the tombstone - `test_domain_metadata_system_domain_rejected_at_commit` -- error path: system `delta.*` domain is rejected at commit - `test_domain_metadata_duplicate_domain_rejected_at_commit` -- error path: duplicate domain in a single transaction is rejected at commit
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes are proposed in this pull request?
How was this change tested?