Skip to content

Sync with base repo#1

Closed
lizardoluis wants to merge 457 commits intosplitgraph:mainfrom
delta-io:main
Closed

Sync with base repo#1
lizardoluis wants to merge 457 commits intosplitgraph:mainfrom
delta-io:main

Conversation

@lizardoluis
Copy link
Copy Markdown

What changes are proposed in this pull request?

How was this change tested?

chiinlquah and others added 30 commits February 17, 2026 17:37
…lient (#1854)

## What changes are proposed in this pull request?

Split the single UCCommitsClient trait into two focused traits:

- UCCommitClient: only commit() — used by UCCommitter
- UCGetCommitsClient: only get_commits() — used by UCCatalog

This allows consumers like the UC-Committer to depend only on the commit
capability without pulling in the get_commits interface.

UCCommitsRestClient and InMemoryCommitsClient implement both traits.

<!--
**Uncomment** this section if there are any changes affecting public
APIs. Else, **delete** this section.
### This PR affects the following public APIs
If there are breaking changes, please ensure the `breaking-changes`
label gets added by CI, and describe why the changes are needed.
Note that _new_ public APIs are not considered breaking.
-->

## How was this change tested?
Existing unit test sufficient for refactoring changes
## What changes are proposed in this pull request?

Instead of adding the value in the loop, we push the operation down to
Arrow which is much faster.

## How was this change tested?
## What changes are proposed in this pull request?

Update Cargo.toml to avoid resolving broken native-tls 0.2.17
* See rust-native-tls/rust-native-tls#370

## How was this change tested?

CI passes now.
)

## 🥞 Stacked PR
Use this
[link](https://github.com/delta-io/delta-kernel-rs/pull/1759/files/378c06c6a8b934bd4b212e71e666fa53e244e6b8..da2e2bb13219eb868e7a64c8e2a73677d68ab671)
to review incremental changes.
-
[stack/1684-read-config](#1758)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1758/files)]
-
[**stack/1682-table-info**](#1759)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1759/files/378c06c6a8b934bd4b212e71e666fa53e244e6b8..da2e2bb13219eb868e7a64c8e2a73677d68ab671)]
-
[stack/1681-read-spec](#1760)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1760/files/db121ff255a42c8f0b8784dbbaf7cff9e982fefc..75d88fb3300b8b35ac1da5b814fd63fce6f93359)]
-
[stack/1807-workload-spec-models](#1816)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1816/files/75d88fb3300b8b35ac1da5b814fd63fce6f93359..7b140ffdf8f9eb64fcb56bbb26addff61a704579)]
-
[stack/1793-read-metadata-runner](#1826)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1826/files/3bf783e6c5d75325c2a08f5cc2b8e42243a2c031..1f047b8b5efbbf4cd4b9f5213eb13de9177debfd)]
-
[stack/workload-loading](#1867)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1867/files/e6483d21687058267890110c6784692fa9cfebed..fe148876ccd411ed3814a6c37c862f6d1e3d3108)]
-
[stack/criterion-benchmarking](#1871)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1871/files/fe148876ccd411ed3814a6c37c862f6d1e3d3108..d1201313f4308a3d79a271231dcc82c7cbf63b50)]

---------
## What changes are proposed in this pull request?

<!--
**Uncomment** this section if there are any changes affecting public
APIs. Else, **delete** this section.
### This PR affects the following public APIs
If there are breaking changes, please ensure the `breaking-changes`
label gets added by CI, and describe why the changes are needed.
Note that _new_ public APIs are not considered breaking.
-->
fix: #1682  (explained in ticket details)

## How was this change tested?
Tests added to verify correct deserialization behavior of TableInfo for
the following cases:
- All fields present
- No name
- No description
- Extra fields
)

## 🥞 Stacked PR
Use this
[link](https://github.com/delta-io/delta-kernel-rs/pull/1760/files/db121ff255a42c8f0b8784dbbaf7cff9e982fefc..75d88fb3300b8b35ac1da5b814fd63fce6f93359)
to review incremental changes.
-
[stack/1684-read-config](#1758)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1758/files)]
-
[stack/1682-table-info](#1759)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1759/files/378c06c6a8b934bd4b212e71e666fa53e244e6b8..da2e2bb13219eb868e7a64c8e2a73677d68ab671)]
-
[**stack/1681-read-spec**](#1760)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1760/files/db121ff255a42c8f0b8784dbbaf7cff9e982fefc..75d88fb3300b8b35ac1da5b814fd63fce6f93359)]
-
[stack/1807-workload-spec-models](#1816)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1816/files/75d88fb3300b8b35ac1da5b814fd63fce6f93359..7b140ffdf8f9eb64fcb56bbb26addff61a704579)]
-
[stack/1793-read-metadata-runner](#1826)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1826/files/3bf783e6c5d75325c2a08f5cc2b8e42243a2c031..1f047b8b5efbbf4cd4b9f5213eb13de9177debfd)]
-
[stack/workload-loading](#1867)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1867/files/e6483d21687058267890110c6784692fa9cfebed..fe148876ccd411ed3814a6c37c862f6d1e3d3108)]
-
[stack/criterion-benchmarking](#1871)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1871/files/fe148876ccd411ed3814a6c37c862f6d1e3d3108..d1201313f4308a3d79a271231dcc82c7cbf63b50)]

---------
## What changes are proposed in this pull request?

<!--
**Uncomment** this section if there are any changes affecting public
APIs. Else, **delete** this section.
### This PR affects the following public APIs
If there are breaking changes, please ensure the `breaking-changes`
label gets added by CI, and describe why the changes are needed.
Note that _new_ public APIs are not considered breaking.
-->
fix: #1681 (explained in ticket details)

## How was this change tested?
Tests added to verify correct deserialization behavior for the following
cases:
- Version included
- Version not included
- Missing type
- Unsupported type
- Extra fields
…ime API safety (#1842)

## What changes are proposed in this pull request?

```
Replace the CreateTableTransaction newtype wrapper with a type-state pattern using Transaction<S> generic over marker types ExistingTable and CreateTable. This provides compile-time prevention of invalid operations on create-table transactions (e.g., file removal, blind append, DV updates) instead of deferring to commit-time validation.

What's a no-op (code moves / reorganization — can skim)
The bulk of the diff is pure code movement with no logic changes:
create_table.rs → builder/create_table.rs (~858 lines): CreateTableTransactionBuilder and all its validation helpers moved to a new transaction/builder/ submodule. Zero logic changes — just a different file.
CreateTableTransaction type alias + docs moved from mod.rs → create_table.rs (re-exported, so all import paths still work).
try_new_create_table constructor moved from mod.rs → create_table.rs.
Create-table unit tests moved from mod.rs → kernel/tests/create_table.rs (integration tests).
Removed test_table_setup internal helper from kernel/src/utils.rs (no longer needed; integration tests use test_utils crate).

> Tip: Review commit-by-commit. The first commit (c30210c) contains the actual type-state refactor. The second commit (5f8f1d3) is purely file reorganization with no functional changes.
What needs detailed review
All of the actual logic changes are in kernel/src/transaction/mod.rs (first commit):
Type-state pattern on Transaction<S> — Transaction gains a PhantomData<S> marker. S is either ExistingTable (default, full API) or CreateTable (restricted API).
impl block split — Methods are now split by which transaction types they apply to:
impl<S> Transaction<S> — shared methods: commit(), with_domain_metadata(), with_engine_info(), with_data_change(), add_files(), get_write_context(), stats_columns(), stats_schema(), all validation and action generation.
impl Transaction (i.e. Transaction<ExistingTable>) — existing-table-only: with_blind_append(), with_operation(), with_transaction_id(), with_domain_metadata_removed(), remove_files(), update_deletion_vectors(), with_committer().
Type-state flows through return types — CommitResult<S>, ConflictedTransaction<S>, and RetryableTransaction<S> are now generic, preserving the type-state marker across commit retries.
Runtime defense-in-depth checks preserved — The existing is_create_table() runtime checks in validate_blind_append_semantics(), validate_domain_metadata(), etc. are kept as defensive fallbacks. The test_validate_blind_append_rejects_create_table unit test verifies these still work by directly setting private fields.
```
Closes #1768

<!--
**Uncomment** this section if there are any changes affecting public
APIs. Else, **delete** this section.
### This PR affects the following public APIs
If there are breaking changes, please ensure the `breaking-changes`
label gets added by CI, and describe why the changes are needed.
Note that _new_ public APIs are not considered breaking.
-->

## How was this change tested?
## What changes are proposed in this pull request?

There's a semantic gap in strum's `EnumString`: The parsing API it
derives is always fallible (impl `FromString` and `TryFrom`, both with
strum parsing error), but enums with a default variant have infallible
parsing (because any unknown value goes into the default).

Define and use a new trait, `IntoTableFeature`, which mimicks `Into`. We
can't "just" impl From for TableFeature because the blanket `impl<T:
From> TryFrom for T` conflicts with the `impl TryFrom for TableFeature`
strum emits.

Result: Code that works with table features no longer needs to
parse+unwrap, and there's now a narrow waist to update some day if/when
this gets fixed upstream (just remove the extension trait and `impl From
for TableFeature` instead).

See also
* Peternator7/strum#430
* Peternator7/strum#432

## How was this change tested?

Updated unit tests.
## What changes are proposed in this pull request?

A bunch of feature-related unit tests use illegal (bogus or incomplete)
protocol and metadata combinations that kernel isn't smart enough to
detect and reject yet. Fix them.

## How was this change tested?

Test-only change.
## Summary
This PR adds nextest support to the miri test job in CI, providing
consistency with other test jobs and better test output.

## Changes
- Install nextest action in miri job
- Switch from `cargo miri test` to `cargo miri nextest run`
- Maintains same feature flags (`default-engine-rustls`) and environment
variables (`MIRIFLAGS=-Zmiri-disable-isolation`)

## Benefits
- Consistent test runner across all CI jobs
- Better test output and reporting
- Improved test execution insights

## Testing
The changes will be validated when CI runs on this PR.

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: chiinlquah <chiinlquah@gmail.com>
## What changes are proposed in this pull request?

Unpin the Miri nightly toolchain version that was pinned in #1845. The
upstream issue (rust-lang/miri#4855) has been fixed, so we no longer
need to pin to `nightly-2026-02-05`.

  ## How was this change tested?

  CI should pass with the unpinned nightly toolchain.
## What changes are proposed in this pull request?

This extends visitors to be able to visit REE columns. The initial need
is to visit the _filename metadata column which is string only. i added
support for all primitive types just to avoid future possible surprised
(one place of extension could be to use REE with literal transform
expressions)

## How was this change tested?

Added unit tests.

---------

Co-authored-by: Nick Lanham <nicklan@users.noreply.github.com>
…commit (#1811)

Co-authored-by: Louis Shawn <louis.shawn@qq.com>
Co-authored-by: Drake Lin <drakelin18@gmail.com>
Co-authored-by: Nick Lanham <nicklan@users.noreply.github.com>
Co-authored-by: Louis Shawn <louis.shawn@qq.com>
Co-authored-by: Nick Lanham <nicklan@users.noreply.github.com>
## 🥞 Stacked PR
Use this
[link](https://github.com/delta-io/delta-kernel-rs/pull/1836/files) to
review incremental changes.
-
[**stack/use-physical-stats**](#1836)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1836/files)]
-
[stack/write-ctx-cm](#1837)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1837/files/b168896d4907e4c8ceffa0a528c5faa59d8e2ea4..30afa922e9716d016acdfb002eb71477d66e3143)]
-
[stack/correct-transform](#1862)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1862/files/30afa922e9716d016acdfb002eb71477d66e3143..cca68bd1278db0b4df42e4e6c309a784c918a268)]
-
[stack/cm-partition](#1870)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1870/files/cca68bd1278db0b4df42e4e6c309a784c918a268..68e8407a7599dd006d540a8c8e3e6bcbd02341bd)]
-
[stack/support-cm-write](#1863)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1863/files/68e8407a7599dd006d540a8c8e3e6bcbd02341bd..dfcb7beb7f39a50d0ba6439e27b161030b1794f5)]
-
[stack/support-CM-with-flag](#1910)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1910/files/cca68bd1278db0b4df42e4e6c309a784c918a268..eb1c6ce93ac2e97825e0d60be387a81fded683c6)]

---------
## What changes are proposed in this pull request?
In the past, when we generate `WriteContext` in `Transaction`, we use
logical column names for stats columns. As `WriteContext` is used for
writes, it should be physical names. This PR changes it to physical
column names.
<!--
**Uncomment** this section if there are any changes affecting public
APIs. Else, **delete** this section.
### This PR affects the following public APIs
If there are breaking changes, please ensure the `breaking-changes`
label gets added by CI, and describe why the changes are needed.
Note that _new_ public APIs are not considered breaking.
-->

## How was this change tested?
Added tests
…ng and `materializePartitionColumns` (#1837)

## 🥞 Stacked PR
Use this
[link](https://github.com/delta-io/delta-kernel-rs/pull/1837/files) to
review incremental changes.
-
[**stack/write-ctx-cm**](#1837)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1837/files)]
-
[stack/correct-transform](#1862)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1862/files/105c22d3f8eb7ddf273eaf94666e58fbd7079d7f..9d2156a008912b9578309c20919e4f9c56951e98)]
-
[stack/cm-partition](#1870)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1870/files/9d2156a008912b9578309c20919e4f9c56951e98..0879a373beac7290a93e2d5fe008c58bac31f238)]
-
[stack/support-cm-write](#1863)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1863/files/0879a373beac7290a93e2d5fe008c58bac31f238..8c19a6d03ef12ebaba35eaa12dfc0ac3f8e0ebd9)]
-
[stack/support-CM-with-flag](#1910)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1910/files/9d2156a008912b9578309c20919e4f9c56951e98..b591037ece8a8cf30e25f7fb5a0ff609d824baf0)]

---------
## What changes are proposed in this pull request?
In the past, when we generate `WriteContext` in Transaction, the
`physical_schema` field doesn't consider column mapping and
`materializePartitionColumns`. This PR is to generate the
`physical_schema` w.r.t column mapping and
`materializePartitionColumns`.
<!--
**Uncomment** this section if there are any changes affecting public
APIs. Else, **delete** this section.
### This PR affects the following public APIs
If there are breaking changes, please ensure the `breaking-changes`
label gets added by CI, and describe why the changes are needed.
Note that _new_ public APIs are not considered breaking.
-->

## How was this change tested?
Added tests.
## What changes are proposed in this pull request?

Move `Protocol::is_catalog_managed` to
`TableConfiguration::is_catalog_managed`, alongside other similar
methods.

Many call sites also invoke
`table_configuration.protocol().partition_columns()`. Add and use
`TableConfiguration::partition_columns` instead, for simpler code. While
we're at it, make the method return`&[String]` instead of `&Vec<String>`
(more idiomatic).

Both changes reinforces `TableConfiguration` as the preferred way to
interact with a table's metadata, rather than dipping into raw
`Protocol` or `Metadata`.

## How was this change tested?

Compilation suffices.
## What changes are proposed in this pull request?

uses
[install-and-cache-homebrew-tools](https://github.com/marketplace/actions/install-and-cache-homebrew-tools)
on OSX.

Note this does remove an explicit error if we somehow run on not linux
or osx (which shouldn't happen since we specify that this should only
run there), but regardless if no arrow-glib is installed then the tests
themselves will fail to compile so we'll know something went wrong.

## How was this change tested?

Checking the ci runs. After one run it hits the cache and greatly
reduces runtime:
<img width="2601" height="650" alt="image"
src="https://github.com/user-attachments/assets/878be5a1-c0f8-4d06-8f9c-7572f0460282"
/>

[cache miss
run](https://github.com/delta-io/delta-kernel-rs/actions/runs/22208011288/job/64236127126?pr=1909)
[cache hit
run](https://github.com/delta-io/delta-kernel-rs/actions/runs/22208205504/job/64236686596?pr=1909)
## What changes are proposed in this pull request?
This PR improves memory allocation efficiency by pre-allocating `Vec`
and `HashSet` collections when the size is known or can be reasonably
estimated. This avoids repeated reallocations as collections grow.

## How was this change tested?
- Existing test suite passes
- Changes are allocation-only optimizations with no behavioral changes

---------

Co-authored-by: emkornfield <emkornfield@gmail.com>
…1907)

## What changes are proposed in this pull request?

Fold `Protocol::validate_table_features` into `Protocol::try_new` and
update all tests to actually use the constructor.

Also define convenience constructors `Protocol::try_new_legacy` and
`Protocol::try_new_modern` and use them where appropriate.

## How was this change tested?

Mostly a test change. Prod call sides verified by compiler and unit
tests.

---------

Co-authored-by: Nick Lanham <nicklan@users.noreply.github.com>
…es (#1901)

## 🥞 Stacked PR
Use this
[link](https://github.com/delta-io/delta-kernel-rs/pull/1901/files) to
review incremental changes.
-
[**stack/stats-validation-types**](#1901)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1901/files)]
-
[stack/getdata-then-simplify](#1918)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1918/files/9318ff760c960332e35048f9e62d4d2d6639b022..ce4221d92b355f983f3eed3607a066f90647b706)]
-
[stack/stats-validation](#1667)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1667/files/ce4221d92b355f983f3eed3607a066f90647b706..54aa9ea9d212ca263bbc41c171b9f5efae878c5c)]

---------
## What changes are proposed in this pull request?

implement GetData for types which didn't have it
## How was this change tested?
The ones didn't have unit test, added them
## 🥞 Stacked PR
Use this
[link](https://github.com/delta-io/delta-kernel-rs/pull/1918/files) to
review incremental changes.
-
[**stack/getdata-then-simplify**](#1918)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1918/files)]
-
[stack/stats-validation](#1667)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1667/files/301f01205582cff6560384ef49a042872b66418d..2529aea0f508e1511222fee83bb401ecf5de6000)]

---------
## What changes are proposed in this pull request?

quick followup to previous pr

## How was this change tested?
Existing unit tests
## What changes are proposed in this pull request?

Reader protocol 3 and writer protocol 7 are special, because they
introduce the concept of table features (which require feature lists
where lower protocols forbid them). But they are _also_ special because
they're currently the highest protocol version numbers that kernel
supports.

Replace a bunch of magic values in the code with proper constants to
make clearer what's going on. If/when the Delta spec bumps the max
protocol version number, we'll hopefully have fewer ambiguous `3` and
`7` values that need careful auditing.

## How was this change tested?

Mechanical replacement of literals with same-valued constants.
Compilation and unit tests cover it.
…1818)

## What changes are proposed in this pull request?

Before this change, there was an assumption that columns would be
extracted in the order they appear in the schema. This is an easy
prerequisite to mess up and this code is typically not on the hot path,
so we can afford to be more robust.

According to the [trait
docs](https://github.com/delta-io/delta-kernel-rs/blob/main/kernel/src/engine_data.rs#L294):

```
/// The names and types of leaf fields this visitor accesses. The `EngineData` being visited
/// validates these types when extracting column getters, and [`RowVisitor::visit`] will receive
/// one getter for each selected field, in the requested order.
```

## How was this change tested?

Added additional unit test to demonstrate the issue.
## 🥞 Stacked PR
Use this
[link](https://github.com/delta-io/delta-kernel-rs/pull/1862/files) to
review incremental changes.
-
[**stack/correct-transform**](#1862)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1862/files)]
-
[stack/cm-partition](#1870)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1870/files/6b36ecc5022a5be295d128fe6b66192b74d39614..76691de21b81671453a8e52a1fd58b58fdcbf1fe)]
-
[stack/support-cm-write](#1863)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1863/files/76691de21b81671453a8e52a1fd58b58fdcbf1fe..1e938eb9de2f5f8a52943ae794984585867f31a5)]
-
[stack/support-CM-with-flag](#1910)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1910/files/6b36ecc5022a5be295d128fe6b66192b74d39614..7e184b835db7ac1142cb878524561847007dc297)]

---------
## What changes are proposed in this pull request?
Add with_dropped_field_if_exists to Transform for optionally dropping
fields that may not exist in the input, and use it in
generate_logical_to_physical to drop partition columns without erroring
when the input data doesn't contain them.

Change `WriteContext::logical_to_physical` will now correctly rename
nested struct fields to their physical names under column mapping.


<!--
**Uncomment** this section if there are any changes affecting public
APIs. Else, **delete** this section.
### This PR affects the following public APIs
If there are breaking changes, please ensure the `breaking-changes`
label gets added by CI, and describe why the changes are needed.
Note that _new_ public APIs are not considered breaking.
-->

## How was this change tested?
Added test to verify the renaming successful and
`with_dropped_field_if_exists` works as expected
…code refactor. (#1872)

## What changes are proposed in this pull request?

Closes #1657

Currently, there is no Schema modification method supporting
insert/remove/replace a field from the Schema. Schema modification is
left to the caller to construct, which create the boilerplate code
across the codebase. This PR introduced the schema modification method
for inserting, removing and replacing a field from schema. In addition,
the schema modification method is integrated into the stats_transform to
reduce boilerplate code when creating schema.

Change included:
- New StructType method for schema modification : 
- `pub fn with_field_inserted_after(mut self, after: Option<&str>,
new_field: StructField) -> Self`
    -  `pub fn with_field_removed(mut self, name: &str) -> Self`
- `pub fn with_field_replaced(mut self, name: &str, new_field:
StructField)`
- `pub fn with_field_inserted_after(mut self, before: Option<&str>,
new_field: StructField) -> Self`
- Unit-test for StructType to verify the correct behaviour of the schema
modification method `schema::tests::test_with_field_`
- Update the stats_transform.rs function `transform_add_schema`,
`build_add_output_schema`, `add_stats_parsed_to_add_schema`

## How was this change tested?
- Unit test for Schema modification method: `cargo test --package
delta_kernel --lib -- schema::tests::test_with_field_`
- Unit test for stat_transform `cargo test --package delta_kernel --lib
-- checkpoint::stats_transform::test`

---------

Co-authored-by: Nick Lanham <nicklan@users.noreply.github.com>
lorenarosati and others added 27 commits April 1, 2026 01:45
## What changes are proposed in this pull request?

This PR adds documentation for new tags added to the workloads tarball.

Current plan:
- If users want to fully understand the tables that exist and/or test
their features with new tables quickly, they should run benchmarking
locally
- We've added more tags for CI to be more useful in the short-term; the
README explains generally what each tag is used for (along with the
names of the tables that currently have those tags) and the user can
decide themselves which tags they think are relevant for their PRs.
After initially running benchmarking with these new tags, if the user
wanted more detail on which tables were ran they can run benchmarking
locally to inspect the tables more closely
- I opted to not add any more detail in the README about the specific
tables that the tags correspond to (beyond the table names) because at
that point, that would be just reiterating the table info. This can also
get out of date from the tarball quickly.

When these new tags are approved, the tarball will be updated.

## How was this change tested?
Tarball sent to @OussamaSaoudi, can also be sent to anyone else if they
want to inspect the tables (but the README changes describe which tables
have the updated tags)
## What changes are proposed in this pull request?

Added a line mentioning that the timing in CI isn't fully accurate and
there isn't a quick fix for that, but benchmarking is still useful to
see performance changes - see this ticket:
#2301

## How was this change tested?
…n columns (#2142)

## What changes are proposed in this pull request?
In the past, we compute physical schema without partition columns
repeatedly. The PR is to centralize them to `TableConfiguration`
<!--
**Uncomment** this section if there are any changes affecting public
APIs. Else, **delete** this section.
### This PR affects the following public APIs
If there are breaking changes, please ensure the `breaking-changes`
label gets added by CI, and describe why the changes are needed.
Note that _new_ public APIs are not considered breaking.
-->

## How was this change tested?
Existing
## What changes are proposed in this pull request?

Currently, there is no method to read the Snapshot's timestamp according
to Delta protocol. This PR added a method to get Snapshot's timestamp
based on the in commit timestamp (ICT). If the ICT table properties is
enabled in the table, the Snapshot's timestamp will be extracted from
the in commit timestamp via `snapshot::get_in_commit_timestamp`. If the
ICT properties is disabled, snapshot tried to read the latest file
modification timestamp from the commit logs. Here is the reference
implementation from Delta protocol and the java-kernel:

- Delta protocol:
https://github.com/delta-io/delta/blob/master/PROTOCOL.md#recommendations-for-readers-of-tables-with-in-commit-timestamps
- Java kernel reference implementation:
https://github.com/delta-io/delta/blob/fc7d92cf740c96a955537a7b262ce578ce47c40e/kernel/kernel-api/src/main/java/io/delta/kernel/internal/SnapshotImpl.java#L172

In the future, the Snapshot::get_timestamp will be exposed to FFI
function, allowing engines to use it.

## How was this change tested?
- New unit test was added.
- `cargo test --package delta_kernel --lib --all-features --
snapshot::tests::test_get_timestamp`
…2281)

## What changes are proposed in this pull request?

- `remove.tags`, `remove.partitionValues`, `checkpointMetadata.tags`,
`cdc.tags`, and `sidecar.tags` all allow null values in map. These null
values will be ignored when read by kernel.
- Extend the `#[allow_null_container_values]` derive macro attribute to
support `Option<HashMap<...>>`
  fields

The Delta protocol allows null values in partitionValues maps (a null
partition value means the
partition column is null for that file) and in tags maps. The Add action
schema already handled this
correctly, but Remove and other actions did not, causing Arrow
validation errors when reading actions
  with null map values.


## How was this change tested?
- New parameterized test read_actions_with_null_map_values covers null
map values across all action
  types (remove, add, cdc, sidecar, checkpointMetadata). 
- NOTE: commitInfo.operationParameters and metadata.configuration are
left as future work.
- verified `#[allow_null_container_values]` works on
Option<HashMap<...>> and fails on other non-hashmap Optionals.
  - Existing schema tests updated to expect nullable map values
##   What changes are proposed in this pull request?

Add support for benchmarking remote Delta tables via Unity Catalog
credential vending and
  direct S3 access.

- Add `CatalogInfo` to `TableInfo` for UC-managed tables. This is
mutually exclusive with table_path.
- `resolve_snapshot_strategy` handles UC credential vending, S3 engine
setup from env vars, or
local filesystem -- returning an engine + `SnapshotStrategy` pair used
by both runners. The SnapshotStrategy enum determines how snapshots are
loaded:
- Standard: uses `Snapshot::builder_for` (local, S3, or UC
non-catalog-managed tables)
- CatalogManaged: uses `UCKernelClient::load_snapshot` (This is detected
via `delta.feature.catalogManaged` in UC's get_table response
properties)
- Add `KERNEL_BENCH_WORKLOAD_DIR` env var to load workloads from an
external directory (for
  remote tables without local data).
- Runners now accept Arc<Runtime> instead of Arc<dyn Engine>,
constructing per-table engines
   internally.

##  How was this change tested?

- Unit tests for CatalogInfo deserialization, TimeTravel::as_version,
unsupported URL
  schemes, and snapshot construction with time travel.
- existing workloads are unaffected.
  - Validated end-to-end on a live UC catalog-managed table confirming
SnapshotStrategy::CatalogManaged is correctly selected and
UCKernelClient::load_snapshot
  succeeds.
## 🥞 Stacked PR
Use this
[link](https://github.com/delta-io/delta-kernel-rs/pull/2293/files) to
review incremental changes.
-
[**stack/kernel-catalog-managed-create**](#2293)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2293/files)]
-
[stack/ccv2-create](#2203)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2203/files/8cf518a944cd62f87f35cad6db205195b224c513..9c6826254bddecce6b2fe428aa9a1b6a1af1b6d2)]
-
[stack/ccv2-create-pt2](#2247)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2247/files/9c6826254bddecce6b2fe428aa9a1b6a1af1b6d2..6847bd06fd174c05674b34e4970671d5cf073d5a)]
-
[stack/create-table-utils-pt3](#2250)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2250/files/6847bd06fd174c05674b34e4970671d5cf073d5a..e1035044d9e0c419eaee5d4ebb1284adb8646bf0)]
-
[stack/create-table-utils-pt4](#2254)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2254/files/e1035044d9e0c419eaee5d4ebb1284adb8646bf0..ba618b1a85366699b2312ea3430d6adec8aefc4d)]

---------
## What changes are proposed in this pull request?

Enables creating catalog-managed Delta tables by adding `CatalogManaged`
to `ALLOWED_DELTA_FEATURES` in the `create_table` builder. This allows
connectors to pass `delta.feature.catalogManaged = "supported"` as a
table property during table creation. The feature is gated behind the
`catalog-managed` feature flag.

## How was this change tested?

`test_catalog_managed_feature_signal_accepted` - verifies the feature
signal is accepted, removed from properties, and added to both reader
and writer features
## 🥞 Stacked PR
Use this
[link](https://github.com/delta-io/delta-kernel-rs/pull/2203/files/8cf518a944cd62f87f35cad6db205195b224c513..9c6826254bddecce6b2fe428aa9a1b6a1af1b6d2)
to review incremental changes.
-
[stack/kernel-catalog-managed-create](#2293)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2293/files)]
-
[**stack/ccv2-create**](#2203)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2203/files/8cf518a944cd62f87f35cad6db205195b224c513..9c6826254bddecce6b2fe428aa9a1b6a1af1b6d2)]
-
[stack/ccv2-create-pt2](#2247)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2247/files/9c6826254bddecce6b2fe428aa9a1b6a1af1b6d2..6847bd06fd174c05674b34e4970671d5cf073d5a)]
-
[stack/create-table-utils-pt3](#2250)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2250/files/6847bd06fd174c05674b34e4970671d5cf073d5a..e1035044d9e0c419eaee5d4ebb1284adb8646bf0)]
-
[stack/create-table-utils-pt4](#2254)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2254/files/e1035044d9e0c419eaee5d4ebb1284adb8646bf0..ba618b1a85366699b2312ea3430d6adec8aefc4d)]

---------
## What changes are proposed in this pull request?

Adds `create_utils` module to `delta-kernel-unity-catalog` with two
public functions for building the required table properties during UC
catalog-managed table creation:
1. `get_required_properties_for_disk(uc_table_id)` -- returns properties
that must be written to disk in `000.json` (catalogManaged,
vacuumProtocolCheck, tableId, ICT)
2. `get_final_required_properties_for_uc(snapshot, engine)` -- extracts
post-commit properties to send to UC (protocol versions, feature
signals, metadata config, clustering columns, ICT timestamp
3. Also adds public getters on `Snapshot` for protocol/metadata
inspection: `min_reader_version`, `min_writer_version`,
`reader_features`, `writer_features`, `metadata_configuration`, and
`get_clustering_columns`.
Other changes:
- Adds `physical_to_logical_column_name` to `column_mapping.rs` for
converting physical clustering column names back to logical names
- Adds `CatalogManaged` to `ALLOWED_DELTA_FEATURES` in `create_table`
builder
- Extracts shared UC constants to `constants.rs` module

## How was this change tested?
1. `test_get_required_properties_for_disk` - verifies all 4 disk
properties
2. `test_get_final_required_properties_for_uc` - round-trip test:
creates table, loads snapshot, extracts UC properties, verifies protocol
versions, feature signals, metadata config, ICT timestamp, and version
3. `test_get_final_required_properties_for_uc_with_clustering` - same
with clustering columns, verifies JSON serialization
4. `test_public_protocol_getters` - tests Snapshot protocol getters
against fixture
5. `test_metadata_configuration` - tests Snapshot metadata config getter
6. `test_catalog_managed_feature_signal_accepted` - verifies
CatalogManaged in ALLOWED_DELTA_FEATURES
## 🥞 Stacked PR
Use this
[link](https://github.com/delta-io/delta-kernel-rs/pull/2247/files/9c6826254bddecce6b2fe428aa9a1b6a1af1b6d2..6847bd06fd174c05674b34e4970671d5cf073d5a)
to review incremental changes.
-
[stack/kernel-catalog-managed-create](#2293)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2293/files)]
-
[stack/ccv2-create](#2203)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2203/files/8cf518a944cd62f87f35cad6db205195b224c513..9c6826254bddecce6b2fe428aa9a1b6a1af1b6d2)]
-
[**stack/ccv2-create-pt2**](#2247)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2247/files/9c6826254bddecce6b2fe428aa9a1b6a1af1b6d2..6847bd06fd174c05674b34e4970671d5cf073d5a)]
-
[stack/create-table-utils-pt3](#2250)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2250/files/6847bd06fd174c05674b34e4970671d5cf073d5a..e1035044d9e0c419eaee5d4ebb1284adb8646bf0)]
-
[stack/create-table-utils-pt4](#2254)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2254/files/e1035044d9e0c419eaee5d4ebb1284adb8646bf0..ba618b1a85366699b2312ea3430d6adec8aefc4d)]

---------
## What changes are proposed in this pull request?

UCCommitter previously rejected version 0 commits with an unsupported
error. This adds support for table creation by writing the version 0
commit file directly to the published path, bypassing the staged commit
+ UC ratify flow (which requires the table to already exist in UC). The
connector is responsible for finalizing the table via the UC create
table API after the commit succeeds.

## How was this change tested?

1. `commit_version_0_writes_published_commit`: verifies the commit file
is written to the published path and exists on disk
2. `commit_version_0_conflict_when_file_exists`: verifies conflict
response when the version 0 file already exists
3. existing tests pass
…uator (#2160)

## 🥞 Stacked PR
Use this
[link](https://github.com/delta-io/delta-kernel-rs/pull/2160/files) to
review incremental changes.
-
[**stack/skip-name-based-val**](#2160)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2160/files)]

---------
## What changes are proposed in this pull request?

When reading `stats_parsed` from checkpoints with column mapping
enabled, struct column data has physical names but the expected output
schema has logical names. The `validate_array_type` check in
`evaluate_expression` does name-based matching via `ensure_data_types`,
which rejects the structurally correct data due to the name mismatch.

Skip name-based validation for `Column` expressions with `Struct` result
type. The downstream ordinal-based `apply_schema` transformation already
validates types and handles renaming.

## How was this change tested?
New test
…2314)

## What changes are proposed in this pull request?
Fixes #2263. This PR updates the `checkpoint` method to return a tuple
of `(CheckpointWriteResult, SnapshotRef)` instead of just `SnapshotRef`.
This way, callers can easily detect whether a checkpoint was newly
written or already existed.

Also previously, an existing checkpoint would just be silently
overwritten. But with this new logic, we will exit early instead of
rewriting the checkpoint.

### This PR affects the following public APIs
This is non-backwards compatible for the `Snapshot.checkpoint` API since
return type changes.

## How was this change tested?
Integration test inside of `kernel/tests/maintenance_ops.rs`
## 🥞 Stacked PR
Use this
[link](https://github.com/delta-io/delta-kernel-rs/pull/2270/files) to
review incremental changes.
-
[**stack/cache-output-schema**](#2270)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2270/files)]
-
[stack/sidecar-iter](#2271)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2271/files/36ae1abfe90c9c15aec5e24208c8f2454440fd7b..80094b977ef0472f671b07a58ba32e4f1e77c558)]
-
[stack/engine-update](#2287)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2287/files/80094b977ef0472f671b07a58ba32e4f1e77c558..1f66fc9dcc47fc34f98a33e1512fbeb0115c6b8f)]
-
[stack/extract-schemas](#2313)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2313/files/1f66fc9dcc47fc34f98a33e1512fbeb0115c6b8f..fecb21e445e8312c48c6a6cfd91c38b86e55d57d)]
-
[stack/support-sidecar](#2304)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2304/files/fecb21e445e8312c48c6a6cfd91c38b86e55d57d..4a23712205cec1467354acc986ee641a25ebbcb8)]

---------
## What changes are proposed in this pull request?
Cached output schema for checkpoint in CheckpointWriter, added utils to
do that. The cached output schema will be used in following PRs to
perform v2 checkpoint sidecar writes
<!--
**Uncomment** this section if there are any changes affecting public
APIs. Else, **delete** this section.
### This PR affects the following public APIs
If there are breaking changes, please ensure the `breaking-changes`
label gets added by CI, and describe why the changes are needed.
Note that _new_ public APIs are not considered breaking.
-->

## How was this change tested?
Existing + added
…t reuse (#2255)

# Summary

This PR refactors the snapshot FFI to use a builder pattern, while also
adding an optimization option that allows passing in an old snapshot.

## Changes

The existing flat functions are augmented by a builder-pattern API:

**Existing:**

``` c++
// snapshot at latest version
snapshot(table_path_slice, engine)

// snapshot at specific version
snapshot_at_version(table_path_slice, engine, version)

// with log tail (catalog-managed)
snapshot_with_log_tail(table_path_slice, engine, log_tail)
snapshot_at_version_with_log_tail(table_path_slice, engine, version, log_tail)
```

**Builder:**

``` c++
// fresh snapshot from table path
HandleMutableFfiSnapshotBuilder builder = get_snapshot_builder(table_path_slice, engine).ok;

// incremental update reusing an existing snapshot (avoids re-reading the log)
HandleMutableFfiSnapshotBuilder builder = get_snapshot_builder_from(old_snapshot, engine).ok;

// optional: pin a version
snapshot_builder_set_version(&builder, version);

// optional: set log tail (catalog-managed feature)
snapshot_builder_set_log_tail(&builder, log_tail);

// produce the snapshot
SharedSnapshot* snapshot = snapshot_builder_build(builder).ok;

// or discard without building
free_snapshot_builder(&builder);
```

For backwards compatibility, `snapshot()` and `snapshot_with_log_tail()`
functions are kept around for now.

Also: fix existing `test_snapshot_log_tail` test.

---------

Co-authored-by: Sam Ansmink <samansmink@hotmail.com>
Co-authored-by: Nick Lanham <nicklan@users.noreply.github.com>
Co-authored-by: chiinlquah <259692855+chiinlquah@users.noreply.github.com>
…ching (#2250)

## 🥞 Stacked PR
Use this
[link](https://github.com/delta-io/delta-kernel-rs/pull/2250/files) to
review incremental changes.
-
[**stack/create-table-utils-pt3**](#2250)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2250/files)]
-
[stack/create-table-utils-pt4](#2254)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2254/files/c71e28b2c5147bed800bf672f1d9920fff456882..e553d73653b343e65473fdba3833823d6cce74e1)]
-
[stack/remove-catalog-managed-flag](#2310)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2310/files/e553d73653b343e65473fdba3833823d6cce74e1..21b747eb65a5d99133cf1d6dfabf417564ff92b9)]

---------
## What changes are proposed in this pull request?

Expands `CommitMetadata` to carry protocol/metadata state and commit
change detection:
- `protocol`/`metadata`: read snapshot state
- `new_protocol`/`new_metadata`: present when the commit changes P&M
- `domain_metadata_changes`: domain metadata actions in this commit
- Public getters: `has_protocol_change()`, `has_metadata_change()`,
`has_domain_metadata_change`

Also adds committer/table type validations: a catalog committer can only
commit to catalog-managed tables, and a non-catalog committer can only
commit to non-catalog-managed tables.

## How was this change tested?

1. `disallow_catalog_committer_for_non_catalog_managed_table` --
existing non catalog managed table with catalog committer returns error
2. `disallow_catalog_committer_for_non_catalog_managed_create_table` --
create-table without catalogManaged with catalog committer returns error
3. Updated `test_commit_metadata` and `FileSystemCommitter` tests for
new `CommitMetadata::new` signature
4. Existing tests pass
## 🥞 Stacked PR
Use this
[link](https://github.com/delta-io/delta-kernel-rs/pull/2254/files) to
review incremental changes.
-
[**stack/create-table-utils-pt4**](#2254)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2254/files)]
-
[stack/remove-catalog-managed-flag](#2310)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2310/files/9adf0e37b1a76b8c401e34bd4dd3abdcce8a6dd3..bbe546c9476d0f2fa5189b35818a923819599127)]

---------
## What changes are proposed in this pull request?

Adds UCCommitter validation to enforce catalog-managed table invariants
on every commit. The committer validates required protocol features
(catalogManaged, vacuumProtocolCheck, inCommitTimestamp), metadata
properties (table ID match, ICT enabled), and rejects ALTER TABLE
operations (protocol/metadata/clustering changes) and upgrade/downgrade
attempts. This PR also introduces a `require!` macro and centralized
`errors.rs` module for validation logic.

## How was this change tested?
9 new tests added, existing tests pass.
## What changes are proposed in this pull request?

Extracts the in-memory test table setup logic from `scan.rs` into a
shared `setup_snapshot` helper in `ffi_test_utils.rs`. This removes the
duplicated `setup_test_table` function and makes the helper available to
other FFI test modules (e.g. transaction tests).

## How was this change tested?

Existing `scan_builder_tests` pass unchanged -- they now call the shared
helper instead of a local copy.
…ection (#2170)

## What changes are proposed in this pull request?
Addresses #2165

When a predicate references a column added after the checkpoint was
written, the parquet projection mask selects no leaf columns inside
stats structs like `minValues`/`maxValues`, causing the parquet reader
to omit them -- but `get_indices` still created `Nested` reorder entries
that indexed into the missing columns

To fix, we add a guard that creates a `Missing` entry instead of a
`Nested` entry if none of struct childrens match.

 Example
1. V0: create table with schema `[id: long, name: string]`,
`writeStatsAsStruct=true`
2. V1: write data, create checkpoint -- `stats_parsed` covers
`{numRecords, nullCount: {id, name}, minValues: {id, name}, maxValues:
{id, name}}`
  3. V2: schema evolves to `[id: long, name: string, age: long]`
  4. V3: write more data
  5. Scan with predicate `age > 30` -- **panics**

## How was this change tested?
## What changes are proposed in this pull request?

Adds the ability to read and write file histograms using lazy crcs. Adds
two fields to FileStatsDelta (Added Histogram and Removed Histogram)
which each track the added and removed file histograms for each
transaction. If there was a CRC with a set of histogram buckets before
this commit then those buckets are used otherwise we use the default
buckets. Consolidation of the histogram happens in the CRC Delta apply
and is simply Base Histogram + Added Histogram - Removed Histogram. If
we run into histogram combination issues such as negative histograms we
just drop the histogram from the CRC all together (it is an optional
field).

## How was this change tested?

Unit tests and integration tests
## What changes are proposed in this pull request?

In #2266 , the `Snapshot::get_timestamp` was added, providing the
capability to read the timestamp of a Snapshot. This PR added an FFI
function `snapshot_timestamp`, allowing other language to use the
`Snapshot::get_timestamp` method.

## How was this change tested?

New unit test was added.
…constants (#1717) (#1774)

## What changes are proposed in this pull request?

- Replace separate *_INDEX + string field-name constants in
ActionReconciliationVisitor with GetterColumn { index, name } constants.
- Update call sites to use .index when selecting getters[...] and .name
when calling get_*.
- Keep the column order aligned with selected_column_names_and_types().

## How was this change tested?
- cargo test -p delta_kernel
- cargo clippy -p delta_kernel --all-features --tests --benches -- -D
warnings

Fixes #1717

---------

Co-authored-by: Nick Lanham <nicklan@users.noreply.github.com>
## 🥞 Stacked PR
Use this
[link](https://github.com/delta-io/delta-kernel-rs/pull/2296/files) to
review incremental changes.
-
[**stack/ffi/create-table**](#2296)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2296/files)]
-
[stack/ffi/remove-files](#2297)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2297/files/fa37f1bcc100c8c9cabf2586c801760f828a7a7f..4d52a5ebfaedb7f0993fee391bd611209166082f)]

---------
## What changes are proposed in this pull request?

Adds FFI support for creating Delta tables via a builder pattern
(following the convention from #2255):

- `get_create_table_builder()` -- creates an `FfiCreateTableBuilder`
from a path, schema, and engine info
- `create_table_builder_with_table_property()` -- adds a single table
property (consumes and returns the builder handle)
- `create_table_builder_with_committer()` -- sets a custom committer
(e.g. Unity Catalog); defaults to `FileSystemCommitter` if not called
- `create_table_builder_build()` -- builds the transaction and returns a
standard `ExclusiveTransaction` handle, enabling the caller to
optionally stage initial data with `add_files()` before committing with
`commit()` (CTAS support)
- `free_create_table_builder()` -- frees the builder without building

The create-table flow reuses the existing transaction FFI functions
(`add_files`, `set_data_change`, `commit`) rather than bundling
build+commit into a single call. At the FFI boundary,
`Transaction<CreateTable>` is transmuted to `Transaction<ExistingTable>`
since they have identical layouts (only `PhantomData<S>` differs) and
runtime create-table behavior is driven by the snapshot version, not the
type parameter.

Also extracts `commit_result_to_version` as a shared helper used by both
`commit` and the create-table path, replacing the inline match in
`commit`.

### This PR affects the following public APIs

New FFI functions: `get_create_table_builder`,
`create_table_builder_with_table_property`,
`create_table_builder_with_committer`, `create_table_builder_build`,
`free_create_table_builder`. New FFI types:
`ExclusiveCreateTableBuilder`, `FfiCreateTableBuilder`.

## How was this change tested?

Unit tests covering:
- Basic table creation with schema verification (build + commit
two-step)
- Table creation with custom properties
- Table creation with a custom committer handle
- Error when creating a table that already exists
- Error on empty schema
- Builder free without commit (no leak/panic)
## What changes are proposed in this pull request?

### This PR affects the following public APIs

Removes the backwards-compatible snapshot FFI convenience functions that
were kept around
in #2255 when the builder API was introduced:

- `snapshot(path, engine)`
- `snapshot_at_version(path, engine, version)`
- `snapshot_with_log_tail(path, engine, log_tail)` (catalog-managed)
- `snapshot_at_version_with_log_tail(path, engine, version, log_tail)`
(catalog-managed)

Callers should use the builder API instead:
- `get_snapshot_builder` / `get_snapshot_builder_from`
- `snapshot_builder_set_version`
- `snapshot_builder_set_log_tail`
- `snapshot_builder_build`

All internal test call sites are updated to use the builder API. The C
example comment
referencing the old functions is also updated.

## How was this change tested?

All existing FFI tests updated and passing:
`cargo test -p delta_kernel_ffi --all-features` (51 passed)
`cargo clippy -p delta_kernel_ffi --all-features --tests -- -D warnings`
(clean)
## 🥞 Stacked PR
Use this
[link](https://github.com/delta-io/delta-kernel-rs/pull/2310/files) to
review incremental changes.
-
[**stack/remove-catalog-managed-flag**](#2310)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2310/files)]

---------
## What changes are proposed in this pull request?

Removes the `catalog-managed` Cargo feature flag, making catalog-managed
table support always compiled in. The feature was already enabled by all
consumers (UC crate, FFI, kernel dev-deps) and runtime behavior is gated
by whether the table has `catalogManaged` in its protocol. This is a
breaking change because the feature flag no longer exists in
`kernel/Cargo.toml` or `ffi/Cargo.toml`, so any downstream `Cargo.toml`
referencing it will fail to compile.

`test_feature_signal_accepted` now includes `CatalogManaged` as a case
(merged from the standalone
`test_feature_signal_accepted_catalog_managed`)

## How was this change tested?

1. All previously `#[cfg(feature = "catalog-managed")]`gated tests now
run unconditionally and pass
2. Full workspace `cargo nextest run --all-features` passes
## What changes are proposed in this pull request?

Adds FFI functions for removing files from a Delta table transaction:

- `remove_files` -- removes files using raw engine data and a selection
vector. The selection vector indicates which rows in the data represent
files to remove (true = selected). A null/empty selection vector selects
all rows.

These functions enable C/C++ consumers to implement DELETE/MERGE
operations by scanning a table, identifying files to rewrite, and
removing the originals.

### This PR affects the following public APIs

New FFI functions: `remove_files`

## How was this change tested?

Unit tests covering:
- `remove_files` with selection vector: same end-to-end flow using raw
engine data
- Null selection vector creates empty `FilteredEngineData` (all rows
selected)
## 🥞 Stacked PR
Use this
[link](https://github.com/delta-io/delta-kernel-rs/pull/2324/files) to
review incremental changes.
-
[**stack/refactor-helpers**](#2324)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2324/files)]
-
[stack/row-group-1](#1893)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1893/files/89b4729dc0e45dbd22dd5b80dfbcdecad9aa4610..1d1a8d46072ead0fb2abc3a0c54e9311632e84c2)]
-
[stack/row-group-2](#1931)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1931/files/1d1a8d46072ead0fb2abc3a0c54e9311632e84c2..6c11c9f3d6b6a38ace81d38acef096b3af5ba52e)]

---------
## What changes are proposed in this pull request?

Extracts stat extraction helpers from `RowGroupFilter` methods into
shared free functions, preparing for reuse by `CheckpointRowGroupFilter`


## How was this change tested?

Existing tests pass unchanged (test impls updated to wrap return values
in `Some()`).
## What changes are proposed in this pull request?

Adds `SharedProtocol` and `SharedMetadata` as first-class opaque FFI
handle types, giving
FFI callers structured access to snapshot protocol and metadata fields.

Previously there was no way for FFI callers to access protocol versions,
table features, or
core metadata fields (id, name, description, format provider, created
time). This PR adds
the full surface via two new handle types and visitor-based accessors.

**New FFI functions:**

- `snapshot_get_protocol` / `free_protocol` -- acquire and release a
protocol handle
- `visit_protocol(handle, ctx, visit_versions, visit_feature)` --
single-call visitor that
reports `(minReaderVersion, minWriterVersion)` and iterates
reader/writer feature names
  (skipped for legacy protocols)
- `snapshot_get_metadata` / `free_metadata` -- acquire and release a
metadata handle
- `visit_metadata(handle, allocate_fn, ctx, visitor)` -- single-call
visitor that reports
id, name, description, format provider, and created time (strings
allocated via caller's
  `allocate_fn`)

**Kernel change:**

Adds `Metadata::format_provider() -> &str` (gated behind
`#[internal_api]`) since
`Metadata.format` is a private field with no existing accessor.

## How was this change tested?

4 new `#[tokio::test]` cases in `ffi/src/lib.rs`:
- `test_visit_protocol_legacy` -- legacy protocol (minRV=1, minWV=2):
verifies versions,
  confirms feature visitor is never called
- `test_visit_protocol_with_features` -- modern protocol (minRV=3,
minWV=7) with
columnMapping + rowTracking: verifies both reader and writer feature
callbacks
- `test_visit_metadata_default` -- default metadata: id, null
name/description, format
  provider, createdTime
- `test_visit_metadata_with_name` -- metadata with name set: id, name,
createdTime
## What changes are proposed in this pull request?

### This PR affects the following public APIs

Two new FFI functions are added (not breaking -- purely additive):

- `with_domain_metadata(txn, domain, configuration, engine)` -- add a
`domainMetadata` action to a transaction
- `with_domain_metadata_removed(txn, domain, engine)` -- add a
`domainMetadata` removal tombstone to a transaction

Both follow the existing consume-and-return handle pattern used by
`with_engine_info`.

## How was this change tested?

Three new FFI tests:
- `test_domain_metadata_add_and_remove` -- happy path: adds domain
metadata, commits, verifies JSON in commit log; then removes it in a
second transaction and verifies the tombstone
- `test_domain_metadata_system_domain_rejected_at_commit` -- error path:
system `delta.*` domain is rejected at commit
- `test_domain_metadata_duplicate_domain_rejected_at_commit` -- error
path: duplicate domain in a single transaction is rejected at commit
@lizardoluis lizardoluis marked this pull request as ready for review April 8, 2026 13:34
@lizardoluis lizardoluis closed this Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.