Commit f6c0a41
authored
feat: write partitionValues_parsed in checkpoints (delta-io#1932)
## 🥞 Stacked PR
Use this
[link](https://github.com/delta-io/delta-kernel-rs/pull/1932/files) to
review incremental changes.
-
[**stack/write-parsed-partition-2**](delta-io#1932)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1932/files)]
---------
## What changes are proposed in this pull request?
When `delta.checkpoint.writeStatsAsStruct=true` and the table is
partitioned, checkpoint writes now populate the `partitionValues_parsed`
field on Add actions.
The transform uses a COALESCE expression:
```
partitionValues_parsed = COALESCE(
partitionValues_parsed,
MAP_TO_STRUCT(partitionValues)
)
```
- For rows from **commits** (no existing `partitionValues_parsed`):
`MAP_TO_STRUCT` converts the string-valued map into a native typed
struct, using the output schema to determine field names and target
types.
- For rows from **old checkpoints**: preserves the existing
`partitionValues_parsed` via COALESCE.
- For **non-partitioned tables**: `partitionValues_parsed` is not added
to the schema at all.
- When `writeStatsAsStruct=false`: `partitionValues_parsed` is dropped
Also renames `stats_transform.rs` → `checkpoint_transform.rs` since it
now handles both stats and partition value transforms.
## How was this change tested?
- Unit tests cover all combinations of `writeStatsAsStruct` ×
partitioned/non-partitioned for both the transform expression and output
schema.
- Integration tests covering all 16 combinations of `(json1, struct1) →
(json2, struct2)` config changes for both non-partitioned and
partitioned tables. Each test writes real parquet data through the
transaction API, creates a checkpoint, changes the stats config, creates
a second checkpoint (exercising COALESCE across checkpoint + commit
sources), and reads all data back to verify correctness.1 parent e4d92f7 commit f6c0a41
6 files changed
Lines changed: 1000 additions & 60 deletions
File tree
- kernel
- src
- checkpoint
- engine/arrow_expression
- tests
0 commit comments