You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Record sort order when writing Parquet with WITH ORDER (#19595)
## Which issue does this PR close?
Part of #19433
## Rationale for this change
When writing data to a table created with `CREATE EXTERNAL TABLE ...
WITH ORDER`, the sorting columns should be recorded in the Parquet
file's row group metadata. This allows downstream readers to know the
data is sorted and potentially skip sorting operations.
## What changes are included in this PR?
- Add `sort_expr_to_sorting_column()` and
`lex_ordering_to_sorting_columns()` functions in `metadata.rs` to
convert DataFusion ordering to Parquet `SortingColumn`
- Add `sorting_columns` field to `ParquetSink` with
`with_sorting_columns()` builder method
- Update `create_writer_physical_plan()` to pass order requirements to
`ParquetSink`
- Update `create_writer_props()` to set sorting columns on
`WriterProperties`
- Add test verifying `sorting_columns` metadata is written correctly
## Are these changes tested?
Yes, added `test_create_table_with_order_writes_sorting_columns` that:
1. Creates an external table with `WITH ORDER (a ASC NULLS FIRST, b DESC
NULLS LAST)`
2. Inserts data
3. Reads the Parquet file and verifies the `sorting_columns` metadata
matches the expected order
## Are there any user-facing changes?
No user-facing API changes. Parquet files written via `INSERT INTO` or
`COPY` for tables with `WITH ORDER` will now contain `sorting_columns`
metadata in the row group.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
0 commit comments