Skip to content

Commit c8b784a

Browse files
adriangbclaude
andauthored
refactor(parquet-datasource): split sink and schema_coercion out of file_format.rs (#22347)
## Which issue does this PR close? Relates to the discussion in #22024 about the Parquet datasource crate becoming hard to navigate. Split out of #22156, which bundled several code-motion moves into one PR — this is one of three smaller, independently-reviewable PRs that replace it. ## Rationale for this change `file_format.rs` had grown to ~2,000 LOC, bundling several distinct responsibilities into one file. That makes it hard to read and hard to review changes in isolation. This PR is **pure code motion**: no behavior change and no public API change. ## What changes are included in this PR? Extracts two responsibilities from `file_format.rs` into focused modules (`file_format.rs` drops to ~660 LOC): - `sink.rs` — `ParquetSink` and the parallel-write machinery (`column_serializer_task`, `spawn_column_parallel_row_group_writer`, `output_single_parquet_file_parallelized`, `concatenate_parallel_row_groups`, etc.). - `schema_coercion.rs` — the Arrow-schema coercion utilities (`apply_file_schema_type_coercions`, `coerce_int96_to_resolution`, `coerce_file_schema_to_view_type`, `coerce_file_schema_to_string_type`, `transform_schema_to_view`, `transform_binary_to_string`, `field_with_new_type`) and their tests. Every previously-public item is still reachable at the same path: the crate root re-exports `sink::ParquetSink` and the `schema_coercion::*` functions, and the historical `file_format::ParquetSink` path is preserved via `pub use` (datafusion-proto depends on it). ## Are these changes tested? Yes, covered by existing tests (the `coerce_int96_to_resolution_*` tests moved with the function to `schema_coercion.rs`). `cargo test -p datafusion-datasource-parquet --all-features` (122 passing) and `cargo clippy -p datafusion-datasource-parquet --all-targets --all-features -- -D warnings` both pass. `datafusion-proto` (a downstream `ParquetSink` consumer) builds clean. ## Are there any user-facing changes? No. Public API is unchanged — every previously-public item is still reachable at the same crate-root path. The only difference is the file organization inside the crate. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent b4739e5 commit c8b784a

4 files changed

Lines changed: 1473 additions & 1392 deletions

File tree

0 commit comments

Comments
 (0)