Skip to content

Commit 6412c3a

Browse files
authored
Add end-to-end Parquet tests for List and LargeList struct schema evolution (#20840)
## Which issue does this PR close? * Part of #20835 ## Rationale for this change While the core fixes for nested struct schema evolution have landed in #20907, existing coverage is primarily at the unit/helper level. This PR adds end-to-end Parquet-based integration tests to validate that List<Struct> and LargeList<Struct> schema evolution behaves correctly through the full execution pipeline (planning, scanning, and projection). This ensures that real-world query paths such as `SELECT *` and nested field projection behave consistently and that previous repro cases are no longer failing. ## What changes are included in this PR? ### 1. End-to-end Rust integration tests Added comprehensive tests in: * `datafusion/core/tests/parquet/expr_adapter.rs` These tests: * Generate old/new Parquet files with differing nested struct schemas * Cover both `List<Struct<...>>` and `LargeList<Struct<...>>` * Validate: * `SELECT *` correctness * Nested field projection via `get_field(...)` * NULL backfilling for missing nullable fields * Ignoring extra source-only fields ### 2. Error-path coverage Added failure tests for both `List` and `LargeList`: * Non-nullable missing field → error * Incompatible nested field type → error Ensures parity across both list encodings and prevents partial regressions. ### 3. Test utilities and refactoring Introduced reusable helpers to simplify nested test setup: * `NestedListKind` abstraction for List vs LargeList * `NestedMessageRow` test fixture struct * Batch builders and schema helpers * Macro `test_struct_schema_evolution_pair!` to generate paired tests These reduce duplication and make it easier to extend the test matrix. ### 4. End-user API coverage via `.slt` Added: * `datafusion/sqllogictest/test_files/schema_evolution_nested.slt` This validates behavior through SQL-only workflows: * Uses `COPY ... TO PARQUET` to generate test files * Uses `CREATE EXTERNAL TABLE` to query them Covers: * Mixed-schema reads * Nested projection queries * Both `List` and `LargeList` --- ## Are these changes tested? Yes. This PR adds both: 1. **Rust integration tests** * End-to-end Parquet scan behavior * Success and failure scenarios * Covers both `List` and `LargeList` 2. **sqllogictest (`.slt`) tests** * Validates behavior through end-user SQL interface * Uses generated Parquet fixtures (no checked-in binaries) All tests pass locally, including: * `test_list_struct_schema_evolution_end_to_end` * `test_large_list_struct_schema_evolution_end_to_end` * Error-path variants for both list encodings ## Are there any user-facing changes? No direct user-facing changes. This PR improves correctness guarantees and test coverage for nested schema evolution, ensuring more predictable behavior for users working with evolving Parquet schemas. ## LLM-generated code disclosure This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.
1 parent 9c1e7ab commit 6412c3a

File tree

2 files changed

+646
-2
lines changed

2 files changed

+646
-2
lines changed

0 commit comments

Comments
 (0)