Commit 6412c3a
authored
Add end-to-end Parquet tests for List and LargeList struct schema evolution (#20840)
## Which issue does this PR close?
* Part of #20835
## Rationale for this change
While the core fixes for nested struct schema evolution have landed in
#20907, existing coverage is primarily at the unit/helper level. This PR
adds end-to-end Parquet-based integration tests to validate that
List<Struct> and LargeList<Struct> schema evolution behaves correctly
through the full execution pipeline (planning, scanning, and
projection).
This ensures that real-world query paths such as `SELECT *` and nested
field projection behave consistently and that previous repro cases are
no longer failing.
## What changes are included in this PR?
### 1. End-to-end Rust integration tests
Added comprehensive tests in:
* `datafusion/core/tests/parquet/expr_adapter.rs`
These tests:
* Generate old/new Parquet files with differing nested struct schemas
* Cover both `List<Struct<...>>` and `LargeList<Struct<...>>`
* Validate:
* `SELECT *` correctness
* Nested field projection via `get_field(...)`
* NULL backfilling for missing nullable fields
* Ignoring extra source-only fields
### 2. Error-path coverage
Added failure tests for both `List` and `LargeList`:
* Non-nullable missing field → error
* Incompatible nested field type → error
Ensures parity across both list encodings and prevents partial
regressions.
### 3. Test utilities and refactoring
Introduced reusable helpers to simplify nested test setup:
* `NestedListKind` abstraction for List vs LargeList
* `NestedMessageRow` test fixture struct
* Batch builders and schema helpers
* Macro `test_struct_schema_evolution_pair!` to generate paired tests
These reduce duplication and make it easier to extend the test matrix.
### 4. End-user API coverage via `.slt`
Added:
* `datafusion/sqllogictest/test_files/schema_evolution_nested.slt`
This validates behavior through SQL-only workflows:
* Uses `COPY ... TO PARQUET` to generate test files
* Uses `CREATE EXTERNAL TABLE` to query them
Covers:
* Mixed-schema reads
* Nested projection queries
* Both `List` and `LargeList`
---
## Are these changes tested?
Yes.
This PR adds both:
1. **Rust integration tests**
* End-to-end Parquet scan behavior
* Success and failure scenarios
* Covers both `List` and `LargeList`
2. **sqllogictest (`.slt`) tests**
* Validates behavior through end-user SQL interface
* Uses generated Parquet fixtures (no checked-in binaries)
All tests pass locally, including:
* `test_list_struct_schema_evolution_end_to_end`
* `test_large_list_struct_schema_evolution_end_to_end`
* Error-path variants for both list encodings
## Are there any user-facing changes?
No direct user-facing changes.
This PR improves correctness guarantees and test coverage for nested
schema evolution, ensuring more predictable behavior for users working
with evolving Parquet schemas.
## LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated
content has been manually reviewed and tested.1 parent 9c1e7ab commit 6412c3a
File tree
2 files changed
+646
-2
lines changed- datafusion
- core/tests/parquet
- sqllogictest/test_files
2 files changed
+646
-2
lines changed
0 commit comments