You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add reusable plan-time schema alignment helper and apply to RecursiveQueryExec (#21912)
## Which issue does this PR close?
* Closes#21910.
---
## Rationale for this change
Physical plans in DataFusion can expose schemas that differ from their
declared output schema, particularly when combining independently
planned branches such as in recursive CTEs. This mismatch can lead to
inconsistencies observed by downstream operators or consumers that rely
on field names or schema metadata.
Previously, schema alignment for recursive queries was handled at the
`RecordBatch` level during execution, which can obscure contract
violations in the physical plan and make behavior harder to audit.
This change introduces a reusable execution-layer helper to align
schemas at plan construction time, ensuring that child plans conform to
the expected schema before execution.
---
## What changes are included in this PR?
* Introduce `project_plan_to_schema` in
`datafusion/physical-plan/src/common.rs`:
* Returns the input plan unchanged when schemas match.
* Applies a `ProjectionExec` to align field names when schemas are
positionally compatible.
* Validates column count, data types, nullability, and metadata before
applying projection.
* Produces clear errors when alignment is not possible.
* Update `RecursiveQueryExec`:
* Apply `project_plan_to_schema` to the recursive term during
construction.
* Remove batch-level schema rebinding logic from `RecursiveQueryStream`.
* Adjust tests and expected plans to reflect consistent field naming:
* Updated recursive CTE tests and explain output expectations.
---
## Are these changes tested?
Yes. The following tests are included:
* In `common.rs`:
* `project_plan_to_schema_returns_input_when_schema_matches`
* `project_plan_to_schema_aliases_field_names_with_projection_exec`
* `project_plan_to_schema_preserves_matching_metadata_while_renaming`
* `project_plan_to_schema_errors_on_column_count_mismatch`
* `project_plan_to_schema_errors_on_type_mismatch`
* `project_plan_to_schema_errors_on_nullability_mismatch`
* `project_plan_to_schema_errors_on_field_metadata_mismatch`
* `project_plan_to_schema_errors_on_schema_metadata_mismatch`
* In `recursive_query.rs`:
* `recursive_query_exec_projects_recursive_term_to_reconciled_schema`
* `recursive_query_exec_rejects_nullability_mismatch`
* Updates to existing sqllogictest cases in `cte.slt` and explain plan
expectations.
---
## Are there any user-facing changes?
No direct user-facing API changes are introduced.
However, physical plans for recursive queries now consistently expose
the declared schema at plan time, which may result in more consistent
field names in explain plans and downstream consumers.
---
## LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated
content has been manually reviewed and tested.
---------
Co-authored-by: Copilot <copilot@github.com>
0 commit comments