Summary
#21912 introduced reusable execution-layer schema alignment helpers and updated recursive CTE execution to avoid ad hoc schema rebinding.
However, it also rewrote an existing SLT query in order for the test to pass:
diff --git a/datafusion/sqllogictest/test_files/cte.slt b/datafusion/sqllogictest/test_files/cte.slt
index e9c1c0245..d13e0d4f0 100644
--- a/datafusion/sqllogictest/test_files/cte.slt
+++ b/datafusion/sqllogictest/test_files/cte.slt
@@ -699,7 +699,7 @@ WITH RECURSIVE region_sales AS (
SELECT
s.salesperson_id AS salesperson_id,
SUM(s.sale_amount) AS amount,
- 0 as level
+ SUM(0) as level
FROM
sales s
GROUP BY
The rewrite changes a naturally non-null expression:
into a nullable aggregate expression:
This masks a recursive CTE nullability issue rather than fixing it.
Recursive CTE handling should not require rewriting valid SQL solely to satisfy schema alignment requirements.
Problem
Recursive self-reference/work-table columns need to be treated conservatively as nullable during recursive-term planning and optimization.
Without this, optimizer rules may incorrectly rely on anchor-only non-nullability assumptions.
For example, this query should execute correctly:
WITH RECURSIVE t(a, b) AS (
SELECT 0 AS a, 0 AS b
UNION ALL
SELECT b AS a, CAST(NULL AS INT) AS b FROM t WHERE a IS NOT NULL
)
SELECT * FROM t;
If recursive self-reference columns are incorrectly planned as non-nullable, optimizer rules may incorrectly eliminate:
leading to unsound recursive behavior.
Expected Behavior
Recursive CTE handling should:
- allow valid SQL such as
0 AS level without requiring rewrites like SUM(0);
- plan recursive self-reference/work-table columns conservatively as nullable;
- preserve the declared/logical recursive CTE schema;
- derive final recursive CTE output nullability correctly from both static and recursive branches;
- avoid unnecessary schema-alignment casts and projections.
Proposed Behavior
Recursive self-reference/work-table schemas should be nullable before recursive-term planning:
let work_table_schema = nullable_schema(static_plan.schema().inner());
After recursive-term planning, recursive CTE output nullability should be derived from both branches, for example:
static_field.is_nullable() || recursive_field.is_nullable()
rather than forcing all recursive CTE output fields to become nullable.
This preserves correctness while avoiding unnecessary logical/physical plan churn and allowing existing valid SQL queries to continue working unchanged.
Summary
#21912 introduced reusable execution-layer schema alignment helpers and updated recursive CTE execution to avoid ad hoc schema rebinding.
However, it also rewrote an existing SLT query in order for the test to pass:
The rewrite changes a naturally non-null expression:
into a nullable aggregate expression:
This masks a recursive CTE nullability issue rather than fixing it.
Recursive CTE handling should not require rewriting valid SQL solely to satisfy schema alignment requirements.
Problem
Recursive self-reference/work-table columns need to be treated conservatively as nullable during recursive-term planning and optimization.
Without this, optimizer rules may incorrectly rely on anchor-only non-nullability assumptions.
For example, this query should execute correctly:
If recursive self-reference columns are incorrectly planned as non-nullable, optimizer rules may incorrectly eliminate:
a IS NOT NULLleading to unsound recursive behavior.
Expected Behavior
Recursive CTE handling should:
0 AS levelwithout requiring rewrites likeSUM(0);Proposed Behavior
Recursive self-reference/work-table schemas should be nullable before recursive-term planning:
After recursive-term planning, recursive CTE output nullability should be derived from both branches, for example:
rather than forcing all recursive CTE output fields to become nullable.
This preserves correctness while avoiding unnecessary logical/physical plan churn and allowing existing valid SQL queries to continue working unchanged.