Skip to content

Commit cdddd76

Browse files
fix: preserve subquery structure when unparsing SubqueryAlias over Ag… (#21099)
When the SQL unparser encountered a SubqueryAlias node whose direct child was an Aggregate (or other clause-building plan like Window, Sort, Limit, Union), it would flatten the subquery into a simple table alias, losing the aggregate entirely. For example, a plan representing: SELECT j1.col FROM j1 JOIN (SELECT max(id) AS m FROM j2) AS b ON j1.id = b.m would unparse to: SELECT j1.col FROM j1 INNER JOIN j2 AS b ON j1.id = b.m dropping the MAX aggregate and the subquery. Root cause: the SubqueryAlias handler in select_to_sql_recursively would call subquery_alias_inner_query_and_columns (which only unwraps Projection children) and unparse_table_scan_pushdown (which only handles TableScan/SubqueryAlias/Projection). When both returned nothing useful for an Aggregate child, the code recursed directly into the Aggregate, merging its GROUP BY into the outer SELECT instead of wrapping it in a derived subquery. The fix adds an early check: if the SubqueryAlias's direct child is a plan type that builds its own SELECT clauses (Aggregate, Window, Sort, Limit, Union), emit it as a derived subquery via self.derive() with the alias always attached, rather than falling through to the recursive path that would flatten it. ## Which issue does this PR close? - Closes #21098 ## Rationale for this change The SQL unparser silently drops subquery structure when a SubqueryAlias node directly wraps an Aggregate (or Window, Sort, Limit, Union). For example, a plan representing ```sql SELECT ... FROM j1 JOIN (SELECT max(id) FROM j2) AS b ... ``` unparses to ```sql SELECT ... FROM j1 JOIN j2 AS b ... ``` losing the aggregate entirely. This produces semantically incorrect SQL. ## What changes are included in this PR? In the SubqueryAlias handler within select_to_sql_recursively (`datafusion/sql/src/unparser/plan.rs`): - Added an early check: if the SubqueryAlias's direct child is a plan type that builds its own SELECT clauses (Aggregate, Window, Sort, Limit, Union) and cannot be reduced to a table scan, emit it as a derived subquery (SELECT ...) AS alias via self.derive() instead of recursing into the child and flattening it. - Added a helper requires_derived_subquery() that identifies plan types requiring this treatment. ## Are these changes tested? Yes. A new test test_unparse_manual_join_with_subquery_aggregate is added that constructs a SubqueryAlias > Aggregate plan (without an intermediate Projection) and asserts the unparsed SQL preserves the MAX() aggregate function call. This test fails without the fix. All current unparser tests succeed without modification ## Are there any user-facing changes? No. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 603bfb4 commit cdddd76

File tree

2 files changed

+130
-1
lines changed

2 files changed

+130
-1
lines changed

datafusion/sql/src/unparser/plan.rs

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -828,6 +828,50 @@ impl Unparser<'_> {
828828
Some(plan_alias.alias.clone()),
829829
select.already_projected(),
830830
)?;
831+
832+
// If the (possibly rewritten) inner plan builds its own
833+
// SELECT clauses (e.g. Aggregate adds GROUP BY, Window adds
834+
// OVER, etc.) and unparse_table_scan_pushdown couldn't reduce it,
835+
// we must emit a derived subquery: (SELECT ...) AS alias.
836+
// Without this, the recursive handler would merge those clauses
837+
// into the outer SELECT, losing the subquery structure entirely.
838+
if unparsed_table_scan.is_none() && Self::requires_derived_subquery(plan)
839+
{
840+
// When the dialect does not support column aliases in
841+
// table aliases (e.g. SQLite), inject the aliases into
842+
// the inner projection before wrapping as a derived
843+
// subquery.
844+
if !columns.is_empty()
845+
&& !self.dialect.supports_column_alias_in_table_alias()
846+
{
847+
let Ok(rewritten_plan) =
848+
inject_column_aliases_into_subquery(plan.clone(), columns)
849+
else {
850+
return internal_err!(
851+
"Failed to transform SubqueryAlias plan"
852+
);
853+
};
854+
return self.derive(
855+
&rewritten_plan,
856+
relation,
857+
Some(self.new_table_alias(
858+
plan_alias.alias.table().to_string(),
859+
vec![],
860+
)),
861+
false,
862+
);
863+
}
864+
return self.derive(
865+
plan,
866+
relation,
867+
Some(self.new_table_alias(
868+
plan_alias.alias.table().to_string(),
869+
columns,
870+
)),
871+
false,
872+
);
873+
}
874+
831875
// if the child plan is a TableScan with pushdown operations, we don't need to
832876
// create an additional subquery for it
833877
if !select.already_projected() && unparsed_table_scan.is_none() {
@@ -1060,6 +1104,22 @@ impl Unparser<'_> {
10601104
scan.projection.is_some() || !scan.filters.is_empty() || scan.fetch.is_some()
10611105
}
10621106

1107+
/// Returns true if a plan, when used as the direct child of a SubqueryAlias,
1108+
/// must be emitted as a derived subquery `(SELECT ...) AS alias`.
1109+
///
1110+
/// Plans like Aggregate or Window build their own SELECT clauses (GROUP BY,
1111+
/// window functions).
1112+
fn requires_derived_subquery(plan: &LogicalPlan) -> bool {
1113+
matches!(
1114+
plan,
1115+
LogicalPlan::Aggregate(_)
1116+
| LogicalPlan::Window(_)
1117+
| LogicalPlan::Sort(_)
1118+
| LogicalPlan::Limit(_)
1119+
| LogicalPlan::Union(_)
1120+
)
1121+
}
1122+
10631123
/// Try to unparse a table scan with pushdown operations into a new subquery plan.
10641124
/// If the table scan is without any pushdown operations, return None.
10651125
fn unparse_table_scan_pushdown(

datafusion/sql/tests/cases/plan_to_sql.rs

Lines changed: 70 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ use datafusion_common::{
2323
};
2424
use datafusion_expr::expr::{WindowFunction, WindowFunctionParams};
2525
use datafusion_expr::test::function_stub::{
26-
count_udaf, max_udaf, min_udaf, sum, sum_udaf,
26+
count_udaf, max, max_udaf, min_udaf, sum, sum_udaf,
2727
};
2828
use datafusion_expr::{
2929
EmptyRelation, Expr, Extension, LogicalPlan, LogicalPlanBuilder, Union,
@@ -2904,3 +2904,72 @@ fn test_json_access_3() {
29042904
@r#"SELECT (j1.j1_string : 'field.inner1[''inner2'']') FROM j1"#
29052905
);
29062906
}
2907+
2908+
/// Roundtrip test for a subquery aggregate with column aliases.
2909+
/// Ensures that `subquery_alias_inner_query_and_columns` unwrapping
2910+
/// a Projection -> Aggregate still triggers the derived-subquery path.
2911+
#[test]
2912+
fn roundtrip_subquery_aggregate_with_column_alias() -> Result<(), DataFusionError> {
2913+
roundtrip_statement_with_dialect_helper!(
2914+
sql: "SELECT id FROM (SELECT max(j1_id) FROM j1) AS c(id)",
2915+
parser_dialect: GenericDialect {},
2916+
unparser_dialect: UnparserDefaultDialect {},
2917+
expected: @"SELECT c.id FROM (SELECT max(j1.j1_id) FROM j1) AS c (id)",
2918+
);
2919+
Ok(())
2920+
}
2921+
2922+
/// Test that unparsing a manually constructed join with a subquery aggregate
2923+
/// preserves the MAX aggregate function.
2924+
///
2925+
/// Builds the equivalent of:
2926+
/// SELECT j1.j1_string FROM j1
2927+
/// JOIN (SELECT max(j2_id) AS max_id FROM j2) AS b
2928+
/// ON j1.j1_id = b.max_id
2929+
#[test]
2930+
fn test_unparse_manual_join_with_subquery_aggregate() -> Result<()> {
2931+
let context = MockContextProvider {
2932+
state: MockSessionState::default(),
2933+
};
2934+
let j1_schema = context
2935+
.get_table_source(TableReference::bare("j1"))?
2936+
.schema();
2937+
let j2_schema = context
2938+
.get_table_source(TableReference::bare("j2"))?
2939+
.schema();
2940+
2941+
// Build the right side: SELECT max(j2_id) AS max_id FROM j2
2942+
let right_scan = table_scan(Some("j2"), &j2_schema, None)?.build()?;
2943+
let right_agg = LogicalPlanBuilder::from(right_scan)
2944+
.aggregate(
2945+
vec![] as Vec<Expr>,
2946+
vec![max(col("j2.j2_id")).alias("max_id")],
2947+
)?
2948+
.build()?;
2949+
let right_subquery = subquery_alias(right_agg, "b")?;
2950+
2951+
// Build the full plan: SELECT j1.j1_string FROM j1 JOIN (...) AS b ON j1.j1_id = b.max_id
2952+
let left_scan = table_scan(Some("j1"), &j1_schema, None)?.build()?;
2953+
let plan = LogicalPlanBuilder::from(left_scan)
2954+
.join(
2955+
right_subquery,
2956+
datafusion_expr::JoinType::Inner,
2957+
(
2958+
vec![Column::from_qualified_name("j1.j1_id")],
2959+
vec![Column::from_qualified_name("b.max_id")],
2960+
),
2961+
None,
2962+
)?
2963+
.project(vec![col("j1.j1_string")])?
2964+
.build()?;
2965+
2966+
let unparser = Unparser::default();
2967+
let sql = unparser.plan_to_sql(&plan)?.to_string();
2968+
let sql_upper = sql.to_uppercase();
2969+
assert!(
2970+
sql_upper.contains("MAX("),
2971+
"Unparsed SQL should preserve the MAX aggregate function call, got: {sql}"
2972+
);
2973+
2974+
Ok(())
2975+
}

0 commit comments

Comments
 (0)