Skip to content

Commit d4e629f

Browse files
authored
fix: Preserve quoted mixed-case identifiers in the pivot_unpivot example (#21432)
## Summary This PR fixes a bug in the `datafusion-examples/examples/relation_planner/pivot_unpivot.rs` example implementation. The example planner rewrites `PIVOT` into a `GROUP BY` plus `CASE` expressions. During that rewrite, it rebuilt column references using unquoted `col("...")` expressions. That loses the original identifier quoting and case-sensitivity, which breaks queries that pivot on quoted mixed-case columns such as `"pointNumber"`. This PR preserves the original parsed column expression instead of reconstructing it from a plain string. ## Problem Consider a query like: ```sql SELECT * FROM point_stats PIVOT ( MAX(max_value) FOR "pointNumber" IN ('16951' AS p16951, '16952' AS p16952) ) ORDER BY ts ``` Before this change, the example planner: 1. Parsed `"pointNumber"` correctly as a quoted, case-sensitive identifier. 2. Extracted its name as `pointNumber`. 3. Reconstructed new expressions with `col(&pivot_col_name)` and `col(...)` for `GROUP BY`. That reconstruction treated the identifier as an unquoted column reference, which could be normalized differently from the original schema field. In practice, this means the planner could end up looking for `pointnumber` while the schema still contained `"pointNumber"`. ## Root Cause Two places in the example rewrite logic rebuilt column references from bare strings: 1. The generated `CASE` expression for the pivot column 2. The inferred `GROUP BY` expressions That is fine for simple lowercase identifiers, but it is not correct for quoted identifiers or qualified fields because the reconstructed expression no longer carries the original identifier semantics. ## Fix The fix is minimal: 1. Reuse the already planned `pivot_col` expression when building the `CASE` expression. 2. Build `GROUP BY` expressions directly from the input schema via `Expr::from(...)` rather than re-creating them with `col(field_name)`. This preserves: - quoted mixed-case identifiers - qualifiers - original field resolution semantics ## Code Changes File changed: - `datafusion-examples/examples/relation_planner/pivot_unpivot.rs` Main changes: - Replace: ```rust case(col(&pivot_col_name)) ``` with: ```rust case(pivot_col.clone()) ``` - Replace string-based `GROUP BY` reconstruction: ```rust schema .fields() .iter() .map(|f| f.name().as_str()) ... .map(col) ``` with schema-derived expressions: ```rust schema .iter() .filter(...) .map(Expr::from) ``` ## Example Coverage This PR also adds an additional example scenario to the same file: - a `point_stats` input table - a `PIVOT` query using quoted mixed-case column `"pointNumber"` - a snapshot asserting the expected output This makes the bug and the fix visible directly in the example itself.
1 parent 374806c commit d4e629f

File tree

1 file changed

+53
-5
lines changed

1 file changed

+53
-5
lines changed

datafusion-examples/examples/relation_planner/pivot_unpivot.rs

Lines changed: 53 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -217,6 +217,25 @@ async fn run_examples(ctx: &SessionContext) -> Result<()> {
217217
+--------+
218218
");
219219

220+
// Example 7: PIVOT on a quoted mixed-case column
221+
// Reuses the parsed column expression so quoted identifiers keep their case.
222+
let results = run_example(
223+
ctx,
224+
"Example 7: PIVOT with quoted mixed-case column",
225+
r#"SELECT * FROM point_stats
226+
PIVOT (MAX(max_value) FOR "pointNumber" IN ('16951' AS p16951, '16952' AS p16952)) AS p
227+
ORDER BY ts"#,
228+
)
229+
.await?;
230+
assert_snapshot!(results, @r"
231+
+----------------------+------+--------+--------+
232+
| ts | port | p16951 | p16952 |
233+
+----------------------+------+--------+--------+
234+
| 2024-09-01T10:00:00Z | 2411 | 10 | 20 |
235+
| 2024-09-01T10:01:00Z | 2411 | 30 | 40 |
236+
+----------------------+------+--------+--------+
237+
");
238+
220239
Ok(())
221240
}
222241

@@ -288,6 +307,34 @@ fn register_sample_data(ctx: &SessionContext) -> Result<()> {
288307
])?,
289308
)?;
290309

310+
// point_stats: grouped data with a quoted mixed-case pivot column.
311+
ctx.register_batch(
312+
"point_stats",
313+
RecordBatch::try_from_iter(vec![
314+
(
315+
"ts",
316+
Arc::new(StringArray::from(vec![
317+
"2024-09-01T10:00:00Z",
318+
"2024-09-01T10:00:00Z",
319+
"2024-09-01T10:01:00Z",
320+
"2024-09-01T10:01:00Z",
321+
])) as ArrayRef,
322+
),
323+
(
324+
"pointNumber",
325+
Arc::new(StringArray::from(vec!["16951", "16952", "16951", "16952"])),
326+
),
327+
(
328+
"port",
329+
Arc::new(StringArray::from(vec!["2411", "2411", "2411", "2411"])),
330+
),
331+
(
332+
"max_value",
333+
Arc::new(Int64Array::from(vec![10, 20, 30, 40])),
334+
),
335+
])?,
336+
)?;
337+
291338
Ok(())
292339
}
293340

@@ -415,11 +462,12 @@ fn plan_pivot(
415462
.collect();
416463

417464
let group_by_cols: Vec<Expr> = schema
418-
.fields()
419465
.iter()
420-
.map(|f| f.name().as_str())
421-
.filter(|name| *name != pivot_col_name.as_str() && !agg_input_cols.contains(name))
422-
.map(col)
466+
.filter(|(_, field)| {
467+
let name = field.name();
468+
name != pivot_col_name.as_str() && !agg_input_cols.contains(&name.as_str())
469+
})
470+
.map(Expr::from)
423471
.collect();
424472

425473
// Build CASE expressions for each (aggregate, pivot_value) pair
@@ -434,7 +482,7 @@ fn plan_pivot(
434482

435483
for (value_alias, pivot_value) in &pivot_values {
436484
// CASE pivot_col WHEN pivot_value THEN agg_input END
437-
let case_expr = case(col(&pivot_col_name))
485+
let case_expr = case(pivot_col.clone())
438486
.when(pivot_value.clone(), agg_input.clone())
439487
.end()?;
440488

0 commit comments

Comments
 (0)