Commit aadae6b
authored
## Which issue does this PR close?
- Closes #6543.
## Rationale for this change
We're building a SQL engine on top of DataFusion and hit this while
running TPC-DS benchmarks — Q39 fails during planning with:
```
Projections require unique expression names but the expression
"CAST(inv1.cov AS Decimal128(30, 10))" at position 4 and "inv1.cov"
at position 10 have the same name.
```
The underlying issue is that `CAST` is transparent to `schema_name()`,
so both expressions resolve to `inv1.cov`. But this also affects simpler
cases like `SELECT 1, 1` or `SELECT x, x FROM t` — all of which
PostgreSQL, Trino, and SQLite handle without errors.
Looking at the issue discussion, @alamb suggested adding auto-aliases in
the SQL planner:
> "I think that is actually a pretty neat idea -- specifically add the
aliases in the SQL planner. I would be happy to review such a PR"
That's what this PR does.
### TPC-DS Q39 reproduction
The query joins two CTEs that both produce columns named `cov`, `mean`,
etc. When the planner applies implicit casts during type coercion, the
cast-wrapped and original expressions end up with the same schema name:
```sql
WITH inv1 AS (
SELECT w_warehouse_name, w_warehouse_sk, i_item_sk, d_moy,
stdev, mean, (CASE mean WHEN 0 THEN NULL ELSE stdev/mean END) AS cov
FROM (SELECT w_warehouse_name, w_warehouse_sk, i_item_sk, d_moy,
stddev_samp(inv_quantity_on_hand) AS stdev,
avg(inv_quantity_on_hand) AS mean
FROM inventory, item, warehouse, date_dim
WHERE inv_item_sk = i_item_sk AND inv_warehouse_sk = w_warehouse_sk
AND inv_date_sk = d_date_sk AND d_year = 2001
GROUP BY w_warehouse_name, w_warehouse_sk, i_item_sk, d_moy) foo
WHERE CASE mean WHEN 0 THEN 0 ELSE stdev/mean END > 1
),
inv2 AS (
SELECT w_warehouse_name, w_warehouse_sk, i_item_sk, d_moy,
stdev, mean, (CASE mean WHEN 0 THEN NULL ELSE stdev/mean END) AS cov
FROM (SELECT w_warehouse_name, w_warehouse_sk, i_item_sk, d_moy,
stddev_samp(inv_quantity_on_hand) AS stdev,
avg(inv_quantity_on_hand) AS mean
FROM inventory, item, warehouse, date_dim
WHERE inv_item_sk = i_item_sk AND inv_warehouse_sk = w_warehouse_sk
AND inv_date_sk = d_date_sk AND d_year = 2001 AND d_moy = 2
GROUP BY w_warehouse_name, w_warehouse_sk, i_item_sk, d_moy) foo
WHERE CASE mean WHEN 0 THEN 0 ELSE stdev/mean END > 1
)
SELECT inv1.w_warehouse_sk, inv1.i_item_sk, inv1.d_moy, inv1.mean, inv1.cov,
inv2.w_warehouse_sk, inv2.i_item_sk, inv2.d_moy, inv2.mean, inv2.cov
FROM inv1 JOIN inv2
ON inv1.i_item_sk = inv2.i_item_sk AND inv1.w_warehouse_sk = inv2.w_warehouse_sk
ORDER BY inv1.w_warehouse_sk, inv1.i_item_sk, inv1.d_moy, inv1.mean, inv1.cov,
inv2.d_moy, inv2.mean, inv2.cov;
```
## What changes are included in this PR?
A dedup pass in `SqlToRel` that runs right after
`prepare_select_exprs()` and before `self.project()`. It detects
duplicate `schema_name()` values and wraps the second (and subsequent)
occurrences in an `Alias` with a `:{N}` suffix:
```sql
SELECT x AS c1, y AS c1 FROM t;
-- produces columns: c1, c1:1
```
The actual code is small (~45 lines of logic across 2 files):
- `datafusion/sql/src/utils.rs` — new `deduplicate_select_expr_names()`
function
- `datafusion/sql/src/select.rs` — one call site between
`prepare_select_exprs()` and `self.project()`
I intentionally kept this scoped to the SQL planner only:
- `validate_unique_names("Projections")` in `builder.rs` is untouched,
so the Rust API (`LogicalPlanBuilder::project`) still rejects duplicates
- No changes to the optimizer, physical planner, or DFSchema
- `validate_unique_names("Windows")` is unchanged
**Known limitation:** `SELECT *, x FROM t` still errors when `x`
overlaps with `*`, because wildcard expansion happens after this dedup
pass (inside `project_with_validation`). Happy to address that in a
follow-up if desired.
## Are these changes tested?
New sqllogictest file (`duplicate_column_alias.slt`) with 13 test cases
covering:
- Basic duplicate aliases, literals, and same-column-twice
- Subquery with duplicate names
- ORDER BY resolving to first occurrence
- CTE join (TPC-DS Q39 pattern)
- Three-way duplicates
- CAST producing same schema_name as original column
- GROUP BY and aggregates with duplicates
- ORDER BY positional reference to the renamed column
- `iszero(0.0), iszero(-0.0)` (reported in the issue by @jatin510)
- UNION with duplicate column names
- Wildcard limitation documented as explicit `query error` test
Updated existing tests in `sql_integration.rs` (5 tests),
`aggregate.slt`, and `unnest.slt` that previously asserted the
"Projections require unique" error.
## Are there any user-facing changes?
Yes, this is a behavior change:
- SQL queries with duplicate expression names now succeed instead of
erroring
- Duplicate columns get a `:{N}` suffix in the output (e.g., `cov`,
`cov:1`)
- First occurrence keeps its original name, so ORDER BY / HAVING
references still work
- The programmatic Rust API is unchanged
1 parent 1a0af76 commit aadae6b
File tree
6 files changed
+265
-35
lines changed- datafusion
- sqllogictest/test_files
- sql
- src
- tests
6 files changed
+265
-35
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
26 | | - | |
27 | | - | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
28 | 29 | | |
29 | 30 | | |
30 | 31 | | |
| |||
109 | 110 | | |
110 | 111 | | |
111 | 112 | | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
112 | 117 | | |
113 | 118 | | |
114 | 119 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
| 36 | + | |
36 | 37 | | |
37 | 38 | | |
38 | 39 | | |
| |||
633 | 634 | | |
634 | 635 | | |
635 | 636 | | |
| 637 | + | |
| 638 | + | |
| 639 | + | |
| 640 | + | |
| 641 | + | |
| 642 | + | |
| 643 | + | |
| 644 | + | |
| 645 | + | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
| 656 | + | |
| 657 | + | |
| 658 | + | |
| 659 | + | |
| 660 | + | |
| 661 | + | |
| 662 | + | |
| 663 | + | |
| 664 | + | |
| 665 | + | |
| 666 | + | |
| 667 | + | |
| 668 | + | |
636 | 669 | | |
637 | 670 | | |
638 | 671 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
789 | 789 | | |
790 | 790 | | |
791 | 791 | | |
792 | | - | |
793 | | - | |
| 792 | + | |
794 | 793 | | |
795 | | - | |
796 | | - | |
| 794 | + | |
| 795 | + | |
| 796 | + | |
| 797 | + | |
| 798 | + | |
797 | 799 | | |
798 | 800 | | |
799 | 801 | | |
| |||
1531 | 1533 | | |
1532 | 1534 | | |
1533 | 1535 | | |
1534 | | - | |
1535 | | - | |
| 1536 | + | |
1536 | 1537 | | |
1537 | | - | |
1538 | | - | |
| 1538 | + | |
| 1539 | + | |
| 1540 | + | |
| 1541 | + | |
| 1542 | + | |
| 1543 | + | |
1539 | 1544 | | |
1540 | 1545 | | |
1541 | 1546 | | |
| |||
1584 | 1589 | | |
1585 | 1590 | | |
1586 | 1591 | | |
1587 | | - | |
1588 | | - | |
| 1592 | + | |
1589 | 1593 | | |
1590 | | - | |
1591 | | - | |
| 1594 | + | |
| 1595 | + | |
| 1596 | + | |
| 1597 | + | |
| 1598 | + | |
| 1599 | + | |
1592 | 1600 | | |
1593 | 1601 | | |
1594 | 1602 | | |
| |||
1625 | 1633 | | |
1626 | 1634 | | |
1627 | 1635 | | |
1628 | | - | |
1629 | | - | |
| 1636 | + | |
1630 | 1637 | | |
1631 | | - | |
1632 | | - | |
| 1638 | + | |
| 1639 | + | |
| 1640 | + | |
| 1641 | + | |
| 1642 | + | |
| 1643 | + | |
1633 | 1644 | | |
1634 | 1645 | | |
1635 | 1646 | | |
| |||
1750 | 1761 | | |
1751 | 1762 | | |
1752 | 1763 | | |
1753 | | - | |
1754 | | - | |
| 1764 | + | |
1755 | 1765 | | |
1756 | | - | |
1757 | | - | |
| 1766 | + | |
| 1767 | + | |
| 1768 | + | |
| 1769 | + | |
| 1770 | + | |
| 1771 | + | |
1758 | 1772 | | |
1759 | 1773 | | |
1760 | 1774 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7941 | 7941 | | |
7942 | 7942 | | |
7943 | 7943 | | |
7944 | | - | |
7945 | | - | |
| 7944 | + | |
| 7945 | + | |
7946 | 7946 | | |
| 7947 | + | |
| 7948 | + | |
7947 | 7949 | | |
7948 | | - | |
7949 | | - | |
7950 | | - | |
7951 | | - | |
7952 | | - | |
7953 | | - | |
| 7950 | + | |
7954 | 7951 | | |
7955 | | - | |
7956 | | - | |
7957 | | - | |
7958 | | - | |
| 7952 | + | |
| 7953 | + | |
7959 | 7954 | | |
7960 | 7955 | | |
7961 | 7956 | | |
| |||
Lines changed: 175 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
547 | 547 | | |
548 | 548 | | |
549 | 549 | | |
550 | | - | |
| 550 | + | |
551 | 551 | | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
552 | 560 | | |
553 | 561 | | |
554 | 562 | | |
| |||
0 commit comments