Commit 6d2840a
[SPARK-56333][SQL] Use multiset intersection for NATURAL JOIN column matching
### What changes were proposed in this pull request?
Replace `distinct.filter(resolver)` with a multiset intersection (`canonicalizedIntersect`) when computing common columns for NATURAL JOIN. This preserves duplicate column multiplicity (each name appears `min(left count, right count)` times) instead of deduplicating.
### Why are the changes needed?
`distinct` drops duplicate column names, so `NATURAL JOIN` between relations with repeated column names (e.g., c1, c1, c2) silently loses join conditions. The multiset approach matches `Seq.intersect` semantics while still respecting `spark.sql.caseSensitive`.
Now `NATURAL JOIN` behaves in the same way as before #54858 except it now respects caseSensitive conf.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
New golden file tests.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #55163 from stefankandic/unionMultiplicity.
Authored-by: Stefan Kandic <stefan.kandic@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>1 parent 66a5a89 commit 6d2840a
6 files changed
Lines changed: 80 additions & 4 deletions
File tree
- sql
- catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis
- resolver
- core/src/test/resources/sql-tests
- analyzer-results
- inputs
- results
Lines changed: 2 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3632 | 3632 | | |
3633 | 3633 | | |
3634 | 3634 | | |
3635 | | - | |
3636 | | - | |
3637 | | - | |
| 3635 | + | |
| 3636 | + | |
3638 | 3637 | | |
3639 | 3638 | | |
3640 | 3639 | | |
| |||
Lines changed: 26 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| 20 | + | |
| 21 | + | |
20 | 22 | | |
21 | 23 | | |
22 | 24 | | |
| |||
82 | 84 | | |
83 | 85 | | |
84 | 86 | | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
85 | 111 | | |
86 | 112 | | |
87 | 113 | | |
| |||
Lines changed: 4 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
370 | 370 | | |
371 | 371 | | |
372 | 372 | | |
373 | | - | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
374 | 377 | | |
375 | 378 | | |
376 | 379 | | |
| |||
Lines changed: 28 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
692 | 692 | | |
693 | 693 | | |
694 | 694 | | |
| 695 | + | |
| 696 | + | |
| 697 | + | |
| 698 | + | |
| 699 | + | |
| 700 | + | |
| 701 | + | |
| 702 | + | |
| 703 | + | |
| 704 | + | |
| 705 | + | |
| 706 | + | |
| 707 | + | |
| 708 | + | |
| 709 | + | |
| 710 | + | |
| 711 | + | |
| 712 | + | |
| 713 | + | |
| 714 | + | |
| 715 | + | |
| 716 | + | |
| 717 | + | |
| 718 | + | |
| 719 | + | |
| 720 | + | |
| 721 | + | |
| 722 | + | |
Lines changed: 4 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
77 | 77 | | |
78 | 78 | | |
79 | 79 | | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
Lines changed: 16 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
348 | 348 | | |
349 | 349 | | |
350 | 350 | | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
0 commit comments