Commit ccef264
[SPARK-57777][SQL][CONNECT] Distinguish explicit collation when rendering string literals to SQL
### What changes were proposed in this pull request?
This PR makes `Literal.sql` render an explicit `collate` clause for any string literal whose
type carries an explicit collation, **including an explicit `UTF8_BINARY`**, while still rendering
the default (un-collated) `StringType` without a clause.
`Literal.sql` previously had two arms for string literals:
```scala
case (v: UTF8String, StringType) => // matched any UTF8_BINARY StringType by value equality
"'" + escaped + "'"
case (v: UTF8String, st: StringType) => // only reached for non-UTF8_BINARY collations
"'" + escaped + "'" + st.typeName.substring(6)
```
The first arm matches via the `StringType` case object's `equals` (which compares `collationId`
and `constraint`), so it collapsed both the *default* `StringType` and an *explicitly collated*
`UTF8_BINARY` `StringType` into the same clause-less output. The two arms are now merged into one
that decides the clause via `DataTypeUtils.isDefaultStringCharOrVarcharType` (the same
singleton-identity check used elsewhere, e.g. by `ApplyDefaultCollation` and `SHOW CREATE TABLE`):
```scala
case (v: UTF8String, st: StringType) =>
val collateClause =
if (DataTypeUtils.isDefaultStringCharOrVarcharType(st)) "" else s" collate ${st.collationName}"
"'" + escaped + "'" + collateClause
```
For non-default collations the produced string is byte-for-byte identical to the previous
`st.typeName.substring(6)` output (`typeName` is `s"string collate $collationName"`).
This PR also removes a test-only normalization in `PlanGenerationTestSuite` that stamped
`UTF8_BINARY` onto every string-type proto whose `collation` field was empty before writing the
golden files. That shim made the generated `query-tests` golden artifacts (`.proto.bin` / `.json`,
and the downstream `.explain` files) misrepresent the real wire format: a default `StringType` is
serialized with an **empty** `collation` field (the "undetermined / default collation" sentinel),
not `UTF8_BINARY`. The affected golden files were regenerated so they now reflect what a real
client actually sends.
### Why are the changes needed?
A default `StringType` and an explicitly-`UTF8_BINARY` `StringType` are semantically different:
the former is "undetermined" and is eligible to inherit a default collation during analysis (e.g.
`CREATE TABLE ... DEFAULT COLLATION UTF8_LCASE AS SELECT 'x'`), while the latter is explicitly
pinned and must not inherit. `Literal.sql` is used in view text, `SHOW CREATE TABLE`, error
messages, etc.; rendering an explicitly-collated `UTF8_BINARY` literal without a `collate` clause
loses that distinction and is not faithful on re-parse. This aligns string-literal SQL rendering
with how the rest of the engine already distinguishes explicit collation.
The test-normalization removal is a correctness fix for the Spark Connect golden files: they are
meant to be a faithful record of the protocol, and they were showing `collate UTF8_BINARY` on
default strings where the actual proto omits the collation.
### Does this PR introduce _any_ user-facing change?
Yes. `Literal.sql` now appends ` collate UTF8_BINARY` when a string literal's type is an explicit
(non-default) `UTF8_BINARY` `StringType`. A plain default string literal is unchanged (no clause),
and literals with other explicit collations are unchanged. This affects SQL text generated from
literals (e.g. view definitions, `SHOW CREATE TABLE`, error messages).
### How was this patch tested?
Existing suites, all passing:
- `catalyst/testOnly org.apache.spark.sql.catalyst.expressions.LiteralExpressionSuite` (49)
- `connect-client-jvm/testOnly org.apache.spark.sql.PlanGenerationTestSuite` (727; golden files regenerated and reviewed)
- `connect/testOnly org.apache.spark.sql.connect.ProtoToParsedPlanTestSuite` (732; explain golden regenerated)
- `sql/testOnly org.apache.spark.sql.SQLQueryTestSuite` for the collation inputs (`collations-basic`, `view-with-default-collation`, `collations-padding-trim`, `collations-string-functions`, `collations-aliases`, `listagg-collations`) (12; no golden changes)
- `sql/testOnly org.apache.spark.sql.execution.command.{v1,v2}.ShowCreateTableSuite` (47)
The regenerated golden diff is limited to dropping the test-stamped `collation: "UTF8_BINARY"`
from default-string proto types and the corresponding `CAST(NULL AS STRING COLLATE UTF8_BINARY)`
-> `CAST(NULL AS STRING)` in one explain file.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Opus 4.8)
Closes #56892 from cloud-fan/cloud-fan/string-literal-default-collation.
Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(cherry picked from commit a947296)
Signed-off-by: Max Gekk <max.gekk@gmail.com>1 parent 343e576 commit ccef264
12 files changed
Lines changed: 26 additions & 38 deletions
File tree
- sql
- catalyst/src
- main/scala/org/apache/spark/sql/catalyst/expressions
- test/scala/org/apache/spark/sql/catalyst/expressions
- connect
- client/jvm/src/test/scala/org/apache/spark/sql
- common/src/test/resources/query-tests
- explain-results
- queries
Lines changed: 12 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
592 | 592 | | |
593 | 593 | | |
594 | 594 | | |
595 | | - | |
596 | | - | |
597 | | - | |
598 | 595 | | |
| 596 | + | |
| 597 | + | |
| 598 | + | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
599 | 607 | | |
600 | | - | |
601 | | - | |
| 608 | + | |
602 | 609 | | |
603 | 610 | | |
604 | 611 | | |
| |||
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralExpressionSuite.scala
Lines changed: 13 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
842 | 842 | | |
843 | 843 | | |
844 | 844 | | |
| 845 | + | |
| 846 | + | |
| 847 | + | |
| 848 | + | |
| 849 | + | |
| 850 | + | |
| 851 | + | |
| 852 | + | |
| 853 | + | |
| 854 | + | |
| 855 | + | |
| 856 | + | |
| 857 | + | |
845 | 858 | | |
Lines changed: 0 additions & 12 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
198 | 198 | | |
199 | 199 | | |
200 | 200 | | |
201 | | - | |
202 | 201 | | |
203 | 202 | | |
204 | 203 | | |
| |||
221 | 220 | | |
222 | 221 | | |
223 | 222 | | |
224 | | - | |
225 | | - | |
226 | | - | |
227 | | - | |
228 | | - | |
229 | | - | |
230 | | - | |
231 | | - | |
232 | | - | |
233 | | - | |
234 | | - | |
235 | 223 | | |
236 | 224 | | |
237 | 225 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
2 | 2 | | |
Lines changed: 0 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
22 | 21 | | |
23 | 22 | | |
24 | 23 | | |
| |||
Binary file not shown.
Lines changed: 0 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
521 | 521 | | |
522 | 522 | | |
523 | 523 | | |
524 | | - | |
525 | 524 | | |
526 | 525 | | |
527 | 526 | | |
| |||
579 | 578 | | |
580 | 579 | | |
581 | 580 | | |
582 | | - | |
583 | 581 | | |
584 | 582 | | |
585 | 583 | | |
| |||
Binary file not shown.
Lines changed: 0 additions & 16 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
82 | 82 | | |
83 | 83 | | |
84 | 84 | | |
85 | | - | |
86 | 85 | | |
87 | 86 | | |
88 | 87 | | |
| |||
958 | 957 | | |
959 | 958 | | |
960 | 959 | | |
961 | | - | |
962 | 960 | | |
963 | 961 | | |
964 | 962 | | |
| |||
1023 | 1021 | | |
1024 | 1022 | | |
1025 | 1023 | | |
1026 | | - | |
1027 | 1024 | | |
1028 | 1025 | | |
1029 | 1026 | | |
| |||
1106 | 1103 | | |
1107 | 1104 | | |
1108 | 1105 | | |
1109 | | - | |
1110 | 1106 | | |
1111 | 1107 | | |
1112 | 1108 | | |
| |||
1157 | 1153 | | |
1158 | 1154 | | |
1159 | 1155 | | |
1160 | | - | |
1161 | 1156 | | |
1162 | 1157 | | |
1163 | 1158 | | |
| |||
1202 | 1197 | | |
1203 | 1198 | | |
1204 | 1199 | | |
1205 | | - | |
1206 | 1200 | | |
1207 | 1201 | | |
1208 | 1202 | | |
| |||
1545 | 1539 | | |
1546 | 1540 | | |
1547 | 1541 | | |
1548 | | - | |
1549 | 1542 | | |
1550 | 1543 | | |
1551 | 1544 | | |
| |||
1555 | 1548 | | |
1556 | 1549 | | |
1557 | 1550 | | |
1558 | | - | |
1559 | 1551 | | |
1560 | 1552 | | |
1561 | 1553 | | |
| |||
1629 | 1621 | | |
1630 | 1622 | | |
1631 | 1623 | | |
1632 | | - | |
1633 | 1624 | | |
1634 | 1625 | | |
1635 | 1626 | | |
| |||
1707 | 1698 | | |
1708 | 1699 | | |
1709 | 1700 | | |
1710 | | - | |
1711 | 1701 | | |
1712 | 1702 | | |
1713 | 1703 | | |
| |||
1813 | 1803 | | |
1814 | 1804 | | |
1815 | 1805 | | |
1816 | | - | |
1817 | 1806 | | |
1818 | 1807 | | |
1819 | 1808 | | |
1820 | 1809 | | |
1821 | 1810 | | |
1822 | 1811 | | |
1823 | | - | |
1824 | 1812 | | |
1825 | 1813 | | |
1826 | 1814 | | |
| |||
1896 | 1884 | | |
1897 | 1885 | | |
1898 | 1886 | | |
1899 | | - | |
1900 | 1887 | | |
1901 | 1888 | | |
1902 | 1889 | | |
| |||
1992 | 1979 | | |
1993 | 1980 | | |
1994 | 1981 | | |
1995 | | - | |
1996 | 1982 | | |
1997 | 1983 | | |
1998 | 1984 | | |
| |||
2010 | 1996 | | |
2011 | 1997 | | |
2012 | 1998 | | |
2013 | | - | |
2014 | 1999 | | |
2015 | 2000 | | |
2016 | 2001 | | |
| |||
2024 | 2009 | | |
2025 | 2010 | | |
2026 | 2011 | | |
2027 | | - | |
2028 | 2012 | | |
2029 | 2013 | | |
2030 | 2014 | | |
| |||
Binary file not shown.
0 commit comments