Skip to content

Commit 997cd2b

Browse files
committed
Fix charset annotation in plan strings via saffron.properties
The getDefaultCharset() override (introduced to fix non-ASCII PPL string literals) caused Calcite to annotate all VARCHAR/CHAR types and literals with CHARACTER SET "UTF-8" and _UTF-8 prefix in plan strings, breaking dozens of unit tests that compare logical plan representations. Root cause: Calcite suppresses the charset annotation and _charset prefix only when the charset matches CalciteSystemProperty.DEFAULT_CHARSET (which defaults to ISO-8859-1). Overriding getDefaultCharset() to UTF-8 set the charset on all types, but the suppression checks still compared against ISO-8859-1, making the annotations appear everywhere. Fix: add saffron.properties to core/src/main/resources with: calcite.default.charset=UTF-8 calcite.default.collation.name=UTF-8$en_US Calcite reads this file at CalciteSystemProperty class-load time (before any DEFAULT_CHARSET or SqlCollation.IMPLICIT static field is initialized), shifting the entire "default charset" universe to UTF-8. Both suppression checks now compare against UTF-8, so plan strings are identical to before while non-ASCII string literals continue to work correctly. The now-redundant getDefaultCharset() override is removed; the inherited SqlTypeFactoryImpl path already returns UTF-8 via Util.getDefaultCharset() which reads from CalciteSystemProperty.DEFAULT_CHARSET = "UTF-8". Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
1 parent a4db157 commit 997cd2b

3 files changed

Lines changed: 9 additions & 9 deletions

File tree

core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -131,9 +131,8 @@ public RexNode visitLiteral(Literal node, CalcitePlanContext context) {
131131
case NULL:
132132
return rexBuilder.makeNullLiteral(typeFactory.createSqlType(SqlTypeName.NULL));
133133
case STRING:
134-
// UTF-8 charset is applied globally by OpenSearchTypeFactory.createSqlType() for
135-
// VARCHAR/CHAR, so non-ASCII characters (e.g. Chinese, Arabic) are handled correctly
136-
// and literal types stay compatible with column types for string operations.
134+
// saffron.properties sets calcite.default.charset=UTF-8 so non-ASCII characters
135+
// (e.g. Chinese, Arabic) are accepted and literal types stay compatible with column types.
137136
if (value.toString().length() == 1) {
138137
// To align Spark/PostgreSQL, Char(1) is useful, such as cast('1' to boolean) should
139138
// return true

core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,6 @@
3232

3333
import java.lang.reflect.Type;
3434
import java.nio.charset.Charset;
35-
import java.nio.charset.StandardCharsets;
3635
import java.util.ArrayList;
3736
import java.util.LinkedHashMap;
3837
import java.util.List;
@@ -96,11 +95,6 @@ public RelDataType createTypeWithNullability(RelDataType type, boolean nullable)
9695
return super.createTypeWithNullability(type, nullable);
9796
}
9897

99-
@Override
100-
public Charset getDefaultCharset() {
101-
return StandardCharsets.UTF_8;
102-
}
103-
10498
@Override
10599
public RelDataType createTypeWithCharsetAndCollation(
106100
RelDataType type, Charset charset, SqlCollation collation) {
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# Shift Calcite's default charset from ISO-8859-1 to UTF-8 so that:
2+
# 1. Non-ASCII PPL string literals (Chinese, Arabic, etc.) are accepted without error.
3+
# 2. Plan-string representations (used in unit-test assertions) are unchanged,
4+
# because Calcite suppresses the CHARACTER SET annotation and _charset prefix
5+
# whenever the charset matches this "default" value.
6+
calcite.default.charset=UTF-8
7+
calcite.default.collation.name=UTF-8$en_US

0 commit comments

Comments
 (0)