Skip to content

Commit d4f5c04

Browse files
committed
upmerge
2 parents 37a020c + 79b83d8 commit d4f5c04

14 files changed

Lines changed: 1149 additions & 445 deletions

File tree

.github/workflows/pr_build_linux.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,7 @@ jobs:
122122
org.apache.comet.exec.CometAsyncShuffleSuite
123123
org.apache.comet.exec.DisableAQECometShuffleSuite
124124
org.apache.comet.exec.DisableAQECometAsyncShuffleSuite
125+
org.apache.spark.shuffle.sort.SpillSorterSuite
125126
- name: "parquet"
126127
value: |
127128
org.apache.comet.parquet.CometParquetWriterSuite

.github/workflows/pr_build_macos.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,7 @@ jobs:
8585
org.apache.comet.exec.CometAsyncShuffleSuite
8686
org.apache.comet.exec.DisableAQECometShuffleSuite
8787
org.apache.comet.exec.DisableAQECometAsyncShuffleSuite
88+
org.apache.spark.shuffle.sort.SpillSorterSuite
8889
- name: "parquet"
8990
value: |
9091
org.apache.comet.parquet.CometParquetWriterSuite

dev/ensure-jars-have-correct-contents.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,7 @@ allowed_expr+="|^org/apache/spark/shuffle/$"
8686
allowed_expr+="|^org/apache/spark/shuffle/sort/$"
8787
allowed_expr+="|^org/apache/spark/shuffle/sort/CometShuffleExternalSorter.*$"
8888
allowed_expr+="|^org/apache/spark/shuffle/sort/RowPartition.class$"
89+
allowed_expr+="|^org/apache/spark/shuffle/sort/SpillSorter.*$"
8990
allowed_expr+="|^org/apache/spark/shuffle/comet/.*$"
9091
allowed_expr+="|^org/apache/spark/sql/$"
9192
# allow ExplainPlanGenerator trait since it may not be available in older Spark versions

docs/source/user-guide/latest/compatibility.md

Lines changed: 95 additions & 99 deletions
Original file line numberDiff line numberDiff line change
@@ -73,122 +73,118 @@ should not be used in production. The feature will be enabled in a future releas
7373

7474
Cast operations in Comet fall into three levels of support:
7575

76-
- **Compatible**: The results match Apache Spark
77-
- **Incompatible**: The results may match Apache Spark for some inputs, but there are known issues where some inputs
76+
- **C (Compatible)**: The results match Apache Spark
77+
- **I (Incompatible)**: The results may match Apache Spark for some inputs, but there are known issues where some inputs
7878
will result in incorrect results or exceptions. The query stage will fall back to Spark by default. Setting
7979
`spark.comet.expression.Cast.allowIncompatible=true` will allow all incompatible casts to run natively in Comet, but this is not
8080
recommended for production use.
81-
- **Unsupported**: Comet does not provide a native version of this cast expression and the query stage will fall back to
81+
- **U (Unsupported)**: Comet does not provide a native version of this cast expression and the query stage will fall back to
8282
Spark.
83+
- **N/A**: Spark does not support this cast.
8384

84-
### Compatible Casts
85+
### Legacy Mode
8586

86-
The following cast operations are generally compatible with Spark except for the differences noted here.
87+
<!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
88+
89+
<!--BEGIN:CAST_LEGACY_TABLE-->
90+
<!-- prettier-ignore-start -->
91+
| | binary | boolean | byte | date | decimal | double | float | integer | long | short | string | timestamp |
92+
|---|---|---|---|---|---|---|---|---|---|---|---|---|
93+
| binary | - | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | C | N/A |
94+
| boolean | N/A | - | C | N/A | U | C | C | C | C | C | C | U |
95+
| byte | U | C | - | N/A | C | C | C | C | C | C | C | U |
96+
| date | N/A | U | U | - | U | U | U | U | U | U | C | U |
97+
| decimal | N/A | C | C | N/A | - | C | C | C | C | C | C | U |
98+
| double | N/A | C | C | N/A | I | - | C | C | C | C | C | U |
99+
| float | N/A | C | C | N/A | I | C | - | C | C | C | C | U |
100+
| integer | U | C | C | N/A | C | C | C | - | C | C | C | U |
101+
| long | U | C | C | N/A | C | C | C | C | - | C | C | U |
102+
| short | U | C | C | N/A | C | C | C | C | C | - | C | U |
103+
| string | C | C | C | C | I | C | C | C | C | C | - | I |
104+
| timestamp | N/A | U | U | C | U | U | U | U | C | U | C | - |
105+
<!-- prettier-ignore-end -->
106+
107+
**Notes:**
108+
109+
- **decimal -> string**: There can be formatting differences in some case due to Spark using scientific notation where Comet does not
110+
- **double -> decimal**: There can be rounding differences
111+
- **double -> string**: There can be differences in precision. For example, the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
112+
- **float -> decimal**: There can be rounding differences
113+
- **float -> string**: There can be differences in precision. For example, the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
114+
- **string -> date**: Only supports years between 262143 BC and 262142 AD
115+
- **string -> decimal**: Does not support fullwidth unicode digits (e.g \\uFF10)
116+
or strings containing null bytes (e.g \\u0000)
117+
- **string -> timestamp**: Not all valid formats are supported
118+
<!--END:CAST_LEGACY_TABLE-->
119+
120+
### Try Mode
87121

88122
<!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
89123

90-
<!--BEGIN:COMPAT_CAST_TABLE-->
124+
<!--BEGIN:CAST_TRY_TABLE-->
91125
<!-- prettier-ignore-start -->
92-
| From Type | To Type | Notes |
93-
|-|-|-|
94-
| boolean | byte | |
95-
| boolean | short | |
96-
| boolean | integer | |
97-
| boolean | long | |
98-
| boolean | float | |
99-
| boolean | double | |
100-
| boolean | string | |
101-
| byte | boolean | |
102-
| byte | short | |
103-
| byte | integer | |
104-
| byte | long | |
105-
| byte | float | |
106-
| byte | double | |
107-
| byte | decimal | |
108-
| byte | string | |
109-
| short | boolean | |
110-
| short | byte | |
111-
| short | integer | |
112-
| short | long | |
113-
| short | float | |
114-
| short | double | |
115-
| short | decimal | |
116-
| short | string | |
117-
| integer | boolean | |
118-
| integer | byte | |
119-
| integer | short | |
120-
| integer | long | |
121-
| integer | float | |
122-
| integer | double | |
123-
| integer | decimal | |
124-
| integer | string | |
125-
| long | boolean | |
126-
| long | byte | |
127-
| long | short | |
128-
| long | integer | |
129-
| long | float | |
130-
| long | double | |
131-
| long | decimal | |
132-
| long | string | |
133-
| float | boolean | |
134-
| float | byte | |
135-
| float | short | |
136-
| float | integer | |
137-
| float | long | |
138-
| float | double | |
139-
| float | string | There can be differences in precision. For example, the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45 |
140-
| double | boolean | |
141-
| double | byte | |
142-
| double | short | |
143-
| double | integer | |
144-
| double | long | |
145-
| double | float | |
146-
| double | string | There can be differences in precision. For example, the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45 |
147-
| decimal | boolean | |
148-
| decimal | byte | |
149-
| decimal | short | |
150-
| decimal | integer | |
151-
| decimal | long | |
152-
| decimal | float | |
153-
| decimal | double | |
154-
| decimal | decimal | |
155-
| decimal | string | There can be formatting differences in some case due to Spark using scientific notation where Comet does not |
156-
| string | boolean | |
157-
| string | byte | |
158-
| string | short | |
159-
| string | integer | |
160-
| string | long | |
161-
| string | float | |
162-
| string | double | |
163-
| string | binary | |
164-
| string | date | Only supports years between 262143 BC and 262142 AD |
165-
| binary | string | |
166-
| date | string | |
167-
| timestamp | long | |
168-
| timestamp | string | |
169-
| timestamp | date | |
126+
| | binary | boolean | byte | date | decimal | double | float | integer | long | short | string | timestamp |
127+
|---|---|---|---|---|---|---|---|---|---|---|---|---|
128+
| binary | - | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | C | N/A |
129+
| boolean | N/A | - | C | N/A | U | C | C | C | C | C | C | U |
130+
| byte | U | C | - | N/A | C | C | C | C | C | C | C | U |
131+
| date | N/A | U | U | - | U | U | U | U | U | U | C | U |
132+
| decimal | N/A | C | C | N/A | - | C | C | C | C | C | C | U |
133+
| double | N/A | C | C | N/A | I | - | C | C | C | C | C | U |
134+
| float | N/A | C | C | N/A | I | C | - | C | C | C | C | U |
135+
| integer | U | C | C | N/A | C | C | C | - | C | C | C | U |
136+
| long | U | C | C | N/A | C | C | C | C | - | C | C | U |
137+
| short | U | C | C | N/A | C | C | C | C | C | - | C | U |
138+
| string | C | C | C | C | I | C | C | C | C | C | - | I |
139+
| timestamp | N/A | U | U | C | U | U | U | U | C | U | C | - |
170140
<!-- prettier-ignore-end -->
171-
<!--END:COMPAT_CAST_TABLE-->
172141

173-
### Incompatible Casts
142+
**Notes:**
174143

175-
The following cast operations are not compatible with Spark for all inputs and are disabled by default.
144+
- **decimal -> string**: There can be formatting differences in some case due to Spark using scientific notation where Comet does not
145+
- **double -> decimal**: There can be rounding differences
146+
- **double -> string**: There can be differences in precision. For example, the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
147+
- **float -> decimal**: There can be rounding differences
148+
- **float -> string**: There can be differences in precision. For example, the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
149+
- **string -> date**: Only supports years between 262143 BC and 262142 AD
150+
- **string -> decimal**: Does not support fullwidth unicode digits (e.g \\uFF10)
151+
or strings containing null bytes (e.g \\u0000)
152+
- **string -> timestamp**: Not all valid formats are supported
153+
<!--END:CAST_TRY_TABLE-->
154+
155+
### ANSI Mode
176156

177157
<!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
178158

179-
<!--BEGIN:INCOMPAT_CAST_TABLE-->
159+
<!--BEGIN:CAST_ANSI_TABLE-->
180160
<!-- prettier-ignore-start -->
181-
| From Type | To Type | Notes |
182-
|-|-|-|
183-
| float | decimal | There can be rounding differences |
184-
| double | decimal | There can be rounding differences |
185-
| string | decimal | Does not support fullwidth unicode digits (e.g \\uFF10)
186-
or strings containing null bytes (e.g \\u0000) |
187-
| string | timestamp | Not all valid formats are supported |
161+
| | binary | boolean | byte | date | decimal | double | float | integer | long | short | string | timestamp |
162+
|---|---|---|---|---|---|---|---|---|---|---|---|---|
163+
| binary | - | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | C | N/A |
164+
| boolean | N/A | - | C | N/A | U | C | C | C | C | C | C | U |
165+
| byte | U | C | - | N/A | C | C | C | C | C | C | C | U |
166+
| date | N/A | U | U | - | U | U | U | U | U | U | C | U |
167+
| decimal | N/A | C | C | N/A | - | C | C | C | C | C | C | U |
168+
| double | N/A | C | C | N/A | I | - | C | C | C | C | C | U |
169+
| float | N/A | C | C | N/A | I | C | - | C | C | C | C | U |
170+
| integer | U | C | C | N/A | C | C | C | - | C | C | C | U |
171+
| long | U | C | C | N/A | C | C | C | C | - | C | C | U |
172+
| short | U | C | C | N/A | C | C | C | C | C | - | C | U |
173+
| string | C | C | C | C | I | C | C | C | C | C | - | I |
174+
| timestamp | N/A | U | U | C | U | U | U | U | C | U | C | - |
188175
<!-- prettier-ignore-end -->
189-
<!--END:INCOMPAT_CAST_TABLE-->
190176

191-
### Unsupported Casts
177+
**Notes:**
178+
179+
- **decimal -> string**: There can be formatting differences in some case due to Spark using scientific notation where Comet does not
180+
- **double -> decimal**: There can be rounding differences
181+
- **double -> string**: There can be differences in precision. For example, the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
182+
- **float -> decimal**: There can be rounding differences
183+
- **float -> string**: There can be differences in precision. For example, the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
184+
- **string -> date**: Only supports years between 262143 BC and 262142 AD
185+
- **string -> decimal**: Does not support fullwidth unicode digits (e.g \\uFF10)
186+
or strings containing null bytes (e.g \\u0000)
187+
- **string -> timestamp**: ANSI mode not supported
188+
<!--END:CAST_ANSI_TABLE-->
192189

193-
Any cast not listed in the previous tables is currently unsupported. We are working on adding more. See the
194-
[tracking issue](https://github.com/apache/datafusion-comet/issues/286) for more details.
190+
See the [tracking issue](https://github.com/apache/datafusion-comet/issues/286) for more details.

0 commit comments

Comments
 (0)