You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: recommend SQL file tests for new expressions (#3598)
Update the "Adding Spark-side Tests" section in the contributor guide
to recommend the SQL file test framework as the preferred way to add
test coverage for new expressions, with a link to the full SQL file
tests documentation. The Scala test approach is preserved as an
alternative for cases requiring programmatic setup.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: docs/source/contributor-guide/adding_a_new_expression.md
+53-15Lines changed: 53 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -210,9 +210,59 @@ Any notes provided will be logged to help with debugging and understanding why a
210
210
211
211
#### Adding Spark-side Tests for the New Expression
212
212
213
-
It is important to verify that the new expression is correctly recognized by the native execution engine and matches the expected spark behavior. To do this, you can add a set of test cases in the `CometExpressionSuite`, and use the `checkSparkAnswerAndOperator` method to compare the results of the new expression with the expected Spark results and that Comet's native execution engine is able to execute the expression.
213
+
It is important to verify that the new expression is correctly recognized by the native execution engine and matches the expected Spark behavior. The preferred way to add test coverage is to write a SQL test file using the SQL file test framework. This approach is simpler than writing Scala test code and makes it easy to cover many input combinations and edge cases.
214
+
215
+
##### Writing a SQL test file
216
+
217
+
Create a `.sql` file under the appropriate subdirectory in `spark/src/test/resources/sql-tests/expressions/` (e.g., `string/`, `math/`, `array/`). The file should create a table with test data, then run queries that exercise the expression. Here is an example for the `unhex` expression:
Each `query` block automatically runs the SQL through both Spark and Comet and compares results, and also verifies that Comet executes the expression natively (not falling back to Spark).
248
+
249
+
Run the test with:
250
+
251
+
```shell
252
+
./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite unhex" -Dtest=none
253
+
```
254
+
255
+
For full documentation on the test file format — including directives like `ConfigMatrix`, query modes like `spark_answer_only` and `tolerance`, handling known bugs with `ignore(...)`, and tips for writing thorough tests — see the [SQL File Tests](sql-file-tests.md) guide.
256
+
257
+
##### Tips
214
258
215
-
For example, this is the test case for the `unhex` expression:
259
+
-**Cover both column references and literals.** Comet often uses different code paths for each. The SQL file test suite automatically disables constant folding, so all-literal queries are evaluated natively.
260
+
-**Include edge cases** such as `NULL`, empty strings, boundary values, `NaN`, and multibyte UTF-8 characters.
261
+
-**Keep one file per expression** to make failures easy to locate.
262
+
263
+
##### Scala tests (alternative)
264
+
265
+
For cases that require programmatic setup or custom assertions beyond what SQL files support, you can also add Scala test cases in `CometExpressionSuite` using the `checkSparkAnswerAndOperator` method:
216
266
217
267
```scala
218
268
test("unhex") {
@@ -236,11 +286,7 @@ test("unhex") {
236
286
}
237
287
```
238
288
239
-
#### Testing with Literal Values
240
-
241
-
When writing tests that use literal values (e.g., `SELECT my_func('literal')`), Spark's constant folding optimizer may evaluate the expression at planning time rather than execution time. This means your Comet implementation might not actually be exercised during the test.
242
-
243
-
To ensure literal expressions are executed by Comet, disable the constant folding optimizer:
289
+
When writing Scala tests with literal values (e.g., `SELECT my_func('literal')`), Spark's constant folding optimizer may evaluate the expression at planning time, bypassing Comet. To prevent this, disable constant folding:
244
290
245
291
```scala
246
292
test("my_func with literals") {
@@ -251,14 +297,6 @@ test("my_func with literals") {
251
297
}
252
298
```
253
299
254
-
This is particularly important for:
255
-
256
-
- Edge case tests using specific literal values (e.g., null handling, overflow conditions)
257
-
- Tests verifying behavior with special input values
258
-
- Any test where the expression inputs are entirely literal
259
-
260
-
When possible, prefer testing with column references from tables (as shown in the `unhex` example above), which naturally avoids the constant folding issue.
261
-
262
300
### Adding the Expression To the Protobuf Definition
263
301
264
302
Once you have the expression implemented in Scala, you might need to update the protobuf definition to include the new expression. You may not need to do this if the expression is already covered by the existing protobuf definition (e.g. you're adding a new scalar function that uses the `ScalarFunc` message).
0 commit comments