Skip to content

Commit 84df1ce

Browse files
andygroveclaude
andauthored
docs: recommend SQL file tests for new expressions (#3598)
Update the "Adding Spark-side Tests" section in the contributor guide to recommend the SQL file test framework as the preferred way to add test coverage for new expressions, with a link to the full SQL file tests documentation. The Scala test approach is preserved as an alternative for cases requiring programmatic setup. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent d0aa1ff commit 84df1ce

1 file changed

Lines changed: 53 additions & 15 deletions

File tree

docs/source/contributor-guide/adding_a_new_expression.md

Lines changed: 53 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -210,9 +210,59 @@ Any notes provided will be logged to help with debugging and understanding why a
210210

211211
#### Adding Spark-side Tests for the New Expression
212212

213-
It is important to verify that the new expression is correctly recognized by the native execution engine and matches the expected spark behavior. To do this, you can add a set of test cases in the `CometExpressionSuite`, and use the `checkSparkAnswerAndOperator` method to compare the results of the new expression with the expected Spark results and that Comet's native execution engine is able to execute the expression.
213+
It is important to verify that the new expression is correctly recognized by the native execution engine and matches the expected Spark behavior. The preferred way to add test coverage is to write a SQL test file using the SQL file test framework. This approach is simpler than writing Scala test code and makes it easy to cover many input combinations and edge cases.
214+
215+
##### Writing a SQL test file
216+
217+
Create a `.sql` file under the appropriate subdirectory in `spark/src/test/resources/sql-tests/expressions/` (e.g., `string/`, `math/`, `array/`). The file should create a table with test data, then run queries that exercise the expression. Here is an example for the `unhex` expression:
218+
219+
```sql
220+
-- ConfigMatrix: parquet.enable.dictionary=false,true
221+
222+
statement
223+
CREATE TABLE test_unhex(col string) USING parquet
224+
225+
statement
226+
INSERT INTO test_unhex VALUES
227+
('537061726B2053514C'),
228+
('737472696E67'),
229+
('\0'),
230+
(''),
231+
('###'),
232+
('G123'),
233+
('hello'),
234+
('A1B'),
235+
('0A1B'),
236+
(NULL)
237+
238+
-- column argument
239+
query
240+
SELECT unhex(col) FROM test_unhex
241+
242+
-- literal arguments
243+
query
244+
SELECT unhex('48656C6C6F'), unhex(''), unhex(NULL)
245+
```
246+
247+
Each `query` block automatically runs the SQL through both Spark and Comet and compares results, and also verifies that Comet executes the expression natively (not falling back to Spark).
248+
249+
Run the test with:
250+
251+
```shell
252+
./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite unhex" -Dtest=none
253+
```
254+
255+
For full documentation on the test file format — including directives like `ConfigMatrix`, query modes like `spark_answer_only` and `tolerance`, handling known bugs with `ignore(...)`, and tips for writing thorough tests — see the [SQL File Tests](sql-file-tests.md) guide.
256+
257+
##### Tips
214258

215-
For example, this is the test case for the `unhex` expression:
259+
- **Cover both column references and literals.** Comet often uses different code paths for each. The SQL file test suite automatically disables constant folding, so all-literal queries are evaluated natively.
260+
- **Include edge cases** such as `NULL`, empty strings, boundary values, `NaN`, and multibyte UTF-8 characters.
261+
- **Keep one file per expression** to make failures easy to locate.
262+
263+
##### Scala tests (alternative)
264+
265+
For cases that require programmatic setup or custom assertions beyond what SQL files support, you can also add Scala test cases in `CometExpressionSuite` using the `checkSparkAnswerAndOperator` method:
216266

217267
```scala
218268
test("unhex") {
@@ -236,11 +286,7 @@ test("unhex") {
236286
}
237287
```
238288

239-
#### Testing with Literal Values
240-
241-
When writing tests that use literal values (e.g., `SELECT my_func('literal')`), Spark's constant folding optimizer may evaluate the expression at planning time rather than execution time. This means your Comet implementation might not actually be exercised during the test.
242-
243-
To ensure literal expressions are executed by Comet, disable the constant folding optimizer:
289+
When writing Scala tests with literal values (e.g., `SELECT my_func('literal')`), Spark's constant folding optimizer may evaluate the expression at planning time, bypassing Comet. To prevent this, disable constant folding:
244290

245291
```scala
246292
test("my_func with literals") {
@@ -251,14 +297,6 @@ test("my_func with literals") {
251297
}
252298
```
253299

254-
This is particularly important for:
255-
256-
- Edge case tests using specific literal values (e.g., null handling, overflow conditions)
257-
- Tests verifying behavior with special input values
258-
- Any test where the expression inputs are entirely literal
259-
260-
When possible, prefer testing with column references from tables (as shown in the `unhex` example above), which naturally avoids the constant folding issue.
261-
262300
### Adding the Expression To the Protobuf Definition
263301

264302
Once you have the expression implemented in Scala, you might need to update the protobuf definition to include the new expression. You may not need to do this if the expression is already covered by the existing protobuf definition (e.g. you're adding a new scalar function that uses the `ScalarFunc` message).

0 commit comments

Comments
 (0)