Skip to content

Commit b4ac134

Browse files
authored
test: add SQL test coverage for spark.sql.legacy.timeParserPolicy (#4183)
1 parent 3990ac3 commit b4ac134

30 files changed

Lines changed: 1137 additions & 0 deletions

docs/source/contributor-guide/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@ Adding a New Operator <adding_a_new_operator>
4848
Adding a New Expression <adding_a_new_expression>
4949
Adding a New Spark Version <adding_a_new_spark_version>
5050
Supported Spark Expressions <spark_expressions_support>
51+
Supported Spark Configurations <spark_configs_support>
5152
Tracing <tracing>
5253
Profiling <profiling>
5354
Comet SQL Tests <sql-file-tests.md>
Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
<!---
2+
Licensed to the Apache Software Foundation (ASF) under one
3+
or more contributor license agreements. See the NOTICE file
4+
distributed with this work for additional information
5+
regarding copyright ownership. The ASF licenses this file
6+
to you under the Apache License, Version 2.0 (the
7+
"License"); you may not use this file except in compliance
8+
with the License. You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing,
13+
software distributed under the License is distributed on an
14+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
KIND, either express or implied. See the License for the
16+
specific language governing permissions and limitations
17+
under the License.
18+
-->
19+
20+
# Supported Spark Configurations
21+
22+
This document tracks Spark SQL configurations that affect Comet's behavior. For each
23+
configuration we record which Comet expressions or operators are influenced, what
24+
verification has been performed, and any known gaps.
25+
26+
## How to Read This Document
27+
28+
The status column uses these values:
29+
30+
- **Supported** -- Comet runs the affected expressions natively under every value of
31+
the config, and produces results matching Spark.
32+
- **Partial** -- Comet runs natively for some values of the config but falls back to
33+
Spark for others, or runs natively but with documented incompatibilities.
34+
- **Falls back** -- Comet does not run the affected expressions natively under this
35+
config and always defers to Spark.
36+
- **Unaudited** -- the config's interaction with Comet has not yet been verified.
37+
38+
## Audited Configurations
39+
40+
- `spark.sql.legacy.timeParserPolicy`
41+
- Default: `EXCEPTION`
42+
- Status: Falls back (see notes)
43+
- Affected expressions: `date_format`, `from_unixtime`, `unix_timestamp`, `to_unix_timestamp`, `to_timestamp`, `to_timestamp_ntz`, `to_date`, `try_to_timestamp` (Spark 4+)
44+
- Spark versions checked: 3.4.3, 3.5.8, 4.0.1
45+
- Date: 2026-05-02
46+
47+
## Audit Notes
48+
49+
### `spark.sql.legacy.timeParserPolicy`
50+
51+
**Source.** `SQLConf.LEGACY_TIME_PARSER_POLICY` selects the formatter used by
52+
`TimestampFormatter` and `DateFormatter`:
53+
54+
- `LEGACY` -- `java.text.SimpleDateFormat` / `FastDateFormat`. Lenient parsing.
55+
- `CORRECTED` -- `java.time.DateTimeFormatter` via `Iso8601TimestampFormatter`. Strict.
56+
- `EXCEPTION` (default) -- same parser as `CORRECTED`, plus
57+
`DateTimeFormatterHelper.checkParsedDiff` raises `SparkUpgradeException`
58+
(`INCONSISTENT_BEHAVIOR_CROSS_VERSION`) when the new parser fails on input that the
59+
legacy parser would have accepted. Pattern validation also raises
60+
`SparkUpgradeException` when a pattern is recognized only by the legacy formatter
61+
(this check applies under both `CORRECTED` and `EXCEPTION`).
62+
63+
**Affected expressions.** Determined by tracing `TimestampFormatterHelper`,
64+
`TimestampFormatter(...)`, and `DateFormatter(...)` usage in
65+
`sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala`
66+
across Spark 3.4, 3.5, 4.0, and 4.1. Three expression classes mix in
67+
`TimestampFormatterHelper`:
68+
69+
- `DateFormatClass` -- `date_format`
70+
- `FromUnixTime` -- `from_unixtime`
71+
- `ToTimestamp` (abstract) -- `UnixTimestamp` (`unix_timestamp`),
72+
`ToUnixTimestamp` (`to_unix_timestamp`), `GetTimestamp` (used by
73+
`ParseToTimestamp` for `to_timestamp` / `to_timestamp_ntz`, `ParseToDate` for
74+
`to_date`, and Spark 4's `try_to_timestamp`)
75+
76+
`Cast` between strings and date / timestamp also reads the policy via the default
77+
formatters but is tested separately by `CometCastSuite` and is out of scope here.
78+
79+
**Comet status.** None of the listed expressions consult `legacyTimeParserPolicy` in
80+
their Comet serde. The native implementations of `date_format`, `from_unixtime`, and
81+
`unix_timestamp` use a fixed strftime-style mapping that does not vary with policy;
82+
the remaining four (`to_unix_timestamp`, `to_timestamp`, `to_date`,
83+
`try_to_timestamp`) have no native implementation and fall back to Spark. Today this
84+
works because:
85+
86+
- `date_format` is `Compatible` only for a small whitelist of formats under UTC; the
87+
whitelisted formats happen to produce identical output under all three policies.
88+
- `from_unixtime` is marked `Incompatible` and falls back unless
89+
`spark.comet.expression.FromUnixTime.allowIncompatible=true` is set.
90+
- `unix_timestamp(<timestamp_or_date>)` does not call the formatter at all; the
91+
string-input overload falls back.
92+
93+
If a Comet contributor adds native string-format parsing or extends the date_format
94+
whitelist, this audit should be revisited and the policy must be honored explicitly.
95+
96+
**Test coverage.** `spark/src/test/resources/sql-tests/expressions/datetime/`:
97+
98+
- One ConfigMatrix file per expression covering convergent inputs under
99+
`LEGACY,CORRECTED,EXCEPTION` (`*_time_parser_policy.sql`).
100+
- Per-policy files locking in divergent behavior:
101+
- `_legacy.sql` -- lenient inputs (single-digit fields, out-of-range values,
102+
trailing characters) and legacy-only pattern tokens (`'aaaa'`).
103+
- `_corrected.sql` -- the same inputs return null; legacy-only tokens raise
104+
`INCONSISTENT_BEHAVIOR_CROSS_VERSION.DATETIME_PATTERN_RECOGNITION` at formatter
105+
creation.
106+
- `_exception.sql` -- the same inputs raise
107+
`INCONSISTENT_BEHAVIOR_CROSS_VERSION.PARSE_DATETIME_BY_NEW_PARSER` at parse time.
108+
109+
**Findings.** All 42 generated test cases pass on Spark 3.4.3, 3.5.8, and 4.0.1. No
110+
Comet bugs were uncovered by the audit. The tests use `query spark_answer_only` so
111+
that result-correctness is enforced regardless of whether Comet runs the expression
112+
natively or falls back.
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
-- Licensed to the Apache Software Foundation (ASF) under one
2+
-- or more contributor license agreements. See the NOTICE file
3+
-- distributed with this work for additional information
4+
-- regarding copyright ownership. The ASF licenses this file
5+
-- to you under the Apache License, Version 2.0 (the
6+
-- "License"); you may not use this file except in compliance
7+
-- with the License. You may obtain a copy of the License at
8+
--
9+
-- http://www.apache.org/licenses/LICENSE-2.0
10+
--
11+
-- Unless required by applicable law or agreed to in writing,
12+
-- software distributed under the License is distributed on an
13+
-- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
-- KIND, either express or implied. See the License for the
15+
-- specific language governing permissions and limitations
16+
-- under the License.
17+
18+
-- Convergent date_format() behavior across all three timeParserPolicy values.
19+
-- Patterns here produce identical output under LEGACY, CORRECTED, and EXCEPTION.
20+
-- ConfigMatrix: spark.sql.legacy.timeParserPolicy=LEGACY,CORRECTED,EXCEPTION
21+
-- Config: spark.sql.session.timeZone=UTC
22+
23+
statement
24+
CREATE TABLE test_date_format_policy(ts timestamp) USING parquet
25+
26+
statement
27+
INSERT INTO test_date_format_policy VALUES (timestamp('2024-06-15 10:30:45')), (timestamp('1970-01-01 00:00:00')), (NULL)
28+
29+
query spark_answer_only
30+
SELECT date_format(ts, 'yyyy-MM-dd') FROM test_date_format_policy
31+
32+
query spark_answer_only
33+
SELECT date_format(ts, 'yyyy-MM-dd HH:mm:ss') FROM test_date_format_policy
34+
35+
query spark_answer_only
36+
SELECT date_format(ts, 'HH:mm:ss') FROM test_date_format_policy
37+
38+
query spark_answer_only
39+
SELECT date_format(ts, 'yyyyMMdd') FROM test_date_format_policy
40+
41+
query spark_answer_only
42+
SELECT date_format(ts, 'yyyyMM') FROM test_date_format_policy
43+
44+
-- literal arguments
45+
query spark_answer_only
46+
SELECT date_format(timestamp('2024-06-15 10:30:45'), 'yyyy-MM-dd')
47+
48+
query spark_answer_only
49+
SELECT date_format(NULL, 'yyyy-MM-dd')
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
-- Licensed to the Apache Software Foundation (ASF) under one
2+
-- or more contributor license agreements. See the NOTICE file
3+
-- distributed with this work for additional information
4+
-- regarding copyright ownership. The ASF licenses this file
5+
-- to you under the Apache License, Version 2.0 (the
6+
-- "License"); you may not use this file except in compliance
7+
-- with the License. You may obtain a copy of the License at
8+
--
9+
-- http://www.apache.org/licenses/LICENSE-2.0
10+
--
11+
-- Unless required by applicable law or agreed to in writing,
12+
-- software distributed under the License is distributed on an
13+
-- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
-- KIND, either express or implied. See the License for the
15+
-- specific language governing permissions and limitations
16+
-- under the License.
17+
18+
-- date_format() under CORRECTED timeParserPolicy.
19+
-- Patterns recognized only by the legacy formatter raise SparkUpgradeException at
20+
-- formatter creation, even under CORRECTED, because validatePatternString is called
21+
-- with checkLegacy=true.
22+
-- Config: spark.sql.legacy.timeParserPolicy=CORRECTED
23+
-- Config: spark.sql.session.timeZone=UTC
24+
25+
statement
26+
CREATE TABLE test_date_format_corrected(ts timestamp) USING parquet
27+
28+
statement
29+
INSERT INTO test_date_format_corrected VALUES (timestamp('2024-06-15 10:30:45'))
30+
31+
-- 4-char am/pm marker: legacy accepts, new rejects, validation throws SparkUpgradeException.
32+
query expect_error(INCONSISTENT_BEHAVIOR_CROSS_VERSION)
33+
SELECT date_format(ts, 'yyyy-MM-dd aaaa') FROM test_date_format_corrected
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
-- Licensed to the Apache Software Foundation (ASF) under one
2+
-- or more contributor license agreements. See the NOTICE file
3+
-- distributed with this work for additional information
4+
-- regarding copyright ownership. The ASF licenses this file
5+
-- to you under the Apache License, Version 2.0 (the
6+
-- "License"); you may not use this file except in compliance
7+
-- with the License. You may obtain a copy of the License at
8+
--
9+
-- http://www.apache.org/licenses/LICENSE-2.0
10+
--
11+
-- Unless required by applicable law or agreed to in writing,
12+
-- software distributed under the License is distributed on an
13+
-- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
-- KIND, either express or implied. See the License for the
15+
-- specific language governing permissions and limitations
16+
-- under the License.
17+
18+
-- date_format() under EXCEPTION timeParserPolicy (the default).
19+
-- Patterns rejected by the new formatter but accepted by legacy raise
20+
-- SparkUpgradeException at formatter creation.
21+
-- Config: spark.sql.legacy.timeParserPolicy=EXCEPTION
22+
-- Config: spark.sql.session.timeZone=UTC
23+
24+
statement
25+
CREATE TABLE test_date_format_exception(ts timestamp) USING parquet
26+
27+
statement
28+
INSERT INTO test_date_format_exception VALUES (timestamp('2024-06-15 10:30:45'))
29+
30+
query expect_error(INCONSISTENT_BEHAVIOR_CROSS_VERSION)
31+
SELECT date_format(ts, 'yyyy-MM-dd aaaa') FROM test_date_format_exception
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
-- Licensed to the Apache Software Foundation (ASF) under one
2+
-- or more contributor license agreements. See the NOTICE file
3+
-- distributed with this work for additional information
4+
-- regarding copyright ownership. The ASF licenses this file
5+
-- to you under the Apache License, Version 2.0 (the
6+
-- "License"); you may not use this file except in compliance
7+
-- with the License. You may obtain a copy of the License at
8+
--
9+
-- http://www.apache.org/licenses/LICENSE-2.0
10+
--
11+
-- Unless required by applicable law or agreed to in writing,
12+
-- software distributed under the License is distributed on an
13+
-- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
-- KIND, either express or implied. See the License for the
15+
-- specific language governing permissions and limitations
16+
-- under the License.
17+
18+
-- date_format() under LEGACY timeParserPolicy.
19+
-- Legacy SimpleDateFormat accepts patterns that the new java.time formatter rejects.
20+
-- Config: spark.sql.legacy.timeParserPolicy=LEGACY
21+
-- Config: spark.sql.session.timeZone=UTC
22+
23+
statement
24+
CREATE TABLE test_date_format_legacy(ts timestamp) USING parquet
25+
26+
statement
27+
INSERT INTO test_date_format_legacy VALUES (timestamp('2024-06-15 10:30:45')), (timestamp('1970-01-01 00:00:00')), (NULL)
28+
29+
-- Legacy-only token: 4-char am/pm marker is invalid in the new formatter.
30+
query spark_answer_only
31+
SELECT date_format(ts, 'yyyy-MM-dd aaaa') FROM test_date_format_legacy
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
-- Licensed to the Apache Software Foundation (ASF) under one
2+
-- or more contributor license agreements. See the NOTICE file
3+
-- distributed with this work for additional information
4+
-- regarding copyright ownership. The ASF licenses this file
5+
-- to you under the Apache License, Version 2.0 (the
6+
-- "License"); you may not use this file except in compliance
7+
-- with the License. You may obtain a copy of the License at
8+
--
9+
-- http://www.apache.org/licenses/LICENSE-2.0
10+
--
11+
-- Unless required by applicable law or agreed to in writing,
12+
-- software distributed under the License is distributed on an
13+
-- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
-- KIND, either express or implied. See the License for the
15+
-- specific language governing permissions and limitations
16+
-- under the License.
17+
18+
-- Convergent from_unixtime() behavior across all three timeParserPolicy values.
19+
-- Patterns here produce identical output under LEGACY, CORRECTED, and EXCEPTION.
20+
-- ConfigMatrix: spark.sql.legacy.timeParserPolicy=LEGACY,CORRECTED,EXCEPTION
21+
-- Config: spark.sql.session.timeZone=UTC
22+
23+
statement
24+
CREATE TABLE test_from_unix_time_policy(t long) USING parquet
25+
26+
statement
27+
INSERT INTO test_from_unix_time_policy VALUES (0), (1718451045), (-1), (NULL), (2147483647)
28+
29+
query spark_answer_only
30+
SELECT from_unixtime(t) FROM test_from_unix_time_policy
31+
32+
query spark_answer_only
33+
SELECT from_unixtime(t, 'yyyy-MM-dd') FROM test_from_unix_time_policy
34+
35+
query spark_answer_only
36+
SELECT from_unixtime(t, 'yyyy-MM-dd HH:mm:ss') FROM test_from_unix_time_policy
37+
38+
query spark_answer_only
39+
SELECT from_unixtime(t, 'HH:mm:ss') FROM test_from_unix_time_policy
40+
41+
-- literal arguments
42+
query spark_answer_only
43+
SELECT from_unixtime(0)
44+
45+
query spark_answer_only
46+
SELECT from_unixtime(1718451045, 'yyyy-MM-dd')
47+
48+
query spark_answer_only
49+
SELECT from_unixtime(NULL, 'yyyy-MM-dd')
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
-- Licensed to the Apache Software Foundation (ASF) under one
2+
-- or more contributor license agreements. See the NOTICE file
3+
-- distributed with this work for additional information
4+
-- regarding copyright ownership. The ASF licenses this file
5+
-- to you under the Apache License, Version 2.0 (the
6+
-- "License"); you may not use this file except in compliance
7+
-- with the License. You may obtain a copy of the License at
8+
--
9+
-- http://www.apache.org/licenses/LICENSE-2.0
10+
--
11+
-- Unless required by applicable law or agreed to in writing,
12+
-- software distributed under the License is distributed on an
13+
-- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
-- KIND, either express or implied. See the License for the
15+
-- specific language governing permissions and limitations
16+
-- under the License.
17+
18+
-- from_unixtime() under CORRECTED timeParserPolicy.
19+
-- Patterns recognized only by the legacy formatter raise SparkUpgradeException
20+
-- at formatter creation, even under CORRECTED.
21+
-- Config: spark.sql.legacy.timeParserPolicy=CORRECTED
22+
-- Config: spark.sql.session.timeZone=UTC
23+
24+
statement
25+
CREATE TABLE test_from_unix_time_corrected(t long) USING parquet
26+
27+
statement
28+
INSERT INTO test_from_unix_time_corrected VALUES (1718451045)
29+
30+
query expect_error(INCONSISTENT_BEHAVIOR_CROSS_VERSION)
31+
SELECT from_unixtime(t, 'yyyy-MM-dd aaaa') FROM test_from_unix_time_corrected
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
-- Licensed to the Apache Software Foundation (ASF) under one
2+
-- or more contributor license agreements. See the NOTICE file
3+
-- distributed with this work for additional information
4+
-- regarding copyright ownership. The ASF licenses this file
5+
-- to you under the Apache License, Version 2.0 (the
6+
-- "License"); you may not use this file except in compliance
7+
-- with the License. You may obtain a copy of the License at
8+
--
9+
-- http://www.apache.org/licenses/LICENSE-2.0
10+
--
11+
-- Unless required by applicable law or agreed to in writing,
12+
-- software distributed under the License is distributed on an
13+
-- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
-- KIND, either express or implied. See the License for the
15+
-- specific language governing permissions and limitations
16+
-- under the License.
17+
18+
-- from_unixtime() under EXCEPTION timeParserPolicy (the default).
19+
-- Patterns rejected by the new formatter but accepted by legacy raise
20+
-- SparkUpgradeException at formatter creation.
21+
-- Config: spark.sql.legacy.timeParserPolicy=EXCEPTION
22+
-- Config: spark.sql.session.timeZone=UTC
23+
24+
statement
25+
CREATE TABLE test_from_unix_time_exception(t long) USING parquet
26+
27+
statement
28+
INSERT INTO test_from_unix_time_exception VALUES (1718451045)
29+
30+
query expect_error(INCONSISTENT_BEHAVIOR_CROSS_VERSION)
31+
SELECT from_unixtime(t, 'yyyy-MM-dd aaaa') FROM test_from_unix_time_exception

0 commit comments

Comments
 (0)