@@ -58,6 +58,21 @@ Expressions that are not 100% Spark-compatible will fall back to Spark by defaul
5858` spark.comet.expression.EXPRNAME.allowIncompatible=true ` , where ` EXPRNAME ` is the Spark expression class name. See
5959the [ Comet Supported Expressions Guide] ( expressions.md ) for more information on this configuration setting.
6060
61+ ## Date and Time Functions
62+
63+ Comet's native implementation of date and time functions may produce different results than Spark for dates
64+ far in the future (approximately beyond year 2100). This is because Comet uses the chrono-tz library for
65+ timezone calculations, which has limited support for Daylight Saving Time (DST) rules beyond the IANA
66+ time zone database's explicit transitions.
67+
68+ For dates within a reasonable range (approximately 1970-2100), Comet's date and time functions are compatible
69+ with Spark. For dates beyond this range, functions that involve timezone-aware calculations (such as
70+ ` date_trunc ` with timezone-aware timestamps) may produce results with incorrect DST offsets.
71+
72+ If you need to process dates far in the future with accurate timezone handling, consider:
73+ - Using timezone-naive types (` timestamp_ntz ` ) when timezone conversion is not required
74+ - Falling back to Spark for these specific operations
75+
6176## Regular Expressions
6277
6378Comet uses the Rust regexp crate for evaluating regular expressions, and this has different behavior from Java's
@@ -106,15 +121,14 @@ Cast operations in Comet fall into three levels of support:
106121<!-- prettier-ignore-end -->
107122
108123** Notes:**
109-
110124- ** decimal -> string** : There can be formatting differences in some case due to Spark using scientific notation where Comet does not
111125- ** double -> decimal** : There can be rounding differences
112126- ** double -> string** : There can be differences in precision. For example, the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
113127- ** float -> decimal** : There can be rounding differences
114128- ** float -> string** : There can be differences in precision. For example, the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
115129- ** string -> date** : Only supports years between 262143 BC and 262142 AD
116130- ** string -> decimal** : Does not support fullwidth unicode digits (e.g \\ uFF10)
117- or strings containing null bytes (e.g \\ u0000)
131+ or strings containing null bytes (e.g \\ u0000)
118132- ** string -> timestamp** : Not all valid formats are supported
119133<!-- END:CAST_LEGACY_TABLE-->
120134
@@ -142,15 +156,14 @@ Cast operations in Comet fall into three levels of support:
142156<!-- prettier-ignore-end -->
143157
144158** Notes:**
145-
146159- ** decimal -> string** : There can be formatting differences in some case due to Spark using scientific notation where Comet does not
147160- ** double -> decimal** : There can be rounding differences
148161- ** double -> string** : There can be differences in precision. For example, the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
149162- ** float -> decimal** : There can be rounding differences
150163- ** float -> string** : There can be differences in precision. For example, the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
151164- ** string -> date** : Only supports years between 262143 BC and 262142 AD
152165- ** string -> decimal** : Does not support fullwidth unicode digits (e.g \\ uFF10)
153- or strings containing null bytes (e.g \\ u0000)
166+ or strings containing null bytes (e.g \\ u0000)
154167- ** string -> timestamp** : Not all valid formats are supported
155168<!-- END:CAST_TRY_TABLE-->
156169
@@ -178,15 +191,14 @@ Cast operations in Comet fall into three levels of support:
178191<!-- prettier-ignore-end -->
179192
180193** Notes:**
181-
182194- ** decimal -> string** : There can be formatting differences in some case due to Spark using scientific notation where Comet does not
183195- ** double -> decimal** : There can be rounding differences
184196- ** double -> string** : There can be differences in precision. For example, the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
185197- ** float -> decimal** : There can be rounding differences
186198- ** float -> string** : There can be differences in precision. For example, the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
187199- ** string -> date** : Only supports years between 262143 BC and 262142 AD
188200- ** string -> decimal** : Does not support fullwidth unicode digits (e.g \\ uFF10)
189- or strings containing null bytes (e.g \\ u0000)
201+ or strings containing null bytes (e.g \\ u0000)
190202- ** string -> timestamp** : ANSI mode not supported
191203<!-- END:CAST_ANSI_TABLE-->
192204
0 commit comments