This page is the complete reference for how Apache Comet handles each Spark built-in
expression. Comet accelerates expressions either with a native (Rust) implementation or by
dispatching to a Spark-compatible codegen path. When an expression is not supported, Comet
transparently falls back to Spark for that part of the plan; results are unaffected.
Expressions marked ✅ Supported are enabled by default and produce Spark-compatible results.
Some ✅ Supported expressions have specific incompatible cases that fall back to Spark by
default. Those cases must be opted into per expression with
spark.comet.expression.EXPRNAME.allowIncompatible=true (where EXPRNAME is the Spark
expression class name, for example Cast). There is no global opt-in.
Most expressions can also be disabled with spark.comet.expression.EXPRNAME.enabled=false, where
EXPRNAME is the Spark expression class name (for example Length or StartsWith). See the
Comet Configuration Guide for the full list.
| Status |
Meaning |
| ✅ Supported |
Comet produces Spark-compatible results by default. Some inputs or forms may fall back to Spark, and any incompatible behavior is opt-in (off by default). |
| 🔜 Planned |
Intended; tracked by an open issue or pull request. |
Comet focuses acceleration on mainstream relational, string, datetime, math, and collection
expressions. The following function families are not currently planned for native acceleration (they are not on the 1.0 roadmap): specialized functionality with narrow real-world analytics use and high implementation cost. They fall back to Spark and may be reconsidered based on demand:
- Probabilistic sketches and approximate top-k (
kll_sketch_*, hll_*, theta_*, count_min_sketch, bitmap_*, approx_top_k*): specialized data structures with exact-correctness traps.
- Geospatial (
st_*): brand-new Spark 4.1 functionality, specialized.
- Avro / Protobuf codecs (
from_avro, to_avro, from_protobuf, to_protobuf, schema_of_avro): format conversion belongs at the IO layer, not expression evaluation.
- JVM reflection (
java_method, reflect): niche, and they invoke arbitrary JVM methods (a security concern).
- UTF-8 validation (
is_valid_utf8, make_valid_utf8, validate_utf8, try_validate_utf8): niche Spark 4.x string-validation helpers.
- Miscellaneous niche (
histogram_numeric, version, sentences, quote): low-value or specialized functions with little benefit from native acceleration.
The file-metadata functions input_file_name, input_file_block_start, and input_file_block_length depend on scan-internal per-row file information rather than the expression layer; their support status is covered in the scan compatibility guide.
Note that approx_count_distinct, median, and mode are planned: they are mainstream (median and mode are exact aggregates). approx_percentile / percentile_approx are not currently planned because their approximate results cannot be made bit-identical to Spark.
The tables below list every Spark built-in expression with its current status.
| Function |
Status |
Notes |
any |
✅ |
|
any_value |
✅ |
|
approx_count_distinct |
🔜 |
tracking #4098 |
array_agg |
🔜 |
Array aggregate (related to collect_list, #2524) |
avg |
✅ |
Interval types fall back |
bit_and |
✅ |
|
bit_or |
✅ |
|
bit_xor |
✅ |
|
bool_and |
✅ |
|
bool_or |
✅ |
|
collect_list |
🔜 |
#2524 |
collect_set |
✅ |
|
corr |
✅ |
|
count |
✅ |
|
count_if |
✅ |
|
covar_pop |
✅ |
|
covar_samp |
✅ |
|
every |
✅ |
|
first |
✅ |
|
first_value |
✅ |
|
grouping |
🔜 |
Grouping indicator for ROLLUP/CUBE/GROUPING SETS |
grouping_id |
🔜 |
Grouping indicator for ROLLUP/CUBE/GROUPING SETS |
kurtosis |
🔜 |
tracking #4098 |
last |
✅ |
|
last_value |
✅ |
|
listagg |
🔜 |
String aggregation |
max |
✅ |
|
max_by |
🔜 |
#3841 |
mean |
✅ |
|
median |
🔜 |
tracking #4098 |
min |
✅ |
|
min_by |
🔜 |
#3841 |
mode |
🔜 |
#3970 |
percentile |
🔜 |
#4542 |
percentile_cont |
🔜 |
Percentile aggregate |
percentile_disc |
🔜 |
Percentile aggregate |
regr_avgx |
✅ |
Native: Spark rewrites to Average (tests in #4551) |
regr_avgy |
✅ |
Native: Spark rewrites to Average (tests in #4551) |
regr_count |
✅ |
Native: Spark rewrites to Count (tests in #4551) |
regr_intercept |
🔜 |
Falls back; can reuse covar_pop/var_pop accumulators (#4552) |
regr_r2 |
🔜 |
Falls back; can reuse the corr accumulator (#4552) |
regr_slope |
🔜 |
Falls back; can reuse covar_pop/var_pop accumulators (#4552) |
regr_sxx |
🔜 |
Falls back; can reuse var_pop accumulator (#4552) |
regr_sxy |
🔜 |
Falls back; can reuse covar_pop accumulator (#4552) |
regr_syy |
🔜 |
Falls back; can reuse var_pop accumulator (#4552) |
skewness |
🔜 |
tracking #4098 |
some |
✅ |
|
std |
✅ |
|
stddev |
✅ |
|
stddev_pop |
✅ |
|
stddev_samp |
✅ |
|
string_agg |
🔜 |
String aggregation (alias of listagg) |
sum |
✅ |
|
try_avg |
🔜 |
tracking #4098 |
try_sum |
🔜 |
tracking #4098 |
var_pop |
✅ |
|
var_samp |
✅ |
|
variance |
✅ |
|
| Function |
Status |
Notes |
array |
✅ |
|
array_append |
✅ |
|
array_compact |
✅ |
|
array_contains |
✅ |
NaN/signed-zero handling may differ (details) |
array_distinct |
✅ |
NaN/signed-zero handling may differ (details) |
array_except |
✅ |
Incompatible; falls back by default (details) |
array_insert |
✅ |
|
array_intersect |
✅ |
Incompatible; falls back by default (details) |
array_join |
✅ |
Incompatible; falls back by default (details) |
array_max |
✅ |
NaN ordering may differ (details) |
array_min |
✅ |
NaN ordering may differ (details) |
array_position |
✅ |
Binary/struct/map/null elements fall back |
array_prepend |
🔜 |
Sibling of array_append |
array_remove |
✅ |
|
array_repeat |
✅ |
|
array_union |
✅ |
NaN/signed-zero handling may differ (details) |
arrays_overlap |
✅ |
|
arrays_zip |
✅ |
|
element_at |
✅ |
MapType input falls back |
flatten |
✅ |
Binary/struct/map elements fall back |
get |
✅ |
|
sequence |
✅ |
|
shuffle |
🔜 |
Random array shuffle |
slice |
✅ |
Native (#4149) |
sort_array |
✅ |
Nested struct/null arrays fall back |
| Function |
Status |
Notes |
& |
✅ |
|
<< |
✅ |
|
>> |
✅ |
|
>>> |
✅ |
Operator alias for shiftrightunsigned (Spark 4.0+) |
^ |
✅ |
|
bit_count |
✅ |
|
bit_get |
✅ |
|
getbit |
✅ |
|
shiftright |
✅ |
|
shiftrightunsigned |
✅ |
|
| |
✅ |
|
~ |
✅ |
|
| Function |
Status |
Notes |
array_size |
✅ |
|
cardinality |
✅ |
MapType input falls back |
concat |
✅ |
Binary/array children fall back |
reverse |
✅ |
Binary-element arrays fall back (Incompatible) (details) |
size |
✅ |
MapType input falls back |
| Function |
Status |
Notes |
coalesce |
✅ |
|
if |
✅ |
|
ifnull |
✅ |
|
nanvl |
✅ |
|
nullif |
✅ |
|
nullifzero |
✅ |
Lowers to if/= (Spark 4.0+) |
nvl |
✅ |
|
nvl2 |
✅ |
|
when |
✅ |
|
zeroifnull |
✅ |
Lowers to coalesce (Spark 4.0+) |
The type-name conversion functions (bigint, binary, boolean, date, decimal, double, float, int, smallint, string, timestamp, tinyint) are SQL aliases for CAST(... AS <type>) and share the support and caveats of cast.
| Function |
Status |
Notes |
cast |
✅ |
Some casts fall back; float-to-decimal is opt-in (details) |
| Function |
Status |
Notes |
from_csv |
✅ |
|
schema_of_csv |
✅ |
|
to_csv |
✅ |
|
| Function |
Status |
Notes |
add_months |
✅ |
|
convert_timezone |
✅ |
|
curdate |
✅ |
Constant-folded to a literal (alias of current_date) |
current_date |
✅ |
Constant-folded to a literal before Comet sees the plan |
current_time |
🔜 |
Blocked on Spark 4.1 TIME type support (#4288) |
current_timestamp |
✅ |
Constant-folded to a literal before Comet sees the plan |
current_timezone |
✅ |
|
date_add |
✅ |
|
date_diff |
✅ |
|
date_format |
✅ |
|
date_from_unix_date |
✅ |
|
date_part |
✅ |
|
date_sub |
✅ |
|
date_trunc |
✅ |
|
dateadd |
✅ |
|
datediff |
✅ |
|
datepart |
✅ |
|
day |
✅ |
|
dayname |
✅ |
Abbreviated day name (Spark 4.0+) |
dayofmonth |
✅ |
|
dayofweek |
✅ |
|
dayofyear |
✅ |
|
extract |
✅ |
|
from_unixtime |
✅ |
|
from_utc_timestamp |
✅ |
Legacy zone forms fall back (Incompatible) (details) |
hour |
✅ |
|
last_day |
✅ |
|
localtimestamp |
✅ |
|
make_date |
✅ |
|
make_dt_interval |
🔜 |
#4541 |
make_interval |
🔜 |
Produces legacy CalendarInterval; tracked by #4540 |
make_time |
🔜 |
Spark 4.1 TIME type; tracked by #4288 |
make_timestamp |
✅ |
|
make_timestamp_ltz |
✅ |
2-arg TIME form falls back |
make_timestamp_ntz |
✅ |
2-arg TIME form falls back |
make_ym_interval |
🔜 |
#4541 |
minute |
✅ |
|
month |
✅ |
|
monthname |
✅ |
Abbreviated month name (Spark 4.0+) |
months_between |
✅ |
|
next_day |
✅ |
|
now |
✅ |
Constant-folded to a literal (alias of current_timestamp) |
quarter |
✅ |
|
second |
✅ |
|
session_window |
🔜 |
Time-window grouping; tracked by #4553 |
time_diff |
🔜 |
Spark 4.1 TIME type; tracked by #4288 |
time_trunc |
🔜 |
Spark 4.1 TIME type; tracked by #4288 |
timestamp_micros |
✅ |
|
timestamp_millis |
✅ |
|
timestamp_seconds |
✅ |
|
to_date |
✅ |
Rewrites to Cast (or Cast(GetTimestamp) with a format) before Comet sees the plan |
to_time |
🔜 |
Spark 4.1 TIME type; tracked by #4288 |
to_timestamp |
✅ |
Rewrites to Cast (or GetTimestamp with a format) before Comet sees the plan |
to_timestamp_ltz |
✅ |
Rewrites to to_timestamp (TimestampType) |
to_timestamp_ntz |
✅ |
Rewrites to to_timestamp (TimestampNTZType) |
to_unix_timestamp |
✅ |
|
to_utc_timestamp |
✅ |
Legacy zone forms fall back (Incompatible) (details) |
trunc |
✅ |
|
try_make_interval |
🔜 |
Produces legacy CalendarInterval; tracked by #4540 |
try_make_timestamp |
✅ |
|
try_to_date |
🔜 |
Rewrites to Cast/GetTimestamp but currently falls back; tracked by #4556 |
try_to_time |
🔜 |
Spark 4.1 TIME type; tracked by #4288 |
try_to_timestamp |
🔜 |
Rewrites to Cast/GetTimestamp but currently falls back; tracked by #4556 |
unix_date |
✅ |
|
unix_micros |
✅ |
|
unix_millis |
✅ |
|
unix_seconds |
✅ |
|
unix_timestamp |
✅ |
|
weekday |
✅ |
|
weekofyear |
✅ |
|
window |
🔜 |
Time-window grouping; tracked by #4553 |
window_time |
🔜 |
Time-window grouping; tracked by #4553 |
year |
✅ |
|
explode and posexplode are supported via CometExplodeExec (operator-level, not
expression-level). The outer variants are wired but marked Incompatible; they require
spark.comet.exec.explode.enabled=true and allowIncompatible.
| Function |
Status |
Notes |
explode |
✅ |
via CometExplodeExec |
explode_outer |
✅ |
outer=true falls back (Incompatible) (audit) |
inline |
🔜 |
Operator-level generator (like explode) |
inline_outer |
🔜 |
Operator-level generator (like explode) |
posexplode |
✅ |
via CometExplodeExec |
posexplode_outer |
✅ |
outer=true falls back (Incompatible) (audit) |
stack |
🔜 |
Operator-level generator |
| Function |
Status |
Notes |
crc32 |
✅ |
|
hash |
✅ |
|
md5 |
✅ |
|
sha |
✅ |
|
sha1 |
✅ |
|
sha2 |
✅ |
|
xxhash64 |
✅ |
|
| Function |
Status |
Notes |
from_json |
✅ |
Falls back by default; opt-in via allowIncompatible (audit) |
get_json_object |
✅ |
Some inputs need allowIncompatible (audit) |
json_array_length |
✅ |
Single-quoted/trailing JSON needs allowIncompatible (audit) |
json_object_keys |
✅ |
|
json_tuple |
🔜 |
#3160 |
schema_of_json |
✅ |
|
to_json |
✅ |
Options and map/array inputs fall back (audit) |
| Function |
Status |
Notes |
aggregate |
✅ |
|
array_sort |
✅ |
|
exists |
✅ |
|
filter |
🔜 |
General lambda not yet wired; the array_compact form is supported (#4224) |
forall |
✅ |
|
map_filter |
✅ |
|
map_zip_with |
✅ |
|
reduce |
✅ |
|
transform |
✅ |
|
transform_keys |
✅ |
|
transform_values |
✅ |
|
zip_with |
✅ |
|
| Function |
Status |
Notes |
element_at |
✅ |
MapType input falls back |
map |
🔜 |
Constructs a map |
map_concat |
✅ |
|
map_contains_key |
✅ |
|
map_entries |
✅ |
|
map_from_arrays |
✅ |
|
map_from_entries |
✅ |
BinaryType key/value falls back (Incompatible) (details) |
map_keys |
✅ |
|
map_values |
✅ |
|
str_to_map |
✅ |
|
try_element_at |
✅ |
Lowers to element_at; array input (MapType falls back) |
| Function |
Status |
Notes |
% |
✅ |
|
* |
✅ |
Interval multiplication falls back |
+ |
✅ |
|
- |
✅ |
|
/ |
✅ |
|
abs |
✅ |
Interval types fall back |
acos |
✅ |
|
acosh |
✅ |
|
asin |
✅ |
|
asinh |
✅ |
|
atan |
✅ |
|
atan2 |
✅ |
|
atanh |
✅ |
|
bin |
✅ |
|
bround |
✅ |
|
cbrt |
✅ |
|
ceil |
✅ |
Two-arg form falls back |
ceiling |
✅ |
|
conv |
✅ |
|
cos |
✅ |
|
cosh |
✅ |
|
cot |
✅ |
|
csc |
✅ |
|
degrees |
✅ |
|
div |
✅ |
|
e |
✅ |
Folds to a literal (like pi) |
exp |
✅ |
|
expm1 |
✅ |
|
factorial |
✅ |
|
floor |
✅ |
Two-arg form falls back |
greatest |
✅ |
|
hex |
✅ |
|
hypot |
✅ |
|
least |
✅ |
|
ln |
✅ |
|
log |
✅ |
|
log10 |
✅ |
|
log1p |
✅ |
|
log2 |
✅ |
|
mod |
✅ |
|
negative |
✅ |
|
pi |
✅ |
|
pmod |
✅ |
|
positive |
✅ |
|
pow |
✅ |
|
power |
✅ |
|
radians |
✅ |
|
rand |
✅ |
|
randn |
✅ |
|
random |
✅ |
Alias for rand (Spark 4.0+); seed must be a literal |
randstr |
🔜 |
Random string (Spark 4.0+) |
rint |
✅ |
|
round |
✅ |
Float/double inputs fall back |
sec |
✅ |
|
shiftleft |
✅ |
|
sign |
✅ |
|
signum |
✅ |
|
sin |
✅ |
|
sinh |
✅ |
|
sqrt |
✅ |
|
tan |
✅ |
|
tanh |
✅ |
|
try_add |
✅ |
Datetime/interval form falls back |
try_divide |
✅ |
|
try_mod |
✅ |
|
try_multiply |
✅ |
|
try_subtract |
✅ |
|
unhex |
✅ |
|
uniform |
✅ |
Constant-folded; literal arguments only (Spark 4.0+) |
width_bucket |
✅ |
|
| Function |
Status |
Notes |
aes_decrypt |
✅ |
Routed through the JVM codegen dispatcher |
aes_encrypt |
✅ |
Routed through the JVM codegen dispatcher; nondeterministic IV by default |
assert_true |
🔜 |
Lowers to RaiseError, which falls back |
current_catalog |
✅ |
Resolved to a literal by the analyzer (ReplaceCurrentLike) |
current_database |
✅ |
Resolved to a literal by the analyzer (ReplaceCurrentLike) |
current_schema |
✅ |
Alias of current_database; resolved to a literal by the analyzer |
current_user |
✅ |
Resolved to a literal by the analyzer; same as user |
equal_null |
✅ |
Lowers to <=> (EqualNullSafe) |
is_variant_null |
🔜 |
tracking #4098 |
monotonically_increasing_id |
✅ |
|
parse_json |
🔜 |
tracking #4098 |
raise_error |
🔜 |
Raises a runtime error |
rand |
✅ |
Seed must be a literal |
randn |
✅ |
Seed must be a literal |
schema_of_variant |
🔜 |
tracking #4098 |
schema_of_variant_agg |
🔜 |
tracking #4098 |
session_user |
✅ |
Alias of current_user; resolved to a literal by the analyzer |
spark_partition_id |
✅ |
|
to_variant_object |
🔜 |
tracking #4098 |
try_aes_decrypt |
✅ |
Routed through the JVM codegen dispatcher |
try_parse_json |
🔜 |
tracking #4098 |
try_variant_get |
🔜 |
tracking #4098 |
typeof |
✅ |
Foldable; resolved to a literal before Comet sees the plan |
user |
✅ |
Resolved to a literal by the Spark analyzer before reaching Comet |
uuid |
🔜 |
Nondeterministic random UUID |
variant_get |
🔜 |
tracking #4098 |
| Function |
Status |
Notes |
! |
✅ |
|
< |
✅ |
|
<= |
✅ |
|
<=> |
✅ |
|
= |
✅ |
|
== |
✅ |
|
> |
✅ |
|
>= |
✅ |
|
and |
✅ |
|
between |
✅ |
|
ilike |
✅ |
|
in |
✅ |
|
isnan |
✅ |
|
isnotnull |
✅ |
|
isnull |
✅ |
|
like |
✅ |
|
not |
✅ |
|
or |
✅ |
|
regexp |
✅ |
Falls back by default; opt-in via allowIncompatible (details) |
regexp_like |
✅ |
Falls back by default; opt-in via allowIncompatible (details) |
rlike |
✅ |
Falls back by default; opt-in via allowIncompatible (details) |
| Function |
Status |
Notes |
ascii |
✅ |
|
base64 |
🔜 |
Lowers to StaticInvoke(encode) (not allowlisted); falls back |
bit_length |
✅ |
|
btrim |
✅ |
|
char |
✅ |
|
char_length |
✅ |
|
character_length |
✅ |
|
chr |
✅ |
|
collate |
🔜 |
Spark collation (umbrella #2190) |
collation |
✅ |
Constant-folded to a literal (Spark 4.0+) |
concat_ws |
✅ |
|
contains |
✅ |
|
decode |
✅ |
|
elt |
✅ |
|
encode |
🔜 |
Lowers to StaticInvoke(encode) (not allowlisted); falls back |
endswith |
✅ |
|
find_in_set |
✅ |
|
format_number |
✅ |
|
format_string |
✅ |
|
initcap |
✅ |
|
instr |
✅ |
|
lcase |
✅ |
|
left |
✅ |
|
len |
✅ |
|
length |
✅ |
|
levenshtein |
✅ |
|
locate |
✅ |
|
lower |
✅ |
|
lpad |
✅ |
|
ltrim |
✅ |
|
luhn_check |
✅ |
Native via StaticInvoke (tests: luhn_check.sql) |
mask |
🔜 |
Data masking |
octet_length |
✅ |
|
overlay |
✅ |
|
position |
✅ |
|
printf |
✅ |
|
regexp_count |
🔜 |
tracking #4098 |
regexp_extract |
✅ |
Routed through the JVM codegen dispatcher |
regexp_extract_all |
✅ |
Routed through the JVM codegen dispatcher |
regexp_instr |
✅ |
Routed through the JVM codegen dispatcher |
regexp_replace |
✅ |
|
regexp_substr |
🔜 |
tracking #4098 |
repeat |
✅ |
|
replace |
✅ |
|
right |
✅ |
|
rpad |
✅ |
|
rtrim |
✅ |
|
soundex |
✅ |
|
space |
✅ |
|
split |
✅ |
|
split_part |
🔜 |
Lowers to element_at(StringSplitSQL(...)); StringSplitSQL falls back (#4561) |
startswith |
✅ |
|
substr |
✅ |
|
substring |
✅ |
|
substring_index |
✅ |
|
to_binary |
✅ |
Hex form accelerated; other formats fall back |
to_char |
✅ |
|
to_number |
✅ |
|
to_varchar |
✅ |
|
translate |
✅ |
Falls back by default; opt-in via allowIncompatible (#4463) |
trim |
✅ |
|
try_to_binary |
🔜 |
Lowers to TryEval(...), which falls back |
try_to_number |
🔜 |
TRY variant of to_number |
ucase |
✅ |
|
unbase64 |
✅ |
|
upper |
✅ |
|
| Function |
Status |
Notes |
named_struct |
✅ |
Duplicate field names fall back |
struct |
✅ |
|
| Function |
Status |
Notes |
parse_url |
✅ |
|
try_url_decode |
✅ |
|
url_decode |
✅ |
|
url_encode |
✅ |
|
Window functions run via CometWindowExec. Window support is disabled by default due to known
correctness issues (tracking #2721).
When enabled, lag and lead are explicitly wired; aggregate window functions (count, min,
max, sum) are also supported. Ranking functions (rank, dense_rank, row_number,
ntile, percent_rank, cume_dist, nth_value) are not yet wired in the window serde and
fall back to Spark.
| Function |
Status |
Notes |
cume_dist |
🔜 |
Window function; tracked by #2721 |
dense_rank |
🔜 |
Window function; tracked by #2721 |
lag |
✅ |
via CometWindowExec |
lead |
✅ |
via CometWindowExec |
nth_value |
🔜 |
Window function; tracked by #2721 |
ntile |
🔜 |
Window function; tracked by #2721 |
percent_rank |
🔜 |
Window function; tracked by #2721 |
rank |
🔜 |
Window function; tracked by #2721 |
row_number |
🔜 |
Window function; tracked by #2721 |
| Function |
Status |
Notes |
from_xml |
✅ |
Spark 4.0+ |
schema_of_xml |
✅ |
Spark 4.0+ |
to_xml |
✅ |
Spark 4.0+ |
xpath |
✅ |
|
xpath_boolean |
✅ |
|
xpath_double |
✅ |
|
xpath_float |
✅ |
|
xpath_int |
✅ |
|
xpath_long |
✅ |
|
xpath_number |
✅ |
Alias of xpath_double |
xpath_short |
✅ |
|
xpath_string |
✅ |
|
Comet also accelerates a number of Catalyst expressions that have no Spark SQL function name and therefore do not appear in the tables above. These arise from the DataFrame API, from SQL syntax other than function calls, or from the query optimizer. They include:
- Operator and optimizer-injected expressions: runtime bloom-filter join probes (
BloomFilterMightContain, BloomFilterAggregate), optimized IN sets (InSet), scalar subqueries (ScalarSubquery), and floating-point normalization (KnownFloatingPointNormalized).
- Accessor expressions (subscript and field access, not functions): struct field access (
col.field), array element access (arr[i]), and map value access (map[key]).
- Internal decimal arithmetic:
CheckOverflow, MakeDecimal, and UnscaledValue, which the analyzer inserts around decimal operations.
- User-defined functions: Scala UDFs registered through the DataFrame or SQL API.
- Structural expressions: aliases, attribute references, literals, sort orders, and
CASE WHEN.
This list is illustrative, not exhaustive: the per-function tables are not the complete set of expressions Comet can accelerate.