|
106 | 106 | - [ ] percentile_approx |
107 | 107 | - [ ] percentile_cont |
108 | 108 | - [ ] percentile_disc |
109 | | -- [ ] regr_avgx |
110 | | -- [ ] regr_avgy |
111 | | -- [ ] regr_count |
| 109 | +- [x] regr_avgx |
| 110 | +- [x] regr_avgy |
| 111 | +- [x] regr_count |
112 | 112 | - [ ] regr_intercept |
113 | 113 | - [ ] regr_r2 |
114 | 114 | - [ ] regr_slope |
|
243 | 243 | - Spark 4.1.1 (audited 2026-05-27): `inputTypes` tightened to `Seq(ArrayType, IntegralType)` (analysis-time only); runtime unchanged. |
244 | 244 | - [ ] sequence |
245 | 245 | - [ ] shuffle |
246 | | -- [ ] slice |
| 246 | +- [x] slice |
247 | 247 | - [x] sort_array |
248 | 248 | - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8. |
249 | 249 | - Spark 3.5.8 (audited 2026-05-27): baseline. `SortArray(base, ascendingOrder) extends BinaryExpression with ArraySortLike`; the second arg must be a `Literal(_: Boolean, BooleanType)`. Comet `CometSortArray` flags `Incompatible` under strict floating-point and falls back for nested arrays whose innermost element is `Struct` or `Null`. |
|
267 | 267 | - Spark 3.5.8 (audited 2026-05-27): identical to 3.4.3. |
268 | 268 | - Spark 4.0.1 (audited 2026-05-27): `>>` parses to `ShiftRight`. Comet `CometShiftRight` mirrors the same operand-cast logic as `CometShiftLeft`. |
269 | 269 | - Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1. |
270 | | -- [ ] `>>>` |
| 270 | +- [x] `>>>` |
271 | 271 | - [x] `^` |
272 | 272 | - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8. |
273 | 273 | - Spark 3.5.8 (audited 2026-05-27): baseline. `BitwiseXor(left, right) extends BinaryArithmetic` over `IntegralType`. Comet routes via `CometBitwiseXor` to the proto's `bitwise_xor` binary expression. |
|
311 | 311 |
|
312 | 312 | ### collection_funcs |
313 | 313 |
|
314 | | -- [ ] array_size |
315 | | -- [ ] cardinality |
| 314 | +- [x] array_size |
| 315 | + - Native via `size`; returns -1 instead of NULL for NULL input (https://github.com/apache/datafusion-comet/issues/4560). |
| 316 | +- [x] cardinality |
316 | 317 | - [x] concat |
317 | 318 | - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8. |
318 | 319 | - Spark 3.5.8 (audited 2026-05-27): baseline. `Concat(children) extends ComplexTypeMergingExpression with QueryErrorsBase`; `allowedTypes = Seq(StringType, BinaryType, ArrayType)`; result type is the merged child type. Empty children is allowed and returns the empty string of the result type. |
|
355 | 356 | - Spark 3.5.8 (audited 2026-05-27): `NullIf(left, right, replacement) extends RuntimeReplaceable with InheritAnalysisRules`; the analyzer rewrites to `If(EqualTo(left, right), Literal(null, left.dataType), left)`. Comet handles via `CometIf` plus `CometEqualTo`. |
356 | 357 | - Spark 4.0.1 (audited 2026-05-27): identical to 3.5.8. |
357 | 358 | - Spark 4.1.1 (audited 2026-05-27): identical to 3.5.8. |
358 | | -- [ ] nullifzero |
| 359 | +- [x] nullifzero |
359 | 360 | - [x] nvl |
360 | 361 | - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8. |
361 | 362 | - Spark 3.5.8 (audited 2026-05-27): `Nvl(left, right, replacement) extends RuntimeReplaceable`; analyzer rewrites to `Coalesce(Seq(left, right))`. Comet handles via `CometCoalesce`. |
|
371 | 372 | - Spark 3.5.8 (audited 2026-05-27): the `CASE WHEN ... THEN ...` SQL form lowers to `CaseWhen(branches: Seq[(Expression, Expression)], elseValue: Option[Expression])`. Spark evaluates left-to-right with short-circuit; result type is the merged branch type. Comet routes via `CometCaseWhen` to the native `CaseWhen` proto. |
372 | 373 | - Spark 4.0.1 (audited 2026-05-27): adds the `withNewAlwaysEvaluatedInputs` optimizer hook; semantics unchanged. |
373 | 374 | - Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1. |
374 | | -- [ ] zeroifnull |
| 375 | +- [x] zeroifnull |
375 | 376 |
|
376 | 377 | ### conversion_funcs |
377 | 378 |
|
|
486 | 487 | - Known divergence: Comet's native timezone parser does not accept Spark's legacy zone forms (`GMT+1`, `UTC+1`, three-letter abbreviations like `PST`). Such timezones throw a native parse error at execution. |
487 | 488 | - [x] trunc |
488 | 489 | - [ ] try_make_interval |
489 | | -- [ ] try_make_timestamp |
| 490 | +- [x] try_make_timestamp |
| 491 | + - Native for valid inputs; returns wrong values for invalid inputs instead of NULL (https://github.com/apache/datafusion-comet/issues/4554). |
490 | 492 | - [ ] try_to_date |
491 | 493 | - [ ] try_to_time |
492 | 494 | - [ ] try_to_timestamp |
|
505 | 507 |
|
506 | 508 | - [x] explode |
507 | 509 | - Handled at the operator level as a `GenerateExec` (`CometExplodeExec`), not via the expression serde maps, so it is not auto-detected by the function-registry checkbox logic. Compatible for array inputs; map inputs fall back ([#2837](https://github.com/apache/datafusion-comet/issues/2837)). |
508 | | -- [ ] explode_outer |
| 510 | +- [x] explode_outer |
509 | 511 | - Same `CometExplodeExec` path as `explode`, but the `outer=true` case is `Incompatible` (empty arrays are not preserved as null outputs) and falls back unless `spark.comet.expr.allowIncompatible=true` ([datafusion#19053](https://github.com/apache/datafusion/issues/19053)). |
510 | 512 | - [ ] inline |
511 | 513 | - [ ] inline_outer |
512 | 514 | - [x] posexplode |
513 | 515 | - Handled at the operator level as a `GenerateExec` (`CometExplodeExec`), like `explode`. Compatible for array inputs; map inputs fall back ([#2837](https://github.com/apache/datafusion-comet/issues/2837)). |
514 | | -- [ ] posexplode_outer |
| 516 | +- [x] posexplode_outer |
515 | 517 | - Same `CometExplodeExec` path as `posexplode`, but the `outer=true` case is `Incompatible` and falls back unless `spark.comet.expr.allowIncompatible=true` ([datafusion#19053](https://github.com/apache/datafusion/issues/19053)). |
516 | 518 | - [ ] stack |
517 | 519 |
|
|
556 | 558 |
|
557 | 559 | ### json_funcs |
558 | 560 |
|
559 | | -- [ ] from_json |
| 561 | +- [x] from_json |
| 562 | + - Partial native support, marked `Incompatible` (requires explicit schema). |
560 | 563 | - [x] get_json_object |
561 | 564 | - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8. |
562 | 565 | - Spark 3.5.8 (audited 2026-05-27): baseline. `BinaryExpression with ExpectsInputTypes with CodegenFallback`; `inputTypes = Seq(StringType, StringType) -> StringType`. Eval is inline and uses Jackson with `RawStyle` output. Foldable paths are parsed once. Returns NULL for invalid JSON, missing paths, or `JsonProcessingException`. |
|
567 | 570 | - [ ] json_object_keys |
568 | 571 | - [ ] json_tuple |
569 | 572 | - [ ] schema_of_json |
570 | | -- [ ] to_json |
| 573 | +- [x] to_json |
| 574 | + - Partial native support; options and map/array inputs fall back. |
571 | 575 |
|
572 | 576 | ### lambda_funcs |
573 | 577 |
|
|
629 | 633 | - Spark 3.5.8 (audited 2026-05-27): baseline. `StringToMap(text, pairDelim, keyValueDelim) extends TernaryExpression`; splits `text` on `pairDelim`, then each pair on `keyValueDelim` (default `","` and `":"`). Uses `ArrayBasedMapBuilder` for duplicate-key handling. Wired as `CometScalarFunction("str_to_map")`. |
630 | 634 | - Spark 4.0.1 (audited 2026-05-27): `inputTypes` widened to `StringTypeNonCSAICollation`; uses `CollationAwareUTF8String.splitSQL` with a `collationId`. Runtime unchanged for `UTF8_BINARY`. |
631 | 635 | - Spark 4.1.1 (audited 2026-05-27): adds the `legacySplitTruncate` flag (driven by `spark.sql.legacy.truncateForEmptyRegexSplit`) to both `splitSQL` calls (https://github.com/apache/datafusion-comet/issues/4477). The Comet native impl does not honour this flag; behaviour matches the non-legacy default. |
632 | | -- [ ] try_element_at |
| 636 | +- [x] try_element_at |
633 | 637 |
|
634 | 638 | ### math_funcs |
635 | 639 |
|
|
681 | 685 | - Spark 3.4.3, 3.5.8, 4.0.1, 4.1.1 (audited 2026-05-27): `UnaryMathExpression(math.toDegrees, "DEGREES")` unchanged across versions. |
682 | 686 | - [x] div |
683 | 687 | - Spark 3.4.3, 3.5.8, 4.0.1, 4.1.1 (audited 2026-05-27): `IntegralDivide(left, right, evalMode)`. Non-decimal operands are cast to `DecimalType(19, 0)`; result is recomputed per `IntegralDivide.resultDecimalType`, wrapped in `CheckOverflow`, then cast to `Long`. ANSI overflow for `Long.MinValue div -1` and decimal-overflow ANSI cases are covered by existing tests. |
684 | | -- [ ] e |
| 688 | +- [x] e |
| 689 | + - Foldable; rewritten to a literal by ConstantFolding (like `pi`). |
685 | 690 | - [x] exp |
686 | 691 | - Spark 3.4.3, 3.5.8, 4.0.1, 4.1.1 (audited 2026-05-27): `UnaryMathExpression(StrictMath.exp, "EXP")` unchanged. ULP-level differences vs DataFusion `exp` are possible but unflagged. |
687 | 692 | - [x] expm1 |
|
729 | 734 | - See `misc_funcs / rand`. |
730 | 735 | - [x] randn |
731 | 736 | - See `misc_funcs / randn`. |
732 | | -- [ ] random |
| 737 | +- [x] random |
733 | 738 | - [ ] randstr |
734 | 739 | - [x] rint |
735 | 740 | - Spark 3.4.3, 3.5.8, 4.0.1, 4.1.1 (audited 2026-05-27): `UnaryMathExpression(math.rint, "ROUND")` with `funcName = "rint"`. Passthrough to DataFusion `rint` (round-half-to-even). |
|
764 | 769 | - Spark 3.4.3, 3.5.8, 4.0.1, 4.1.1 (audited 2026-05-27): rewrites to `Subtract(.., EvalMode.TRY)`. Integer path uses `checked_sub`; decimal uses `WideDecimalBinaryExpr` as needed. |
765 | 770 | - [x] unhex |
766 | 771 | - Spark 3.4.3, 3.5.8, 4.0.1, 4.1.1 (audited 2026-05-27): `Unhex(child, failOnError)`. Spark 4.x widens input to `StringTypeWithCollation` and wraps the inner call in try/catch; Comet `CometUnhex` forwards `failOnError` to native `spark_unhex` but does not gate on collation. |
767 | | -- [ ] uniform |
| 772 | +- [x] uniform |
768 | 773 | - [x] width_bucket |
769 | 774 | - Spark 3.5.8 (audited 2026-05-27): introduced; not available in 3.4.3. |
770 | 775 | - Spark 4.0.1, 4.1.1 (audited 2026-05-27): same semantics; `NullIntolerant` -> `nullIntolerant: Boolean` refactor. |
|
782 | 787 | - [ ] bitmap_construct_agg |
783 | 788 | - [ ] bitmap_count |
784 | 789 | - [ ] bitmap_or_agg |
785 | | -- [ ] current_catalog |
786 | | -- [ ] current_database |
787 | | -- [ ] current_schema |
788 | | -- [ ] current_user |
789 | | -- [ ] equal_null |
| 790 | +- [x] current_catalog |
| 791 | + - Resolved to a literal by the analyzer (`ReplaceCurrentLike`). |
| 792 | +- [x] current_database |
| 793 | + - Resolved to a literal by the analyzer (`ReplaceCurrentLike`). |
| 794 | +- [x] current_schema |
| 795 | + - Alias of `current_database`; resolved to a literal by the analyzer. |
| 796 | +- [x] current_user |
| 797 | + - Resolved to a literal by the analyzer; same as `user`. |
| 798 | +- [x] equal_null |
790 | 799 | - [ ] from_avro |
791 | 800 | - [ ] from_protobuf |
792 | 801 | - [ ] hll_sketch_estimate |
|
819 | 828 | - [ ] schema_of_avro |
820 | 829 | - [ ] schema_of_variant |
821 | 830 | - [ ] schema_of_variant_agg |
822 | | -- [ ] session_user |
| 831 | +- [x] session_user |
| 832 | + - Alias of `current_user`; resolved to a literal by the analyzer. |
823 | 833 | - [x] spark_partition_id |
824 | 834 | - Spark 3.4.3 (audited 2026-05-27): byte-for-byte identical to 4.1.1. `SparkPartitionID() extends LeafExpression with Nondeterministic`; returns the integer index of the partition being processed. Comet emits an empty `SparkPartitionId` proto. |
825 | 835 | - Spark 3.5.8 (audited 2026-05-27): identical to 3.4.3. |
|
841 | 851 | - [ ] try_parse_json |
842 | 852 | - [ ] try_reflect |
843 | 853 | - [ ] try_variant_get |
844 | | -- [ ] typeof |
| 854 | +- [x] typeof |
| 855 | + - Foldable; resolved to a literal before Comet sees the plan. |
845 | 856 | - [x] user |
846 | 857 | - Spark 3.4.3 (audited 2026-05-27): `CurrentUser() extends LeafExpression with Unevaluable`; the analyzer's `ResolveCurrentLike` rule replaces it with a `StringType` literal of the current user name before Comet sees the plan. No Comet serde needed; the literal flows through `CometLiteral`. |
847 | 858 | - Spark 3.5.8 (audited 2026-05-27): identical to 3.4.3. |
|
940 | 951 | - Spark 3.5.8 (audited 2026-05-27): baseline. `Or(left, right) extends BinaryOperator with Predicate`; short-circuit left-to-right. Comet routes via `CometOr` to the proto's `or` binary expression. |
941 | 952 | - Spark 4.0.1 (audited 2026-05-27): semantics unchanged. |
942 | 953 | - Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1. |
943 | | -- [ ] regexp |
944 | | -- [ ] regexp_like |
| 954 | +- [x] regexp |
| 955 | +- [x] regexp_like |
945 | 956 | - [x] rlike |
946 | 957 | - See `string_funcs / regexp_replace` and the `CometRLike` notes (audited in PR #4461). Uses the Rust `regex` crate, which differs from Java's `Pattern` engine; requires `spark.comet.expression.regexp.allowIncompatible=true`. |
947 | 958 |
|
|
956 | 967 | - [x] character_length |
957 | 968 | - [x] chr |
958 | 969 | - [ ] collate |
959 | | -- [ ] collation |
| 970 | +- [x] collation |
960 | 971 | - [x] concat_ws |
961 | 972 | - [x] contains |
962 | 973 | - [x] decode |
|
978 | 989 | - [x] lower |
979 | 990 | - [x] lpad |
980 | 991 | - [x] ltrim |
981 | | -- [ ] luhn_check |
| 992 | +- [x] luhn_check |
982 | 993 | - [ ] make_valid_utf8 |
983 | 994 | - [ ] mask |
984 | 995 | - [x] octet_length |
|
1006 | 1017 | - [x] substr |
1007 | 1018 | - [x] substring |
1008 | 1019 | - [x] substring_index |
1009 | | -- [ ] to_binary |
| 1020 | +- [x] to_binary |
1010 | 1021 | - [ ] to_char |
1011 | 1022 | - [ ] to_number |
1012 | 1023 | - [ ] to_varchar |
|
1052 | 1063 |
|
1053 | 1064 | - [ ] cume_dist |
1054 | 1065 | - [ ] dense_rank |
1055 | | -- [ ] lag |
1056 | | -- [ ] lead |
| 1066 | +- [x] lag |
| 1067 | + - Supported via `CometWindowExec` (operator level), not the expression serde. |
| 1068 | +- [x] lead |
| 1069 | + - Supported via `CometWindowExec` (operator level), not the expression serde. |
1057 | 1070 | - [ ] nth_value |
1058 | 1071 | - [ ] ntile |
1059 | 1072 | - [ ] percent_rank |
|
0 commit comments