Skip to content

Commit 6b7d521

Browse files
authored
chore(audit): audit json expressions across Spark 3.4.3, 3.5.8, 4.0.1, 4.1.1 (apache#4470)
1 parent bedd6b7 commit 6b7d521

2 files changed

Lines changed: 10 additions & 6 deletions

File tree

docs/source/contributor-guide/spark_expressions_support.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -511,6 +511,11 @@
511511

512512
- [ ] from_json
513513
- [x] get_json_object
514+
- Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
515+
- Spark 3.5.8 (audited 2026-05-27): baseline. `BinaryExpression with ExpectsInputTypes with CodegenFallback`; `inputTypes = Seq(StringType, StringType) -> StringType`. Eval is inline and uses Jackson with `RawStyle` output. Foldable paths are parsed once. Returns NULL for invalid JSON, missing paths, or `JsonProcessingException`.
516+
- Spark 4.0.1 (audited 2026-05-27): the eval is extracted into a `GetJsonObjectEvaluator` helper (no behaviour change). The trait set now mixes in `DefaultStringProducingExpression`, and `inputTypes` is widened to `StringTypeWithCollation(supportsTrimCollation = true)` for both arguments.
517+
- Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1.
518+
- Known incompatibility: Spark accepts single-quoted JSON and unescaped control characters; Comet's native parser (built on `serde_json`) rejects both, so those inputs require `spark.comet.expression.GetJsonObject.allowIncompatible=true` and may still produce different results. Non-default Spark 4.0 string collations are not propagated (https://github.com/apache/datafusion-comet/issues/2190).
514519
- [ ] json_array_length
515520
- [ ] json_object_keys
516521
- [ ] json_tuple

spark/src/main/scala/org/apache/comet/serde/strings.scala

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -443,15 +443,14 @@ object CometStringSplit extends CometExpressionSerde[StringSplit] {
443443

444444
object CometGetJsonObject extends CometExpressionSerde[GetJsonObject] {
445445

446-
override def getIncompatibleReasons(): Seq[String] = Seq(
446+
private val incompatReason =
447447
"Spark allows single-quoted JSON and unescaped control characters which Comet does not" +
448-
" support")
448+
" support"
449+
450+
override def getIncompatibleReasons(): Seq[String] = Seq(incompatReason)
449451

450452
override def getSupportLevel(expr: GetJsonObject): SupportLevel =
451-
Incompatible(
452-
Some(
453-
"Spark allows single-quoted JSON and unescaped control characters " +
454-
"which Comet does not support"))
453+
Incompatible(Some(incompatReason))
455454

456455
override def convert(
457456
expr: GetJsonObject,

0 commit comments

Comments
 (0)