You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/contributor-guide/expression-audits/json_funcs.md
+6Lines changed: 6 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,6 +33,12 @@
33
33
- Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1.
34
34
- Known incompatibility: Spark accepts single-quoted JSON and unescaped control characters; Comet's native parser (built on `serde_json`) rejects both, so those inputs require `spark.comet.expression.GetJsonObject.allowIncompatible=true` and may still produce different results. Non-default Spark 4.0 string collations are not propagated (https://github.com/apache/datafusion-comet/issues/2190).
35
35
36
+
## json_array_length
37
+
38
+
-`LengthOfJsonArray`: `UnaryExpression with ExpectsInputTypes with CodegenFallback`; `inputTypes = Seq(StringType) -> IntegerType`. Returns NULL for NULL input, invalid JSON, or non-array JSON; otherwise the number of top-level array elements.
39
+
- Runs through the codegen dispatcher by default for byte-exact Spark compatibility.
40
+
- Known incompatibility: the native path (built on `serde_json`) requires strict JSON, so single-quoted JSON, unescaped control characters, and trailing content require `spark.comet.expression.LengthOfJsonArray.allowIncompatible=true` and may still produce different results.
41
+
36
42
## to_json
37
43
38
44
- Partial native support; options and map/array inputs fall back.
|`get_json_object`| Supported, with gaps on single-quoted JSON and unescaped control characters |`spark.comet.expression.GetJsonObject.allowIncompatible`|
|`to_json`| Supported for struct inputs only, no options |`spark.comet.expression.StructsToJson.allowIncompatible`|
43
+
|`json_array_length`| Supported, with gaps on single-quoted JSON, unescaped control characters, and trailing content |`spark.comet.expression.LengthOfJsonArray.allowIncompatible`|
44
+
45
+
When the native path is enabled but an expression or input case has no native
46
+
implementation (for example `to_json` with map or array inputs, or `from_json`
47
+
with an unsupported schema), Comet falls back to the codegen dispatcher for that
48
+
case.
49
+
50
+
## When to use the native path
51
+
52
+
- You want the faster native path and your inputs avoid the known compatibility
53
+
gaps above.
54
+
- Enable it per expression, for example
55
+
`spark.comet.expression.GetJsonObject.allowIncompatible=true`. Cases the native path
56
+
does not cover still fall back to the codegen dispatcher.
0 commit comments