[SPARK-57641] Add variant_insert expression#56703
Conversation
variant_insert expressionvariant_insert expression
harshmotw-db
left a comment
There was a problem hiding this comment.
Thanks for the feature! I have gone through the scala implementation and it looks good to me from an implementation PoV. So far, my only point of contention is whether we want to implicitly cast structs/maps to variant using a to_variant_object transformation which is implicitly lossy. Tagging @srielau to think about this.
Alternatives I can think of are:
- Force users to pass variant as the input argument. They could cast it themselves
- Allow users to pass any data that doesn't contain structs/maps and if they want to use the
to_variant_objecttransformation, they could do that manually and pass variant as the inserted argument instead.
I will go over the Python code and the tests tomorrow
| result | ||
| } else if (value.dataType == NullType) { | ||
| TypeCheckResult.TypeCheckSuccess | ||
| } else if (!VariantGet.checkDataType(value.dataType, allowStructsAndMaps = true)) { |
There was a problem hiding this comment.
It seems we are implicitly allowing struct/map -> variant cast here (for which we use to_variant_object in regular SQL)? If we are sure about this, we should document that structs and maps are transformed into variant using the to_variant_object transformation.
There was a problem hiding this comment.
This is a good point, out of the two proposals I am fine with the second one (I wouldn't force users to use variant input since it would require an explicit cast even for plain scalars - e.g. variant_insert(v, '$.a', cast(2 AS VARIANT)), which is an unnecessary degradation of user experience).
I defined value parameter as "any value castable to variant". If the consensus is to consider value castable only if it satisfies cast operator constraints (link) I am happy to make the change. Otherwise, documenting the struct/map behaviour makes sense.
There was a problem hiding this comment.
Let's stick to orthodoxy here... that object problem will need to be solved eventually. But maybe variant_insert is not where we want to start.
Let's go with castable-only.
What changes were proposed in this pull request?
Adds the SQL function
variant_insert(v, path, val), which inserts a value into a Variant value at a single JSONPath location.Details:
$.b): adds a new field; errors withVARIANT_DUPLICATE_KEYif the key already exists;
$[N]): inserts at indexN, shifting later elements right; anindex past the end pads the gap with variant nulls;
VARIANT_PATH_TYPE_MISMATCHis raised when a path segmentis applied to an incompatible value (e.g. an object key on an array); the root
path
$is rejected withINVALID_VARIANT_PATH, and results exceeding the sizelimit raise
VARIANT_SIZE_LIMIT.Why are the changes needed?
There is no way to add a field or element to a variant; today the only option is to convert it to another datatype (e.g. map), mutate, and convert back.
Does this PR introduce any user-facing change?
Yes, a new SQL function (and Scala/Python
functionsAPI) plus a new error conditionVARIANT_PATH_TYPE_MISMATCH.How was this patch tested?
Unit tests.
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code with Claude Opus 4.8