Skip to content

[SPARK-57641] Add variant_insert expression#56703

Open
bojana-db wants to merge 7 commits into
apache:masterfrom
bojana-db:variant-insert
Open

[SPARK-57641] Add variant_insert expression#56703
bojana-db wants to merge 7 commits into
apache:masterfrom
bojana-db:variant-insert

Conversation

@bojana-db

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Adds the SQL function variant_insert(v, path, val), which inserts a value into a Variant value at a single JSONPath location.

Details:

  • Object path (e.g. $.b): adds a new field; errors with VARIANT_DUPLICATE_KEY
    if the key already exists;
  • Array path (e.g. $[N]): inserts at index N, shifting later elements right; an
    index past the end pads the gap with variant nulls;
  • Missing intermediate keys/indices along the path are created;
  • Any NULL argument returns NULL;
  • The value may be any expression castable to variant;
  • A new error condition VARIANT_PATH_TYPE_MISMATCH is raised when a path segment
    is applied to an incompatible value (e.g. an object key on an array); the root
    path $ is rejected with INVALID_VARIANT_PATH, and results exceeding the size
    limit raise VARIANT_SIZE_LIMIT.

Why are the changes needed?

There is no way to add a field or element to a variant; today the only option is to convert it to another datatype (e.g. map), mutate, and convert back.

Does this PR introduce any user-facing change?

Yes, a new SQL function (and Scala/Python functions API) plus a new error condition VARIANT_PATH_TYPE_MISMATCH.

How was this patch tested?

Unit tests.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code with Claude Opus 4.8

@bojana-db bojana-db changed the title init [SPARK-57417] Add variant_insert expression Jun 23, 2026
@bojana-db bojana-db changed the title [SPARK-57417] Add variant_insert expression [SPARK-57641] Add variant_insert expression Jun 23, 2026

@harshmotw-db harshmotw-db left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feature! I have gone through the scala implementation and it looks good to me from an implementation PoV. So far, my only point of contention is whether we want to implicitly cast structs/maps to variant using a to_variant_object transformation which is implicitly lossy. Tagging @srielau to think about this.
Alternatives I can think of are:

  1. Force users to pass variant as the input argument. They could cast it themselves
  2. Allow users to pass any data that doesn't contain structs/maps and if they want to use the to_variant_object transformation, they could do that manually and pass variant as the inserted argument instead.

I will go over the Python code and the tests tomorrow

result
} else if (value.dataType == NullType) {
TypeCheckResult.TypeCheckSuccess
} else if (!VariantGet.checkDataType(value.dataType, allowStructsAndMaps = true)) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we are implicitly allowing struct/map -> variant cast here (for which we use to_variant_object in regular SQL)? If we are sure about this, we should document that structs and maps are transformed into variant using the to_variant_object transformation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point, out of the two proposals I am fine with the second one (I wouldn't force users to use variant input since it would require an explicit cast even for plain scalars - e.g. variant_insert(v, '$.a', cast(2 AS VARIANT)), which is an unnecessary degradation of user experience).

I defined value parameter as "any value castable to variant". If the consensus is to consider value castable only if it satisfies cast operator constraints (link) I am happy to make the change. Otherwise, documenting the struct/map behaviour makes sense.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's stick to orthodoxy here... that object problem will need to be solved eventually. But maybe variant_insert is not where we want to start.
Let's go with castable-only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants