Skip to content

[SPARK-57624][SQL] from_xml to variant should honor the parse mode#56681

Open
mzhang wants to merge 4 commits into
apache:masterfrom
mzhang:oss-from-xml-variant-mode
Open

[SPARK-57624][SQL] from_xml to variant should honor the parse mode#56681
mzhang wants to merge 4 commits into
apache:masterfrom
mzhang:oss-from-xml-variant-mode

Conversation

@mzhang

@mzhang mzhang commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

from_xml(..., 'variant', map('mode', ...)) ignored the parse mode. XmlToStructsEvaluator.evaluate called StaxXmlParser.parseVariant directly for variant output, bypassing the FailureSafeParser the struct path uses, so a malformed record aborted the whole job regardless of mode.

This routes variant output through FailureSafeParser like the struct path: a new StaxXmlParser.parseVariantColumn wraps parse failures in BadRecordException, and the evaluator parses into a single value: variant row and unwraps it. PERMISSIVE rescues the row to null, FAILFAST rethrows.

It also fixes from_xml to variant under whole-stage codegen: XmlToStructs.doGenCode hardcast the result to InternalRow (a compile error for variant output) and did not null-check the result (NPE on a PERMISSIVE-rescued null). It now casts to the actual output type and null-checks.

Why are the changes needed?

from_xml to variant is unusable on input that contains any malformed record: the documented mode option has no effect for variant output, so PERMISSIVE cannot rescue bad records and a single bad row fails the whole query (including streaming jobs). Separately, from_xml to variant fails to compile under whole-stage codegen.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New unit test in XmlFunctionsSuite ("from_xml variant output honors the parse mode"): a malformed record (an illegal XML control char) plus a well-formed record, asserting PERMISSIVE rescues the bad record to null and parses the good one, and FAILFAST throws. Run under both interpreted execution and whole-stage codegen.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.8)

`from_xml(..., 'variant', map('mode', ...))` ignored the parse mode.
`XmlToStructsEvaluator.evaluate` called `StaxXmlParser.parseVariant`
directly for variant output, bypassing the `FailureSafeParser` the struct
path uses, so a malformed record (e.g. an illegal XML control char) aborted
the whole job even under PERMISSIVE.

Route variant output through `FailureSafeParser` like the struct path: a new
`StaxXmlParser.parseVariantColumn` wraps parse failures in
`BadRecordException`, the evaluator parses into a single `value: variant` row
and unwraps it. PERMISSIVE rescues the row to null, FAILFAST rethrows.

Also fix `from_xml`-to-variant under whole-stage codegen: `XmlToStructs.
doGenCode` hardcast the result to `InternalRow` (compile error for variant
output) and did not null-check the result (NPE on a PERMISSIVE-rescued null).
Cast to the actual output type and null-check.

Co-authored-by: Isaac
@mzhang mzhang changed the title [WIP][SQL] from_xml to variant should honor the parse mode [SPARK-57624][SQL] from_xml to variant should honor the parse mode Jun 22, 2026

@xiaonanyang-db xiaonanyang-db left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, lgtm

Comment thread sql/core/src/test/scala/org/apache/spark/sql/XmlFunctionsSuite.scala Outdated
Per review: assert the FAILFAST failure with checkError on
MALFORMED_RECORD_IN_PARSING.WITHOUT_SUGGESTION instead of matching the
exception message text, so the test does not depend on message wording.

Co-authored-by: Isaac
@mzhang

mzhang commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

Hey @uros-b ! Thanks for the suggestion, I applied it. Are there any concerns before merging this PR?

…iantType

Per review: drop the separate variant branch in XmlToStructsEvaluator and
route variant output through StaxXmlParser.doParseColumn as well, gated by a
new internal XmlOptions.rootVariantType flag. Removes parseVariantColumn and
its duplicated bad-record handling, and gives variant output XSD validation.

Co-authored-by: Isaac
Comment thread build/sbt-launch-1.12.11.jar.part Outdated
Per review: add a from_xml -> variant test exercising rowValidationXSDPath -
a record that fails the XSD is rescued to null under PERMISSIVE, a valid one
parses. Also remove an accidentally committed build/sbt-launch jar .part file.

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants