[SPARK-57624][SQL] from_xml to variant should honor the parse mode#56681
Open
mzhang wants to merge 4 commits into
Open
[SPARK-57624][SQL] from_xml to variant should honor the parse mode#56681mzhang wants to merge 4 commits into
mzhang wants to merge 4 commits into
Conversation
`from_xml(..., 'variant', map('mode', ...))` ignored the parse mode.
`XmlToStructsEvaluator.evaluate` called `StaxXmlParser.parseVariant`
directly for variant output, bypassing the `FailureSafeParser` the struct
path uses, so a malformed record (e.g. an illegal XML control char) aborted
the whole job even under PERMISSIVE.
Route variant output through `FailureSafeParser` like the struct path: a new
`StaxXmlParser.parseVariantColumn` wraps parse failures in
`BadRecordException`, the evaluator parses into a single `value: variant` row
and unwraps it. PERMISSIVE rescues the row to null, FAILFAST rethrows.
Also fix `from_xml`-to-variant under whole-stage codegen: `XmlToStructs.
doGenCode` hardcast the result to `InternalRow` (compile error for variant
output) and did not null-check the result (NPE on a PERMISSIVE-rescued null).
Cast to the actual output type and null-check.
Co-authored-by: Isaac
HyukjinKwon
approved these changes
Jun 23, 2026
uros-b
reviewed
Jun 23, 2026
Per review: assert the FAILFAST failure with checkError on MALFORMED_RECORD_IN_PARSING.WITHOUT_SUGGESTION instead of matching the exception message text, so the test does not depend on message wording. Co-authored-by: Isaac
Contributor
Author
|
Hey @uros-b ! Thanks for the suggestion, I applied it. Are there any concerns before merging this PR? |
sandip-db
suggested changes
Jun 30, 2026
…iantType Per review: drop the separate variant branch in XmlToStructsEvaluator and route variant output through StaxXmlParser.doParseColumn as well, gated by a new internal XmlOptions.rootVariantType flag. Removes parseVariantColumn and its duplicated bad-record handling, and gives variant output XSD validation. Co-authored-by: Isaac
sandip-db
approved these changes
Jun 30, 2026
sandip-db
reviewed
Jun 30, 2026
Per review: add a from_xml -> variant test exercising rowValidationXSDPath - a record that fails the XSD is rescued to null under PERMISSIVE, a valid one parses. Also remove an accidentally committed build/sbt-launch jar .part file. Co-authored-by: Isaac
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
from_xml(..., 'variant', map('mode', ...))ignored the parsemode.XmlToStructsEvaluator.evaluatecalledStaxXmlParser.parseVariantdirectly for variant output, bypassing theFailureSafeParserthe struct path uses, so a malformed record aborted the whole job regardless ofmode.This routes variant output through
FailureSafeParserlike the struct path: a newStaxXmlParser.parseVariantColumnwraps parse failures inBadRecordException, and the evaluator parses into a singlevalue: variantrow and unwraps it. PERMISSIVE rescues the row tonull, FAILFAST rethrows.It also fixes
from_xmlto variant under whole-stage codegen:XmlToStructs.doGenCodehardcast the result toInternalRow(a compile error for variant output) and did not null-check the result (NPE on a PERMISSIVE-rescuednull). It now casts to the actual output type and null-checks.Why are the changes needed?
from_xmlto variant is unusable on input that contains any malformed record: the documentedmodeoption has no effect for variant output, so PERMISSIVE cannot rescue bad records and a single bad row fails the whole query (including streaming jobs). Separately,from_xmlto variant fails to compile under whole-stage codegen.Does this PR introduce any user-facing change?
No.
How was this patch tested?
New unit test in
XmlFunctionsSuite("from_xml variant output honors the parse mode"): a malformed record (an illegal XML control char) plus a well-formed record, asserting PERMISSIVE rescues the bad record tonulland parses the good one, and FAILFAST throws. Run under both interpreted execution and whole-stage codegen.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Opus 4.8)