Skip to content

SNOW-3375781: Preserve 1-pass read for SCOS XML user schema performance#4185

Merged
sfc-gh-mayliu merged 2 commits into
mainfrom
SNOW-3375781-SCOS-XML-custom-schema-perf
Apr 20, 2026
Merged

SNOW-3375781: Preserve 1-pass read for SCOS XML user schema performance#4185
sfc-gh-mayliu merged 2 commits into
mainfrom
SNOW-3375781-SCOS-XML-custom-schema-perf

Conversation

@sfc-gh-mayliu

@sfc-gh-mayliu sfc-gh-mayliu commented Apr 17, 2026

Copy link
Copy Markdown
Collaborator
  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-3375781

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
    • I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
    • If adding any arguments to public Snowpark APIs or creating new public Snowpark APIs, I acknowledge that I have ensured my changes include AST support. Follow the link for more information: AST Support Guidelines
  3. Please describe how your code solves the related issue.

    Please write a short description of how your code change solves the related issue.

Add internal _XML_SKIP_INFERENCE option to preserve 1-pass read in SCOS XML with user schema.
Snowpark Python now supports XML inferSchema=False to produce all StringType columns for leaf tags while preserving nested structures (to be released in v1.50.0). Since SCOS custom-schema turns off inferSchema flag, a 2-pass read would be triggered in Snowpark after release.

This PR serves to preserve the 1-pass read performance in SCOS XML custom-schema scenarios to adhere to Spark behaviors -- no behavioral changes whatsoever, pure performance improvement. The private _XML_SKIP_INFERENCE option in SCOS is an internal contract to preserve performance.
Relevant SCOS PR: https://github.com/snowflake-eng/sas/pull/3687

@github-actions

github-actions Bot commented Apr 17, 2026

Copy link
Copy Markdown

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@sfc-gh-mayliu

Copy link
Copy Markdown
Collaborator Author

I have read the CLA Document and I hereby sign the CLA

@sfc-gh-mayliu sfc-gh-mayliu added the NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md label Apr 17, 2026
Comment thread tests/unit/test_xml_schema_inference.py Outdated
@codecov-commenter

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.42%. Comparing base (7c853e1) to head (f36a7d4).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4185   +/-   ##
=======================================
  Coverage   95.42%   95.42%           
=======================================
  Files         171      171           
  Lines       43615    43684   +69     
  Branches     7459     7475   +16     
=======================================
+ Hits        41620    41686   +66     
- Misses       1220     1221    +1     
- Partials      775      777    +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sfc-gh-mayliu sfc-gh-mayliu merged commit 788ad8b into main Apr 20, 2026
29 of 30 checks passed
@sfc-gh-mayliu sfc-gh-mayliu deleted the SNOW-3375781-SCOS-XML-custom-schema-perf branch April 20, 2026 20:33
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 20, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants