Skip to content

Modify SqlStreamingXml XmlWriter to internally use direct XmlReader parsing#3974

Open
jimhblythe wants to merge 9 commits into
dotnet:mainfrom
jimhblythe:issue-1877
Open

Modify SqlStreamingXml XmlWriter to internally use direct XmlReader parsing#3974
jimhblythe wants to merge 9 commits into
dotnet:mainfrom
jimhblythe:issue-1877

Conversation

@jimhblythe

Copy link
Copy Markdown

Modify SqlStreamingXml XmlWriter to internally use a MemoryStream instead of a StringBuilder.

Note: UTF8Encoding(false) addition in s_writerSettings is consistent with prior default used within StringWriter/StringBuilder

Issues

Fixes #1877 to be O(n)

Testing

Added 2 new Manual tests to ensure linear behavior for single large node, and secondary validation for multiple nodes
image

…tead of a StringBuilder. (dotnet#1877)

Note: UTF8Encoding(false) addition in s_writerSettings is consistent with prior default used within StringWriter/StringBuilder
@jimhblythe

Copy link
Copy Markdown
Author

@dotnet-policy-service agree

Comment thread src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/SqlStream.cs Outdated
@mdaigle mdaigle moved this from To triage to Needs Response in SqlClient Board Feb 24, 2026
…ple elements

Enhance comments within SqlStreamingXml
Extend Manual tests to fully cover GetChars
WriteXmlElement includes uncovered paths not accessible for SQL XML column types which normalize Whitespace, CDATA, EntityReference, XmlDeclaration, ProcessingInstruction, DocumentType, and Comment node types
@paulmedynski

Copy link
Copy Markdown
Contributor

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 2 pipeline(s).

@paulmedynski paulmedynski self-assigned this Feb 27, 2026
@paulmedynski paulmedynski moved this from Needs Response to In review in SqlClient Board Feb 27, 2026

@paulmedynski paulmedynski left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great - thanks for this optimization and expanded test coverage! Just one question.

Comment thread src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/SqlStream.cs Outdated
Comment thread src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/SqlStream.cs Outdated
@paulmedynski

Copy link
Copy Markdown
Contributor

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 2 pipeline(s).

paulmedynski
paulmedynski previously approved these changes Feb 27, 2026
@paulmedynski paulmedynski added this to the 7.0.0 milestone Feb 27, 2026
@codecov

codecov Bot commented Feb 27, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 97.57576% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.82%. Comparing base (a68e00f) to head (9e62ba5).
⚠️ Report is 65 commits behind head on main.

Files with missing lines Patch % Lines
...qlClient/src/Microsoft/Data/SqlClient/SqlStream.cs 97.57% 4 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (a68e00f) and HEAD (9e62ba5). Click for more details.

HEAD has 6 uploads less than BASE
Flag BASE (a68e00f) HEAD (9e62ba5)
netfx 2 0
netcore 2 0
addons 2 0
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3974      +/-   ##
==========================================
- Coverage   75.22%   65.82%   -9.40%     
==========================================
  Files         266      275       +9     
  Lines       42932    65896   +22964     
==========================================
+ Hits        32294    43379   +11085     
- Misses      10638    22517   +11879     
Flag Coverage Δ
PR-SqlClient-Project 65.82% <97.57%> (?)
addons ?
netcore ?
netfx ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

mdaigle added a commit that referenced this pull request Mar 5, 2026
Add tests that verify SqlDataReader.GetChars correctly handles non-ASCII
characters when reading XML columns with SequentialAccess:

- GetChars_NonAsciiContent: Latin accented characters (2-byte UTF-8)
- GetChars_NonAsciiContent_BulkRead: Bulk read path with accented chars
- GetChars_CjkContent: CJK characters (3-byte UTF-8)

These tests establish a baseline for correct behavior on main before
PR #3974 (issue #1877) refactors SqlStreamingXml internals.
@mdaigle

mdaigle commented Mar 5, 2026

Copy link
Copy Markdown
Contributor

@jimhblythe can you take a look at the tests in #4005 and #4008? They seem to suggest that this introduces a regression for non-ascii characters. You can check the pipeline results to see the test outcomes. I'd like to see some tests like that included in this PR as well, please 🙏

@jimhblythe

Copy link
Copy Markdown
Author

@jimhblythe can you take a look at the tests in #4005 and #4008? They seem to suggest that this introduces a regression for non-ascii characters. You can check the pipeline results to see the test outcomes. I'd like to see some tests like that included in this PR as well, please 🙏

I will additionally including a fourth test to verify surrogate pairs:

        [ConditionalFact(typeof(DataTestUtility), nameof(DataTestUtility.AreConnStringsSetup))]
        public static void GetChars_SurrogatePairContent()
        {
            SqlConnection connection = new(DataTestUtility.TCPConnectionString);
            // Surrogate Pair characters: 4 bytes each in UTF-8
            string xml = "<data>\U0001F600\U0001F525\U0001F680</data>";

Actual: "<data>😀🔥🚀</data>"

The test shows these worked with prior code.

…reamingXml, reconstructing XML fragments as strings and streaming them char-by-char. Improves efficiency, reduces allocations, and fixes non-ASCII and surrogate pairs.

Add comprehensive unit tests for XML edge cases (non-ASCII, surrogate pairs, comments, CDATA, attributes, namespaces, etc.). Refactor existing tests for clarity and better handling of disposables.
(AI assist using ChatGPT to better consider edge cases)
@jimhblythe

jimhblythe commented Mar 8, 2026

Copy link
Copy Markdown
Author

I had to use an alternate as MemoryStream was not the right choice. I actually optimized further with bypass of XmlWriter which used an additional buffer; more memory spared with latest commit.

Significantly more tests added :
image

Is exact replication of prior logic desired or more consistent with how SSMS would show an XML column?
Prior logic would occasionally convert < back to &lt; - e.g. within CData
Prior logic would also expand \n to \r\n

Coverage complete except for null reference bullet-proofing that cannot be replicated through SqlReader which ensures values are provide:
image

@paulmedynski paulmedynski modified the milestones: 7.0.0, 7.0.1, 7.1.0-preview1 Mar 9, 2026
@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 2 pipeline(s).

@paulmedynski

Copy link
Copy Markdown
Contributor

Hey @jimhblythe - Nothing further at the moment. We're just tied up with other tasks. I have tentatively put this into the next preview release, but it looks like a substantial change to review. It may get bumped to preview 2.

@benrr101 benrr101 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the submission - this would be a great performance improvement to take!

Definitely want to see the tests rewritten as unit tests, and I'd really like to avoid rewriting xml entities escaping code. But feel free to push back if I misunderstood something - just be prepared to document why it's being done that way :)

Comment thread src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/SqlStream.cs Outdated
Comment thread src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/SqlStream.cs Outdated
Comment thread src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/SqlStream.cs Outdated
Comment thread src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/SqlStream.cs Outdated
Comment thread src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/SqlStream.cs Outdated
Comment thread src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/SqlStream.cs Outdated
@github-project-automation github-project-automation Bot moved this from In review to In progress in SqlClient Board May 12, 2026
Comment thread src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/SqlStream.cs Outdated
Comment thread src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/SqlStream.cs Outdated

@priyankatiwari08 priyankatiwari08 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the perf work on #1877 — algorithmic fix seems to be in the right direction. Agree with @benrr101's asks; nothing material to add on top of his review, just two small nits below.

@jimhblythe

Copy link
Copy Markdown
Author

@benrr101 @priyankatiwari08 Thanks for the reviews and feedback.

I did not mark any conversations as resolved unless I addressed them with a code modification. For the remaining open items, could you please review and either reply with any additional considerations or resolve if you feel the current explanation is sufficient?

@jimhblythe jimhblythe requested a review from benrr101 May 15, 2026 23:54
@benrr101

Copy link
Copy Markdown
Contributor

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 2 pipeline(s).

benrr101
benrr101 previously approved these changes May 20, 2026

@benrr101 benrr101 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the explanations make sense, and the changes look good. I'm ok to take these. Thank you again for you contribution!

@paulmedynski

Copy link
Copy Markdown
Contributor

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 2 pipeline(s).

@benrr101

Copy link
Copy Markdown
Contributor

@jimhblythe I took the liberty to fix the issue with the manual test project file that was causing the pipeline failures. Hopefully we'll get it moving

@benrr101

Copy link
Copy Markdown
Contributor

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 2 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

O(N^2) performance when reading XML with SequentialAccess

7 participants