Modify SqlStreamingXml XmlWriter to internally use direct XmlReader parsing#3974
Modify SqlStreamingXml XmlWriter to internally use direct XmlReader parsing#3974jimhblythe wants to merge 9 commits into
Conversation
…tead of a StringBuilder. (dotnet#1877) Note: UTF8Encoding(false) addition in s_writerSettings is consistent with prior default used within StringWriter/StringBuilder
|
@dotnet-policy-service agree |
…ple elements Enhance comments within SqlStreamingXml Extend Manual tests to fully cover GetChars WriteXmlElement includes uncovered paths not accessible for SQL XML column types which normalize Whitespace, CDATA, EntityReference, XmlDeclaration, ProcessingInstruction, DocumentType, and Comment node types
|
/azp run |
|
Azure Pipelines successfully started running 2 pipeline(s). |
paulmedynski
left a comment
There was a problem hiding this comment.
Looks great - thanks for this optimization and expanded test coverage! Just one question.
|
/azp run |
|
Azure Pipelines successfully started running 2 pipeline(s). |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3974 +/- ##
==========================================
- Coverage 75.22% 65.82% -9.40%
==========================================
Files 266 275 +9
Lines 42932 65896 +22964
==========================================
+ Hits 32294 43379 +11085
- Misses 10638 22517 +11879
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Add tests that verify SqlDataReader.GetChars correctly handles non-ASCII characters when reading XML columns with SequentialAccess: - GetChars_NonAsciiContent: Latin accented characters (2-byte UTF-8) - GetChars_NonAsciiContent_BulkRead: Bulk read path with accented chars - GetChars_CjkContent: CJK characters (3-byte UTF-8) These tests establish a baseline for correct behavior on main before PR #3974 (issue #1877) refactors SqlStreamingXml internals.
|
@jimhblythe can you take a look at the tests in #4005 and #4008? They seem to suggest that this introduces a regression for non-ascii characters. You can check the pipeline results to see the test outcomes. I'd like to see some tests like that included in this PR as well, please 🙏 |
I will additionally including a fourth test to verify surrogate pairs:
The test shows these worked with prior code. |
…reamingXml, reconstructing XML fragments as strings and streaming them char-by-char. Improves efficiency, reduces allocations, and fixes non-ASCII and surrogate pairs. Add comprehensive unit tests for XML edge cases (non-ASCII, surrogate pairs, comments, CDATA, attributes, namespaces, etc.). Refactor existing tests for clarity and better handling of disposables. (AI assist using ChatGPT to better consider edge cases)
|
Azure Pipelines successfully started running 2 pipeline(s). |
|
Hey @jimhblythe - Nothing further at the moment. We're just tied up with other tasks. I have tentatively put this into the next preview release, but it looks like a substantial change to review. It may get bumped to preview 2. |
benrr101
left a comment
There was a problem hiding this comment.
Thanks for the submission - this would be a great performance improvement to take!
Definitely want to see the tests rewritten as unit tests, and I'd really like to avoid rewriting xml entities escaping code. But feel free to push back if I misunderstood something - just be prepared to document why it's being done that way :)
|
@benrr101 @priyankatiwari08 Thanks for the reviews and feedback. I did not mark any conversations as resolved unless I addressed them with a code modification. For the remaining open items, could you please review and either reply with any additional considerations or resolve if you feel the current explanation is sufficient? |
|
/azp run |
|
Azure Pipelines successfully started running 2 pipeline(s). |
benrr101
left a comment
There was a problem hiding this comment.
I think the explanations make sense, and the changes look good. I'm ok to take these. Thank you again for you contribution!
|
/azp run |
|
Azure Pipelines successfully started running 2 pipeline(s). |
|
@jimhblythe I took the liberty to fix the issue with the manual test project file that was causing the pipeline failures. Hopefully we'll get it moving |
|
/azp run |
|
Azure Pipelines successfully started running 2 pipeline(s). |


Modify SqlStreamingXml XmlWriter to internally use a MemoryStream instead of a StringBuilder.
Note: UTF8Encoding(false) addition in s_writerSettings is consistent with prior default used within StringWriter/StringBuilder
Issues
Fixes #1877 to be O(n)
Testing
Added 2 new Manual tests to ensure linear behavior for single large node, and secondary validation for multiple nodes
