add opt-in XML character sanitization for package save#2064
add opt-in XML character sanitization for package save#2064SimonCropp wants to merge 2 commits intodotnet:mainfrom
Conversation
|
IMO this should be the default behaviour. |
twsouthwick
left a comment
There was a problem hiding this comment.
Thanks for the implementation!
| } | ||
|
|
||
| public XmlWriter Create(Stream stream, XmlWriterSettings? settings) | ||
| => new SanitizingXmlWriter(DefaultXmlWriterFactoryFeature.Instance.Create(stream, settings)); |
There was a problem hiding this comment.
this shouldn't just use the default - it should pass in the existing and wrap it
|
|
||
| internal static class XmlCharacterSanitizer | ||
| { | ||
| public static bool IsValidXmlChar(char c) |
There was a problem hiding this comment.
is there a reason to not use XmlConvert.IsXmlChar here?
|
|
||
| if (package.Features.Get<IXmlWriterFactoryFeature>() is not SanitizingXmlWriterFactoryFeature) | ||
| { | ||
| package.Features.Set<IXmlWriterFactoryFeature>(SanitizingXmlWriterFactoryFeature.Instance); |
| _xmlWriter = ResolveFactory(openXmlPart).Create(partStream, xmlWriterSettings); | ||
| } | ||
|
|
||
| private static IXmlWriterFactoryFeature ResolveFactory(OpenXmlPart openXmlPart) |
There was a problem hiding this comment.
I'd prefer to just use GetRequiredFeature - there's a default one already so we don't need it here as well
| _writer = Create(stream); | ||
| } | ||
|
|
||
| public XmlDOMTextWriter(Stream stream, IFeatureCollection? features) |
There was a problem hiding this comment.
pass in a IXmlWriterFactoryFeature here instead of a full collection
There was a problem hiding this comment.
Pull request overview
Adds an opt-in feature to sanitize invalid XML 1.0 characters during package/part serialization by centralizing XmlWriter creation behind a new IXmlWriterFactoryFeature, with a built-in sanitizing implementation that decorates the writer.
Changes:
- Introduces
IXmlWriterFactoryFeatureplus default and sanitizing factory implementations, andOpenXmlPackage.UseXmlTextSanitization()to enable it. - Routes key SDK serialization paths (
OpenXmlPartWriter,OpenXmlPartRootElement.Save,XmlDOMTextWritercall sites) through the factory. - Adds unit and round-trip tests for sanitizer behavior across Word/Excel/PowerPoint, and updates public API + docs.
Show a summary per file
| File | Description |
|---|---|
| test/DocumentFormat.OpenXml.Framework.Tests/Features/XmlTextSanitizationTests.cs | Adds unit tests for sanitizer + sanitizing writer behavior. |
| test/DocumentFormat.OpenXml.Framework.Tests/Features/XmlTextSanitizationTests.Word.cs | Word round-trip tests validating opt-in sanitization behavior. |
| test/DocumentFormat.OpenXml.Framework.Tests/Features/XmlTextSanitizationTests.Excel.cs | Excel round-trip tests validating opt-in sanitization behavior. |
| test/DocumentFormat.OpenXml.Framework.Tests/Features/XmlTextSanitizationTests.Powerpoint.cs | PowerPoint round-trip tests validating opt-in sanitization behavior. |
| src/DocumentFormat.OpenXml.Linq/Feature/OpenXmlPartRootXElementFeature.cs | Uses XmlDOMTextWriter overload that can consume part features (factory). |
| src/DocumentFormat.OpenXml.Framework/XmlDOMTextWriter.cs | Adds feature-aware constructors and resolves writer factory from IFeatureCollection. |
| src/DocumentFormat.OpenXml.Framework/PublicAPI/PublicAPI.Unshipped.txt | Declares new public surface area (IXmlWriterFactoryFeature, extension). |
| src/DocumentFormat.OpenXml.Framework/OpenXmlPartWriter.cs | Switches part-based writer creation to use the resolved writer factory. |
| src/DocumentFormat.OpenXml.Framework/OpenXmlPartRootElement.cs | Routes Save(Stream) writer creation through XmlDOMTextWriter + features. |
| src/DocumentFormat.OpenXml.Framework/OpenXmlLeafTextElement.cs | Routes InnerXml writing through feature-aware XmlDOMTextWriter. |
| src/DocumentFormat.OpenXml.Framework/OpenXmlElement.cs | Routes InnerXml/OuterXml writing through feature-aware XmlDOMTextWriter. |
| src/DocumentFormat.OpenXml.Framework/OpenXmlCompositeElement.cs | Routes InnerXml wrapper writing through feature-aware XmlDOMTextWriter. |
| src/DocumentFormat.OpenXml.Framework/Features/XmlTextSanitizationExtensions.cs | Adds UseXmlTextSanitization() opt-in extension for packages. |
| src/DocumentFormat.OpenXml.Framework/Features/XmlCharacterSanitizer.cs | Implements core XML 1.0 character filtering logic. |
| src/DocumentFormat.OpenXml.Framework/Features/SanitizingXmlWriterFactoryFeature.cs | Factory feature that wraps writers with SanitizingXmlWriter. |
| src/DocumentFormat.OpenXml.Framework/Features/SanitizingXmlWriter.cs | XmlWriter decorator that sanitizes text-ish writes while leaving WriteRaw untouched. |
| src/DocumentFormat.OpenXml.Framework/Features/IXmlWriterFactoryFeature.cs | New public feature interface for centralized writer creation. |
| src/DocumentFormat.OpenXml.Framework/Features/DefaultXmlWriterFactoryFeature.cs | Default factory mapping to System.Xml.XmlWriter.Create. |
| src/DocumentFormat.OpenXml.Framework/Features/DefaultFeatures.cs | Registers IXmlWriterFactoryFeature in the default feature set. |
| docs/Features.md | Documents the new feature and the opt-in sanitization behavior. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (1)
src/DocumentFormat.OpenXml.Framework/Features/SanitizingXmlWriter.cs:204
WriteCharsAsynchas the same parameter-validation gap as the sync path: it only checksbuffer != nullbefore iterating/sanitizing. Validateindex/count(andindex + countoverflow) up front so invalid arguments produce consistentXmlWriter-style exceptions instead ofIndexOutOfRangeException.
public override Task WriteCharsAsync(char[] buffer, int index, int count)
{
if (buffer is null)
{
throw new ArgumentNullException(nameof(buffer));
}
- Files reviewed: 20/20 changed files
- Comments generated: 1
| { | ||
| throw new ArgumentNullException(nameof(buffer)); | ||
| } | ||
|
|
There was a problem hiding this comment.
WriteChars validates only buffer != null. If index/count are negative or out of range, the sanitizer loop will throw IndexOutOfRangeException (or worse on overflow) rather than the ArgumentOutOfRangeException/ArgumentException behavior callers expect from XmlWriter. Add explicit bounds validation (including index + count overflow) before calling TrySanitizeSpan/iterating, and mirror the inner writer’s exception types.
This issue also appears on line 199 of the same file.
| if (index < 0) | |
| { | |
| throw new ArgumentOutOfRangeException(nameof(index)); | |
| } | |
| if (count < 0) | |
| { | |
| throw new ArgumentOutOfRangeException(nameof(count)); | |
| } | |
| if (index > buffer.Length - count) | |
| { | |
| throw new ArgumentException("The sum of index and count is larger than the buffer length."); | |
| } |
Introduces
IXmlWriterFactoryFeatureand theUseXmlTextSanitization()extension onOpenXmlPackage. When enabled, invalid XML 1.0 characters (control chars, orphaned surrogates, non-characters) are silently removed from text content, attribute values, CDATA, comments, and PI bodies before the underlyingXmlWritersees them. Element/attribute names andWriteRawpayloads are untouched.Previously, saving a document with text like
"Hello\u0002World"— common when exporting data straight from legacy databases — threw deep insideSystem.Xml.XmlUtf8RawTextWriter. The feature is pay-for-play: packages that don't opt in pay nothing.Plumbing centralizes
XmlWritercreation behind the feature so thatOpenXmlPartWriter,OpenXmlPartRootElement.Save, and everyXmlDOMTextWritercall site (includingInnerXml/OuterXmlpaths) route through it, inherited transparently from package → part → element.Covers Word, Excel, and PowerPoint round-trip tests plus unit tests for the sanitizer, the decorator, and the feature registration.
fixes #1532