Skip to content

Commit 0c89e9b

Browse files
authored
Ensure fully deterministic zip output (#45)
* Ensure fully deterministic ZIP output across net48 and net10.0 - Replace framework ZipArchive writing with raw ZipStorer to guarantee compression method 0 (Stored) on all frameworks (net48 ignores CompressionLevel.NoCompression) - Write raw zlib stored blocks in PngNormalizer to avoid DEFLATE differences between framework ZLibStream implementations - Sort ZIP entries by name using ordinal comparison - Sort [Content_Types].xml elements deterministically (ContentTypesPatcher) - Renumber all relationship IDs in .rels files to DeterministicId{n} and remap corresponding r:id references in content XML - Add verification tests for stored entries, deterministic relationship IDs, and sorted content types * . * . * .
1 parent 3debc5a commit 0c89e9b

258 files changed

Lines changed: 311 additions & 2145 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

claude.md

Lines changed: 34 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -43,12 +43,37 @@ The codebase uses a patcher pattern for normalizing OOXML content:
4343
- Each patcher implements `IPatcher` interface
4444
- `IsMatch(Entry entry)` - determines which files the patcher applies to (e.g., "word/document.xml")
4545
- `PatchXml(XDocument xml)` - modifies the XML in-place
46-
- Register patchers in `DeterministicPackage.cs` patchers list
47-
- **Order matters** - patchers run in sequence
46+
- Register patchers via `CreatePatchers()` factory in `DeterministicPackage.cs` (fresh instance per conversion)
47+
- **Order matters** - relationship patchers must run before their content patchers (e.g., `WorkbookRelationshipPatcher` before `WorkbookPatcher`)
48+
49+
#### Paired Patchers
50+
51+
Some patchers work in pairs: a relationship patcher renumbers IDs in `.rels` files and stores the mapping, then a content patcher remaps `r:id` references in the corresponding XML:
52+
53+
- `WorkbookRelationshipPatcher``WorkbookPatcher` (xl/_rels/workbook.xml.rels → xl/workbook.xml)
54+
- `DocumentRelationshipPatcher``DocumentPatcher` (word/_rels/document.xml.rels → word/document.xml)
55+
56+
The content patcher receives the relationship patcher via constructor injection.
57+
58+
#### Relationship ID Renumbering
59+
60+
`RelationshipRenumber` (in IPatcher.cs) provides shared helpers:
61+
- `RenumberAndSort(XDocument)` — sorts relationships by Type+Target, renumbers to `DeterministicId{n}`, returns old→new mapping
62+
- `RemapIds(XDocument, mapping)` — replaces `r:id` attribute values in content XML using the mapping
63+
64+
#### Content Types Sorting
65+
66+
`ContentTypesPatcher` sorts `[Content_Types].xml` elements by local name, then Extension, then PartName to ensure deterministic order across frameworks.
67+
68+
### ZIP Output
69+
70+
- `ZipStorer` rewrites ZIP archives with all entries using compression method 0 (Stored), bypassing net48's `ZipArchive` which ignores `CompressionLevel.NoCompression`
71+
- Entries are sorted by `FullName` using `StringComparer.Ordinal`
72+
- `PngNormalizer` writes raw zlib stored blocks (CMF+FLG + DEFLATE stored blocks + Adler-32) instead of using `ZLibStream`, which produces different output on net48 vs net10.0
4873

4974
Example patcher structure:
5075
```csharp
51-
class DocumentPatcher : IPatcher
76+
class DocumentPatcher(DocumentRelationshipPatcher relsPatcher) : IPatcher
5277
{
5378
static XNamespace wp = "http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing";
5479
static XNamespace pic = "http://schemas.openxmlformats.org/drawingml/2006/picture";
@@ -58,9 +83,12 @@ class DocumentPatcher : IPatcher
5883

5984
public void PatchXml(XDocument xml)
6085
{
61-
var root = xml.Root!;
62-
var elements = root.Descendants(wp + "docPr").ToList();
63-
// Normalize IDs...
86+
// Normalize drawing IDs...
87+
// Then remap relationship IDs
88+
if (relsPatcher.IdMapping.Count > 0)
89+
{
90+
RelationshipRenumber.RemapIds(xml, relsPatcher.IdMapping);
91+
}
6492
}
6593
}
6694
```

readme.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ Example file formats that leverage System.IO.Packaging
4444
using var sourceStream = File.OpenRead(packagePath);
4545
await DeterministicPackage.ConvertAsync(sourceStream, targetStream);
4646
```
47-
<sup><a href='/src/Tests/Tests.cs#L174-L179' title='Snippet source file'>snippet source</a> | <a href='#snippet-ConvertAsync' title='Start of snippet'>anchor</a></sup>
47+
<sup><a href='/src/Tests/Tests.cs#L226-L231' title='Snippet source file'>snippet source</a> | <a href='#snippet-ConvertAsync' title='Start of snippet'>anchor</a></sup>
4848
<!-- endSnippet -->
4949

5050

@@ -56,7 +56,7 @@ await DeterministicPackage.ConvertAsync(sourceStream, targetStream);
5656
using var sourceStream = File.OpenRead(packagePath);
5757
await DeterministicPackage.ConvertAsync(sourceStream, targetStream);
5858
```
59-
<sup><a href='/src/Tests/Tests.cs#L174-L179' title='Snippet source file'>snippet source</a> | <a href='#snippet-ConvertAsync' title='Start of snippet'>anchor</a></sup>
59+
<sup><a href='/src/Tests/Tests.cs#L226-L231' title='Snippet source file'>snippet source</a> | <a href='#snippet-ConvertAsync' title='Start of snippet'>anchor</a></sup>
6060
<!-- endSnippet -->
6161

6262

src/DeterministicIoPackaging/DeterministicPackage.cs

Lines changed: 22 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -5,18 +5,24 @@ public static partial class DeterministicPackage
55
public static DateTime StableDate { get; } = new(2020, 1, 1, 0, 0, 0, DateTimeKind.Utc);
66
public static DateTimeOffset StableDateOffset { get; } = new(StableDate);
77

8-
static IReadOnlyList<IPatcher> patchers =
9-
[
10-
new RelationshipPatcher(),
11-
new SheetPatcher(),
12-
new WorkbookPatcher(),
13-
new WorkbookRelationshipPatcher(),
14-
new CorePatcher(),
15-
new SheetRelationshipPatcher(),
16-
new DocumentRelationshipPatcher(),
17-
new DocumentPatcher(),
18-
new NumberingPatcher()
19-
];
8+
static IReadOnlyList<IPatcher> CreatePatchers()
9+
{
10+
var workbookRelsPatcher = new WorkbookRelationshipPatcher();
11+
var documentRelsPatcher = new DocumentRelationshipPatcher();
12+
return
13+
[
14+
new ContentTypesPatcher(),
15+
new RelationshipPatcher(),
16+
new SheetPatcher(),
17+
workbookRelsPatcher,
18+
new WorkbookPatcher(workbookRelsPatcher),
19+
new CorePatcher(),
20+
new SheetRelationshipPatcher(),
21+
documentRelsPatcher,
22+
new DocumentPatcher(documentRelsPatcher),
23+
new NumberingPatcher()
24+
];
25+
}
2026

2127
static Archive CreateArchive(Stream target) => new(target, ZipArchiveMode.Create, leaveOpen: true);
2228

@@ -30,7 +36,7 @@ static Archive ReadArchive(Stream source)
3036
return new(source, ZipArchiveMode.Read, leaveOpen: true);
3137
}
3238

33-
static void DuplicateEntry(Entry sourceEntry, Archive targetArchive)
39+
static void DuplicateEntry(Entry sourceEntry, Archive targetArchive, IReadOnlyList<IPatcher> currentPatchers)
3440
{
3541
if (IsPsmdcp(sourceEntry))
3642
{
@@ -41,7 +47,7 @@ static void DuplicateEntry(Entry sourceEntry, Archive targetArchive)
4147
var targetEntry = CreateEntry(sourceEntry, targetArchive);
4248
using var targetStream = targetEntry.Open();
4349

44-
foreach (var patcher in patchers)
50+
foreach (var patcher in currentPatchers)
4551
{
4652
if (!patcher.IsMatch(sourceEntry))
4753
{
@@ -63,7 +69,7 @@ static void DuplicateEntry(Entry sourceEntry, Archive targetArchive)
6369
sourceStream.CopyTo(targetStream);
6470
}
6571

66-
static async Task DuplicateEntryAsync(Entry sourceEntry, Archive targetArchive, Cancel cancel)
72+
static async Task DuplicateEntryAsync(Entry sourceEntry, Archive targetArchive, IReadOnlyList<IPatcher> currentPatchers, Cancel cancel)
6773
{
6874
if (IsPsmdcp(sourceEntry))
6975
{
@@ -73,7 +79,7 @@ static async Task DuplicateEntryAsync(Entry sourceEntry, Archive targetArchive,
7379
using var sourceStream = await sourceEntry.OpenAsync(cancel);
7480
var targetEntry = CreateEntry(sourceEntry, targetArchive);
7581
using var targetStream = await targetEntry.OpenAsync(cancel);
76-
foreach (var patcher in patchers)
82+
foreach (var patcher in currentPatchers)
7783
{
7884
if (!patcher.IsMatch(sourceEntry))
7985
{

src/DeterministicIoPackaging/DeterministicPackage_Convert.cs

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,13 +20,14 @@ public static async Task<MemoryStream> ConvertAsync(Stream source)
2020

2121
public static void Convert(Stream source, Stream target)
2222
{
23+
var patchers = CreatePatchers();
2324
var intermediate = new MemoryStream();
2425
using (var sourceArchive = ReadArchive(source))
2526
using (var targetArchive = CreateArchive(intermediate))
2627
{
2728
foreach (var sourceEntry in sourceArchive.OrderedEntries())
2829
{
29-
DuplicateEntry(sourceEntry, targetArchive);
30+
DuplicateEntry(sourceEntry, targetArchive, patchers);
3031
}
3132
}
3233

@@ -35,13 +36,14 @@ public static void Convert(Stream source, Stream target)
3536

3637
public static async Task ConvertAsync(Stream source, Stream target, Cancel token = default)
3738
{
39+
var patchers = CreatePatchers();
3840
var intermediate = new MemoryStream();
3941
using (var sourceArchive = ReadArchive(source))
4042
using (var targetArchive = CreateArchive(intermediate))
4143
{
4244
foreach (var sourceEntry in OrderedEntries(sourceArchive))
4345
{
44-
await DuplicateEntryAsync(sourceEntry, targetArchive, token);
46+
await DuplicateEntryAsync(sourceEntry, targetArchive, patchers, token);
4547
}
4648
}
4749

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
1-
global using System.Globalization;
21
global using System.IO.Compression;
3-
global using System.Xml.Linq;
2+
global using System.IO.Hashing;
3+
global using System.Xml.Linq;
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
class ContentTypesPatcher : IPatcher
2+
{
3+
public bool IsMatch(Entry entry) =>
4+
entry.FullName is "[Content_Types].xml";
5+
6+
public void PatchXml(XDocument xml)
7+
{
8+
var root = xml.Root!;
9+
var elements = root.Elements()
10+
.OrderBy(_ => _.Name.LocalName)
11+
.ThenBy(_ => (string?)_.Attribute("Extension") ?? "")
12+
.ThenBy(_ => (string?)_.Attribute("PartName") ?? "")
13+
.ToList();
14+
15+
root.ReplaceAll(elements);
16+
}
17+
}

src/DeterministicIoPackaging/Patching/DocumentPatcher.cs

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
class DocumentPatcher : IPatcher
1+
class DocumentPatcher(DocumentRelationshipPatcher relsPatcher) : IPatcher
22
{
33
static XNamespace wp = "http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing";
44
static XNamespace pic = "http://schemas.openxmlformats.org/drawingml/2006/picture";
@@ -23,5 +23,10 @@ public void PatchXml(XDocument xml)
2323
// Use index + 1 for 1-based numbering (common in Office Open XML)
2424
elementsWithIds[i].Attribute("id")!.Value = (i + 1).ToString();
2525
}
26+
27+
if (relsPatcher.IdMapping.Count > 0)
28+
{
29+
RelationshipRenumber.RemapIds(xml, relsPatcher.IdMapping);
30+
}
2631
}
2732
}
Lines changed: 4 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,10 @@
11
class DocumentRelationshipPatcher : IPatcher
22
{
3+
internal Dictionary<string, string> IdMapping { get; private set; } = [];
4+
35
public bool IsMatch(Entry entry) =>
46
entry.FullName is "word/_rels/document.xml.rels";
57

6-
public void PatchXml(XDocument xml)
7-
{
8-
var root = xml.Root!;
9-
var relationships = root.Elements()
10-
.OrderBy(_ => _.Attribute("Type")!.Value)
11-
.ThenBy(_ => _.Attribute("Target")!.Value)
12-
.ToList();
13-
root.ReplaceAll(relationships);
14-
}
8+
public void PatchXml(XDocument xml) =>
9+
IdMapping = RelationshipRenumber.RenumberAndSort(xml);
1510
}
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
interface IPatcher
1+
interface IPatcher
22
{
33
public void PatchXml(XDocument xml);
44
public bool IsMatch(Entry entry);
5-
}
5+
}

src/DeterministicIoPackaging/Patching/RelationshipPatcher.cs

Lines changed: 4 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -6,28 +6,18 @@ public bool IsMatch(Entry entry) =>
66
public void PatchXml(XDocument xml)
77
{
88
var root = xml.Root!;
9-
var relationships = root.Elements()
10-
.OrderBy(_ => _.Attribute("Type")!.Value)
11-
.ThenBy(_ => _.Attribute("Target")!.Value)
12-
.ToList();
139

14-
foreach (var element in relationships.Where(IsPsmdcpElement).ToList())
10+
foreach (var element in root.Elements().Where(IsPsmdcpElement).ToList())
1511
{
16-
relationships.Remove(element);
12+
element.Remove();
1713
}
1814

19-
for (var index = 0; index < relationships.Count; index++)
20-
{
21-
var relationship = relationships[index];
22-
relationship.Attribute("Id")!.SetValue($"DeterministicId{index + 1}");
23-
}
24-
25-
root.ReplaceAll(relationships);
15+
RelationshipRenumber.RenumberAndSort(xml);
2616
}
2717

2818
static bool IsPsmdcpElement(XElement element)
2919
{
3020
var target = element.Attribute("Target")!;
3121
return target.Value.EndsWith(".psmdcp");
3222
}
33-
}
23+
}

0 commit comments

Comments
 (0)