Skip to content

fix(converter): escape XML entities in OPC attribute values on export (SD-2888)#3180

Merged
tupizz merged 2 commits intomainfrom
tadeu/sd-2888-bug-unescaped-ampersand-in-docx-export-causes-unreadable
May 6, 2026
Merged

fix(converter): escape XML entities in OPC attribute values on export (SD-2888)#3180
tupizz merged 2 commits intomainfrom
tadeu/sd-2888-bug-unescaped-ampersand-in-docx-export-causes-unreadable

Conversation

@tupizz
Copy link
Copy Markdown
Contributor

@tupizz tupizz commented May 6, 2026

Summary

  • Fix unescaped & in word/_rels/document.xml.rels that caused Word to show "unreadable content" on opening exported DOCX files. Linear: SD-2888
  • Add a shared serializeOpcXml helper that escapes &, <, > in OPC attribute values so xml-js's js2xml round-trips entities correctly
  • Route both reconcile-document-relationships and sync-package-metadata through it (closes the same latent bug for _rels/.rels and [Content_Types].xml)

Root cause

xml-js's xml2js decodes &amp;& in attribute values; js2xml does not re-escape on output. When reconcileDocumentRelationships rewrites word/_rels/document.xml.rels (e.g. to add the managed numbering relationship), every existing hyperlink Target containing &amp; is emitted with a bare &, producing malformed XML. Word's repair-on-open then applies default fonts (Times New Roman → Aptos), table cell widths, and paragraph spacing, which look like unrelated regressions but are downstream effects of the same root cause.

The trick in the helper: xml-js calls attributeValueFn with the value after pre-escaping " to &quot;. A naïve &&amp; would double-escape that token. The helper pivots &quot; through a placeholder, escapes &/</>, then restores &quot;.

Files changed

  • new packages/super-editor/src/editors/v1/core/opc/xml-serialization.js (helper)
  • new packages/super-editor/src/editors/v1/core/opc/xml-serialization.test.js (5 unit tests)
  • reconcile-document-relationships.js — call site swap
  • reconcile-document-relationships.test.js — 2 SD-2888 regression tests
  • sync-package-metadata.js — call site swap

Test plan

  • pnpm --filter super-editor test (all 12,367 tests pass; 7 new)
  • OPC suite: npx vitest run src/editors/v1/core/opc (43/43 pass)
  • End-to-end browser repro: upload SD-2888 input.docx → editor.exportDocx() → unzip → verify word/_rels/document.xml.rels contains &amp;companyId (3×, matching rId8/9/10 in the bug) and zero bare &companyId
  • Validate every XML file in the exported package (16 files) parses as well-formed
  • Open the exported file in Microsoft Word and confirm no "unreadable content" repair prompt appears
  • Confirm font (Times New Roman), table widths, and 1.0 line spacing are preserved when opening in Word

@linear
Copy link
Copy Markdown

linear Bot commented May 6, 2026

@tupizz tupizz self-assigned this May 6, 2026
… (SD-2888)

xml-js's js2xml does not re-escape &, <, > in attribute values, so a hyperlink
Target containing &amp; was decoded on parse and written back as a bare &
during rels reconciliation. The malformed word/_rels/document.xml.rels caused
Word to show an "unreadable content" repair prompt and apply default formatting
(font, table widths, line spacing) on top of the otherwise-correct document.

Adds a serializeOpcXml helper that escapes attribute values via
attributeValueFn (using a placeholder pivot to avoid double-escaping the &quot;
xml-js pre-injects), and routes both reconcile-document-relationships and
sync-package-metadata through it. The latter closes the same latent bug for
_rels/.rels and [Content_Types].xml.
@tupizz tupizz force-pushed the tadeu/sd-2888-bug-unescaped-ampersand-in-docx-export-causes-unreadable branch from 45ba9da to db110d3 Compare May 6, 2026 11:46
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@tupizz tupizz marked this pull request as ready for review May 6, 2026 14:34
@tupizz tupizz requested a review from a team as a code owner May 6, 2026 14:34
@tupizz tupizz requested a review from luccas-harbour May 6, 2026 14:35
Word-authored .docx with `&` in a hyperlink Target now exercised through
zero-edit export. Catches the bare-`&` malformed-rels regression at the
integration level; existing unit tests cover the OPC helper directly.
Copy link
Copy Markdown
Contributor

@caio-pizzol caio-pizzol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @tupizz! good call centralizing this in serializeOpcXml :)

i pushed a behavior test that exercises the export path against a Word-authored fixture - catches the bare-& regression at the integration level (the unit tests cover the helper directly).

lgtm.

@tupizz tupizz merged commit b5490c3 into main May 6, 2026
71 of 72 checks passed
@tupizz tupizz deleted the tadeu/sd-2888-bug-unescaped-ampersand-in-docx-export-causes-unreadable branch May 6, 2026 18:41
@superdoc-bot
Copy link
Copy Markdown
Contributor

superdoc-bot Bot commented May 6, 2026

🎉 This PR is included in @superdoc-dev/mcp v0.3.0-next.63

The release is available on GitHub release

@superdoc-bot
Copy link
Copy Markdown
Contributor

superdoc-bot Bot commented May 6, 2026

🎉 This PR is included in @superdoc-dev/react v1.2.0-next.105

The release is available on GitHub release

@superdoc-bot
Copy link
Copy Markdown
Contributor

superdoc-bot Bot commented May 6, 2026

🎉 This PR is included in superdoc-cli v0.8.0-next.79

The release is available on GitHub release

@superdoc-bot
Copy link
Copy Markdown
Contributor

superdoc-bot Bot commented May 6, 2026

🎉 This PR is included in vscode-ext v2.3.0-next.107

@superdoc-bot
Copy link
Copy Markdown
Contributor

superdoc-bot Bot commented May 6, 2026

🎉 This PR is included in superdoc v1.30.0-next.61

The release is available on GitHub release

@superdoc-bot
Copy link
Copy Markdown
Contributor

superdoc-bot Bot commented May 6, 2026

🎉 This PR is included in superdoc-sdk v1.8.0-next.61

@superdoc-bot
Copy link
Copy Markdown
Contributor

superdoc-bot Bot commented May 7, 2026

🎉 This PR is included in superdoc-cli v0.9.0

The release is available on GitHub release

@superdoc-bot
Copy link
Copy Markdown
Contributor

superdoc-bot Bot commented May 7, 2026

🎉 This PR is included in superdoc v1.32.0

The release is available on GitHub release

@superdoc-bot
Copy link
Copy Markdown
Contributor

superdoc-bot Bot commented May 7, 2026

🎉 This PR is included in @superdoc-dev/mcp v0.4.0

The release is available on GitHub release

@superdoc-bot
Copy link
Copy Markdown
Contributor

superdoc-bot Bot commented May 7, 2026

🎉 This PR is included in @superdoc-dev/react v1.3.0

The release is available on GitHub release

@superdoc-bot
Copy link
Copy Markdown
Contributor

superdoc-bot Bot commented May 7, 2026

🎉 This PR is included in vscode-ext v2.4.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants