Skip to content

fix(docx): avoid panic on save, make AnyDocument.save() extension-aware#88

Merged
pratyush618 merged 2 commits into
mainfrom
fix/docx-save-panic-and-extension-aware-save
May 25, 2026
Merged

fix(docx): avoid panic on save, make AnyDocument.save() extension-aware#88
pratyush618 merged 2 commits into
mainfrom
fix/docx-save-panic-and-extension-aware-save

Conversation

@pratyush618
Copy link
Copy Markdown
Collaborator

Summary

  • DOCX save panic. DocxDocument::save_to_bytes round-tripped through docx-rs's writer, which panics in BuildXML for RunChild on RunChild::InstrTextString (reader-only variant, writer arm is unreachable!()). Any DOCX with PAGE, NUMPAGES, PAGEREF, TOC, or HYPERLINK fields crashed doc.save(...). Upstream issue: bokuweb/docx-rs#750, still open; introduced by PR #528. Fix: return the stashed input bytes — paperjam-docx exposes no DOCX mutation API, so rebuilding was both lossy and panic-prone for no benefit.
  • AnyDocument.save(path) ignored the target extension. doc.save("out.pdf") on a DOCX wrote DOCX bytes into a .pdf file. Now routes through convert_to(ext) when the target extension differs from self.format; otherwise writes original bytes unchanged.

Repro (panic + wrong-format file) from the bug report:

import paperjam as pj
doc = pj.open("CurriculumGuideGibbsHighSchool.docx")
doc.save("CurriculumGuideGibbsHighSchool.pdf")  # before: panic; even past the panic, output was DOCX bytes

After this PR: produces a valid 62-page PDF (%PDF-1.7).

Test plan

  • uv run maturin develop --release — builds clean
  • Re-ran the reported repro on CurriculumGuideGibbsHighSchool.docx — no panic, output opens as a valid PDF with page_count == 62 and extractable text
  • Pre-commit (clippy, cargo fmt, ruff, mypy) — passes on both commits
  • Existing cargo test --workspace / pytest tests/python/ should remain green (no behavior change for same-format saves; only adds cross-format routing)

Notes / follow-ups

  • The DOCX fix should be revisited if/when paperjam grows a DOCX editing API and upstream docx-rs#750 lands — then switching back to inner.build().pack() becomes meaningful.
  • The same save(path) UX gap exists on the PDF Document.save() path (writes PDF regardless of extension). Out of scope here; can address in a follow-up if desired.

@github-actions github-actions Bot added rust Pull requests that update rust code python Pull requests that update Python code labels May 25, 2026
@pratyush618 pratyush618 self-assigned this May 25, 2026
@pratyush618 pratyush618 merged commit 21c8fe0 into main May 25, 2026
14 checks passed
@pratyush618 pratyush618 deleted the fix/docx-save-panic-and-extension-aware-save branch May 25, 2026 02:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python Pull requests that update Python code rust Pull requests that update rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant