Security hardening: ZIP entry caps + MCP path sandbox#69
Merged
Conversation
…nput EPUB, PPTX, and DOCX are ZIP archives, and the format crates were reading every entry with unbounded `read_to_string` / `read_to_end`. A crafted archive could declare a tiny compressed size that expands to multi-GB, exhausting memory. Each zip-based crate now has a `safe_read` helper (or inline equivalent in DOCX's single metadata reader) that: - rejects entries whose declared size exceeds a per-entry cap (100 MB for EPUB/PPTX, 16 MB for DOCX's tiny `core.xml`) - uses `Read::take(cap + 1)` as defense in depth against archives that lie about their declared size XLSX and PPTX also sized `Vec::with_capacity` directly from attacker-controlled counts (workbook sheet list, archive slide listing); both now cap the preallocation at a sane ceiling. Tests cover the normal path and the oversized-entry rejection in paperjam-epub.
`resolve_path` previously accepted any absolute path and did no `..` normalisation, so an MCP client could read or write anywhere the server process had access. The server is typically launched by a local assistant with the caller's full user privileges, so this was an open-ended filesystem primitive. resolve_path now: - canonicalises both the working directory and the candidate path (falling back to canonicalising the nearest existing ancestor so that save operations to new files still work) - rejects any resolved path that is not contained within the working directory, with a structured `McpError::PathEscapesSandbox` error The CLI gains `--allow-absolute-paths` for users who relied on the old permissive behaviour, paired with a new `ServerConfig` / `PaperjamServer::with_config` constructor so callers can make the same choice programmatically. Defaults are safe; opting out is explicit. Tests cover the allowed, rejected, traversal, non-existent-child, and opt-out cases.
6 tasks
pratyush618
added a commit
that referenced
this pull request
Apr 24, 2026
* chore: add rust-toolchain and justfile for consistent dev tooling rust-toolchain.toml pins every contributor and CI invocation to the same stable toolchain with rustfmt, clippy, and the wasm32-unknown-unknown target. Previously CI used dtolnay/rust-toolchain@stable while contributors installed their own; minor version drift between them could produce clippy lint discrepancies at merge time. justfile captures the common build / test / lint commands documented in CLAUDE.md as executable recipes. `just` (no args) prints the full list, and the common flows (build, test, check, fmt, clean-all) are one step each so local iteration matches the pre-commit chain. * chore(async): stop force-enabling signatures/validation on core paperjam-async currently only reaches into paperjam_core::render, yet its manifest force-enabled the signatures and validation features on paperjam-core for every consumer. Downstream crates that need those features (paperjam-py does, explicitly) keep working unchanged; lightweight async consumers no longer drag in the x509-parser / cms / rsa / p256 / sha1 / pkcs8 / spki / ureq / rustls / roxmltree tree. * docs: crate-level rustdoc across the workspace Every library crate now has a `//!` summary describing its scope, its entry points, and how it fits into the broader paperjam ecosystem. Uniform style: plain prose, no intra-doc links in crate-level summaries (simpler to maintain, no rustdoc link warnings to manage). Also fixes two pre-existing rustdoc warnings uncovered along the way: an `[OPTIONAL]` literal in signature/tsa.rs that rustdoc was parsing as an intra-doc link, and a bare URL in model/annotations.rs flagged for auto-linking. The PyO3 `PyDocument` and `PyPage` classes get class-level docs that clarify they are the native layer beneath the pure-Python `paperjam.Document` / `paperjam.Page` wrappers. After this commit `cargo doc --workspace --no-deps` produces zero warnings. * chore(ci): run docs workflow on PRs and install wasm-opt The docs workflow previously fired only on pushes to main, so docs regressions (broken wasm builds, Docusaurus compile errors, bad links) were invisible until after merge. Now PRs with matching paths run the full build (without deploying) so problems surface in the PR check run. Also installs binaryen, whose wasm-opt binary wasm-pack invokes automatically when present on PATH. Release-mode WASM bundles shrink by 20-30% with no code changes. Concurrency group is keyed on ref so PR builds and deploy builds don't cancel each other; the deploy job is skipped on pull_request events to preserve production pages behaviour. * docs(changelog): record [Unreleased] entries since 0.2.0 Document the audit-driven work that has landed on main but hasn't been cut into a release yet: the ZIP-entry and MCP sandbox security hardening (#69), the panic-surface cleanup in the PDF engine (#70), the form-bindings stub sync and metadata / docs refresh (#68), plus the tooling, docs, and paperjam-async feature adjustments from this polish branch. * fix(ci): install pinned binaryen release instead of apt binaryen Ubuntu's apt-shipped binaryen is ~v108, which predates the default enablement of bulk-memory and sign-extension instructions in rustc output. The result is wasm-pack invoking /usr/bin/wasm-opt on a valid modern wasm module and wasm-opt rejecting it with "[wasm-validator error] Bulk memory operation (bulk memory is disabled)" — observed on the PR #71 run. Download and install a pinned binaryen release tarball from the upstream GitHub releases page. version_119 is known-good against the current rustc and supports all default features. Future bumps change one env var. * chore(ci): verify binaryen tarball checksum and cache across runs Harden the binaryen install step that landed in the previous commit: - SHA256-pin the downloaded tarball (value verified against a local download of version_119). Guards against upstream tampering or an accidental silent swap. - Split the version-check into a dedicated Verify step so the log shows the installed wasm-opt version unambiguously. - Wrap the install in actions/cache keyed on the pinned version so subsequent runs skip the download. Saves ~3-5s per run. * fix(wasm): tell wasm-pack to enable bulk-memory and sign-ext in wasm-opt rustc 1.82+ emits bulk-memory and sign-extension instructions in its default wasm output. wasm-pack's baseline wasm-opt invocation ("-O") does not pass --enable-bulk-memory / --enable-sign-ext, so even a modern binaryen rejects the module with "Bulk memory operations require bulk memory [--enable-bulk-memory]" during validation. Configure the flags in paperjam-wasm's Cargo.toml metadata block so wasm-pack invokes wasm-opt with the right feature set. This is what was blocking CI #71 even after installing a modern binaryen. * fix(wasm): extend wasm-opt feature set to the full rustc default list Rust 1.87 / LLVM 20 enabled bulk-memory and nontrapping-fptoint in the default wasm32-unknown-unknown feature set, alongside the previously-defaulted multivalue, mutable-globals, reference-types, and sign-ext. wasm-pack's baseline "-O" invocation of wasm-opt does not pass any of them, so the optimiser rejects a perfectly valid rustc-emitted module. The previous commit only enabled bulk-memory and sign-ext, which exposed a follow-on validator error on `i32.trunc_sat_f64_s` (nontrapping-fptoint). Rather than re-play whack-a-mole for each feature, pass the full list that matches the rustc default set documented in the wasm32-unknown-unknown platform-support page. Ref: https://doc.rust-lang.org/rustc/platform-support/wasm32-unknown-unknown.html
This was referenced Apr 24, 2026
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Second batch from the codebase audit. Closes the top-priority security items.
read_to_string/read_to_endon every archive entry with no size check, so a zip-bomb-style archive could declare tiny compressed sizes and inflate to multi-GB. Newsafe_readhelpers reject entries exceeding a per-entry cap (100 MB for EPUB/PPTX, 16 MB for DOCX metadata), andRead::take(cap + 1)protects against archives that lie about their declared size. XLSX and PPTX also cappedVec::with_capacityon attacker-controlled counts.resolve_pathnow canonicalises and enforces containment within the configuredworking_dir. Prior behavior accepted any absolute path and did no..normalisation — an MCP client could read or write anywhere the server process could. NewServerConfig { working_dir, allow_absolute_paths }+--allow-absolute-pathsCLI flag preserves opt-out for users who need the old behavior.Behavior change
The MCP sandbox is a breaking change for clients passing absolute paths outside
working_dir. The error message guides users to the--allow-absolute-pathsopt-out. Default is safe; opt-out is explicit.What's still outstanding from the audit
f64::partial_cmp().unwrap()in table detection;get_object_mut().unwrap()/as_dict_mut().unwrap()in stamp/watermark/validation) — 15+ sites, going out in a dedicated branch.paperjam-epub→paperjam-html) — needs your call on amend-CLAUDE.md vs. refactor.rust-toolchain.toml, CHANGELOG[Unreleased], crate-level//!rustdoc, emptyrag/dirs,paperjam-asyncfeature drag) — candidates for a follow-up cleanup branch.Test plan
cargo test --workspace— new tests: 2 epub + 5 mcp (+ existing 4 xlsx)cargo clippy --workspace --all-targets -- -D warningscargo fmt --all --checkuv run pytest tests/python/— 88 passed, 4 skippedpre-commit run --all-files— all hooks passNotes on scope
The full SafeZip design (total-bytes budget, entry-count cap, compression-ratio cap) is deferred to a future change — this PR handles the dominant vector (single oversized entry) and lays the foundation with the
safe_readmodules.