Skip to content

Security hardening: ZIP entry caps + MCP path sandbox#69

Merged
pratyush618 merged 3 commits into
mainfrom
fix/security-hardening
Apr 24, 2026
Merged

Security hardening: ZIP entry caps + MCP path sandbox#69
pratyush618 merged 3 commits into
mainfrom
fix/security-hardening

Conversation

@pratyush618

Copy link
Copy Markdown
Collaborator

Summary

Second batch from the codebase audit. Closes the top-priority security items.

  • fix(formats): bound ZIP entry reads and preallocations on untrusted input. EPUB/PPTX/DOCX parsers previously called read_to_string / read_to_end on every archive entry with no size check, so a zip-bomb-style archive could declare tiny compressed sizes and inflate to multi-GB. New safe_read helpers reject entries exceeding a per-entry cap (100 MB for EPUB/PPTX, 16 MB for DOCX metadata), and Read::take(cap + 1) protects against archives that lie about their declared size. XLSX and PPTX also capped Vec::with_capacity on attacker-controlled counts.
  • feat(mcp): resolve_path now canonicalises and enforces containment within the configured working_dir. Prior behavior accepted any absolute path and did no .. normalisation — an MCP client could read or write anywhere the server process could. New ServerConfig { working_dir, allow_absolute_paths } + --allow-absolute-paths CLI flag preserves opt-out for users who need the old behavior.

Behavior change

The MCP sandbox is a breaking change for clients passing absolute paths outside working_dir. The error message guides users to the --allow-absolute-paths opt-out. Default is safe; opt-out is explicit.

What's still outstanding from the audit

  • Panic surfaces on untrusted PDFs (f64::partial_cmp().unwrap() in table detection; get_object_mut().unwrap() / as_dict_mut().unwrap() in stamp/watermark/validation) — 15+ sites, going out in a dedicated branch.
  • Layer discipline (paperjam-epubpaperjam-html) — needs your call on amend-CLAUDE.md vs. refactor.
  • Rust test gap — 13/15 crates still have no tests.
  • Easy polish items (rust-toolchain.toml, CHANGELOG [Unreleased], crate-level //! rustdoc, empty rag/ dirs, paperjam-async feature drag) — candidates for a follow-up cleanup branch.

Test plan

  • cargo test --workspace — new tests: 2 epub + 5 mcp (+ existing 4 xlsx)
  • cargo clippy --workspace --all-targets -- -D warnings
  • cargo fmt --all --check
  • uv run pytest tests/python/ — 88 passed, 4 skipped
  • pre-commit run --all-files — all hooks pass

Notes on scope

The full SafeZip design (total-bytes budget, entry-count cap, compression-ratio cap) is deferred to a future change — this PR handles the dominant vector (single oversized entry) and lays the foundation with the safe_read modules.

…nput

EPUB, PPTX, and DOCX are ZIP archives, and the format crates were
reading every entry with unbounded `read_to_string` / `read_to_end`.
A crafted archive could declare a tiny compressed size that expands
to multi-GB, exhausting memory.

Each zip-based crate now has a `safe_read` helper (or inline
equivalent in DOCX's single metadata reader) that:

- rejects entries whose declared size exceeds a per-entry cap
  (100 MB for EPUB/PPTX, 16 MB for DOCX's tiny `core.xml`)
- uses `Read::take(cap + 1)` as defense in depth against archives
  that lie about their declared size

XLSX and PPTX also sized `Vec::with_capacity` directly from
attacker-controlled counts (workbook sheet list, archive slide
listing); both now cap the preallocation at a sane ceiling.

Tests cover the normal path and the oversized-entry rejection in
paperjam-epub.
`resolve_path` previously accepted any absolute path and did no
`..` normalisation, so an MCP client could read or write anywhere
the server process had access. The server is typically launched by
a local assistant with the caller's full user privileges, so this
was an open-ended filesystem primitive.

resolve_path now:

- canonicalises both the working directory and the candidate path
  (falling back to canonicalising the nearest existing ancestor so
  that save operations to new files still work)
- rejects any resolved path that is not contained within the
  working directory, with a structured
  `McpError::PathEscapesSandbox` error

The CLI gains `--allow-absolute-paths` for users who relied on the
old permissive behaviour, paired with a new `ServerConfig` /
`PaperjamServer::with_config` constructor so callers can make the
same choice programmatically. Defaults are safe; opting out is
explicit.

Tests cover the allowed, rejected, traversal, non-existent-child,
and opt-out cases.
@github-actions github-actions Bot added rust Pull requests that update rust code mcp MCP server / paperjam-mcp crate labels Apr 24, 2026
@pratyush618 pratyush618 merged commit cb96b63 into main Apr 24, 2026
13 checks passed
@pratyush618 pratyush618 deleted the fix/security-hardening branch April 24, 2026 10:06
@pratyush618 pratyush618 restored the fix/security-hardening branch April 24, 2026 10:06
pratyush618 added a commit that referenced this pull request Apr 24, 2026
* chore: add rust-toolchain and justfile for consistent dev tooling

rust-toolchain.toml pins every contributor and CI invocation to the
same stable toolchain with rustfmt, clippy, and the
wasm32-unknown-unknown target. Previously CI used
dtolnay/rust-toolchain@stable while contributors installed their own;
minor version drift between them could produce clippy lint
discrepancies at merge time.

justfile captures the common build / test / lint commands documented
in CLAUDE.md as executable recipes. `just` (no args) prints the full
list, and the common flows (build, test, check, fmt, clean-all) are
one step each so local iteration matches the pre-commit chain.

* chore(async): stop force-enabling signatures/validation on core

paperjam-async currently only reaches into paperjam_core::render, yet
its manifest force-enabled the signatures and validation features on
paperjam-core for every consumer. Downstream crates that need those
features (paperjam-py does, explicitly) keep working unchanged;
lightweight async consumers no longer drag in the x509-parser / cms /
rsa / p256 / sha1 / pkcs8 / spki / ureq / rustls / roxmltree tree.

* docs: crate-level rustdoc across the workspace

Every library crate now has a `//!` summary describing its scope,
its entry points, and how it fits into the broader paperjam
ecosystem. Uniform style: plain prose, no intra-doc links in
crate-level summaries (simpler to maintain, no rustdoc link
warnings to manage).

Also fixes two pre-existing rustdoc warnings uncovered along the
way: an `[OPTIONAL]` literal in signature/tsa.rs that rustdoc was
parsing as an intra-doc link, and a bare URL in model/annotations.rs
flagged for auto-linking. The PyO3 `PyDocument` and `PyPage` classes
get class-level docs that clarify they are the native layer beneath
the pure-Python `paperjam.Document` / `paperjam.Page` wrappers.

After this commit `cargo doc --workspace --no-deps` produces zero
warnings.

* chore(ci): run docs workflow on PRs and install wasm-opt

The docs workflow previously fired only on pushes to main, so docs
regressions (broken wasm builds, Docusaurus compile errors, bad
links) were invisible until after merge. Now PRs with matching
paths run the full build (without deploying) so problems surface in
the PR check run.

Also installs binaryen, whose wasm-opt binary wasm-pack invokes
automatically when present on PATH. Release-mode WASM bundles
shrink by 20-30% with no code changes.

Concurrency group is keyed on ref so PR builds and deploy builds
don't cancel each other; the deploy job is skipped on pull_request
events to preserve production pages behaviour.

* docs(changelog): record [Unreleased] entries since 0.2.0

Document the audit-driven work that has landed on main but hasn't
been cut into a release yet: the ZIP-entry and MCP sandbox security
hardening (#69), the panic-surface cleanup in the PDF engine (#70),
the form-bindings stub sync and metadata / docs refresh (#68), plus
the tooling, docs, and paperjam-async feature adjustments from this
polish branch.

* fix(ci): install pinned binaryen release instead of apt binaryen

Ubuntu's apt-shipped binaryen is ~v108, which predates the default
enablement of bulk-memory and sign-extension instructions in rustc
output. The result is wasm-pack invoking /usr/bin/wasm-opt on a
valid modern wasm module and wasm-opt rejecting it with
"[wasm-validator error] Bulk memory operation (bulk memory is
disabled)" — observed on the PR #71 run.

Download and install a pinned binaryen release tarball from the
upstream GitHub releases page. version_119 is known-good against
the current rustc and supports all default features. Future bumps
change one env var.

* chore(ci): verify binaryen tarball checksum and cache across runs

Harden the binaryen install step that landed in the previous commit:

- SHA256-pin the downloaded tarball (value verified against a local
  download of version_119). Guards against upstream tampering or an
  accidental silent swap.
- Split the version-check into a dedicated Verify step so the log
  shows the installed wasm-opt version unambiguously.
- Wrap the install in actions/cache keyed on the pinned version so
  subsequent runs skip the download. Saves ~3-5s per run.

* fix(wasm): tell wasm-pack to enable bulk-memory and sign-ext in wasm-opt

rustc 1.82+ emits bulk-memory and sign-extension instructions in its
default wasm output. wasm-pack's baseline wasm-opt invocation ("-O")
does not pass --enable-bulk-memory / --enable-sign-ext, so even a
modern binaryen rejects the module with "Bulk memory operations
require bulk memory [--enable-bulk-memory]" during validation.

Configure the flags in paperjam-wasm's Cargo.toml metadata block so
wasm-pack invokes wasm-opt with the right feature set. This is what
was blocking CI #71 even after installing a modern binaryen.

* fix(wasm): extend wasm-opt feature set to the full rustc default list

Rust 1.87 / LLVM 20 enabled bulk-memory and nontrapping-fptoint in
the default wasm32-unknown-unknown feature set, alongside the
previously-defaulted multivalue, mutable-globals, reference-types,
and sign-ext. wasm-pack's baseline "-O" invocation of wasm-opt does
not pass any of them, so the optimiser rejects a perfectly valid
rustc-emitted module.

The previous commit only enabled bulk-memory and sign-ext, which
exposed a follow-on validator error on `i32.trunc_sat_f64_s`
(nontrapping-fptoint). Rather than re-play whack-a-mole for each
feature, pass the full list that matches the rustc default set
documented in the wasm32-unknown-unknown platform-support page.

Ref: https://doc.rust-lang.org/rustc/platform-support/wasm32-unknown-unknown.html
@pratyush618 pratyush618 deleted the fix/security-hardening branch May 25, 2026 02:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

mcp MCP server / paperjam-mcp crate rust Pull requests that update rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant