Skip to content

Guard hi-res PDF rendering against oversized pages#4344

Closed
CyMule wants to merge 4 commits intomainfrom
guard-hi-res-pdf-render-pixels
Closed

Guard hi-res PDF rendering against oversized pages#4344
CyMule wants to merge 4 commits intomainfrom
guard-hi-res-pdf-render-pixels

Conversation

@CyMule
Copy link
Copy Markdown
Contributor

@CyMule CyMule commented Apr 25, 2026

Adds a preflight check for hi-res PDF rendering that estimates per-page rendered pixels before rasterization and rejects pages that exceed the configured safe limit.


Note

Medium Risk
Changes hi-res PDF partitioning to fail early with UnprocessableEntityError when a page would render beyond a new per-page pixel cap, which may reject documents that previously processed (or OOMed) depending on DPI/page size and env configuration.

Overview
Adds a hi-res PDF render preflight guard that estimates per-page rasterized pixel counts (from cropbox/mediabox at the configured DPI) and raises UnprocessableEntityError before any rendering when a page exceeds a configurable maximum.

Introduces new env config PDF_RENDER_MAX_PIXELS_PER_PAGE (default 1_000_000_000, 0 disables), wires the check into partition_pdf_or_image for non-image hi-res PDFs, and adds unit tests covering limit exceeded/allowed, disablement, and file-cursor restoration. Also bumps version to 0.22.24 and documents the fix in the changelog.

Reviewed by Cursor Bugbot for commit 37f65d2. Bugbot is set up for automated code reviews on this repo. Configure here.

@CyMule CyMule marked this pull request as ready for review April 25, 2026 12:36
@CyMule CyMule marked this pull request as draft April 25, 2026 12:46
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 37f65d2. Configure here.

@property
def PDF_RENDER_MAX_PIXELS_PER_PAGE(self) -> int:
"""Maximum rendered pixels allowed for a single PDF page."""
return self._get_int("PDF_RENDER_MAX_PIXELS_PER_PAGE", 1_000_000_000)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default pixel limit exceeds PIL safety limit

Medium Severity

The default PDF_RENDER_MAX_PIXELS_PER_PAGE of 1,000,000,000 (1 billion) is twice as high as PILImage.MAX_IMAGE_PIXELS which is set to 5e8 (500 million) at module level. This means with default settings, any page rendering between 500M and 1B pixels passes the preflight check but still triggers a PIL DecompressionBombError during rasterization — after the resource-intensive poppler rendering step has already allocated memory. The preflight check's purpose is to reject oversized pages before rasterization, but the default value makes it ineffective for the exact range PIL would catch.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 37f65d2. Configure here.

@CyMule CyMule closed this Apr 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant