Fix `to_image()` omitting filled AcroForm widget values (closes #1367) by Cyberfilo · Pull Request #1372 · jsvine/pdfplumber

Cyberfilo · 2026-05-22T10:03:27Z

Summary

`page.to_image()` (which goes through `pdfplumber.display.get_page_image`) currently opens a `pypdfium2.PdfDocument` and loads the page without initializing PDFium's form environment. Filled AcroForm field text is rendered via PDFium's form-rendering layer (`FPDF_FFLDraw`), which only runs when a form environment exists. Result: filled form-field text is missing from the PIL bitmap, even though every standard PDF viewer shows it.

This is purely a PDFium rasterization gap — pdfminer text extraction (`page.chars`, `extract_text()`) is unaffected.

Reproduction

import pdfplumber

with pdfplumber.open(\"filled_form.pdf\") as pdf:
    im = pdf.pages[0].to_image(resolution=150).original
    im.save(\"out.png\")  # filled field values absent

(Reproducer + sample PDF in #1367.)

Fix

One-line addition in `pdfplumber/display.py::get_page_image`:

pdfium_doc = pypdfium2.PdfDocument(src, password=password)
...
pdfium_doc.init_forms()  # <-- new
pdfium_page = pdfium_doc.get_page(page_ix)

Per pypdfium2's documentation, `init_forms()` must be called after open and before loading pages. On a document without a form it is a no-op, so this is safe for the common non-form case.

Test plan

No new test added because the bug is in the PDFium rendering path (binary output) and the existing test suite already covers `to_image()` for non-form PDFs — those continue to pass with the no-op `init_forms()` call. For form PDFs the existing tests cannot detect the regression visually; the rasterized bitmap differs only in a few hundred pixels per filled field.

Happy to add a pixel-comparison or perceptual-hash test against the sample PDF in #1367 if maintainers want — let me know.

CHANGELOG

Added under `## Unreleased`.

Closes #1367.

`get_page_image` opens a `pypdfium2.PdfDocument` and immediately loads the page without initializing PDFium's form environment. Filled AcroForm field text is drawn via PDFium's form-rendering layer (`FPDF_FFLDraw`), which only runs when a form environment exists. The fix: call `pdfium_doc.init_forms()` after open and before `get_page(page_ix)`, matching pypdfium2's documented order. `init_forms()` is a no-op on documents without a form, so the change is safe for non-form PDFs. Closes jsvine#1367.

Cyberfilo added 2 commits May 22, 2026 12:03

docs(changelog): note AcroForm to_image fix

9c3290d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `to_image()` omitting filled AcroForm widget values (closes #1367)#1372

Fix `to_image()` omitting filled AcroForm widget values (closes #1367)#1372
Cyberfilo wants to merge 2 commits into
jsvine:developfrom
Cyberfilo:fix/1367-init-forms-for-to-image

Cyberfilo commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Cyberfilo commented May 22, 2026

Summary

Reproduction

Fix

Test plan

CHANGELOG

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant