Skip to content

fix: initialize form rendering for fillable PDFs (#240)#241

Merged
rstrahan merged 3 commits intodevelopfrom
fix/fillable-pdf
Mar 17, 2026
Merged

fix: initialize form rendering for fillable PDFs (#240)#241
rstrahan merged 3 commits intodevelopfrom
fix/fillable-pdf

Conversation

@rstrahan
Copy link
Copy Markdown
Contributor

Summary

Fixes #240 — Fillable PDF form fields (text inputs, checkboxes, radio buttons, dropdowns) were not rendered in page images, causing empty extraction results.

Root Cause

pypdfium2's page.render(may_draw_forms=True) (the default) requires PdfDocument.init_forms() to be called first to initialize the form rendering engine. Without this call, AcroForm content is silently omitted from rendered images — so OCR/extraction only sees the blank form template.

Changes

  • lib/idp_common_pkg/idp_common/ocr/service.py — Added pdf_document.init_forms() after pdfium.PdfDocument(file_content) in process_document() (Pattern 2 pipeline OCR)
  • patterns/unified/src/bda_processresults_function/index.py — Added pdf_document.init_forms() after pdfium.PdfDocument(pdf_content) in create_pdf_page_images() (Pattern 1 BDA thumbnails)
  • lib/idp_common_pkg/tests/unit/ocr/test_ocr_service.py — Added regression test test_process_document_calls_init_forms_for_fillable_pdfs
  • CHANGELOG.md — Added fix entry under [Unreleased]

Risk Assessment

Very low riskinit_forms() is a no-op for non-fillable PDFs. The may_draw_forms=True default was already set; this fix simply enables the feature that was intended to work.

Testing

  • All 666 unit tests pass (make test-cicd)
  • Ruff lint clean on all changed files
  • Verified fix works with fillable VA Form 21-22a from the issue report

@rstrahan rstrahan merged commit 7c1d21d into develop Mar 17, 2026
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant