Initial Checks
Description
Hi, need some help regarding the above error I'm facing while parsing my document. Few PDFs are not able to be parsed. Not able to understand why. It is an medical invoice PDF, where I am aiming to extract the text contents along with their bounding box coordinates.
Example Code
import openparse
basic_doc_path = "/home/sanjayr/Workspace/30-claims/42969914.pdf"
parser = openparse.DocumentParser()
parsed_basic_doc = parser.parse(basic_doc_path)
Python, open-parse & OS Version
python_version: 3.8.20
operating_system: Linux
os_version: 5.15.0-1074-azure
open-parse version: 0.7.0
install path: /home/sanjayr/.conda/envs/be-env/lib/python3.8/site-packages/openparse
python version: 3.8.20 (default, Oct 3 2024, 15:24:27) [GCC 11.2.0]
platform: Linux-5.15.0-1074-azure-x86_64-with-glibc2.17
related packages: PyMuPDF-1.24.11 pydantic-2.10.4 tokenizers-0.20.3 transformers-4.46.3 torch-2.4.1 torchvision-0.19.1
Initial Checks
Description
Hi, need some help regarding the above error I'm facing while parsing my document. Few PDFs are not able to be parsed. Not able to understand why. It is an medical invoice PDF, where I am aiming to extract the text contents along with their bounding box coordinates.
Example Code
Python, open-parse & OS Version