Skip to content

Update pdfminer_utils.py#3974

Merged
cragwolfe merged 4 commits intoUnstructured-IO:mainfrom
Nathan-GoSupply:patch-1
Apr 8, 2025
Merged

Update pdfminer_utils.py#3974
cragwolfe merged 4 commits intoUnstructured-IO:mainfrom
Nathan-GoSupply:patch-1

Conversation

@Nathan-GoSupply
Copy link
Copy Markdown
Contributor

Fix for 'PSSyntaxError' import error:
"cannot import name 'PSSyntaxError' from 'pdfminer.pdfparser'"

Latest pdfminer-six doesn't import PSSyntaxError into pdfminer.pdfparser anymore. It must now be directly imported from its source (pdfminer.psexceptions)

Fix for 'PSSyntaxError' import error.
"cannot import name 'PSSyntaxError' from 'pdfminer.pdfparser'"

Latest pdfminer-six doesn't import PSSyntaxError into `pdfminer.pdfparser` anymore. It must now be directly imported from its source (`pdfminer.psexceptions`)
Copy link
Copy Markdown
Contributor

@cragwolfe cragwolfe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch. However, I think the version of pdfminer should be checked to determine whether to import the older or newer version for backwards compat.

@Nathan-GoSupply
Copy link
Copy Markdown
Contributor Author

This change will also work on the older version. In the older version, pdfminer.pdfparser imports PSSyntaxError from pdfminer.psexceptions.

However they have since removed the PSSyntaxError import from pdfminer.pdfparser.

Therefore, for the new pdfminer version we must change to directly import from pdfminer.psexceptions.
So instead of
pdfminer_utils.py -> pdfminer.pdfparser ->pdfminer.psexceptions

We can do
pdfminer_utils.py -> pdfminer.psexceptions

PSSyntaxError is defined in pdfminer.psexceptions in both the old and new versions of pdfminer, so we will still get backward compatibility.

Here is the commit for the change on pdfminer.

@cragwolfe
Copy link
Copy Markdown
Contributor

please add a bullet under Fixes in, CHANGELOG.md.

thanks for the contribution @Nathan-GoSupply ! the reference to the pdfminer commit is also appreciated.

Nathan-GoSupply added a commit to Nathan-GoSupply/unstructured that referenced this pull request Apr 8, 2025
@cragwolfe cragwolfe merged commit 27f503c into Unstructured-IO:main Apr 8, 2025
43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants