feat: integrate Docling for high-fidelity PDF ingestion (#80)#146
feat: integrate Docling for high-fidelity PDF ingestion (#80)#146DhruvGarg111 wants to merge 3 commits into
Conversation
Implements DoclingPDFProcessor with support for tables, LaTeX formulas, code snippets, and charts. Updates DocumentProcessorFactory with fallback support.
Adds Docling features to README and applies black formatting to maintain CI compliance.
|
Hello, thanks for the contribution ! It seems great, I'll test it soon and integrate it if it works fine. |
|
sure, let me know if any change is needed. |
|
Maybe you can fix a version of docling, I had some errors using minimal version : 2.15.1 But works for latest pypi version. Same for huggingface version, it works only if I get a newer version. But there are some conflicts with other libraries. |
|
For the code, it's good for me 👍 |
Regarding the error -Since I was on latest version, i did not realize it. Now I can think of 2 solutions.
Let me know if you want me to make any of this change. If you have a better solution I welcome it. |
|
I think it's better to fix and not only bump the docling version to avoid errors. |
|
Implemented the compatibility fix requested in review. This update makes Docling PDF processing robust across Docling versions by safely enabling advanced pipeline options
I have tested it in venv and it works. Let me know if any change is needed. |
|
Hey, @Bessouat40 . Is there anything else required in this? |
This PR implements high-fidelity PDF ingestion using IBM's Docling, as discussed in issue #80.
Key Changes: