Small quality checker for Docling JSON output — feedback welcome #3437
MMoney1988
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I have been using Docling for PDF parsing and built a small downstream checker for the JSON output.
The idea is simple: Docling does the parsing; this small Python project checks the parsed JSON before I use it for downstream steps like Markdown export, manual review, or retrieval-oriented downstream preparation.
Repo:
https://github.com/MMoney1988/pdf-quality-report
Right now it checks things like:
id,type,page_number, andbboxIt produces:
normalized_blocks.jsonwith provenance preservedWhat it does not do:
It is meant as a small review layer after Docling, not as a parser replacement.
I would be curious how other Docling users handle this step:
Any practical feedback is welcome.
Beta Was this translation helpful? Give feedback.
All reactions