About Licenses when using markitdown-ocr #1654
-
|
When I installed markitdown-ocr, "pymupdf" was also installed at the same time. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
|
Hey @k-kondohtks , Your concern is valid and technically correct. Here's the full picture. The core issue — AGPL is a strong copyleft license So yes — even though markitdown-ocr itself is MIT, the fact that it pulls in pymupdf as a dependency means your entire commercial service becomes subject to AGPL obligations when you use the OCR feature. MIT + AGPL dependency = the MIT license doesn't protect you AGPL imposes strong copyleft obligations that may require open-sourcing the entire application — which runs counter to the permissiveness of the MIT license. (GitHub) The MIT license of markitdown-ocr only covers markitdown's own code. It cannot override pymupdf's AGPL terms. Your two options for commercial use Option 1 — Purchase a commercial PyMuPDF license from Artifex Option 2 — Use markitdown without the OCR extra pip install markitdown # no OCR extra instead of:pip install markitdown[ocr] # this pulls pymupdf Then use an alternative OCR library with a permissive license (e.g. pytesseract under Apache 2.0) if you need OCR functionality. Summary verdict 👍 If this helped you, please mark it as the answer — it helps others in the community who run into the same issue find the solution faster! |
Beta Was this translation helpful? Give feedback.
-
|
Hi @pauldev-hub, Once again, I would like to express my gratitude. |
Beta Was this translation helpful? Give feedback.
Hey @k-kondohtks ,
Your concern is valid and technically correct. Here's the full picture.
The core issue — AGPL is a strong copyleft license
You cannot deploy PyMuPDF (AGPL) as part of a server-based application or service without disclosing your own application's full source code under AGPL to any users interacting with it. If you can't meet those requirements, a commercial license is required. (Artifex)
So yes — even though markitdown-ocr itself is MIT, the fact that it pulls in pymupdf as a dependency means your entire commercial service becomes subject to AGPL obligations when you use the OCR feature.
MIT + AGPL dependency = the MIT license doesn't protect you
AGPL imposes strong cop…