Requested feature
Even though docling allows for working with various, mutually independent OCR engines, rapidocr is listed as a mandatory dependency, see:
|
'rapidocr (>=3.3,<4.0.0)', |
On the one hand, this could simply be wasteful if e.g. someone knew that they want to work with e.g. tesseract and wanto to trim the size of their deployment container.
On the other hand, this can actually be a complete roadblock for a project. As of now, if I install docling 2.84.0 for osx-arm64 from conda-forge, the following transitive dependencies are installed due to rapidocr's dependence on libopencv:
ffmpeg under GPL 2.0
x264 under GPL 2.0
x265 under GPL 2.9
dbust under AFL-2.1 or GPL-2.0-or-later
jasper under JasPer-2.0
Additionally, when on linux, these are added:
libglvnd, libegl, libgl, libglx and libopengl under LicenseRef-libglvnd
libglu under SGI-B-2.0
libxkbcommon under MIT/X11 Derivative
The licenses of these packages would not be considered sufficiently permissive by many standards. This, therefore means that docling can't be used at all on many projects despite --iiuc -- not actually requiring these transitive dependencies.
Therefore I'm wondering: why not just keep keep the already existing rapidocr flavour of the package and remove rapidocr from the core dependencies? See
|
rapidocr = [ |
|
'rapidocr (>=3.3,<4.0.0)', |
|
'onnxruntime (>=1.7.0,<2.0.0) ; python_version < "3.14"', |
|
] |
Alternatives
- Not using
docling :'(
- Hackily patching the
docling package and overwriting its dependencies on the user's side
Requested feature
Even though
doclingallows for working with various, mutually independent OCR engines,rapidocris listed as a mandatory dependency, see:docling/pyproject.toml
Line 58 in f2affd7
On the one hand, this could simply be wasteful if e.g. someone knew that they want to work with e.g. tesseract and wanto to trim the size of their deployment container.
On the other hand, this can actually be a complete roadblock for a project. As of now, if I install
docling2.84.0 for osx-arm64 from conda-forge, the following transitive dependencies are installed due torapidocr's dependence onlibopencv:ffmpegunder GPL 2.0x264under GPL 2.0x265under GPL 2.9dbustunder AFL-2.1 or GPL-2.0-or-laterjasperunder JasPer-2.0Additionally, when on linux, these are added:
libglvnd,libegl,libgl,libglxandlibopenglunder LicenseRef-libglvndlibgluunder SGI-B-2.0libxkbcommonunder MIT/X11 DerivativeThe licenses of these packages would not be considered sufficiently permissive by many standards. This, therefore means that
doclingcan't be used at all on many projects despite --iiuc -- not actually requiring these transitive dependencies.Therefore I'm wondering: why not just keep keep the already existing
rapidocrflavour of the package and removerapidocrfrom the core dependencies? Seedocling/pyproject.toml
Lines 104 to 107 in f2affd7
Alternatives
docling:'(doclingpackage and overwriting its dependencies on the user's side