Skip to content

Turn rapidocr into an optional dependency #3227

@kklein

Description

@kklein

Requested feature

Even though docling allows for working with various, mutually independent OCR engines, rapidocr is listed as a mandatory dependency, see:

'rapidocr (>=3.3,<4.0.0)',

On the one hand, this could simply be wasteful if e.g. someone knew that they want to work with e.g. tesseract and wanto to trim the size of their deployment container.

On the other hand, this can actually be a complete roadblock for a project. As of now, if I install docling 2.84.0 for osx-arm64 from conda-forge, the following transitive dependencies are installed due to rapidocr's dependence on libopencv:

  • ffmpeg under GPL 2.0
  • x264 under GPL 2.0
  • x265 under GPL 2.9
  • dbust under AFL-2.1 or GPL-2.0-or-later
  • jasper under JasPer-2.0

Additionally, when on linux, these are added:

  • libglvnd, libegl, libgl, libglx and libopengl under LicenseRef-libglvnd
  • libglu under SGI-B-2.0
  • libxkbcommon under MIT/X11 Derivative

The licenses of these packages would not be considered sufficiently permissive by many standards. This, therefore means that docling can't be used at all on many projects despite --iiuc -- not actually requiring these transitive dependencies.

Therefore I'm wondering: why not just keep keep the already existing rapidocr flavour of the package and remove rapidocr from the core dependencies? See

docling/pyproject.toml

Lines 104 to 107 in f2affd7

rapidocr = [
'rapidocr (>=3.3,<4.0.0)',
'onnxruntime (>=1.7.0,<2.0.0) ; python_version < "3.14"',
]

Alternatives

  • Not using docling :'(
  • Hackily patching the docling package and overwriting its dependencies on the user's side

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions