Requested feature
Currently, when using RapidOCR, the initialization of RapidOcrModel considers artifacts_path. That is, it searches for model artifacts under artifacts_path as defined here:
|
self._engine = RapidOcrModel( |
|
enabled=self.enabled, |
|
artifacts_path=artifacts_path, |
|
options=RapidOcrOptions( |
|
backend="onnxruntime", |
|
bitmap_area_threshold=self.options.bitmap_area_threshold, |
|
force_full_page_ocr=self.options.force_full_page_ocr, |
|
), |
|
accelerator_options=accelerator_options, |
|
) |
|
if artifacts_path is not None: |
|
det_model_path = ( |
|
det_model_path |
|
or artifacts_path |
|
/ self._model_repo_folder |
|
/ self._default_models[backend_enum.value]["det_model_path"]["path"] |
|
) |
|
cls_model_path = ( |
|
cls_model_path |
|
or artifacts_path |
|
/ self._model_repo_folder |
|
/ self._default_models[backend_enum.value]["cls_model_path"]["path"] |
|
) |
|
rec_model_path = ( |
|
rec_model_path |
|
or artifacts_path |
|
/ self._model_repo_folder |
|
/ self._default_models[backend_enum.value]["rec_model_path"]["path"] |
|
) |
|
rec_keys_path = ( |
|
rec_keys_path |
|
or artifacts_path |
|
/ self._model_repo_folder |
|
/ self._default_models[backend_enum.value]["rec_keys_path"]["path"] |
|
) |
When installing rapidocr it already ships with the base onnx models available under ...\Lib\site-packages\rapidocr\models. Therefore, we usually do not need downloading these models separately from Modelscope. However, when setting artifacts_path in the pipeline, e.g., for loading the layout detection or table structure model, we are currently not able to load the default onnx models shipped with rapidocr.
@geoHeil Would it be possible to either deliberately skip a globally defined artifacts_path in order to preload the shipped models and skip downloading from Modelscope or b) make loading from ...\Lib\site-packages\rapidocr\models the default?
Requested feature
Currently, when using RapidOCR, the initialization of
RapidOcrModelconsidersartifacts_path. That is, it searches for model artifacts underartifacts_pathas defined here:docling/docling/models/auto_ocr_model.py
Lines 65 to 74 in 6a04e27
docling/docling/models/rapid_ocr_model.py
Lines 125 to 149 in 6a04e27
When installing
rapidocrit already ships with the base onnx models available under...\Lib\site-packages\rapidocr\models. Therefore, we usually do not need downloading these models separately from Modelscope. However, when settingartifacts_pathin the pipeline, e.g., for loading the layout detection or table structure model, we are currently not able to load the default onnx models shipped withrapidocr.@geoHeil Would it be possible to either deliberately skip a globally defined
artifacts_pathin order to preload the shipped models and skip downloading from Modelscope or b) make loading from...\Lib\site-packages\rapidocr\modelsthe default?