Skip to content

Commit db9d92a

Browse files
authored
[Feat] Add CLI support for OCR (#2058)
1 parent 8fc9f33 commit db9d92a

6 files changed

Lines changed: 304 additions & 0 deletions

File tree

docs/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,7 @@ Supported datasets
8383
using_doctr/using_model_export
8484
using_doctr/custom_models_training
8585
using_doctr/running_on_aws
86+
using_doctr/using_cli
8687

8788

8889
.. toctree::
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
Using the CLI for Optical Character Recognition
2+
===============================================
3+
4+
The full Optical Character Recognition (OCR) task can be executed by using the Command Line Interface (CLI) implemented in docTR. This tool allows you to process both images and PDF files without writing a single line of Python code, providing a streamlined way to export OCR results directly to JSON.
5+
6+
Basic Usage
7+
-----------
8+
9+
To run the OCR engine on a file, use the following command structure:
10+
11+
.. code-block:: bash
12+
13+
doctr-cli --input_path path/to/your/document.pdf --output results.json
14+
15+
Arguments
16+
---------
17+
18+
The CLI supports a variety of arguments to fine-tune the detection and recognition process:
19+
20+
**Mandatory Arguments:**
21+
22+
* ``--input_path``: Path to the input image or PDF file you wish to process.
23+
24+
**Architecture Selection:**
25+
26+
* ``--det_arch``: The detection architecture / model to use (e.g., ``db_resnet50``). *Default: db_resnet50*
27+
* ``--reco_arch``: The recognition architecture / model to use (e.g., ``crnn_vgg16_bn``). *Default: crnn_vgg16_bn*
28+
29+
**Processing Options:**
30+
31+
* ``--assume_straight_pages``, ``--no-assume_straight_pages``: Determine whether pages should be handled as straight or skewed pages. *Default: True*
32+
* ``--straighten_pages``: If flagged, the tool will attempt to straighten skewed pages before analysis. *Default: True*
33+
* ``--preserve_aspect_ratio``, ``--no-preserve_aspect_ratio``: Ensures that the aspect ratio is maintained during resizing. *Default: True*
34+
* ``--symmetric_pad``: Applies symmetric padding to the input images. *Default: True*
35+
* ``--det_bs``: Batch size used for the detection model. *Default: 2*
36+
* ``--reco_bs``: Batch size used for the recognition model. *Default: 128*
37+
* ``--detect_orientation``: Enables automatic detection of page orientation. *Default: False*
38+
* ``--detect_language``: Enables language detection for the extracted text. *Default: False*
39+
40+
**Output Options:**
41+
42+
* ``--output``: The destination path where the JSON results will be saved. *Default: results.json*
43+
44+
Examples
45+
--------
46+
47+
**Running OCR on an image:**
48+
49+
.. code-block:: bash
50+
51+
doctr-cli --input_path image.jpg --output ocr_res.json
52+
53+
**Running OCR on a PDF:**
54+
55+
.. code-block:: bash
56+
57+
doctr-cli --input_path image.pdf --output ocr_res.json
58+
59+
**Using a specific detection architecture and straightening pages:**
60+
61+
.. code-block:: bash
62+
63+
doctr-cli --input_path doc.pdf --det_arch db_mobilenet_v3_large --straighten_pages
64+
65+
Output Format
66+
-------------
67+
68+
The results are exported in a structured JSON format containing:
69+
70+
* **Pages**: Dimensions and orientation.
71+
* **Blocks**: Grouping of lines.
72+
* **Lines**: Grouping of words.
73+
* **Words**: The actual text content with confidence scores and bounding box coordinates.

doctr/cli/__init__.py

Whitespace-only changes.

doctr/cli/main.py

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
# Copyright (C) 2021-2026, Mindee.
2+
3+
# This program is licensed under the Apache License 2.0.
4+
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
5+
6+
import argparse
7+
import json
8+
import logging
9+
import sys
10+
11+
from doctr.io import DocumentFile
12+
from doctr.models import ocr_predictor
13+
14+
logging.basicConfig(format="%(levelname)s: %(message)s", level=logging.INFO)
15+
16+
17+
def main(argv=None):
18+
"""Main function for the docTR CLI tool"""
19+
# parse command-line arguments and set up the model
20+
args = _parse_args(argv)
21+
model = ocr_predictor(
22+
det_arch=args.det_arch,
23+
reco_arch=args.reco_arch,
24+
pretrained=True,
25+
assume_straight_pages=args.assume_straight_pages,
26+
preserve_aspect_ratio=args.preserve_aspect_ratio,
27+
symmetric_pad=args.symmetric_pad,
28+
detect_orientation=args.detect_orientation,
29+
straighten_pages=args.straighten_pages,
30+
detect_language=args.detect_language,
31+
det_bs=args.det_bs,
32+
reco_bs=args.reco_bs,
33+
)
34+
35+
# load the document
36+
try:
37+
if args.input_path.lower().endswith(".pdf"):
38+
doc = DocumentFile.from_pdf(args.input_path)
39+
else:
40+
doc = DocumentFile.from_images(args.input_path)
41+
logging.info(f"Document loaded successfully from {args.input_path}")
42+
except FileNotFoundError:
43+
logging.error(f"File not found: {args.input_path}")
44+
sys.exit(1)
45+
except ValueError:
46+
logging.error(f"File could not be read as a valid image or PDF: {args.input_path}")
47+
sys.exit(1)
48+
except Exception as e:
49+
logging.error(f"Error occurred while loading the document: {e}")
50+
sys.exit(1)
51+
52+
# perform OCR
53+
logging.info("Performing OCR...")
54+
result = model(doc)
55+
56+
# save results to JSON file
57+
try:
58+
with open(args.output, "w", encoding="utf-8") as f:
59+
json.dump(result.export(), f, indent=4, ensure_ascii=False)
60+
logging.info(f"Results saved to {args.output}")
61+
except FileNotFoundError:
62+
logging.error(f"Could not write output file at given path: {args.output}")
63+
sys.exit(1)
64+
except Exception as e:
65+
logging.error(f"Results could not be saved: {e}")
66+
sys.exit(1)
67+
68+
69+
def _parse_args(argv=None):
70+
parser = argparse.ArgumentParser(
71+
description="docTR CLI tool for OCR prediction on images and PDFs",
72+
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
73+
)
74+
75+
# required input path
76+
parser.add_argument("--input_path", type=str, required=True, help="path to input image or PDF file")
77+
78+
# architecture selection
79+
parser.add_argument(
80+
"--det_arch",
81+
type=str,
82+
default="db_resnet50",
83+
help="name of the detection architecture or the model itself to use",
84+
)
85+
parser.add_argument(
86+
"--reco_arch",
87+
type=str,
88+
default="crnn_vgg16_bn",
89+
help="name of the recognition architecture or the model itself to use",
90+
)
91+
92+
# processing options
93+
parser.add_argument(
94+
"--assume_straight_pages",
95+
action=argparse.BooleanOptionalAction,
96+
default=True,
97+
help="assume only straight pages without rotated textual elements",
98+
)
99+
parser.add_argument(
100+
"--straighten_pages", action="store_true", help="attempt to straighten skewed pages before analysis"
101+
)
102+
parser.add_argument(
103+
"--preserve_aspect_ratio",
104+
action=argparse.BooleanOptionalAction,
105+
default=True,
106+
help="preserve aspect ratio when resizing pages",
107+
)
108+
parser.add_argument("--symmetric_pad", action="store_true", help="apply symmetric padding")
109+
parser.add_argument("--det_bs", type=int, default=2, help="batch size for detection")
110+
parser.add_argument("--reco_bs", type=int, default=128, help="batch size for recognition")
111+
parser.add_argument("--detect_orientation", action="store_true", help="automatically detect page orientation")
112+
parser.add_argument("--detect_language", action="store_true", help="detect language of the text")
113+
114+
# output options
115+
parser.add_argument("--output", type=str, default="results.json", help="path to output results in JSON format")
116+
117+
return parser.parse_args(argv)

pyproject.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,9 @@ dependencies = [
5454
"tqdm>=4.30.0",
5555
]
5656

57+
[project.scripts]
58+
doctr-cli = "doctr.cli.main:main"
59+
5760
[project.optional-dependencies]
5861
html = [
5962
"weasyprint>=55.0",

tests/common/test_cli.py

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
from pathlib import Path
2+
3+
import pytest
4+
5+
import doctr.cli.main as cli
6+
7+
8+
def test_parse_args_defaults():
9+
args = cli._parse_args(["--input_path", "sample.pdf"])
10+
11+
assert args.input_path == "sample.pdf"
12+
assert args.det_arch == "db_resnet50"
13+
assert args.reco_arch == "crnn_vgg16_bn"
14+
assert args.assume_straight_pages is True
15+
assert args.preserve_aspect_ratio is True
16+
assert args.symmetric_pad is False
17+
assert args.det_bs == 2
18+
assert args.reco_bs == 128
19+
assert args.detect_orientation is False
20+
assert args.detect_language is False
21+
22+
23+
def test_parse_args_boolean_optional_flags():
24+
args = cli._parse_args([
25+
"--input_path",
26+
"sample.pdf",
27+
"--no-assume_straight_pages",
28+
"--no-preserve_aspect_ratio",
29+
])
30+
31+
assert args.assume_straight_pages is False
32+
assert args.preserve_aspect_ratio is False
33+
34+
35+
def test_parse_args_requires_input_path():
36+
with pytest.raises(SystemExit):
37+
cli._parse_args([])
38+
39+
40+
def test_parse_args_custom_values():
41+
args = cli._parse_args([
42+
"--input_path",
43+
"sample.pdf",
44+
"--det_arch",
45+
"custom_det",
46+
"--reco_arch",
47+
"custom_reco",
48+
"--symmetric_pad",
49+
"--detect_orientation",
50+
"--detect_language",
51+
"--output",
52+
"output.json",
53+
])
54+
55+
assert args.input_path == "sample.pdf"
56+
assert args.det_arch == "custom_det"
57+
assert args.reco_arch == "custom_reco"
58+
assert args.symmetric_pad is True
59+
assert args.detect_orientation is True
60+
assert args.detect_language is True
61+
assert args.output == "output.json"
62+
63+
64+
def test_main_with_image(mock_image_path):
65+
output_path = "results.json"
66+
cli.main(["--input_path", mock_image_path, "--output", output_path])
67+
68+
assert Path(output_path).exists()
69+
70+
71+
def test_main_with_pdf(mock_pdf):
72+
output_path = "results.json"
73+
cli.main(["--input_path", mock_pdf, "--output", output_path])
74+
75+
assert Path(output_path).exists()
76+
77+
78+
def test_main_no_input_path():
79+
with pytest.raises(SystemExit):
80+
cli.main([])
81+
82+
83+
def test_main_invalid_input_path():
84+
with pytest.raises(SystemExit):
85+
cli.main(["--input_path", "non_existent_file.pdf", "--output", "results.json"])
86+
87+
88+
def test_main_unsupported_input_file_format(tmp_path):
89+
unsupported_file = tmp_path / "unsupported.txt"
90+
unsupported_file.write_text("This is not a valid image or PDF file.")
91+
with pytest.raises(SystemExit):
92+
cli.main(["--input_path", str(unsupported_file), "--output", "results.json"])
93+
94+
95+
def test_main_corrupted_input_file(tmp_path):
96+
corrupted_pdf = tmp_path / "corrupted.pdf"
97+
corrupted_pdf.write_text("not a real pdf")
98+
99+
with pytest.raises(SystemExit):
100+
cli.main(["--input_path", str(corrupted_pdf), "--output", "results.json"])
101+
102+
103+
def test_main_output_path_not_a_file(mock_image_path):
104+
with pytest.raises(SystemExit):
105+
cli.main(["--input_path", mock_image_path, "--output", "."])
106+
107+
108+
def test_main_output_path_invalid_directory(mock_image_path):
109+
with pytest.raises(SystemExit):
110+
cli.main(["--input_path", mock_image_path, "--output", "non_existent_dir/results.json"])

0 commit comments

Comments
 (0)