You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
BigOcrPDF is a powerful, all-in-one OCR application that adds searchable text layers to scanned PDFs, extracts text from images, and provides a full-featured PDF editor — all from a modern, native Linux interface.
16
18
17
-
## Three Interfaces, One Toolkit
19
+
## Why BigOcrPDF?
18
20
19
-
BigOcrPDF offers three independent interfaces that cover every stage of document work:
21
+
-**AI-Powered OCR** — Uses **RapidOCR PP-OCRv5** with OpenVINO hardware acceleration for fast, accurate text recognition across **130+ languages**
22
+
-**Edit, Merge & Organize PDFs** — Reorder pages, rotate, delete, and combine multiple PDFs and images into a single document
23
+
-**Smart Preprocessing** — Automatic perspective correction, deskew, dewarping, and illumination normalization — even photos of documents come out clean
24
+
-**Multiple Export Formats** — Searchable PDF, PDF/A-2b archival, plain text, and ODF/ODT with layout-aware formatting
25
+
-**Screen Capture OCR** — Select any region on screen and instantly extract text
26
+
-**Batch Processing** — Process dozens of files at once with checkpoint/resume support
27
+
-**File Manager Integration** — Right-click any PDF or image to OCR it directly
20
28
21
-
### 1. PDF OCR (`bigocrpdf`)
22
-
23
-
The main interface. Drop your scanned PDFs, choose your settings, and get searchable documents back. Ideal for:
24
-
25
-
- Turning scanned paperwork, contracts, and books into searchable PDFs
26
-
- Archiving documents as PDF/A-2b for long-term preservation
27
-
- Batch-processing dozens of files with checkpoint/resume
28
-
- Re-OCR'ing documents that already have a poor text layer
29
-
- Exporting extracted text as TXT or ODF/ODT with layout detection
30
-
31
-
### 2. PDF Editor (`bigocrpdf --edit` or `bigocrpdf -e`)
32
-
33
-
A standalone page editor that runs independently of the OCR window. Use it to organize your PDFs before or after OCR:
34
-
35
-
- Reorder, rotate, flip, and delete pages with drag-and-drop
36
-
- Merge multiple PDFs and images into a single document
|**Black & White (JBIG2)**| Pure black-and-white output using JBIG2 — the most compact format for text-only documents |
92
88
|**Plain Text (.txt)**| Extracted text from all pages |
93
-
|**ODF/ODT**| Formatted text with optional embedded images *(experimental — formatting quality may vary)*|
89
+
|**ODF/ODT**⚠️ | 4 modes: formatted + images, images + simple text, formatted text only, or plain text*(experimental — formatting quality may vary)*|
94
90
95
91
ODF export includes **layout analysis**: automatic paragraph/heading detection, table detection, image embedding, and proper page breaks. Note: ODF/ODT export is experimental and formatting results may not always be accurate.
96
92
93
+
### Screen Capture & Image OCR
94
+
95
+
Extract text from anything on your screen.
96
+
97
+
-**Region capture** — select an area and get the text instantly
0 commit comments