Skip to content

Commit fddec8f

Browse files
bundoleeclaude
andcommitted
fix: add --force-ocr to non-English OCR examples
--ocr-lang alone doesn't enable forced OCR; both flags are needed for scanned PDFs with non-English content. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent e52a0b2 commit fddec8f

2 files changed

Lines changed: 2 additions & 2 deletions

File tree

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -285,7 +285,7 @@ opendataloader-pdf --hybrid docling-fast input-scanned.pdf
285285
For non-English documents, specify the OCR language:
286286

287287
```bash
288-
opendataloader-pdf-hybrid --port 5002 --ocr-lang "ko,en"
288+
opendataloader-pdf-hybrid --port 5002 --force-ocr --ocr-lang "ko,en"
289289
```
290290

291291
> **Note**: Standard digital PDFs do not need `--force-ocr`. Use it only for scanned or image-based PDFs.

content/docs/hybrid-mode.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -181,7 +181,7 @@ opendataloader-pdf --hybrid docling-fast input-scanned.pdf
181181
For non-English documents, specify the OCR language:
182182

183183
```bash
184-
opendataloader-pdf-hybrid --port 5002 --ocr-lang "ko,en"
184+
opendataloader-pdf-hybrid --port 5002 --force-ocr --ocr-lang "ko,en"
185185
```
186186

187187
Supported language codes include: `en`, `ko`, `ja`, `ch_sim` (Simplified Chinese), `ch_tra` (Traditional Chinese), `de`, `fr`, and more. Multiple languages can be combined with commas.

0 commit comments

Comments
 (0)