Skip to content

Commit e52a0b2

Browse files
bundoleeclaude
andcommitted
fix: add OCR timeout warning for large scanned PDFs
OCR is CPU-intensive; default 30s timeout causes fallback on large documents. Verified by running --force-ocr against PDFUA-Ref-2-09_Scanned.pdf. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent b23feb8 commit e52a0b2

2 files changed

Lines changed: 7 additions & 0 deletions

File tree

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -290,6 +290,8 @@ opendataloader-pdf-hybrid --port 5002 --ocr-lang "ko,en"
290290

291291
> **Note**: Standard digital PDFs do not need `--force-ocr`. Use it only for scanned or image-based PDFs.
292292
293+
> **Timeout**: OCR is CPU-intensive. For large scanned documents, increase the timeout: `opendataloader-pdf --hybrid docling-fast --hybrid-timeout 120000 input-scanned.pdf`
294+
293295
### Picture / Chart Description (Alt Text)
294296

295297
Generate AI-powered descriptions for images and charts in your PDFs. Useful for accessibility (alt text) and making visual content searchable in RAG pipelines.

content/docs/hybrid-mode.mdx

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -209,6 +209,11 @@ docker run -p 5002:5002 ghcr.io/opendataloader-project/opendataloader-pdf-hybrid
209209

210210
> **Note**: Standard digital PDFs do not need `--force-ocr`. Use it only for scanned or image-based PDFs where text cannot be selected.
211211
212+
> **Timeout**: OCR is CPU-intensive. For large scanned documents, increase the timeout (default is 30s per batch):
213+
> ```bash
214+
> opendataloader-pdf --hybrid docling-fast --hybrid-timeout 120000 input-scanned.pdf
215+
> ```
216+
212217
## Chart and Image Description
213218
214219
Generate AI-powered natural language descriptions for images and charts in your PDFs. This makes visual content searchable in RAG pipelines and produces alt text for accessibility.

0 commit comments

Comments
 (0)