You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix: add OCR timeout warning for large scanned PDFs
OCR is CPU-intensive; default 30s timeout causes fallback on large documents.
Verified by running --force-ocr against PDFUA-Ref-2-09_Scanned.pdf.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
> **Note**: Standard digital PDFs do not need `--force-ocr`. Use it only for scanned or image-based PDFs.
292
292
293
+
> **Timeout**: OCR is CPU-intensive. For large scanned documents, increase the timeout: `opendataloader-pdf --hybrid docling-fast --hybrid-timeout 120000 input-scanned.pdf`
294
+
293
295
### Picture / Chart Description (Alt Text)
294
296
295
297
Generate AI-powered descriptions for images and charts in your PDFs. Useful for accessibility (alt text) and making visual content searchable in RAG pipelines.
Generate AI-powered natural language descriptions forimages and chartsin your PDFs. This makes visual content searchable in RAG pipelines and produces alt text for accessibility.
0 commit comments