@@ -7,30 +7,30 @@ description: Options for the Node.js convert function
77{ /* Run `npm run generate-options` to regenerate */ }
88
99
10- | Option | Type | Default | Description |
11- | -------------------------| ----------------------| --------------| ------------------------------------------------------------------------------------------------------------------------------------|
12- | ` outputDir ` | ` string ` | - | Directory where output files are written. Default: input file directory |
13- | ` password ` | ` string ` | - | Password for encrypted PDF files |
14- | ` format ` | ` string \| string[] ` | - | Output formats (comma-separated). Values: json, text, html, pdf, markdown, markdown-with-html, markdown-with-images. Default: json |
15- | ` quiet ` | ` boolean ` | ` false ` | Suppress console logging output |
16- | ` contentSafetyOff ` | ` string \| string[] ` | - | Disable content safety filters. Values: all, hidden-text, off-page, tiny, hidden-ocg |
17- | ` sanitize ` | ` boolean ` | ` false ` | Enable sensitive data sanitization. Replaces emails, phone numbers, IPs, credit cards, and URLs with placeholders |
18- | ` keepLineBreaks ` | ` boolean ` | ` false ` | Preserve original line breaks in extracted text |
19- | ` replaceInvalidChars ` | ` string ` | ` " " ` | Replacement character for invalid/unrecognized characters. Default: space |
20- | ` useStructTree ` | ` boolean ` | ` false ` | Use PDF structure tree (tagged PDF) for reading order and semantic structure |
21- | ` tableMethod ` | ` string ` | ` "default" ` | Table detection method. Values: default (border-based), cluster (border + cluster). Default: default |
22- | ` readingOrder ` | ` string ` | ` "xycut" ` | Reading order algorithm. Values: off, xycut. Default: xycut |
23- | ` markdownPageSeparator ` | ` string ` | - | Separator between pages in Markdown output. Use %page-number% for page numbers. Default: none |
24- | ` textPageSeparator ` | ` string ` | - | Separator between pages in text output. Use %page-number% for page numbers. Default: none |
25- | ` htmlPageSeparator ` | ` string ` | - | Separator between pages in HTML output. Use %page-number% for page numbers. Default: none |
26- | ` imageOutput ` | ` string ` | ` "external" ` | Image output mode. Values: off (no images), embedded (Base64 data URIs), external (file references). Default: external |
27- | ` imageFormat ` | ` string ` | ` "png" ` | Output format for extracted images. Values: png, jpeg. Default: png |
28- | ` imageDir ` | ` string ` | - | Directory for extracted images |
29- | ` pages ` | ` string ` | - | Pages to extract (e.g., "1,3,5-7"). Default: all pages |
30- | ` includeHeaderFooter ` | ` boolean ` | ` false ` | Include page headers and footers in output |
31- | ` detectStrikethrough ` | ` boolean ` | ` false ` | Detect strikethrough text and wrap with ~~ in Markdown output (experimental) |
32- | ` hybrid ` | ` string ` | ` "off" ` | Hybrid backend for AI processing. Values: off (default), docling-fast |
33- | ` hybridMode ` | ` string ` | ` "auto" ` | Hybrid triage mode. Values: auto (default, dynamic triage), full (skip triage, all pages to backend) |
34- | ` hybridUrl ` | ` string ` | - | Hybrid backend server URL (overrides default) |
35- | ` hybridTimeout ` | ` string ` | ` "0" ` | Hybrid backend request timeout in milliseconds (0 = no timeout). Default: 0 |
36- | ` hybridFallback ` | ` boolean ` | ` false ` | Opt in to Java fallback on hybrid backend error (default: disabled) |
10+ | Option | Type | Default | Description |
11+ | -------------------------| ----------------------| --------------| ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
12+ | ` outputDir ` | ` string ` | - | Directory where output files are written. Default: input file directory |
13+ | ` password ` | ` string ` | - | Password for encrypted PDF files |
14+ | ` format ` | ` string \| string[] ` | - | Output formats (comma-separated). Values: json, text, html, pdf, markdown, markdown-with-html, markdown-with-images. Default: json |
15+ | ` quiet ` | ` boolean ` | ` false ` | Suppress console logging output |
16+ | ` contentSafetyOff ` | ` string \| string[] ` | - | Disable content safety filters. Values: all, hidden-text, off-page, tiny, hidden-ocg |
17+ | ` sanitize ` | ` boolean ` | ` false ` | Enable sensitive data sanitization. Replaces emails, phone numbers, IPs, credit cards, and URLs with placeholders |
18+ | ` keepLineBreaks ` | ` boolean ` | ` false ` | Preserve original line breaks in extracted text |
19+ | ` replaceInvalidChars ` | ` string ` | ` " " ` | Replacement character for invalid/unrecognized characters. Default: space |
20+ | ` useStructTree ` | ` boolean ` | ` false ` | Use PDF structure tree (tagged PDF) for reading order and semantic structure |
21+ | ` tableMethod ` | ` string ` | ` "default" ` | Table detection method. Values: default (border-based), cluster (border + cluster). Default: default |
22+ | ` readingOrder ` | ` string ` | ` "xycut" ` | Reading order algorithm. Values: off, xycut. Default: xycut |
23+ | ` markdownPageSeparator ` | ` string ` | - | Separator between pages in Markdown output. Use %page-number% for page numbers. Default: none |
24+ | ` textPageSeparator ` | ` string ` | - | Separator between pages in text output. Use %page-number% for page numbers. Default: none |
25+ | ` htmlPageSeparator ` | ` string ` | - | Separator between pages in HTML output. Use %page-number% for page numbers. Default: none |
26+ | ` imageOutput ` | ` string ` | ` "external" ` | Image output mode. Values: off (no images), embedded (Base64 data URIs), external (file references). Default: external |
27+ | ` imageFormat ` | ` string ` | ` "png" ` | Output format for extracted images. Values: png, jpeg. Default: png |
28+ | ` imageDir ` | ` string ` | - | Directory for extracted images |
29+ | ` pages ` | ` string ` | - | Pages to extract (e.g., "1,3,5-7"). Default: all pages |
30+ | ` includeHeaderFooter ` | ` boolean ` | ` false ` | Include page headers and footers in output |
31+ | ` detectStrikethrough ` | ` boolean ` | ` false ` | Detect strikethrough text and wrap with ~~ in Markdown output (experimental) |
32+ | ` hybrid ` | ` string ` | ` "off" ` | Hybrid backend (requires a running server). Quick start: pip install "opendataloader-pdf [ hybrid ] " && opendataloader-pdf-hybrid --port 5002. For remote servers use --hybrid-url. Values: off (default), docling-fast |
33+ | ` hybridMode ` | ` string ` | ` "auto" ` | Hybrid triage mode. Values: auto (default, dynamic triage), full (skip triage, all pages to backend) |
34+ | ` hybridUrl ` | ` string ` | - | Hybrid backend server URL (overrides default) |
35+ | ` hybridTimeout ` | ` string ` | ` "0" ` | Hybrid backend request timeout in milliseconds (0 = no timeout). Default: 0 |
36+ | ` hybridFallback ` | ` boolean ` | ` false ` | Opt in to Java fallback on hybrid backend error (default: disabled) |
0 commit comments