Commit ece9c1f
docs(use-struct-tree): clarify that output quality depends on tag quality
Objective: A user reported that --use-struct-tree drops markdown
headings (##) and omits some content. The PDF turned out to be a
deck-style document tagged entirely as <P> with zero heading tags,
so the option was behaving as designed but the user expected it to
recover structure. The docs did not set this expectation anywhere.
Approach: Add a one-line clarification to both the README Tagged PDF
Support section and the CLI option description: output quality
depends on tag quality, and for PDFs with sparse or incorrect tags
the default heuristic mode or --hybrid is often a better fit. No
behavior change — keep --use-struct-tree strict and let the docs
align user expectation with actual behavior.
Evidence: Ran `npm run sync` to regenerate bindings and verified
the new description propagates everywhere users see it.
Before: "Use PDF structure tree (tagged PDF) for reading order and
semantic structure"
After: "Use PDF structure tree (tagged PDF) for reading order and
semantic structure. Output quality depends on tag quality"
Verified in:
- java -jar ...cli.jar --help (CLI help text)
- options.json (generator source of truth)
- python/.../cli_options_generated.py (Python SDK)
- node/.../cli-options.generated.ts (Node SDK)
- README.md Tagged PDF Support section (Note block added)
Maven build: SUCCESS, 21 tests passed.
Fixes PDFDLOSP-8
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 105bcea commit ece9c1f
7 files changed
Lines changed: 8 additions & 6 deletions
File tree
- java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/api/cli
- node/opendataloader-pdf/src
- python/opendataloader-pdf/src/opendataloader_pdf
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
328 | 328 | | |
329 | 329 | | |
330 | 330 | | |
| 331 | + | |
| 332 | + | |
331 | 333 | | |
332 | 334 | | |
333 | 335 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
106 | 106 | | |
107 | 107 | | |
108 | 108 | | |
109 | | - | |
| 109 | + | |
110 | 110 | | |
111 | 111 | | |
112 | 112 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | | - | |
| 18 | + | |
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
24 | | - | |
| 24 | + | |
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
70 | 70 | | |
71 | 71 | | |
72 | 72 | | |
73 | | - | |
| 73 | + | |
74 | 74 | | |
75 | 75 | | |
76 | 76 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
88 | 88 | | |
89 | 89 | | |
90 | 90 | | |
91 | | - | |
| 91 | + | |
92 | 92 | | |
93 | 93 | | |
94 | 94 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
56 | 56 | | |
57 | 57 | | |
58 | 58 | | |
59 | | - | |
| 59 | + | |
60 | 60 | | |
61 | 61 | | |
62 | 62 | | |
| |||
0 commit comments