Commit 27be51c
committed
Squashed commit of the following:
commit 0dd4925
Author: Jeremy Singer-Vine <jsvine@gmail.com>
Date: Thu Jun 12 07:31:46 2025 -0400
Update CITATION.cff
commit c6a24be
Author: Jeremy Singer-Vine <jsvine@gmail.com>
Date: Thu Jun 12 07:23:30 2025 -0400
Bump version to 0.11.7
commit 51f3065
Author: Jeremy Singer-Vine <jsvine@gmail.com>
Date: Thu Jun 12 07:21:29 2025 -0400
Update CHANGELOG.md
commit 738f6f0
Author: Jeremy Singer-Vine <jsvine@gmail.com>
Date: Wed Jun 11 23:40:50 2025 -0400
Add test for CLI auto-help
commit b88907f
Author: mara004 <geisserml@gmail.com>
Date: Fri May 2 23:07:05 2025 +0200
Minor cleanup around pypdfium2 integration
commit 7e364e6
Author: Jeremy Singer-Vine <jsvine@gmail.com>
Date: Wed Jun 11 22:24:28 2025 -0400
Add Page.trimbox, .bleedbox, .artbox (jsvine#1313)
Thanks to @samuelbradshaw for the suggestion!
commit 4c7e092
Author: Jeremy Singer-Vine <jsvine@gmail.com>
Date: Fri May 16 08:20:30 2025 -0400
Upgrade pdfminer.six from 20250327 to 20250506
... and adjust color handling accordingly.
commit 3e0d4df
Author: Jeremy Singer-Vine <jsvine@gmail.com>
Date: Wed Jun 11 23:26:09 2025 -0400
Run make format
commit cd6fd70
Author: nobody <github2@invisiblehand.church>
Date: Mon May 19 08:31:53 2025 -0400
Auto-add --help if CLI run w/o args
(Commit message edited by @jsvine.)
commit 02ff431
Author: Jeremy Singer-Vine <jsvine@gmail.com>
Date: Thu Mar 27 23:21:17 2025 -0400
Tiny tweaks to CHANGELOG.md
commit 8cd8e48
Author: Jeremy Singer-Vine <jsvine@gmail.com>
Date: Thu Mar 27 23:15:41 2025 -0400
Bump version to 0.11.6
commit 44b078c
Author: Jeremy Singer-Vine <jsvine@gmail.com>
Date: Thu Mar 27 23:15:06 2025 -0400
Update CHANGELOG.md
commit e15ed98
Author: Jeremy Singer-Vine <jsvine@gmail.com>
Date: Thu Mar 27 22:44:25 2025 -0400
Fix bug w/ use_text_flow=True extractions (jsvine#1279)
... related to flows where text bounces between lines.
h/t @samuelbradshaw
commit f2ad942
Author: Jeremy Singer-Vine <jsvine@gmail.com>
Date: Thu Mar 27 22:00:14 2025 -0400
Add another oss-fuzz test case, already fixed
commit 748ff31
Author: Jeremy Singer-Vine <jsvine@gmail.com>
Date: Thu Mar 27 21:58:17 2025 -0400
More broadly handle RecursionError, via oss-fuzz
commit 9148810
Author: Jeremy Singer-Vine <jsvine@gmail.com>
Date: Thu Mar 27 21:57:21 2025 -0400
Fix unhandled None in do_PDFStream, via oss-fuzz
commit 3fcb493
Author: Jeremy Singer-Vine <jsvine@gmail.com>
Date: Thu Mar 27 21:31:06 2025 -0400
Bump pdfminer.six to version 20250327
commit 7e28e76
Author: Jeremy Singer-Vine <jsvine@gmail.com>
Date: Tue Mar 25 23:03:13 2025 -0400
Remove test_issue_1089 (jsvine#1263)
@booxter makes a good point that the test is platform-specific. The
issue has been resolved, and it's not expected to return, so I think
provisionally OK to remove this test.
commit 630f30e
Author: Jeremy Singer-Vine <jsvine@gmail.com>
Date: Tue Mar 25 22:52:47 2025 -0400
pragma:nocover exceptions no longer raised by pdfminer.six
commit 12a73a2
Author: Jeremy Singer-Vine <jsvine@gmail.com>
Date: Tue Mar 25 22:52:16 2025 -0400
Bump pdfminer.six to version 20250324
commit 6349adb
Author: Jeremy Singer-Vine <jsvine@gmail.com>
Date: Mon Feb 10 22:09:28 2025 -0500
Add escapechar for .to_csv(...)
commit 980494a
Author: Jeremy Singer-Vine <jsvine@gmail.com>
Date: Mon Feb 10 21:54:10 2025 -0500
Use csv.QUOTE_MINIMAL for .to_csv(...)
commit 47a7ab8
Author: Jeremy Singer-Vine <jsvine@gmail.com>
Date: Mon Feb 10 21:53:17 2025 -0500
Update exception handler
commit 8f5f498
Author: Jeremy Singer-Vine <jsvine@gmail.com>
Date: Sun Feb 9 17:23:37 2025 -0500
Fix wrong exception expectation in test
commit 43ccc5b
Author: Jeremy Singer-Vine <jsvine@gmail.com>
Date: Sun Feb 9 16:23:57 2025 -0500
Catch exceptions from pdfminer and malformed PDFs
... thanks to OSS-Fuzz and @ennamarie19
Cf.: google/oss-fuzz#12949
commit a77808a
Merge: c562774 5d47d5a
Author: Jeremy Singer-Vine <jsvine@gmail.com>
Date: Sun Feb 2 11:16:58 2025 -0500
Merge pull request jsvine#1270 from mara004/patch-1
test_issue_1089: update wording regarding pypdfium2
commit 5d47d5a
Author: mara004 <geisserml@gmail.com>
Date: Sun Feb 2 16:27:53 2025 +0100
test_issue_1089: update wording regarding pypdfium2
See jsvine#1089 (comment) for background
commit c562774
Author: Jeremy Singer-Vine <jsvine@gmail.com>
Date: Wed Jan 1 10:21:18 2025 -0500
Bump version to 0.11.5
commit 4af0e1d
Author: Jeremy Singer-Vine <jsvine@gmail.com>
Date: Wed Jan 1 10:21:00 2025 -0500
Update CHANGELOG.md
commit 7c63541
Author: Jeremy Singer-Vine <jsvine@gmail.com>
Date: Wed Jan 1 10:26:04 2025 -0500
Add thanks to @stolarczyk in README.md
commit 078df97
Author: Jeremy Singer-Vine <jsvine@gmail.com>
Date: Tue Dec 31 09:11:32 2024 -0500
Fix jsvine#1237 (tf → table_settings) h/t @n-traore
And thanks to @cmdlineluser for the nudge.
commit 6e54799
Author: Jeremy Singer-Vine <jsvine@gmail.com>
Date: Sat Dec 28 12:13:32 2024 -0500
Add thanks to @brandonrobertz (jsvine#1235)
commit 69d010a
Author: Jeremy Singer-Vine <jsvine@gmail.com>
Date: Sun Dec 15 23:24:31 2024 -0500
Add initial test/docs for `format --text` (jsvine#1235)
commit e0ee254
Merge: 28d4f50 f3f2b57
Author: Jeremy Singer-Vine <jsvine@gmail.com>
Date: Sun Dec 15 23:07:14 2024 -0500
Merge pull request jsvine#1235 from brandonrobertz/add-text-output-mode
Add a --format text option
commit f3f2b57
Author: Brandon Roberts <brandon@bxroberts.org>
Date: Tue Dec 10 14:21:22 2024 -0800
Add a --format text option
I use this regularly because pdfplumber has among the best layout
preserving methods for PDFs, especially machine generated ones.
Exposing the page output via CLI lets me use pdfplumber as a general
purpose PDF-to-text tool.
Usage:
pdfplumber --format text file.pdf > file.txt
commit 28d4f50
Merge: ea3b3e5 2073164
Author: Jeremy Singer-Vine <jsvine@gmail.com>
Date: Sun Dec 8 23:10:15 2024 -0500
Merge PR jsvine#1195
commit 2073164
Author: Jeremy Singer-Vine <jsvine@gmail.com>
Date: Sun Dec 8 22:55:30 2024 -0500
Appease linter
commit c80c78d
Author: Michal Stolarczyk <stolarczyk.michal93@gmail.com>
Date: Fri Nov 22 16:48:19 2024 +0100
add a test to cover raise_unicode_errors parameter
commit 1e4b48a
Author: Jeremy Singer-Vine <jsvine@gmail.com>
Date: Fri Nov 22 08:18:11 2024 -0500
Run 'make format' and ignore code line-length
commit 138abab
Author: Michal Stolarczyk <stolarczyk.michal93@gmail.com>
Date: Wed Nov 13 18:34:35 2024 +0100
rename warn_unicode_error to raise_unicode_errors for clarity
additionally change the default accordingly
commit ea3b3e5
Merge: 6ef62c9 8542adb
Author: Jeremy Singer-Vine <jsvine@gmail.com>
Date: Sun Nov 10 22:47:33 2024 -0500
Merge pull request jsvine#1221 from erghelium/develop
Fix broken link to Anssi Nurminen's master's thesis in the README.md
commit 8542adb
Author: Guilherme <101049490+erghelium@users.noreply.github.com>
Date: Sun Nov 10 18:19:04 2024 -0300
Fix broken link to Anssi Nurminen's master's thesis in README
commit 6ef62c9
Author: Jeremy Singer-Vine <jsvine@gmail.com>
Date: Wed Oct 2 21:11:38 2024 -0400
Add `name` property to `image` objects (jsvine#1201)
h/t @djr2015
commit 396c5e3
Author: Michal Stolarczyk <stolarczyk.michal93@gmail.com>
Date: Fri Aug 30 10:24:39 2024 +0200
warn on unicode decoding errors in PDF annotations
in some cases the the annotations may contain some junk that hinders annotations processing altogether. This allows to ignore the error and warn instead, which is configurable via warn_unicode_error arguments in the PDF initializer and/or open() method.1 parent 5a65a03 commit 27be51c
42 files changed
Lines changed: 269 additions & 95 deletions
File tree
- pdfplumber
- utils
- tests
- pdfs
- from-oss-fuzz/load
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
5 | 43 | | |
6 | 44 | | |
7 | 45 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
5 | | - | |
| 4 | + | |
| 5 | + | |
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
50 | | - | |
| 50 | + | |
51 | 51 | | |
52 | 52 | | |
53 | 53 | | |
| |||
274 | 274 | | |
275 | 275 | | |
276 | 276 | | |
| 277 | + | |
277 | 278 | | |
278 | 279 | | |
279 | 280 | | |
| |||
354 | 355 | | |
355 | 356 | | |
356 | 357 | | |
357 | | - | |
| 358 | + | |
358 | 359 | | |
359 | 360 | | |
360 | 361 | | |
| |||
567 | 568 | | |
568 | 569 | | |
569 | 570 | | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
570 | 574 | | |
571 | 575 | | |
572 | 576 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
2 | 2 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
11 | 14 | | |
12 | 15 | | |
13 | 16 | | |
| |||
37 | 40 | | |
38 | 41 | | |
39 | 42 | | |
40 | | - | |
| 43 | + | |
41 | 44 | | |
42 | 45 | | |
43 | 46 | | |
| |||
109 | 112 | | |
110 | 113 | | |
111 | 114 | | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
112 | 118 | | |
113 | 119 | | |
114 | 120 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
170 | 170 | | |
171 | 171 | | |
172 | 172 | | |
173 | | - | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
174 | 180 | | |
175 | 181 | | |
176 | 182 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
109 | 109 | | |
110 | 110 | | |
111 | 111 | | |
112 | | - | |
113 | | - | |
| 112 | + | |
| 113 | + | |
114 | 114 | | |
115 | 115 | | |
116 | 116 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
| 12 | + | |
12 | 13 | | |
13 | 14 | | |
14 | 15 | | |
| |||
52 | 53 | | |
53 | 54 | | |
54 | 55 | | |
55 | | - | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
56 | 61 | | |
57 | 62 | | |
58 | 63 | | |
| |||
64 | 69 | | |
65 | 70 | | |
66 | 71 | | |
67 | | - | |
68 | | - | |
69 | 72 | | |
70 | 73 | | |
71 | 74 | | |
| |||
334 | 337 | | |
335 | 338 | | |
336 | 339 | | |
337 | | - | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
338 | 344 | | |
339 | | - | |
340 | | - | |
341 | | - | |
342 | | - | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
343 | 351 | | |
344 | 352 | | |
345 | 353 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
1 | 2 | | |
2 | 3 | | |
3 | 4 | | |
| |||
13 | 14 | | |
14 | 15 | | |
15 | 16 | | |
| 17 | + | |
16 | 18 | | |
17 | 19 | | |
18 | 20 | | |
| |||
34 | 36 | | |
35 | 37 | | |
36 | 38 | | |
| 39 | + | |
37 | 40 | | |
38 | 41 | | |
39 | 42 | | |
| |||
64 | 67 | | |
65 | 68 | | |
66 | 69 | | |
| 70 | + | |
67 | 71 | | |
68 | 72 | | |
69 | 73 | | |
| |||
96 | 100 | | |
97 | 101 | | |
98 | 102 | | |
99 | | - | |
100 | | - | |
101 | | - | |
102 | | - | |
103 | | - | |
104 | | - | |
105 | | - | |
106 | | - | |
107 | | - | |
108 | | - | |
109 | | - | |
110 | | - | |
111 | | - | |
112 | | - | |
113 | | - | |
114 | | - | |
115 | | - | |
116 | | - | |
117 | | - | |
118 | | - | |
119 | | - | |
120 | | - | |
121 | | - | |
122 | 103 | | |
123 | 104 | | |
124 | 105 | | |
| |||
182 | 163 | | |
183 | 164 | | |
184 | 165 | | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
185 | 170 | | |
186 | 171 | | |
187 | 172 | | |
| |||
231 | 216 | | |
232 | 217 | | |
233 | 218 | | |
234 | | - | |
235 | | - | |
236 | | - | |
237 | | - | |
238 | | - | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
239 | 227 | | |
240 | 228 | | |
241 | 229 | | |
| |||
274 | 262 | | |
275 | 263 | | |
276 | 264 | | |
277 | | - | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
278 | 269 | | |
279 | 270 | | |
280 | 271 | | |
| |||
306 | 297 | | |
307 | 298 | | |
308 | 299 | | |
309 | | - | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
310 | 309 | | |
311 | 310 | | |
312 | 311 | | |
| |||
376 | 375 | | |
377 | 376 | | |
378 | 377 | | |
379 | | - | |
380 | | - | |
381 | | - | |
382 | | - | |
383 | | - | |
384 | | - | |
385 | | - | |
386 | 378 | | |
387 | 379 | | |
388 | 380 | | |
| |||
396 | 388 | | |
397 | 389 | | |
398 | 390 | | |
399 | | - | |
400 | | - | |
| 391 | + | |
| 392 | + | |
401 | 393 | | |
402 | | - | |
403 | | - | |
| 394 | + | |
| 395 | + | |
404 | 396 | | |
405 | 397 | | |
406 | 398 | | |
407 | | - | |
| 399 | + | |
408 | 400 | | |
409 | 401 | | |
410 | 402 | | |
| |||
0 commit comments