DenseConsulting
diff --git a/‎CHANGELOG.md‎
Lines changed: 38 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 38 additions & 0 deletions
diff --git a/‎CITATION.cff‎
Lines changed: 2 additions & 2 deletions b/‎CITATION.cff‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎README.md‎
Lines changed: 6 additions & 2 deletions b/‎README.md‎
Lines changed: 6 additions & 2 deletions
diff --git a/‎pdfplumber/_version.py‎
Lines changed: 1 addition & 1 deletion b/‎pdfplumber/_version.py‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎pdfplumber/cli.py‎
Lines changed: 7 additions & 1 deletion b/‎pdfplumber/cli.py‎
Lines changed: 7 additions & 1 deletion
diff --git a/‎pdfplumber/container.py‎
Lines changed: 7 additions & 1 deletion b/‎pdfplumber/container.py‎
Lines changed: 7 additions & 1 deletion
diff --git a/‎pdfplumber/convert.py‎
Lines changed: 2 additions & 2 deletions b/‎pdfplumber/convert.py‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎pdfplumber/display.py‎
Lines changed: 16 additions & 8 deletions b/‎pdfplumber/display.py‎
Lines changed: 16 additions & 8 deletions
diff --git a/‎pdfplumber/page.py‎
Lines changed: 34 additions & 42 deletions b/‎pdfplumber/page.py‎
Lines changed: 34 additions & 42 deletions
@@ -2,6 +2,44 @@
 
 All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](http://keepachangelog.com/).
 
+## [0.11.7] - 2025-06-12
+
+### Added
+- Add access to `Page.trimbox`, `Page.bleedbox`, and `Page.artbox` (h/t @samuelbradshaw). ([#1313](https://github.com/jsvine/pdfplumber/issues/1313) + [7e364e6](https://github.com/jsvine/pdfplumber/commit/7e364e6193c6e8bafa9b46587c0fdd4a46405399))
+
+### Changed
+- Upgrade `pdfminer.six` from `20250327` to `20250506`. ([4c7e092](https://github.com/jsvine/pdfplumber/commit/4c7e092))
+
+### Removed
+- Remove `stroking_pattern` and `non_stroking_pattern` object attributes, due to changes in `pdfminer.six`. ([4c7e092](https://github.com/jsvine/pdfplumber/commit/4c7e092))
+
+## [0.11.6] - 2025-03-27
+### Changed
+- Upgrade `pdfminer.six` from `20231228` to `20250327` ([3fcb493](https://github.com/jsvine/pdfplumber/commit/3fcb493) + [12a73a2](https://github.com/jsvine/pdfplumber/commit/12a73a2))
+- Use csv.QUOTE_MINIMAL for .to_csv(...) ([980494a](https://github.com/jsvine/pdfplumber/commit/980494a))
+
+
+### Fixed
+- Fix bug with `use_text_flow=True` text extraction (h/t @samuelbradshaw)([#1279](https://github.com/jsvine/pdfplumber/issues/1279) + [e15ed98](https://github.com/jsvine/pdfplumber/commit/e15ed98))
+- Catch exceptions from pdfminer and malformed PDFs ([43ccc5b](https://github.com/jsvine/pdfplumber/commit/43ccc5b))
+- More broadly handle RecursionError ([748ff31](https://github.com/jsvine/pdfplumber/commit/748ff31))
+
+### Removed
+- Remove test_issue_1089 ([#1263](https://github.com/jsvine/pdfplumber/issues/1263) + [7e28e76](https://github.com/jsvine/pdfplumber/commit/7e28e76))
+
+## [0.11.5] - 2025-01-01
+
+### Added
+
+- Add `--format text` options to CLI (in addition to previously-available `csv` and `json`) (h/t @brandonrobertz). ([#1235](https://github.com/jsvine/pdfplumber/pull/1235))
+- Add `raise_unicode_errors: bool` parameter to `pdfplumber.open()` to allow bypassing `UnicodeDecodeError`s in annotation-parsing and generate warnings instead (h/t @stolarczyk). ([#1195](https://github.com/jsvine/pdfplumber/issues/1195))
+- Add `name` property to `image` objects (h/t @djr2015). ([#1201](https://github.com/jsvine/pdfplumber/discussions/1201))
+
+### Fixed
+
+- Fix `PageImage.debug_tablefinder(...)` so that its main keyword argument is named the same (`table_settings=`) as other related `Page` methods (h/t @n-traore). ([#1237](https://github.com/jsvine/pdfplumber/issues/1237))
+
+
 ## [0.11.4] - 2024-08-18
 
 ### Fixed
 
@@ -1,8 +1,8 @@
 cff-version: 1.2.0
 title: pdfplumber
 type: software
-version: 0.11.4
-date-released: "2024-08-07"
+version: 0.11.7
+date-released: "2025-06-12"
 authors:
   - family-names: "Singer-Vine"
     given-names: "Jeremy"
 
@@ -47,7 +47,7 @@ The output will be a CSV containing info about every character, line, and rectan
 
 | Argument | Description |
 |----------|-------------|
-|`--format [format]`| `csv` or `json`. The `json` format returns more information; it includes PDF-level and page-level metadata, plus dictionary-nested attributes.|
+|`--format [format]`| `csv`, `json`, or `text`. The `csv` and `json` formats return information about each object. Of those two, the `json` format returns more information; it includes PDF-level and page-level metadata, plus dictionary-nested attributes. The `text` option returns a plain-text representation of the PDF, using `Page.extract_text(layout=True)`.|
 |`--pages [list of pages]`| A space-delimited, `1`-indexed list of pages or hyphenated page ranges. E.g., `1, 11-15`, which would return data for pages 1, 11, 12, 13, 14, and 15.|
 |`--types [list of object types to extract]`| Choices are `char`, `rect`, `line`, `curve`, `image`, `annot`, et cetera. Defaults to all available.|
 |`--laparams`| A JSON-formatted string (e.g., `'{"detect_vertical": true}'`) to pass to `pdfplumber.open(..., laparams=...)`.|
@@ -274,6 +274,7 @@ Additionally, both `pdfplumber.PDF` and `pdfplumber.Page` provide access to seve
 |`bits`| The number of bits per color component; e.g., 8 corresponds to 255 possible values for each color component (R, G, and B in an RGB color space).|
 |`stream`| Pixel values of the image, as a `pdfminer.pdftypes.PDFStream` object.|
 |`imagemask`| A nullable boolean; if `True`, "specifies that the image data is to be used as a stencil mask for painting in the current color."|
+|`name`| "The name by which this image XObject is referenced in the XObject subdictionary of the current resource dictionary." [🔗](https://ghostscript.com/~robin/pdf_reference17.pdf#page=340) |
 |`mcid`| The [marked content](https://ghostscript.com/~robin/pdf_reference17.pdf#page=850) section ID for this image if any (otherwise `None`). *Experimental attribute.*|
 |`tag`| The [marked content](https://ghostscript.com/~robin/pdf_reference17.pdf#page=850) section tag for this image if any (otherwise `None`). *Experimental attribute.*|
 |`object_type`| "image"|
@@ -354,7 +355,7 @@ Note: The methods above are built on Pillow's [`ImageDraw` methods](http://pillo
 
 ## Extracting tables
 
-`pdfplumber`'s approach to table detection borrows heavily from [Anssi Nurminen's master's thesis](http://dspace.cc.tut.fi/dpub/bitstream/handle/123456789/21520/Nurminen.pdf?sequence=3), and is inspired by [Tabula](https://github.com/tabulapdf/tabula-extractor/issues/16). It works like this:
+`pdfplumber`'s approach to table detection borrows heavily from [Anssi Nurminen's master's thesis](https://trepo.tuni.fi/bitstream/handle/123456789/21520/Nurminen.pdf?sequence=3), and is inspired by [Tabula](https://github.com/tabulapdf/tabula-extractor/issues/16). It works like this:
 
 1. For any given PDF page, find the lines that are (a) explicitly defined and/or (b) implied by the alignment of words on the page.
 2. Merge overlapping, or nearly-overlapping, lines.
@@ -567,6 +568,9 @@ Many thanks to the following users who've contributed ideas, features, and fixes
 - [Quentin André](https://github.com/QuentinAndre11)
 - [Léo Roux](https://github.com/leorouxx)
 - [@wodny](https://github.com/wodny)
+- [Michal Stolarczyk](https://github.com/stolarczyk)
+- [Brandon Roberts](https://github.com/brandonrobertz)
+- [@ennamarie19](https://github.com/ennamarie19)
 
 ## Contributing
 
 
@@ -1,2 +1,2 @@
-version_info = (0, 11, 4)
+version_info = (0, 11, 7)
 __version__ = ".".join(map(str, version_info))
@@ -8,6 +8,9 @@
 
 from .pdf import PDF
 
+if len(sys.argv) == 1:
+    sys.argv.append("--help")
+
 
 def parse_page_spec(p_str: str) -> List[int]:
     if "-" in p_str:
@@ -37,7 +40,7 @@ def parse_args(args_raw: List[str]) -> argparse.Namespace:
         action="store_true",
     )
 
-    parser.add_argument("--format", choices=["csv", "json"], default="csv")
+    parser.add_argument("--format", choices=["csv", "json", "text"], default="csv")
 
     parser.add_argument("--types", nargs="+")
 
@@ -109,6 +112,9 @@ def main(args_raw: List[str] = sys.argv[1:]) -> None:
                 include_attrs=args.include_attrs,
                 exclude_attrs=args.exclude_attrs,
             )
+        elif args.format == "text":
+            for page in pdf.pages:
+                print(page.extract_text(layout=True))
         else:
             pdf.to_json(
                 sys.stdout,
 
@@ -170,7 +170,13 @@ def to_csv(
 
         cols = CSV_COLS_REQUIRED + list(filter(serializer.attr_filter, non_req_cols))
 
-        w = csv.DictWriter(stream, fieldnames=cols, extrasaction="ignore")
+        w = csv.DictWriter(
+            stream,
+            fieldnames=cols,
+            extrasaction="ignore",
+            quoting=csv.QUOTE_MINIMAL,
+            escapechar="\\",
+        )
         w.writeheader()
         w.writerows(serialized)
 
 
@@ -109,8 +109,8 @@ def do_dict(self, obj: Dict[str, Any]) -> Dict[str, Any]:
         else:
             return {k: self.serialize(v) for k, v in obj.items()}
 
-    def do_PDFStream(self, obj: Any) -> Dict[str, str]:
-        return {"rawdata": to_b64(obj.rawdata)}
+    def do_PDFStream(self, obj: Any) -> Dict[str, Optional[str]]:
+        return {"rawdata": to_b64(obj.rawdata) if obj.rawdata else None}
 
     def do_PSLiteral(self, obj: PSLiteral) -> str:
         return decode_text(obj.name)
 
@@ -9,6 +9,7 @@
 from . import utils
 from ._typing import T_bbox, T_num, T_obj, T_obj_list, T_point, T_seq
 from .table import T_table_settings, Table, TableFinder, TableSettings
+from .utils.exceptions import MalformedPDFException
 
 if TYPE_CHECKING:  # pragma: nocover
     import pandas as pd
@@ -52,7 +53,11 @@ def get_page_image(
         stream.seek(0)
         src = stream
 
-    pdfium_doc = pypdfium2.PdfDocument(src, password=password)
+    try:
+        pdfium_doc = pypdfium2.PdfDocument(src, password=password)
+    except pypdfium2.PdfiumError as e:
+        raise MalformedPDFException(e)
+
     pdfium_page = pdfium_doc.get_page(page_ix)
 
     img: PIL.Image.Image = pdfium_page.render(
@@ -64,8 +69,6 @@ def get_page_image(
         # Non-modifiable arguments
         prefer_bgrx=True,
     ).to_pil()
-    # In theory `autoclose` when creating it should make it close...
-    # automatically.  In practice this does not seem to be the case.
     pdfium_doc.close()
 
     return img.convert("RGB")
@@ -334,12 +337,17 @@ def debug_table(
         return self
 
     def debug_tablefinder(
-        self, tf: Optional[Union[TableFinder, TableSettings, T_table_settings]] = None
+        self,
+        table_settings: Optional[
+            Union[TableFinder, TableSettings, T_table_settings]
+        ] = None,
     ) -> "PageImage":
-        if isinstance(tf, TableFinder):
-            finder = tf
-        elif tf is None or isinstance(tf, (TableSettings, dict)):
-            finder = self.page.debug_tablefinder(tf)
+        if isinstance(table_settings, TableFinder):
+            finder = table_settings
+        elif table_settings is None or isinstance(
+            table_settings, (TableSettings, dict)
+        ):
+            finder = self.page.debug_tablefinder(table_settings)
         else:
             raise ValueError(
                 "Argument must be instance of TableFinder"
 
@@ -1,3 +1,4 @@
+import numbers
 import re
 from functools import lru_cache
 from typing import (
@@ -13,6 +14,7 @@
     Union,
 )
 from unicodedata import normalize as normalize_unicode
+from warnings import warn
 
 from pdfminer.converter import PDFPageAggregator
 from pdfminer.layout import (
@@ -34,6 +36,7 @@
 from .structure import PDFStructTree, StructTreeMissing
 from .table import T_table_settings, Table, TableFinder, TableSettings
 from .utils import decode_text, resolve_all, resolve_and_decode
+from .utils.exceptions import MalformedPDFException, PdfminerException
 from .utils.text import TextMap
 
 lt_pat = re.compile(r"^LT")
@@ -64,6 +67,7 @@
         "stroke",
         "stroking_color",
         "stream",
+        "name",
         "mcid",
         "tag",
     ]
@@ -96,29 +100,6 @@ def fix_fontname_bytes(fontname: bytes) -> str:
     return str(prefix)[2:-1] + suffix_new
 
 
-def separate_pattern(
-    color: Tuple[Any, ...]
-) -> Tuple[Optional[Tuple[Union[float, int], ...]], Optional[str]]:
-    if isinstance(color[-1], PSLiteral):
-        return (color[:-1] or None), decode_text(color[-1].name)
-    else:
-        return color, None
-
-
-def normalize_color(
-    color: Any,
-) -> Tuple[Optional[Tuple[Union[float, int], ...]], Optional[str]]:
-    if color is None:
-        return (None, None)
-    elif isinstance(color, tuple):
-        tuplefied = color
-    elif isinstance(color, list):
-        tuplefied = tuple(color)
-    else:
-        tuplefied = (color,)
-    return separate_pattern(tuplefied)
-
-
 def tuplify_list_kwargs(kwargs: Dict[str, Any]) -> Dict[str, Any]:
     return {
         key: (tuple(value) if isinstance(value, list) else value)
@@ -182,6 +163,10 @@ def _normalize_box(box_raw: T_bbox, rotation: T_num = 0) -> T_bbox:
     # conventionally specified by their lower-left and upperright
     # corners, it is acceptable to specify any two diagonally opposite
     # corners."
+    if not all(isinstance(x, numbers.Number) for x in box_raw):  # pragma: nocover
+        raise MalformedPDFException(
+            f"Bounding box contains non-number coordinate(s): {box_raw}"
+        )
     x0, x1 = sorted((box_raw[0], box_raw[2]))
     y0, y1 = sorted((box_raw[1], box_raw[3]))
     if rotation in [90, 270]:
@@ -231,11 +216,14 @@ def get_attr(key: str, default: Any = None) -> Any:
 
         self.mediabox = _invert_box(mb_raw, mb_height)
 
-        if "CropBox" in page_obj.attrs:
-            self.cropbox = _invert_box(
-                _normalize_box(get_attr("CropBox"), self.rotation), mb_height
-            )
-        else:
+        for box_name in ["CropBox", "TrimBox", "BleedBox", "ArtBox"]:
+            if box_name in page_obj.attrs:
+                box_normalized = _invert_box(
+                    _normalize_box(get_attr(box_name), self.rotation), mb_height
+                )
+                setattr(self, box_name.lower(), box_normalized)
+
+        if "CropBox" not in page_obj.attrs:
             self.cropbox = self.mediabox
 
         # Page.bbox defaults to self.mediabox, but can be altered by Page.crop(...)
@@ -274,7 +262,10 @@ def layout(self) -> LTPage:
             laparams=self.pdf.laparams,
         )
         interpreter = PDFPageInterpreter(self.pdf.rsrcmgr, device)
-        interpreter.process_page(self.page_obj)
+        try:
+            interpreter.process_page(self.page_obj)
+        except Exception as e:
+            raise PdfminerException(e)
         self._layout: LTPage = device.get_result()
         return self._layout
 
@@ -306,7 +297,15 @@ def parse(annot: T_obj) -> T_obj:
                     try:
                         extras[k] = v.decode("utf-8")
                     except UnicodeDecodeError:
-                        extras[k] = v.decode("utf-16")
+                        try:
+                            extras[k] = v.decode("utf-16")
+                        except UnicodeDecodeError:
+                            if self.pdf.raise_unicode_errors:
+                                raise
+                            warn(
+                                f"Could not decode {k} of annotation."
+                                f" {k} will be missing."
+                            )
 
             parsed = {
                 "page_number": self.page_number,
@@ -376,13 +375,6 @@ def process_attr(item: Tuple[str, Any]) -> Optional[Tuple[str, Any]]:
             if hasattr(obj, cs):
                 attr[cs] = resolve_and_decode(getattr(obj, cs).name)
 
-        for color_attr, pattern_attr in [
-            ("stroking_color", "stroking_pattern"),
-            ("non_stroking_color", "non_stroking_pattern"),
-        ]:
-            if color_attr in attr:
-                attr[color_attr], attr[pattern_attr] = normalize_color(attr[color_attr])
-
         if isinstance(obj, (LTChar, LTTextContainer)):
             text = obj.get_text()
             attr["text"] = (
@@ -396,15 +388,15 @@ def process_attr(item: Tuple[str, Any]) -> Optional[Tuple[str, Any]]:
             # directly expose .stroking_color and .non_stroking_color
             # for LTChar objects (unlike, e.g., LTRect objects).
             gs = obj.graphicstate
-            attr["stroking_color"], attr["stroking_pattern"] = normalize_color(
-                gs.scolor
+            attr["stroking_color"] = (
+                gs.scolor if isinstance(gs.scolor, tuple) else (gs.scolor,)
             )
-            attr["non_stroking_color"], attr["non_stroking_pattern"] = normalize_color(
-                gs.ncolor
+            attr["non_stroking_color"] = (
+                gs.ncolor if isinstance(gs.ncolor, tuple) else (gs.ncolor,)
             )
 
             # Handle (rare) byte-encoded fontnames
-            if isinstance(attr["fontname"], bytes):
+            if isinstance(attr["fontname"], bytes):  # pragma: nocover
                 attr["fontname"] = fix_fontname_bytes(attr["fontname"])
 
         elif isinstance(obj, (LTCurve,)):
Original file line number	Diff line number	Diff line change
`@@ -1,2 +1,2 @@`
`1`		`-version_info = (0, 11, 4)`
	`1`	`+version_info = (0, 11, 7)`
`2`	`2`	`__version__ = ".".join(map(str, version_info))`