diff --git a/docs/page.rst b/docs/page.rst index b207b3f5b..32f3e684d 100644 --- a/docs/page.rst +++ b/docs/page.rst @@ -2329,56 +2329,171 @@ This is an overview of homologous methods on the :ref:`Document` and on the :ref ====================================== ===================================== **Document Level** **Page Level** ====================================== ===================================== -*Document.get_page_fonts(pno)* :meth:`Page.get_fonts` -*Document.get_page_images(pno)* :meth:`Page.get_images` -*Document.get_page_pixmap(pno, ...)* :meth:`Page.get_pixmap` -*Document.get_page_text(pno, ...)* :meth:`Page.get_text` -*Document.search_page_for(pno, ...)* :meth:`Page.search_for` +:meth:`Document.get_page_fonts` :meth:`Page.get_fonts` +:meth:`Document.get_page_images` :meth:`Page.get_images` +:meth:`Document.get_page_pixmap` :meth:`Page.get_pixmap` +:meth:`Document.get_page_text` :meth:`Page.get_text` +:meth:`Document.search_page_for` :meth:`Page.search_for` ====================================== ===================================== -The page number "pno" is a 0-based integer `-∞ < pno < page_count`. +.. note:: + + Most document methods (left column) exist for convenience reasons, and are just wrappers for: *Document[pno].*. So they **load and discard the page** on each execution. + + However, the first two methods work differently. They only need a page's object definition statement - the page itself will **not** be loaded. So e.g. :meth:`Page.get_fonts` is a wrapper the other way round and defined as follows: `page.get_fonts` == `page.parent.get_page_fonts(page.number)`. + + +When calling the :ref:`Document` equivalent methods then the page number is sent through as a parameter, e.g.: + +`Document.get_page_images(pno)` or `Document.get_page_text(pno)` + +.. tip:: + + The page number parameter, ``pno``, is a 0-based integer `-∞ < pno < page_count`. + + + + + +Tables and Related Classes +------------------------------------ + +The `TableFinder` class is returned by :meth:`Page.find_tables` and has related classes as follows: .. class:: TableFinder An object always returned by :meth:`Page.find_tables`. Attributes of interest: - ... attribute:: tables + .. attribute:: tables - A list of :ref:`Table` objects, each of which represents a table found on the page. Empty list if no table found. + A list of :class:`Table` objects, each of which represents a table found on the page. An empty list if no tables are found. - ... attribute:: page + .. attribute:: page A reference to the :ref:`Page` object. + :type: :ref:`Page` + .. class:: Table - An object representing a table found on the page. Attributes of interest: + An object representing a table found on the page. - .. attribute:: bbox - The bounding box of the table given as a tuple `(x0, y0, x1, y1)`. This is the rectangle that contains all cells of the table. + .. attribute:: page - + A back-reference to the owning page. + + :type: :ref:`Page` .. attribute:: cells + An array of `Rect` objects for each cell in the table. + + :type: list + + + .. attribute:: header + + A `TableHeader` object. + + :type: `TableHeader` + + + .. attribute:: bbox + + The bounding box of all cells of the table header. + + + :type: :ref:`Rect` + + + + .. attribute:: row_count + + Number of rows in the table. + + :type: int + + + .. attribute:: col_count + + Number of columns in the table. + + :type: int + + + .. attribute:: rows + + An array of `TableRow` objects for each row in the table. + + :type: list + + + .. method:: extract() + + Extracts table cell text data into a list. + + :type: list + + .. method:: to_markdown(clean=False, fill_empty=True) + + Extracts table data into Markdown text format. + + + :arg bool clean: If ``True`` then markdown syntax is removed from cell content. + :arg bool fill_empty: If ``True`` then cell content `None` is replaced by the values above (columns) or left (rows) in an effort to approximate row and columns spans. + + + :type: string + + + .. method:: to_pandas() + + Return a `pandas DataFrame `_ `DataFrame `_ version of the table. + + :type: pandas DataFrame + .. class:: TableHeader -.. class:: TableRow + Dedicated class for table headers. + .. attribute:: bbox + The bounding box of the union of cells belonging to the table header, given as a tuple (x0, y0, x1, y1). This rectangle contains all table header cells. + :type: :ref:`Rect` -.. note:: + .. attribute:: cells - Most document methods (left column) exist for convenience reasons, and are just wrappers for: *Document[pno].*. So they **load and discard the page** on each execution. + A list of tuples for each bbox of a column header. + + :type: list + + .. attribute:: names + + A list of strings with column header text. + + :type: list + + .. attribute:: external + + A boolean indicating whether the header is outside the table cells. + + :type: `bool` + + +.. class:: TableRow + + Dedicated class for table rows. + + +---- - However, the first two methods work differently. They only need a page's object definition statement - the page itself will **not** be loaded. So e.g. :meth:`Page.get_fonts` is a wrapper the other way round and defined as follows: *page.get_fonts == page.parent.get_page_fonts(page.number)*. .. rubric:: Footnotes