Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 97 additions & 0 deletions docs/converting-files.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
.. include:: header.rst

.. _ConvertingFiles:

==============================
Converting Files
==============================



Files to PDF
~~~~~~~~~~~~~~~~~~

:ref:`Document types supported by PyMuPDF<HowToOpenAFile>` can easily be converted to |PDF| by using the :meth:`Document.convert_to_pdf` method. This method returns a buffer of data which can then be utilized by |PyMuPDF| to create a new |PDF|.



**Example**

.. code-block:: python

import pymupdf

xps = pymupdf.open("input.xps")
pdfbytes = xps.convert_to_pdf()
pdf = pymupdf.open("pdf", pdfbytes)
pdf.save("output.pdf")



PDF to SVG
~~~~~~~~~~~~~~~~~~

Technically, as SVG files cannot be multipage, we must export each page as an SVG.

To get an SVG representation of a page use the :meth:`Page.get_svg_image` method.

**Example**

.. code-block:: python

import pymupdf

doc = pymupdf.open("input.pdf")
page = doc[0]

# Convert page to SVG
svg_content = page.get_svg_image()

# Save to file
with open("output.svg", "w", encoding="utf-8") as f:
f.write(svg_content)

doc.close()


PDF to Markdown
~~~~~~~~~~~~~~~~~

By utlilizing the :doc:`PyMuPDF4LLM API <pymupdf4llm/api>` we are able to convert PDF to a Markdown representation.

**Example**

.. code-block:: python

import pymupdf4llm
import pathlib

md_text = pymupdf4llm.to_markdown("test.pdf")
print(md_text)

pathlib.Path("4llm-output.md").write_bytes(md_text.encode())


PDF to DOCX
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Use the pdf2docx_ library which uses |PyMuPDF| to provide document conversion from |PDF| to **DOCX** format.



**Example**

.. code-block:: python

from pdf2docx import Converter

pdf_file = 'input.pdf'
docx_file = 'output.docx'

# convert pdf to docx
cv = Converter(pdf_file)
cv.convert(docx_file) # all pages by default
cv.close()


.. include:: footer.rst
4 changes: 2 additions & 2 deletions docs/document.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1448,7 +1448,7 @@ For details on **embedded files** refer to Appendix 3.

PDF only: Insert an empty page.

:arg int pno: page number in front of which the new page should be inserted. Must be in `1 < pno <= page_count`. Special values -1 and *doc.page_count* insert **after** the last page.
:arg int pno: page number index (zero-indexed) at which to insert page. Special values -1 and *doc.page_count* insert **after** the last page.

:arg float width: page width.
:arg float height: page height.
Expand All @@ -1468,7 +1468,7 @@ For details on **embedded files** refer to Appendix 3.

PDF only: Insert a new page and insert some text. Convenience function which combines :meth:`Document.new_page` and (parts of) :meth:`Page.insert_text`.

:arg int pno: page number (0-based) **in front of which** to insert. Must be in `range(-1, doc.page_count + 1)`. Special values -1 and `doc.page_count` insert **after** the last page.
:arg int pno: page number index (zero-indexed) at which to insert page. Special values -1 and `doc.page_count` insert **after** the last page.

Changed in v1.14.12
This is now a positional parameter
Expand Down
37 changes: 37 additions & 0 deletions docs/how-to-open-a-file.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,16 +11,53 @@ Opening Files

.. _Supported_File_Types:


Supported File Types
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

|

PyMuPDF
"""""""""

|PyMuPDF| can open files other than just |PDF|.

The following file types are supported:

.. include:: supported-files-table.rst


----


PyMuPDF Pro
"""""""""""""""

|PyMuPDF Pro| can open Office files.

The following file types are supported:

.. list-table::
:header-rows: 1

* - **DOC/DOCX**
- **XLS/XLSX**
- **PPT/PPTX**
- **HWP/HWPX**
* - .. image:: images/icons/icon-docx.svg
:width: 40
:height: 40
- .. image:: images/icons/icon-xlsx.svg
:width: 40
:height: 40
- .. image:: images/icons/icon-pptx.svg
:width: 40
:height: 40
- .. image:: images/icons/icon-hangul.svg
:width: 40
:height: 40



How to Open a File
~~~~~~~~~~~~~~~~~~~~~
Expand Down
8 changes: 5 additions & 3 deletions docs/recipes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,11 @@

how-to-open-a-file.rst

----

.. toctree::

converting-files.rst

----

Expand All @@ -18,21 +23,18 @@

recipes-text.rst


----

.. toctree::

recipes-images.rst


----

.. toctree::

recipes-annotations.rst


----

.. toctree::
Expand Down
2 changes: 1 addition & 1 deletion docs/shape.rst
Original file line number Diff line number Diff line change
Expand Up @@ -345,7 +345,7 @@ Several draw methods can be executed in a row and each one of them will contribu

:arg float lineheight: a factor to override the line height calculated from font properties. If not `None`, a line height of `fontsize * lineheight` will be used.

:arg int expandtabs: controls handling of tab characters ``\t`` using the `string.expandtabs()` method **per each line**.
:arg int expandtabs: controls handling of tab characters ``\t`` using the `string.expandtabs()` method **per each line**.

:arg float stroke_opacity: *(new in v1.18.1)* set transparency for stroke colors. Negative values and values > 1 will be ignored. Default is 1 (intransparent).
:arg float fill_opacity: *(new in v1.18.1)* set transparency for fill colors. Default is 1 (intransparent). Use this value to control transparency of the text color. Stroke opacity **only** affects the border line of characters.
Expand Down