|
| 1 | +# PyMuPDF |
| 2 | + |
| 3 | +> # PyMuPDF |
| 4 | +> |
| 5 | +> **PyMuPDF** is a high performance **Python** library for data extraction, analysis, conversion & manipulation of [PDF (and other) documents](https://pymupdf.readthedocs.io/en/latest/the-basics.html#supported-file-types). |
| 6 | +> |
| 7 | +> # Community |
| 8 | +> Join us on **Discord** here: [#pymupdf](https://discord.gg/TSpYGBW4eq) |
| 9 | +> |
| 10 | +> |
| 11 | +> # Installation |
| 12 | +> |
| 13 | +> **PyMuPDF** requires **Python 3.10 or later**, install using **pip** with: |
| 14 | +> |
| 15 | +> `pip install PyMuPDF` |
| 16 | +> |
| 17 | +> There are **no mandatory** external dependencies. However, some [optional features](#pymupdf-optional-features) become available only if additional packages are installed. |
| 18 | +> |
| 19 | +> You can also try without installing by visiting [PyMuPDF.io](https://pymupdf.io/#examples). |
| 20 | +> |
| 21 | +> |
| 22 | +> # Usage |
| 23 | +> |
| 24 | +> Basic usage is as follows: |
| 25 | +> |
| 26 | +> ```python |
| 27 | +> import pymupdf # imports the pymupdf library |
| 28 | +> doc = pymupdf.open("example.pdf") # open a document |
| 29 | +> for page in doc: # iterate the document pages |
| 30 | +> text = page.get_text() # get plain text encoded as UTF-8 |
| 31 | +> |
| 32 | +> ``` |
| 33 | +> |
| 34 | +> |
| 35 | +> # Documentation |
| 36 | +> |
| 37 | +> Full documentation can be found on [pymupdf.readthedocs.io](https://pymupdf.readthedocs.io). |
| 38 | +> |
| 39 | +> |
| 40 | +> |
| 41 | +> # <a id="pymupdf-optional-features"></a>Optional Features |
| 42 | +> |
| 43 | +> * [fontTools](https://pypi.org/project/fonttools/) for creating font subsets. |
| 44 | +> * [pymupdf-fonts](https://pypi.org/project/pymupdf-fonts/) contains some nice fonts for your text output. |
| 45 | +> * [Tesseract-OCR](https://github.com/tesseract-ocr/tesseract) for optical character recognition in images and document pages. |
| 46 | +> |
| 47 | +> |
| 48 | +> |
| 49 | +> # About |
| 50 | +> |
| 51 | +> **PyMuPDF** adds **Python** bindings and abstractions to [MuPDF](https://mupdf.com/), a lightweight **PDF**, **XPS**, and **eBook** viewer, renderer, and toolkit. Both **PyMuPDF** and **MuPDF** are maintained and developed by [Artifex Software, Inc](https://artifex.com). |
| 52 | +> |
| 53 | +> **PyMuPDF** was originally written by [Jorj X. McKie](mailto:jorj.x.mckie@outlook.de). |
| 54 | +> |
| 55 | +> |
| 56 | +> # License and Copyright |
| 57 | +> |
| 58 | +> **PyMuPDF** is available under [open-source AGPL](https://www.gnu.org/licenses/agpl-3.0.html) and commercial license agreements. If you determine you cannot meet the requirements of the **AGPL**, please contact [Artifex](https://artifex.com/contact/pymupdf-inquiry.php) for more information regarding a commercial license. |
| 59 | + |
| 60 | + |
| 61 | +2015-2026, Artifex |
| 62 | + |
| 63 | +## Pages |
| 64 | + |
| 65 | +- [Welcome to <cite>PyMuPDF</cite>](index.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 66 | +- [PyMuPDF4LLM](pymupdf4llm/index.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 67 | +- [PyMuPDF Pro](pymupdf-pro/index.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 68 | +- [FAQ](faq/index.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 69 | +- [OCR](ocr/index.html.md): How automatic OCR works in PyMuPDF4LLM, when to force it, and how to swap in a different OCR engine. |
| 70 | +- [404!](404.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 71 | +- [feature-matrix th {](about-feature-matrix.html.md): border-style: hidden; |
| 72 | +- [copying-graph .about-graph-area.a {](about-performance.html.md): -webkit-tap-highlight-color: rgba(0,0,0,0); /\* make transparent link selection, adjust last value o... |
| 73 | +- [Features Comparison](about.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 74 | +- [Operator Algebra for Geometry Objects](algebra.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 75 | +- [Annot](annot.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 76 | +- [The PyMuPDF4LLM API](pymupdf4llm/api.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 77 | +- [Appendix 1: Details on Text Extraction](app1.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 78 | +- [Appendix 2: Considerations on Embedded Files](app2.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 79 | +- [Appendix 3: Assorted Technical Information](app3.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 80 | +- [Appendix 4: Performance Comparison Methodology](app4.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 81 | +- [Archive](archive-class.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 82 | +- [Change Log](changes.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 83 | +- [Classes](classes.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 84 | +- [Color Database](colors.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 85 | +- [Colorspace](colorspace.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 86 | +- [Converting Files](converting-files.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 87 | +- [Working together: DisplayList and TextPage](coop_low.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 88 | +- [Device](device.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 89 | +- [DisplayList](displaylist.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 90 | +- [DocumentWriter](document-writer-class.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 91 | +- [Document](document.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 92 | +- [FAQ](faq.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 93 | +- [Font](font.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 94 | +- [Footer](footer.html.md): <p style="color:#999" id="footerDisclaimer">This software is provided AS-IS with no warranty, either... |
| 95 | +- [Functions](functions.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 96 | +- [Glossary](glossary.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 97 | +- [Header-404](header-404.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 98 | +- [Header](header.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 99 | +- [Opening Files](how-to-open-a-file.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 100 | +- [Identity](identity.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 101 | +- [PyMuPDF4LLM](pymupdf4llm/index-new.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 102 | +- [Installation](installation.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 103 | +- [IRect](irect.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 104 | +- [Link](link.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 105 | +- [linkDest](linkdest.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 106 | +- [Low Level Functions and Classes](lowlevel.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 107 | +- [Matrix](matrix.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 108 | +- [Command line interface](module.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 109 | +- [OCR support](new-ocr.html.md): new-ocr.rst |
| 110 | +- [OCR Plugins](pymupdf4llm/ocr-plugins.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 111 | +- [Outline](outline.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 112 | +- [Packaging for Linux distributions](packaging.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 113 | +- [Page](page.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 114 | +- [Pixmap](pixmap.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 115 | +- [Point](point.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 116 | +- [Pyodide](pyodide.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 117 | +- [Quad](quad.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 118 | +- [PyMuPDF, LLM & RAG](rag.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 119 | +- [Annotations](recipes-annotations.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 120 | +- [Common Issues and their Solutions](recipes-common-issues-and-their-solutions.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 121 | +- [Drawing and Graphics](recipes-drawing-and-graphics.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 122 | +- [Images](recipes-images.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 123 | +- [Journalling](recipes-journalling.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 124 | +- [Low-Level Interfaces](recipes-low-level-interfaces.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 125 | +- [Multiprocessing](recipes-multiprocessing.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 126 | +- [OCR - Optical Character Recognition](recipes-ocr.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 127 | +- [Optional Content Support](recipes-optional-content.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 128 | +- [Stories](recipes-stories.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 129 | +- [Text](recipes-text.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 130 | +- [Recipes](recipes.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 131 | +- [Rect](rect.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 132 | +- [Resources](resources.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 133 | +- [Shape](shape.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 134 | +- [Story](story-class.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 135 | +- [feature-matrix th {](supported-files-table.html.md): border-style: hidden; |
| 136 | +- [Tesseract Language Packs](ocr/tesseract-language-packs.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 137 | +- [TextPage](textpage.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 138 | +- [TextWriter](textwriter.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 139 | +- [The Basics](the-basics.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 140 | +- [Tools](tools.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 141 | +- [Tutorial](tutorial.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 142 | +- [Constants and Enumerations](vars.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 143 | +- [Version](version.html.md): This documentation covers PyMuPDF 1.27.2.3. |
| 144 | +- [Widget](widget.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 145 | +- [Xml](xml-class.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 146 | +- [Deprecated Names](znames.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| 147 | + |
| 148 | +--- |
| 149 | + |
| 150 | +For more comprehensive documentation, see [llms-full.txt](llms-full.txt) |
0 commit comments