Skip to content

Commit ae7f8c4

Browse files
committed
Adds LLMs text for improved AI queries
1 parent dd0216b commit ae7f8c4

4 files changed

Lines changed: 928 additions & 1 deletion

File tree

docs/conf.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -201,7 +201,8 @@
201201
# Add any extra paths that contain custom files (such as robots.txt or
202202
# .htaccess) here, relative to this directory. These files are copied
203203
# directly to the root of the documentation.
204-
# html_extra_path = []
204+
# Using to copy over the LLM specific files
205+
html_extra_path = ["llms"]
205206

206207
# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
207208
# using the given strftime format.

docs/llms/_x.txt

Lines changed: 150 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,150 @@
1+
# PyMuPDF
2+
3+
> # PyMuPDF
4+
>
5+
> **PyMuPDF** is a high performance **Python** library for data extraction, analysis, conversion & manipulation of [PDF (and other) documents](https://pymupdf.readthedocs.io/en/latest/the-basics.html#supported-file-types).
6+
>
7+
> # Community
8+
> Join us on **Discord** here: [#pymupdf](https://discord.gg/TSpYGBW4eq)
9+
>
10+
>
11+
> # Installation
12+
>
13+
> **PyMuPDF** requires **Python 3.10 or later**, install using **pip** with:
14+
>
15+
> `pip install PyMuPDF`
16+
>
17+
> There are **no mandatory** external dependencies. However, some [optional features](#pymupdf-optional-features) become available only if additional packages are installed.
18+
>
19+
> You can also try without installing by visiting [PyMuPDF.io](https://pymupdf.io/#examples).
20+
>
21+
>
22+
> # Usage
23+
>
24+
> Basic usage is as follows:
25+
>
26+
> ```python
27+
> import pymupdf # imports the pymupdf library
28+
> doc = pymupdf.open("example.pdf") # open a document
29+
> for page in doc: # iterate the document pages
30+
> text = page.get_text() # get plain text encoded as UTF-8
31+
>
32+
> ```
33+
>
34+
>
35+
> # Documentation
36+
>
37+
> Full documentation can be found on [pymupdf.readthedocs.io](https://pymupdf.readthedocs.io).
38+
>
39+
>
40+
>
41+
> # <a id="pymupdf-optional-features"></a>Optional Features
42+
>
43+
> * [fontTools](https://pypi.org/project/fonttools/) for creating font subsets.
44+
> * [pymupdf-fonts](https://pypi.org/project/pymupdf-fonts/) contains some nice fonts for your text output.
45+
> * [Tesseract-OCR](https://github.com/tesseract-ocr/tesseract) for optical character recognition in images and document pages.
46+
>
47+
>
48+
>
49+
> # About
50+
>
51+
> **PyMuPDF** adds **Python** bindings and abstractions to [MuPDF](https://mupdf.com/), a lightweight **PDF**, **XPS**, and **eBook** viewer, renderer, and toolkit. Both **PyMuPDF** and **MuPDF** are maintained and developed by [Artifex Software, Inc](https://artifex.com).
52+
>
53+
> **PyMuPDF** was originally written by [Jorj X. McKie](mailto:jorj.x.mckie@outlook.de).
54+
>
55+
>
56+
> # License and Copyright
57+
>
58+
> **PyMuPDF** is available under [open-source AGPL](https://www.gnu.org/licenses/agpl-3.0.html) and commercial license agreements. If you determine you cannot meet the requirements of the **AGPL**, please contact [Artifex](https://artifex.com/contact/pymupdf-inquiry.php) for more information regarding a commercial license.
59+
60+
61+
2015-2026, Artifex
62+
63+
## Pages
64+
65+
- [Welcome to <cite>PyMuPDF</cite>](index.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
66+
- [PyMuPDF4LLM](pymupdf4llm/index.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
67+
- [PyMuPDF Pro](pymupdf-pro/index.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
68+
- [FAQ](faq/index.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
69+
- [OCR](ocr/index.html.md): How automatic OCR works in PyMuPDF4LLM, when to force it, and how to swap in a different OCR engine.
70+
- [404!](404.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
71+
- [feature-matrix th {](about-feature-matrix.html.md): border-style: hidden;
72+
- [copying-graph .about-graph-area.a {](about-performance.html.md): -webkit-tap-highlight-color: rgba(0,0,0,0); /\* make transparent link selection, adjust last value o...
73+
- [Features Comparison](about.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
74+
- [Operator Algebra for Geometry Objects](algebra.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
75+
- [Annot](annot.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
76+
- [The PyMuPDF4LLM API](pymupdf4llm/api.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
77+
- [Appendix 1: Details on Text Extraction](app1.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
78+
- [Appendix 2: Considerations on Embedded Files](app2.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
79+
- [Appendix 3: Assorted Technical Information](app3.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
80+
- [Appendix 4: Performance Comparison Methodology](app4.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
81+
- [Archive](archive-class.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
82+
- [Change Log](changes.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
83+
- [Classes](classes.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
84+
- [Color Database](colors.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
85+
- [Colorspace](colorspace.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
86+
- [Converting Files](converting-files.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
87+
- [Working together: DisplayList and TextPage](coop_low.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
88+
- [Device](device.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
89+
- [DisplayList](displaylist.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
90+
- [DocumentWriter](document-writer-class.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
91+
- [Document](document.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
92+
- [FAQ](faq.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
93+
- [Font](font.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
94+
- [Footer](footer.html.md): <p style="color:#999" id="footerDisclaimer">This software is provided AS-IS with no warranty, either...
95+
- [Functions](functions.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
96+
- [Glossary](glossary.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
97+
- [Header-404](header-404.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
98+
- [Header](header.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
99+
- [Opening Files](how-to-open-a-file.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
100+
- [Identity](identity.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
101+
- [PyMuPDF4LLM](pymupdf4llm/index-new.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
102+
- [Installation](installation.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
103+
- [IRect](irect.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
104+
- [Link](link.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
105+
- [linkDest](linkdest.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
106+
- [Low Level Functions and Classes](lowlevel.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
107+
- [Matrix](matrix.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
108+
- [Command line interface](module.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
109+
- [OCR support](new-ocr.html.md): new-ocr.rst
110+
- [OCR Plugins](pymupdf4llm/ocr-plugins.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
111+
- [Outline](outline.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
112+
- [Packaging for Linux distributions](packaging.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
113+
- [Page](page.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
114+
- [Pixmap](pixmap.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
115+
- [Point](point.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
116+
- [Pyodide](pyodide.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
117+
- [Quad](quad.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
118+
- [PyMuPDF, LLM & RAG](rag.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
119+
- [Annotations](recipes-annotations.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
120+
- [Common Issues and their Solutions](recipes-common-issues-and-their-solutions.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
121+
- [Drawing and Graphics](recipes-drawing-and-graphics.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
122+
- [Images](recipes-images.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
123+
- [Journalling](recipes-journalling.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
124+
- [Low-Level Interfaces](recipes-low-level-interfaces.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
125+
- [Multiprocessing](recipes-multiprocessing.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
126+
- [OCR - Optical Character Recognition](recipes-ocr.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
127+
- [Optional Content Support](recipes-optional-content.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
128+
- [Stories](recipes-stories.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
129+
- [Text](recipes-text.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
130+
- [Recipes](recipes.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
131+
- [Rect](rect.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
132+
- [Resources](resources.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
133+
- [Shape](shape.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
134+
- [Story](story-class.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
135+
- [feature-matrix th {](supported-files-table.html.md): border-style: hidden;
136+
- [Tesseract Language Packs](ocr/tesseract-language-packs.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
137+
- [TextPage](textpage.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
138+
- [TextWriter](textwriter.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
139+
- [The Basics](the-basics.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
140+
- [Tools](tools.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
141+
- [Tutorial](tutorial.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
142+
- [Constants and Enumerations](vars.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
143+
- [Version](version.html.md): This documentation covers PyMuPDF 1.27.2.3.
144+
- [Widget](widget.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
145+
- [Xml](xml-class.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
146+
- [Deprecated Names](znames.html.md): PyMuPDF is a high-performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
147+
148+
---
149+
150+
For more comprehensive documentation, see [llms-full.txt](llms-full.txt)

0 commit comments

Comments
 (0)