Skip to content

Commit 7eba586

Browse files
committed
Docs: adds PyMuPDF Product Suite matrix and link to Layout demo.
1 parent 0e9e035 commit 7eba586

2 files changed

Lines changed: 117 additions & 20 deletions

File tree

docs/about.rst

Lines changed: 110 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -23,38 +23,129 @@ The following table illustrates how |PyMuPDF| compares with other typical soluti
2323

2424
----
2525

26-
.. image:: images/icons/icon-docx.svg
27-
:width: 40
28-
:height: 40
2926

30-
.. image:: images/icons/icon-xlsx.svg
31-
:width: 40
32-
:height: 40
3327

34-
.. image:: images/icons/icon-pptx.svg
35-
:width: 40
36-
:height: 40
3728

3829

39-
.. image:: images/icons/icon-hangul.svg
40-
:width: 40
41-
:height: 40
30+
.. note::
4231

32+
.. image:: images/icons/icon-docx.svg
33+
:width: 40
34+
:height: 40
35+
:alt: DOCX icon
4336

37+
.. image:: images/icons/icon-xlsx.svg
38+
:width: 40
39+
:height: 40
40+
:alt: XLSX icon
4441

45-
.. note::
42+
.. image:: images/icons/icon-pptx.svg
43+
:width: 40
44+
:height: 40
45+
:alt: PPTX icon
46+
47+
.. image:: images/icons/icon-hangul.svg
48+
:width: 40
49+
:height: 40
50+
:alt: HWPX icon
51+
52+
A note about **Office** document types (DOCX, XLXS, PPTX) and **Hangul** documents (HWPX). These documents can be loaded into |PyMuPDF| and you will receive a :ref:`Document <Document>` object.
4653

47-
A note about **Office** document types (DOCX, XLXS, PPTX) and **Hangul** documents (HWPX). These documents can be loaded into |PyMuPDF| and you will receive a :ref:`Document <Document>` object.
54+
There are some caveats:
4855

49-
There are some caveats:
56+
- we convert the input to **HTML** to layout the content.
57+
- because of this the original page separation has gone.
5058

59+
When saving out the result any faithful representation of the original layout cannot be expected.
5160

52-
- we convert the input to **HTML** to layout the content.
53-
- because of this the original page separation has gone.
61+
Therefore input files are mostly in a form that's useful for text extraction.
62+
63+
64+
----
5465

55-
When saving out the result any faithful representation of the original layout cannot be expected.
66+
.. _About_PyMuPDF_Product_Suite:
67+
68+
PyMuPDF Product Suite
69+
-----------------------------------------------
70+
71+
|PyMuPDF| is the standard version of the library, however there are a family of additional products each with different features and functionality.
72+
73+
**Additional products** in the |PyMuPDF| product suite are:
74+
75+
- |PyMuPDF Pro| adds support for Office document formats.
76+
- |PyMuPDF4LLM| is optimized for large language model (LLM) applications, providing enhanced text extraction and processing capabilities.
77+
- |PyMuPDF Layout| focuses on layout analysis and semantic understanding, ideal for document conversion and formatting tasks with enhanced results.
78+
79+
.. note::
80+
All of the products above depend on the same core product - |PyMuPDF| and therefore have full access to all of its features.
81+
These additional products can be seen as optional extras to the enhance the core |PyMuPDF| library.
82+
83+
84+
.. _About_PyMuPDF_Products_Comparison:
85+
86+
PyMuPDF Products Comparison
87+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
88+
89+
The following table illustrates what features the products offer:
90+
91+
.. list-table:: PyMuPDF Products Comparison
92+
:widths: 8 23 23 23 23
93+
:header-rows: 1
94+
95+
* -
96+
- PyMuPDF
97+
- PyMuPDF Pro
98+
- PyMuPDF4LLM
99+
- PyMuPDF Layout
100+
* - **Input Documents**
101+
- `PDF`, `XPS`, `EPUB`, `CBZ`, `MOBI`, `FB2`, `SVG`, `TXT`, Images (*standard document types*)
102+
- *as PyMuPDF* and:
103+
`DOC`/`DOCX`, `XLS`/`XLSX`, `PPT`/`PPTX`, `HWP`/`HWPX`
104+
- *as PyMuPDF*
105+
- *as PyMuPDF*
106+
* - **Output Documents**
107+
- Can convert any input document to `PDF`, `SVG` or Image
108+
- *as PyMuPDF*
109+
- *as PyMuPDF* and:
110+
Markdown (`MD`)
111+
- *as PyMuPDF4LLM* and:
112+
`JSON` or `TXT`
113+
* - **Page Analysis**
114+
- Basic page analysis to return document structure
115+
- *as PyMuPDF*
116+
- *as PyMuPDF*
117+
- Advanced Page Analysis with trained data for enhanced results
118+
* - **Data extraction**
119+
- Basic data extraction with structured layout information and bounding box data
120+
- *as PyMuPDF*
121+
- Advanced data extraction with structure tags such as headings, lists, tables
122+
- Advanced layout analysis and semantic understanding
123+
* - **Table extraction**
124+
- Basic table extraction as part of text extraction
125+
- *as PyMuPDF*
126+
- Advanced table extraction with cell structure and data types
127+
- Superior table detection
128+
* - **Image extraction**
129+
- Basic image extraction
130+
- *as PyMuPDF*
131+
- Advanced detection and rendering of image areas on page saving them to disk or embedding in MD output
132+
- Superior detection of "picture" areas
133+
* - **Vector extraction**
134+
- Vector extraction and clustering
135+
- *as PyMuPDF*
136+
- *as PyMuPDF*
137+
- Superior detection of "picture" areas
138+
* - **Popular RAG Integrations**
139+
- Langchane, LlamaIndex
140+
- *as PyMuPDF*
141+
- *as PyMuPDF* and with some addiotnal help methods for RAG workflows
142+
- *as PyMuPDF4LLM*
143+
* - **OCR**
144+
- On-demand invocation of built-in Tesseract for text detection on pages or images.
145+
- *as PyMuPDF*
146+
- *as PyMuPDF*
147+
- Automatic OCR based on page content analysis.
56148

57-
Therefore input files are mostly in a form that's useful for text extraction.
58149

59150

60151
----

docs/pymupdf-layout/index.rst

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
</script>
1111

1212

13-
13+
1414
PyMuPDF Layout
1515
===========================================================================
1616

@@ -20,6 +20,12 @@ PyMuPDF Layout
2020
It is an optional, but recommended, addition to the |PyMuPDF| library especially if you are required to more accurately extract structured data with better semantic information.
2121

2222

23+
.. raw:: html
24+
25+
<button id="tryButton" class="cta orange" onclick="window.location='https://demo.pymupdf.io'">Try Demo</button>
26+
<p></p>
27+
28+
2329
Installing
2430
----------------------------------
2531

0 commit comments

Comments
 (0)