You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -23,38 +23,129 @@ The following table illustrates how |PyMuPDF| compares with other typical soluti
23
23
24
24
----
25
25
26
-
.. image:: images/icons/icon-docx.svg
27
-
:width:40
28
-
:height:40
29
26
30
-
.. image:: images/icons/icon-xlsx.svg
31
-
:width:40
32
-
:height:40
33
27
34
-
.. image:: images/icons/icon-pptx.svg
35
-
:width:40
36
-
:height:40
37
28
38
29
39
-
.. image:: images/icons/icon-hangul.svg
40
-
:width:40
41
-
:height:40
30
+
.. note::
42
31
32
+
.. image:: images/icons/icon-docx.svg
33
+
:width:40
34
+
:height:40
35
+
:alt:DOCX icon
43
36
37
+
.. image:: images/icons/icon-xlsx.svg
38
+
:width:40
39
+
:height:40
40
+
:alt:XLSX icon
44
41
45
-
.. note::
42
+
.. image:: images/icons/icon-pptx.svg
43
+
:width:40
44
+
:height:40
45
+
:alt:PPTX icon
46
+
47
+
.. image:: images/icons/icon-hangul.svg
48
+
:width:40
49
+
:height:40
50
+
:alt:HWPX icon
51
+
52
+
A note about **Office** document types (DOCX, XLXS, PPTX) and **Hangul** documents (HWPX). These documents can be loaded into |PyMuPDF| and you will receive a :ref:`Document <Document>` object.
46
53
47
-
A note about **Office** document types (DOCX, XLXS, PPTX) and **Hangul** documents (HWPX). These documents can be loaded into |PyMuPDF| and you will receive a :ref:`Document <Document>` object.
54
+
There are some caveats:
48
55
49
-
There are some caveats:
56
+
- we convert the input to **HTML** to layout the content.
57
+
- because of this the original page separation has gone.
50
58
59
+
When saving out the result any faithful representation of the original layout cannot be expected.
51
60
52
-
- we convert the input to **HTML** to layout the content.
53
-
- because of this the original page separation has gone.
61
+
Therefore input files are mostly in a form that's useful for text extraction.
62
+
63
+
64
+
----
54
65
55
-
When saving out the result any faithful representation of the original layout cannot be expected.
66
+
.. _About_PyMuPDF_Product_Suite:
67
+
68
+
PyMuPDF Product Suite
69
+
-----------------------------------------------
70
+
71
+
|PyMuPDF| is the standard version of the library, however there are a family of additional products each with different features and functionality.
72
+
73
+
**Additional products** in the |PyMuPDF| product suite are:
74
+
75
+
- |PyMuPDF Pro| adds support for Office document formats.
76
+
- |PyMuPDF4LLM| is optimized for large language model (LLM) applications, providing enhanced text extraction and processing capabilities.
77
+
- |PyMuPDF Layout| focuses on layout analysis and semantic understanding, ideal for document conversion and formatting tasks with enhanced results.
78
+
79
+
.. note::
80
+
All of the products above depend on the same core product - |PyMuPDF| and therefore have full access to all of its features.
81
+
These additional products can be seen as optional extras to the enhance the core |PyMuPDF| library.
82
+
83
+
84
+
.. _About_PyMuPDF_Products_Comparison:
85
+
86
+
PyMuPDF Products Comparison
87
+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
88
+
89
+
The following table illustrates what features the products offer:
It is an optional, but recommended, addition to the |PyMuPDF| library especially if you are required to more accurately extract structured data with better semantic information.
0 commit comments