Skip to content

Commit 1738ff5

Browse files
committed
Documentation updates for version 1.28.0
1 parent 610a665 commit 1738ff5

12 files changed

Lines changed: 253 additions & 54 deletions

docs/about-feature-matrix.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,10 @@
5555
:width: 0
5656
:height: 0
5757

58+
.. image:: images/icons/icon-md.svg
59+
:width: 0
60+
:height: 0
61+
5862
.. raw:: html
5963

6064

@@ -181,6 +185,11 @@
181185
background-size: 40px 40px;
182186
}
183187
188+
#feature-matrix .icon.md {
189+
background: url("_images/icon-md.svg") 0 0 transparent no-repeat;
190+
background-size: 40px 40px;
191+
}
192+
184193
</style>
185194

186195

@@ -207,6 +216,7 @@
207216
<span class="icon cbz"><cite>CBZ</cite></span>
208217
<span class="icon svg"><cite>SVG</cite></span>
209218
<span class="icon txt"><cite>TXT</cite></span>
219+
<span class="icon md"><cite>MD</cite></span>
210220
<span class="icon image"><cite id="transFM3">Image</cite></span>
211221
<hr/>
212222
<span class="icon docx"><cite>DOCX</cite></span>

docs/about.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@ The following table illustrates what features the products offer:
9797
- PyMuPDF Pro
9898
- PyMuPDF4LLM
9999
* - **Input Documents**
100-
- `PDF`, `XPS`, `EPUB`, `CBZ`, `MOBI`, `FB2`, `SVG`, `TXT`, Images (*standard document types*)
100+
- `PDF`, `XPS`, `EPUB`, `CBZ`, `MOBI`, `FB2`, `SVG`, `TXT`, `MD`, Images (*standard document types*)
101101
- *as PyMuPDF* and:
102102
`DOC`/`DOCX`, `XLS`/`XLSX`, `PPT`/`PPTX`, `HWP`/`HWPX`
103103
- *as PyMuPDF*

docs/app3.rst

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -421,6 +421,60 @@ Typical document page sizes are **ISO A4** and **Letter**. A **Letter** page has
421421

422422

423423

424+
425+
.. _CSS_Support:
426+
427+
CSS Support
428+
--------------------------------------------
429+
430+
For now, only a subset of CSS properties are supported.
431+
432+
The underlying C library MuPDF supports a subset of HTML4 and CSS2. The primary goal of the HTML/CSS support is to serve as a popular and convenient way to style text — not to faithfully reproduce websites in PDF.
433+
434+
What Works
435+
~~~~~~~~~~~~~
436+
437+
The following list shows the supported properties and their possible values. The list is not exhaustive, but it gives an idea of what to expect.
438+
439+
Text styling
440+
""""""""""""""
441+
442+
443+
``color``
444+
``font-family``
445+
``font-size``
446+
``font-weight`` (bold)
447+
``font-style`` (italic)
448+
``text-align``
449+
``line-height``
450+
``letter-spacing``
451+
``text-decoration`` (underline etc.)
452+
453+
Box model (basic)
454+
""""""""""""""""""""""""""""
455+
456+
``margin``
457+
``padding``
458+
``border``
459+
``background-color`` (applies to the text's occupied sub-rectangle, not the full box)
460+
461+
Fonts
462+
""""""""""""""
463+
464+
``@font-face`` for loading custom fonts via an Archive
465+
Standard variants (regular, bold, italic, bold-italic) via ``font-weight`` and ``font-style``
466+
467+
Layout
468+
""""""""""""""
469+
470+
Only relative layout is available. No ``position: absolute``, no ``flexbox``, no ``grid``, no ``float``, no ``clear``. The layout is basically a flow layout, where the text is laid out in lines and paragraphs, and the lines are laid out in blocks.
471+
472+
473+
What Doesn't Work
474+
~~~~~~~~~~~~~~~~~~~~~~~~~~
475+
476+
Modern CSS (CSS3+): no ``flexbox``, ``grid``, ``custom properties`` (--vars), ``calc()``, ``transitions``, ``animations``, ``position: absolute`` / ``fixed``, ``float``, ``clear`` and so on.
477+
424478
.. rubric:: Footnotes
425479

426480
.. [#f1] MuPDF supports "deep-copying" objects between PDF documents. To avoid duplicate data in the target, it uses so-called "graftmaps", like a form of scratchpad: for each object to be copied, its :data:`xref` number is looked up in the graftmap. If found, copying is skipped. Otherwise, the new :data:`xref` is recorded and the copy takes place. PyMuPDF makes use of this technique in two places so far: :meth:`Document.insert_pdf` and :meth:`Page.show_pdf_page`. This process is fast and very efficient, because it prevents multiple copies of typically large and frequently referenced data, like images and fonts. However, you may still want to consider using garbage collection (option 4) in any of the following cases:

docs/archive-class.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Archive
1010

1111
This class represents a generalization of file folders and container files like ZIP and TAR archives. Archives allow accessing arbitrary collections of file folders, ZIP / TAR files and single binary data elements as if they all were part of one hierarchical tree of folders.
1212

13-
In PyMuPDF, archives are currently only used by :ref:`Story` objects to specify where to look for fonts, images and other resources.
13+
In PyMuPDF, archives are currently only used by :ref:`Story` objects and as an :ref:`option when opening files <Full_Options_for_Opening_a_File>` to specify where to look for fonts, images and other resources.
1414

1515
================================ ===================================================
1616
**Method / Attribute** **Short Description**

docs/converting-files.rst

Lines changed: 102 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ Converting Files
1111
Files to PDF
1212
~~~~~~~~~~~~~~~~~~
1313

14-
:ref:`Document types supported by PyMuPDF<HowToOpenAFile>` can easily be converted to |PDF| by using the :meth:`Document.convert_to_pdf` method. This method returns a buffer of data which can then be utilized by |PyMuPDF| to create a new |PDF|.
14+
:ref:`Document types supported by PyMuPDF <HowToOpenAFile>` can easily be converted to |PDF| by using the :meth:`Document.convert_to_pdf` method. This method returns a buffer of data which can then be utilized by |PyMuPDF| to create a new |PDF|.
1515

1616

1717

@@ -20,38 +20,97 @@ Files to PDF
2020
.. code-block:: python
2121
2222
import pymupdf
23-
23+
24+
# Convert Markdown to PDF
25+
md_doc = pymupdf.open("example.md")
26+
pdfdata = md_doc.convert_to_pdf()
27+
pdf_doc = pymupdf.open(stream=pdfdata)
28+
pdf_doc.save("example.pdf")
29+
30+
# Convert XPS to PDF
2431
xps = pymupdf.open("input.xps")
25-
pdfbytes = xps.convert_to_pdf()
26-
pdf = pymupdf.open("pdf", pdfbytes)
32+
pdfdata = xps.convert_to_pdf()
33+
pdf = pymupdf.open(stream=pdfdata)
2734
pdf.save("output.pdf")
2835
36+
.. _Markdown_to_PDF:
2937

38+
Markdown to PDF
39+
~~~~~~~~~~~~~~~~~
3040

31-
PDF to SVG
32-
~~~~~~~~~~~~~~~~~~
41+
As Markdown files are supported input files they can be easily converted to PDF using the :meth:`Document.convert_to_pdf` method.
3342

34-
Technically, as SVG files cannot be multipage, we must export each page as an SVG.
43+
In the simplest case you can just open the Markdown file and call the method to get a PDF representation of the content.
3544

36-
To get an SVG representation of a page use the :meth:`Page.get_svg_image` method.
3745

38-
**Example**
46+
Defining paper size
47+
"""""""""""""""""""
48+
49+
The default paper size is 400 x 600 :doc:`rect` but you can specify a custom paper size if you wish, to do this just send through the `rect` parameter as required, for example:
3950

4051
.. code-block:: python
4152
42-
import pymupdf
53+
md_doc = pymupdf.open("example.md", rect=pymupdf.paper_rect("A4")) # A4 size
4354
44-
doc = pymupdf.open("input.pdf")
45-
page = doc[0]
4655
47-
# Convert page to SVG
48-
svg_content = page.get_svg_image()
56+
Defining CSS
57+
""""""""""""
58+
59+
By default, the Markdown content will be converted to PDF using a default CSS stylesheet. However, you can specify your own CSS stylesheet to customize the appearance of the resulting PDF. To do this, define your `css` and apply it.
60+
61+
For example, to make all ``h1`` headers red (The single ``#`` symbol in Markdown), you could do the following:
62+
63+
.. code-block:: python
64+
65+
md_doc = pymupdf.open( # open the Markdown document in A4 size
66+
"example.md",
67+
rect=pymupdf.paper_rect("A4")
68+
)
69+
70+
css = "h1 {color:red;}"
71+
md_doc.apply_css(css)
72+
73+
pdf_doc = pymupdf.open(stream=md_doc.convert_to_pdf())
74+
pdf_doc.ez_save("red-colored-header.pdf")
75+
76+
.. note::
77+
78+
The :ref:`support for CSS <CSS_Support>` is currently limited.
79+
80+
81+
Defining Fonts
82+
"""""""""""""""""
83+
84+
Fonts can be defined by using the `archive` parameter to provide a custom :ref:`Archive` containing the font files.
85+
86+
The fonts must exist in an archive which is provided to the `archive` parameter when opening the Markdown file. The CSS can then refer to these fonts by their names as defined in the archive.
87+
88+
For example, assuming you have access to the source files for the "Comic Sans" font for all text, you could do the following:
89+
90+
.. code-block:: python
91+
92+
# Global CSS instructions to use the "Comic Sans" font for all text. The font files must be provided in the archive.
93+
css = """
94+
@font-face {font-family: sans-serif; src: url(comic.ttf);}
95+
@font-face {font-family: sans-serif; src: url(comicbd.ttf); font-weight: bold;}
96+
@font-face {font-family: sans-serif; src: url(comicz.ttf); font-weight: bold; font-style: italic;}
97+
@font-face {font-family: sans-serif; src: url(comici.ttf); font-style: italic;}
98+
"""
99+
100+
archive = pymupdf.Archive("C:/Windows/Fonts") # the fonts are here
101+
archive.add(".") # we've stored the archive image in this script's folder
102+
103+
md_file = "sample.md"
104+
md_doc = pymupdf.open( # open the Markdown document
105+
md_file,
106+
archive=archive, # where to look for resources (fonts, images)
107+
rect=pymupdf.paper_rect("A4"), # page dimension ISO A4
108+
)
109+
110+
md_doc.apply_css(css)
111+
49112
50-
# Save to file
51-
with open("output.svg", "w", encoding="utf-8") as f:
52-
f.write(svg_content)
53113
54-
doc.close()
55114
56115
57116
PDF to Markdown
@@ -72,6 +131,31 @@ By utlilizing the :doc:`PyMuPDF4LLM API <pymupdf4llm/api>` we are able to conver
72131
pathlib.Path("4llm-output.md").write_bytes(md_text.encode())
73132
74133
134+
PDF to SVG
135+
~~~~~~~~~~~~~~~~~~
136+
137+
Technically, as SVG files cannot be multipage, we must export each page as an SVG.
138+
139+
To get an SVG representation of a page use the :meth:`Page.get_svg_image` method.
140+
141+
**Example**
142+
143+
.. code-block:: python
144+
145+
import pymupdf
146+
147+
doc = pymupdf.open("input.pdf")
148+
page = doc[0]
149+
150+
# Convert page to SVG
151+
svg_content = page.get_svg_image()
152+
153+
# Save to file
154+
with open("output.svg", "w", encoding="utf-8") as f:
155+
f.write(svg_content)
156+
157+
doc.close()
158+
75159
PDF to DOCX
76160
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
77161

docs/document.rst

Lines changed: 43 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Document
1010

1111
This class represents a document. It can be constructed from a file or from memory.
1212

13-
There exists the alias *open* for this class, i.e. `pymupdf.Document(...)` and `pymupdf.open(...)` do exactly the same thing.
13+
There is an alias :meth:`open` for this class, i.e. `pymupdf.Document(...)` and `pymupdf.open(...)` do exactly the same thing.
1414

1515
For details on **embedded files** refer to Appendix 3.
1616

@@ -29,6 +29,7 @@ For details on **embedded files** refer to Appendix 3.
2929
======================================= ==========================================================
3030
:meth:`Document.add_layer` PDF only: make new optional content configuration
3131
:meth:`Document.add_ocg` PDF only: add new optional content group
32+
:meth:`Document.apply_css` Markdown only: apply CSS stylesheet to Markdown content
3233
:meth:`Document.authenticate` gain access to an encrypted document
3334
:meth:`Document.bake` PDF only: make annotations / fields permanent content
3435
:meth:`Document.can_save_incrementally` check if incremental save is possible
@@ -169,7 +170,7 @@ For details on **embedded files** refer to Appendix 3.
169170
pair: rect; Document
170171
pair: fontsize; Document
171172

172-
.. method:: __init__(self, filename=None, stream=None, *, filetype=None, rect=None, width=0, height=0, fontsize=11)
173+
.. method:: __init__(self, filename=None, stream=None, filetype=None, archive=None, rect=None, width=0, height=0, fontsize=11)
173174

174175
Create a ``Document`` object.
175176

@@ -183,11 +184,13 @@ For details on **embedded files** refer to Appendix 3.
183184

184185
:arg str filetype: A string specifying the type of document. This is only ever needed when file content inspection fails. Text types like "txt", "html", "xml" etc. cannot be disambiguated by their content. When such files are provided in memory or being provided with the wrong file extension, this parameter **must** be used.
185186

186-
:arg rect_like rect: a rectangle specifying the desired page size. This parameter is only meaningful for documents with a variable page layout ("reflowable" documents), like e-books or HTML, and ignored otherwise. If specified, it must be a non-empty, finite rectangle with top-left coordinates (0, 0). Together with parameter :data:`fontsize`, each page will be accordingly laid out and hence also determine the number of pages.
187+
:arg Archive archive: An optional :ref:`Archive` object to use as a source for resources like fonts and images.
187188

188-
:arg float width: may used together with ``height`` as an alternative to ``rect`` to specify layout information.
189+
:arg rect_like rect: A rectangle specifying the desired page size. This parameter is only meaningful for documents with a variable page layout ("reflowable" documents), like e-books, MD or HTML, and ignored otherwise. If specified, it must be a non-empty, finite rectangle with top-left coordinates (0, 0). Together with parameter :data:`fontsize`, each page will be accordingly laid out and hence also determine the number of pages.
189190

190-
:arg float height: may used together with ``width`` as an alternative to ``rect`` to specify layout information.
191+
:arg float width: May be used together with ``height`` as an alternative to ``rect`` to specify layout information.
192+
193+
:arg float height: May be used together with ``width`` as an alternative to ``rect`` to specify layout information.
191194

192195
:arg float fontsize: the default :data:`fontsize` for reflowable document types. This parameter is ignored if none of the parameters ``rect`` or ``width`` and ``height`` are specified. Will be used to calculate the page layout.
193196

@@ -201,24 +204,29 @@ For details on **embedded files** refer to Appendix 3.
201204

202205
In case of problems you can see more detail in the internal messages store: `print(pymupdf.TOOLS.mupdf_warnings())` (which will be emptied by this call, but you can also prevent this -- consult :meth:`Tools.mupdf_warnings`).
203206

204-
Overview of possible forms, note: `open` is a synonym of `Document`::
205207

206-
>>> # from a file
207-
>>> doc = pymupdf.open("some.xps")
208-
>>> # handle wrong extension
209-
>>> doc = pymupdf.open("some.file", filetype="xps") # assert expected type
210-
>>> doc = pymupdf.open("some.file", filetype="txt") # treat as plain text
211-
>>>
212-
>>> # from memory
213-
>>> doc = pymupdf.open(stream=mem_area) # works for any supported type
214-
>>> doc = pymupdf.open(stream=unknown-type, filetype="txt") # treat as plain text
215-
>>>
216-
>>> # new empty PDF
217-
>>> doc = pymupdf.open()
218-
>>> doc = pymupdf.open(None)
219-
>>> doc = pymupdf.open("")
208+
Overview of possible forms, note: :meth:`open` is a synonym of :meth:`Document`::
209+
210+
# from a file
211+
doc = pymupdf.open("some.xps")
212+
# handle wrong extension
213+
doc = pymupdf.open("some.file", filetype="xps") # assert expected type
214+
doc = pymupdf.open("some.file", filetype="txt") # treat as plain text
215+
216+
# from memory
217+
doc = pymupdf.open(stream=mem_area) # works for any supported type
218+
doc = pymupdf.open(stream=unknown_type, filetype="txt") # treat as plain text
219+
220+
# new empty PDF
221+
doc = pymupdf.open()
222+
doc = pymupdf.open(None)
223+
doc = pymupdf.open("")
220224

221-
.. note:: Raster images with a wrong (but supported) file extension **are no problem**. MuPDF will determine the correct image type when file **content** is actually accessed and will process it without complaint.
225+
.. note::
226+
227+
Raster images with a wrong (but supported) file extension **are no problem**. MuPDF will determine the correct image type when file **content** is actually accessed and will process it without complaint.
228+
229+
See :ref:`supported file types <Supported_File_Types>` for more information.
222230

223231
The Document class can be also be used as a **context manager**. Exiting the content manager will close the document automatically.
224232

@@ -2030,6 +2038,20 @@ For details on **embedded files** refer to Appendix 3.
20302038
This is a normal PDF document with no usage restrictions whatsoever. If it is not being changed in any way, it can be used together with its journal to undo / redo operations or continue updating.
20312039

20322040

2041+
.. method:: apply_css(css, append=True)
2042+
2043+
* New in v1.28.0
2044+
2045+
Apply CSS styles to the document. This is a global operation, which means that the styles will be applied to all pages and all elements of the document. The CSS syntax is the same as for HTML documents, but only a subset of CSS properties is supported.
2046+
2047+
:arg str css: a string containing the CSS styles to be applied.
2048+
:arg bool append: whether to append the new styles to existing ones (if any) or to replace them.
2049+
2050+
.. note:: This method is primarily intended for use with :ref:`Markdown documents <Markdown_to_PDF>`.
2051+
2052+
2053+
2054+
20332055
.. attribute:: outline
20342056

20352057
Contains the first :ref:`Outline` entry of the document (or `None`). Can be used as a starting point to walk through all outline items. Accessing this property for encrypted, not authenticated documents will raise an *AttributeError*.

0 commit comments

Comments
 (0)