Skip to content

XML to PDF Converter Library

Rafael JPD edited this page Oct 2, 2025 · 1 revision

Usage as a Library

This document describes how to use the PDF generator module, from XML, provided by the packtools library. The first step is to obtain the base input for this conversion: a DOCX file containing the formatting and layout standards to be applied. This file is an intermediate document populated by procedures that extract data from the XML file. Finally, the resulting DOCX is converted into a PDF using LibreOffice.

Download the default SciELO layout file from the repository packtools/tests.

The following code snippet shows a usage example for converting the XML file located at /home/user/my_file.xml:

# Load the pipeline that reads an XML file and converts it to PDF from DOCX
from packtools.sps.formats.pdf.pipeline import docx

# Load utility to transform XML file into a tree structure compatible with the pipeline
from packtools.sps.utils import xml_utils

# Load utility to convert a DOCX file into a PDF file
from packtools.sps.formats.pdf.utils import file_utils

# Define the path of the XML file to be converted
xml_file_path = "/home/user/my_file.xml"

# Transform the XML file into a tree structure (etree)
xml_tree = xml_utils.get_xml_tree(xml_file_path)

# Create a dictionary to provide the layout file path, LibreOffice binary, and assets_dir
params = {
    'base layout': '/home/user/layout.docx',
    'libreoffice_binary': 'libreoffice',
}
params.setdefault('assets_dir', os.path.dirname(os.path.abspath(xml_file_path)))

# Create a Document object (from the python-docx library) from the XML file
docx_document = docx.pipeline_docx(xml_tree, data=params)

# Save the DOCX file somewhere on disk
docx.save("/home/user/my_file.docx")

# Convert the DOCX file into PDF format
file_utils.convert_docx_to_pdf(
    "/home/user/my_file.docx", 
    libreoffice_binary=params.get('libreoffice_binary', 'libreoffice')
)

# In this routine, a PDF file is created at "/home/my_file.pdf"

Clone this wiki locally