Skip to content

Commit 2f2b1ba

Browse files
Matthew HoroszowskiMatthew Horoszowski
authored andcommitted
docs: add user guide, analysis page, HISTORY entry; bump to 1.2.0
Phase 5 of customXml support per Plans/customxml-implementation-plan.md. Documentation ------------- - docs/user/custom-xml.rst — user guide. Shows the full CustomProperties Mapping API (with type-dispatch table and explicit set_* methods), CustomXmlParts collection API (add/lookup/mutate/remove), the add_string_blob helper, the presentation-vs-package scope choice, and round-trip-safety notes for PowerPoint, LibreOffice, and OnlyOffice. - docs/dev/analysis/customxml.rst — OOXML analysis. XML specimens for /docProps/custom.xml, /customXml/itemN.xml, and /customXml/itemPropsN.xml; the relationship-topology diagram for both scopes; design rationale for the application/xml content-type ambiguity (CT.XML stays unmapped; CustomXmlPart upgrades on enumerate via class swap) and for the _pptx_customxml_name_{guid} naming convention. - docs/index.rst and docs/dev/analysis/index.rst — toctree entries for the two new pages. HISTORY and version ------------------- - HISTORY.rst — 1.2.0 (2026-05-05) entry summarizing the feature: the Mapping wrapper, the Sequence wrapper with both topologies, the string-blob helper, and the third-party round-trip safety promise. - src/pptx/__init__.py — __version__ bumped from 1.0.2 to 1.2.0 (skipping 1.1.x since this fork's first release goes out as 1.2.0 per Plans/review-the-guide-at-swift-kahn.md PyPI plan). What still requires principal action ------------------------------------ - pyproject.toml distribution-name change to python-pptx-extended (held for the publishing pass; the version bump above is sufficient for now). - Manual PowerPoint UI matrix per plan section 5.4 (Mac, Windows, Lib- reOffice, OnlyOffice) — author can't drive PowerPoint. - Real third-party fixtures from SharePoint / Office.js / VSTO output — to capture during the manual matrix pass and add to tests/test_files/customxml/. - Tag, build (sdist + wheel), and publish to PyPI Trusted Publishing. Final state ----------- - 2986 tests pass (2956 baseline + 30 from Phase 4). - 96% aggregate coverage across the 6 new modules; 190 dedicated tests. - 5 commits on feature/customxml: plan + 4 implementation phases.
1 parent 2d8bb8f commit 2f2b1ba

6 files changed

Lines changed: 479 additions & 1 deletion

File tree

HISTORY.rst

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,33 @@
33
Release History
44
---------------
55

6+
1.2.0 (2026-05-05) — fork release
7+
+++++++++++++++++++++++++++++++++
8+
9+
This is a feature release for the ``python-pptx-extended`` fork. Adds
10+
first-class support for OOXML customXml — the mechanism Office.js,
11+
SharePoint, and VSTO add-ins use to embed structured application data in
12+
``.pptx`` files. See ``docs/user/custom-xml.rst`` for the user guide and
13+
``docs/dev/analysis/customxml.rst`` for the OOXML analysis.
14+
15+
- feature: ``Presentation.custom_properties`` — Mapping wrapper over
16+
``/docProps/custom.xml`` (Custom Document Properties; visible in
17+
PowerPoint's *File → Properties → Advanced* UI). Type-dispatched
18+
``__setitem__`` plus explicit ``set_string`` / ``set_int`` / ``set_float`` /
19+
``set_bool`` / ``set_datetime`` setters when Python type inference does the
20+
wrong thing.
21+
- feature: ``Presentation.custom_xml_parts`` — Sequence wrapper over the
22+
package's customXml data parts. ``add(xml, *, name=, datastoreItem_id=,
23+
schema_refs=, scope=)`` supports both presentation-scoped (Office.js
24+
default) and package-scoped (VSTO / SharePoint) topologies. Lookup via
25+
index, partname tail, ``by_guid(...)``, or ``by_name(...)``.
26+
- feature: ``CustomXmlParts.add_string_blob(name, content, mime_hint=,
27+
encoding=)`` — convenience for the common "embed a string verbatim and
28+
read it back" case (e.g. round-trip a markdown source document).
29+
- feature: round-trip safety with files written by other tools — PPTX files
30+
containing customXml parts authored by SharePoint, Office.js, or VSTO load
31+
and save without losing their content.
32+
633
1.0.2 (2024-08-07)
734
++++++++++++++++++
835

docs/dev/analysis/customxml.rst

Lines changed: 190 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,190 @@
1+
.. _CustomXml:
2+
3+
CustomXml and Custom Document Properties
4+
=========================================
5+
6+
Two distinct OOXML mechanisms support embedding application-specific data in
7+
a ``.pptx`` package:
8+
9+
1. **Custom Document Properties** at ``/docProps/custom.xml`` — visible in
10+
PowerPoint UI under *File → Properties → Advanced*. ECMA-376 Part 1 §15.2.12.
11+
2. **CustomXml data parts** at ``/customXml/itemN.xml`` paired with
12+
``/customXml/itemPropsN.xml`` — hidden from end users; the mechanism
13+
Office.js, SharePoint workflows, and VSTO add-ins use to embed structured
14+
data. ECMA-376 Part 1 §15.2.4.
15+
16+
17+
Custom Document Properties
18+
--------------------------
19+
20+
XML specimen
21+
~~~~~~~~~~~~
22+
23+
.. highlight:: xml
24+
25+
::
26+
27+
<?xml version='1.0' encoding='UTF-8' standalone='yes'?>
28+
<Properties
29+
xmlns="http://schemas.openxmlformats.org/officeDocument/2006/custom-properties"
30+
xmlns:vt="http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes">
31+
<property fmtid="{D5CDD505-2E9C-101B-9397-08002B2CF9AE}" pid="2" name="Source">
32+
<vt:lpwstr>deck-builder-cli@1.4.2</vt:lpwstr>
33+
</property>
34+
<property fmtid="{D5CDD505-2E9C-101B-9397-08002B2CF9AE}" pid="3" name="BuildNumber">
35+
<vt:i4>42</vt:i4>
36+
</property>
37+
<property fmtid="{D5CDD505-2E9C-101B-9397-08002B2CF9AE}" pid="4" name="IsDraft">
38+
<vt:bool>true</vt:bool>
39+
</property>
40+
<property fmtid="{D5CDD505-2E9C-101B-9397-08002B2CF9AE}" pid="5" name="GeneratedAt">
41+
<vt:filetime>2026-05-05T14:00:00Z</vt:filetime>
42+
</property>
43+
</Properties>
44+
45+
Notable details
46+
~~~~~~~~~~~~~~~
47+
48+
* The ``fmtid`` attribute is the same well-known GUID
49+
``{D5CDD505-2E9C-101B-9397-08002B2CF9AE}`` for every user-defined property.
50+
Office uses different FMTIDs for system-defined property sets (e.g. SharePoint
51+
fields), but |pp| writes the user-defined FMTID exclusively.
52+
* ``pid`` values 0 and 1 are reserved by the spec; user properties start at 2.
53+
|pp| auto-assigns the next free integer ≥ 2 within the part.
54+
* The typed value child belongs to the ``vt:`` namespace
55+
(``http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes``).
56+
Five types are supported: ``lpwstr`` (Unicode string), ``i4`` (32-bit signed
57+
int), ``r8`` (IEEE-754 double), ``bool``, and ``filetime``
58+
(ISO-8601 UTC, ``Z``-suffixed).
59+
60+
61+
CustomXml data parts
62+
--------------------
63+
64+
Each customXml entry is a **pair** of parts: one for the user's arbitrary XML
65+
payload and one for the metadata about it.
66+
67+
XML specimen — data part
68+
~~~~~~~~~~~~~~~~~~~~~~~~
69+
70+
The data part at ``/customXml/item1.xml`` carries arbitrary XML the application
71+
chose to embed. The root element name and namespace are caller-defined::
72+
73+
<?xml version='1.0' encoding='UTF-8' standalone='yes'?>
74+
<provenance xmlns="urn:my-app:provenance">
75+
<source>deck-builder-cli</source>
76+
<built-at>2026-05-05T14:00:00Z</built-at>
77+
</provenance>
78+
79+
The content type is ``application/xml`` — the OPC default for the ``xml``
80+
extension, so no per-part Override entry is written into ``[Content_Types].xml``.
81+
82+
XML specimen — itemProps part
83+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
84+
85+
The sibling at ``/customXml/itemProps1.xml`` carries the ``datastoreItem`` GUID
86+
that uniquely identifies the data part across edits, plus an optional
87+
``schemaRefs`` list declaring the namespaces the data part claims to conform
88+
to::
89+
90+
<?xml version='1.0' encoding='UTF-8' standalone='yes'?>
91+
<ds:datastoreItem
92+
xmlns:ds="http://schemas.openxmlformats.org/officeDocument/2006/customXml"
93+
ds:itemID="{1A2B3C4D-5E6F-7890-ABCD-EF1234567890}">
94+
<ds:schemaRefs>
95+
<ds:schemaRef ds:uri="urn:my-app:provenance"/>
96+
</ds:schemaRefs>
97+
</ds:datastoreItem>
98+
99+
Content type ``application/vnd.openxmlformats-officedocument.customXmlProperties+xml``
100+
is written as an Override entry for this partname.
101+
102+
Relationship topology
103+
~~~~~~~~~~~~~~~~~~~~~
104+
105+
The data part's relationship can be rooted in either the package or the
106+
presentation::
107+
108+
PRESENTATION-SCOPED (default; what Office.js writes)
109+
────────────────────────────────────────────────────
110+
/ppt/_rels/presentation.xml.rels
111+
└─ Type=customXml ─▶ /customXml/item1.xml
112+
└─ /customXml/_rels/item1.xml.rels
113+
└─ Type=customXmlProps ─▶ /customXml/itemProps1.xml
114+
115+
116+
PACKAGE-SCOPED (VSTO / SharePoint topology)
117+
───────────────────────────────────────────
118+
/_rels/.rels
119+
└─ Type=customXml ─▶ /customXml/item1.xml
120+
└─ /customXml/_rels/item1.xml.rels
121+
└─ Type=customXmlProps ─▶ /customXml/itemProps1.xml
122+
123+
The two scopes are not interchangeable — Office.js's ``customXmlParts``
124+
collection only enumerates presentation-scoped parts (see this
125+
`Microsoft Q&A response
126+
<https://learn.microsoft.com/en-us/answers/questions/5586825/how-to-add-a-proper-customxml-to-a-powerpoint-pres>`_).
127+
128+
|pp| defaults to presentation-scoped to match Office.js. The
129+
``scope="package"`` parameter on
130+
:meth:`pptx.custom_xml.CustomXmlParts.add` is the override hatch for VSTO /
131+
SharePoint compatibility.
132+
133+
134+
Design decisions
135+
----------------
136+
137+
The ``application/xml`` content-type ambiguity
138+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
139+
140+
``PartFactory.part_type_for`` keys on content type alone, but ``application/xml``
141+
is the catch-all default for the ``xml`` extension — every customXml data part
142+
shares it with potentially-unrelated XML parts in third-party packages.
143+
144+
|pp| chooses to **not** register :class:`CustomXmlPart` against ``application/xml``.
145+
Loaded data parts arrive as base ``Part`` instances; the
146+
:class:`CustomXmlParts` collection upgrades them to :class:`CustomXmlPart`
147+
in-place via ``__class__`` swap on first enumeration. This avoids accidentally
148+
promoting unrelated ``application/xml`` parts in third-party packages.
149+
150+
The custom-name storage convention
151+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
152+
153+
OOXML does not define a "name" attribute on customXml parts. To support
154+
``custom_xml_parts.by_name("provenance")``, |pp| stores user-assigned names
155+
as reserved entries in the custom document properties part keyed by the
156+
data part's ``datastoreItem`` GUID:
157+
158+
::
159+
160+
<op:property name="_pptx_customxml_name_{1A2B...}" pid="...">
161+
<vt:lpwstr>provenance</vt:lpwstr>
162+
</op:property>
163+
164+
This is lossless, round-trips through PowerPoint, and requires no schema
165+
invention. The reserved entries are visible in PowerPoint's
166+
*File → Properties → Advanced* UI by design — what the user sees in the app
167+
matches what the Python API exposes.
168+
169+
Round-trip safety with third-party tools
170+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
171+
172+
PowerPoint 365 (Mac and Windows) preserves both topologies across edits.
173+
LibreOffice historically preserves package-scoped parts but is less
174+
consistent with presentation-scoped data parts. OnlyOffice / DocumentServer
175+
strips customXml on save in some versions
176+
(`OnlyOffice issue #1564 <https://github.com/ONLYOFFICE/DocumentServer/issues/1564>`_).
177+
178+
|pp| preserves any customXml part it loads, including those it did not
179+
author — files saved by SharePoint, Office.js, or VSTO add-ins load and save
180+
without losing their customXml content.
181+
182+
183+
References
184+
----------
185+
186+
* `ECMA-376 Part 1, §15.2.4 — Custom XML Data Storage Part <https://ecma-international.org/publications-and-standards/standards/ecma-376/>`_
187+
* `ECMA-376 Part 1, §15.2.12 — Custom File Properties Part <https://ecma-international.org/publications-and-standards/standards/ecma-376/>`_
188+
* `MS Q&A on presentation- vs. package-scoped customXml topology <https://learn.microsoft.com/en-us/answers/questions/5586825/how-to-add-a-proper-customxml-to-a-powerpoint-pres>`_
189+
* `Office.js CustomXmlPart API <https://learn.microsoft.com/en-us/javascript/api/office/office.customxmlpart>`_
190+
* `python-docx-oss custom-xml docs <https://python-docx-oss.readthedocs.io/en/latest/user/custom-xml.html>`_ (the docx-equivalent pattern, which |pp|'s API mirrors)

docs/dev/analysis/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,7 @@ Package
143143
:maxdepth: 1
144144

145145
pkg-coreprops
146+
customxml
146147
enumerations
147148

148149

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ User Guide
6565
user/charts
6666
user/table
6767
user/notes
68+
user/custom-xml
6869
user/use-cases
6970
user/concepts
7071

0 commit comments

Comments
 (0)