feat(text): Field authoring — _Paragraph.add_field, _Field class, CT_TextField.text setter (Phase 3)

MHoroszowski · MHoroszowski · commit 7420bb5b146a · 2026-05-13T19:08:26.000-04:00
Public Python API for the headers/footers/slide-numbers/dates epic (#20). Phase 3 adds the field-authoring surface that lets users create auto-updating slide numbers, dates, and other PowerPoint-resolved fields inside any paragraph. Builds on Phase 1 (PR #48) OOXML primitives and Phase 2 (PR #49) slide/master public API. Design source: scanny#797 ("Added a:fld type to paragraphs for page numbers and datetimes"). Manually ported — per CLAUDE.md §2, this fork's master had a repo-wide ruff format pass (PR #10) while upstream did not, so cherry-pick conflicts on whitespace across every touched file. Semantic diff re-derived against the current ruff- formatted, post-Phase-1 source. Changes: - pptx.oxml.simpletypes.ST_FieldType (NEW) — XsdString subclass for the `a:fld@type` attribute value, replacing the plain XsdString declaration Phase 1 used as a placeholder. - pptx.oxml.text.CT_TextField.text — read-only property from Phase 1 now has a setter. Writes through get_or_add_t() and routes the value through CT_TextField._escape_ctrl_chars (NEW static method) which replaces chars in `[\x00-\x08\x0B-\x1F]` with `_xNNNN_` uppercase-hex form per OOXML §22.9.2.19, leaving `\t` (0x09) and `\n` (0x0A) alone. - pptx.oxml.text.CT_TextParagraph.fld — ZeroOrMore("a:fld", successors= ("a:endParaRPr",)) accessor; the `a:pPr` successor tuple already named `a:fld` per Phase 1 (forward declaration). xmlchemy auto-generates `_add_fld()` from the ZeroOrMore. - pptx.text.text._Field (NEW) — public-via-add_field-return-value class wrapping `<a:fld>`. Leading-underscore private name matches `_Run` and `_Paragraph`. Properties: `font` (Font wrapping rPr), `text` (read/ write, routes through the escaping setter), `type` (read/write, str | None). - pptx.text.text._Paragraph.add_field() (NEW) — appends a fresh `<a:fld>` with a uuid4 GUID id wrapped in braces, uppercase hex — matches what PowerPoint's "Insert → Slide Number" writes. Returns a `_Field`; caller sets `type` and optionally `text`. The Run-style symmetry is deliberate: users who know `add_run()` should not have to learn a new pattern. Out of scope for Phase 3 (deliberate): - Field discovery during paragraph iteration — `p.runs` continues to yield only `_Run` objects. Phase 4 will surface `_Field` instances alongside, with a stable ordering rule. - HandoutMaster Python class and watermark helper — Phase 5. - MSO_FIELD_TYPE enum — `type` stays plain `str` for now to mirror scanny#797. An enum can land in a later cleanup once the canonical field-type list is settled. Verification (local, CPython 3.14.4): - python3 -m pytest tests/ -q → 3626 passed in 5.32s (+28 vs Phase 2 baseline) - 14 new tests in tests/oxml/test_text.py (CT_TextField setter + _escape_ctrl_chars + CT_TextParagraph.add_fld) - 14 new tests in tests/text/test_text.py (Describe_Field ×10 + Describe_Paragraph_add_field ×4) - python3 -m ruff check src tests → All checks passed! - python3 -m ruff format --check src tests → 216 files already formatted - python3 -m behave features/ --no-color → 1048 scenarios, 0 failed - python3 uat/uat_headers_footers_phase3.py → PASS (full <a:fld> with id, type, and text round-tripped through save+reopen; GUID preserved byte-for-byte at {2ED44585-07B5-4BC8-93B2-49122D50BCC2}) Refs #20.
diff --git a/src/pptx/oxml/simpletypes.py b/src/pptx/oxml/simpletypes.py
@@ -368,6 +368,18 @@ class ST_Extension(XsdString):
     pass
 
 
+class ST_FieldType(XsdString):
+    """Field-type token on `<a:fld type="...">` per ECMA-376 §A.4.1.
+
+    Values are PowerPoint-defined strings such as `slidenum`, `datetime1` ..
+    `datetime13`, and `title`. Type is intentionally permissive (a plain
+    string) — the schema itself does not enumerate the values, and
+    PowerPoint accepts any token.
+    """
+
+    pass
+
+
 class ST_GapAmount(BaseIntType):
     """
     String value is an integer in range 0-500, representing a percent,
diff --git a/src/pptx/oxml/text.py b/src/pptx/oxml/text.py
@@ -18,6 +18,7 @@
 from pptx.oxml.ns import nsdecls
 from pptx.oxml.simpletypes import (
     ST_Coordinate32,
+    ST_FieldType,
     ST_TextFontScalePercentOrPercentString,
     ST_TextFontSize,
     ST_TextIndentLevelType,
@@ -339,6 +340,7 @@ class CT_TextField(BaseOxmlElement):
     """
 
     get_or_add_rPr: Callable[[], CT_TextCharacterProperties]
+    get_or_add_t: Callable[[], BaseOxmlElement]
 
     rPr: CT_TextCharacterProperties | None = ZeroOrOne(  # pyright: ignore[reportAssignmentType]
         "a:rPr", successors=("a:pPr", "a:t")
@@ -348,7 +350,7 @@ class CT_TextField(BaseOxmlElement):
     )
     id: str = RequiredAttribute("id", XsdString)  # pyright: ignore[reportAssignmentType]
     type: str | None = OptionalAttribute(  # pyright: ignore[reportAssignmentType]
-        "type", XsdString
+        "type", ST_FieldType
     )
 
     @property
@@ -359,6 +361,28 @@ def text(self) -> str:  # pyright: ignore[reportIncompatibleMethodOverride]
             return ""
         return t.text or ""
 
+    @text.setter
+    def text(self, value: str):  # pyright: ignore[reportIncompatibleMethodOverride]
+        """Replace the text of the `a:t` child, escaping control chars.
+
+        Adds an `a:t` child element if not already present. Characters in the
+        ASCII control range 0x00-0x08 and 0x0B-0x1F (everything except `\\t`
+        and `\\n`) are replaced with their `_xNNNN_` plain-text escape per
+        OOXML §22.9.2.19, matching the behavior of `CT_RegularTextRun.text`.
+        """
+        t = self.get_or_add_t()
+        t.text = self._escape_ctrl_chars(value)
+
+    @staticmethod
+    def _escape_ctrl_chars(s: str) -> str:
+        """Return str after replacing each control character with a plain-text escape.
+
+        For example, a BEL character (x07) would appear as "_x0007_". Horizontal-tab
+        (x09) and line-feed (x0A) are not escaped. All other characters in the range
+        x00-x1F are escaped.
+        """
+        return re.sub(r"([\x00-\x08\x0B-\x1F])", lambda match: "_x%04X_" % ord(match.group(1)), s)
+
 
 class CT_TextFont(BaseOxmlElement):
     """Custom element class for `a:latin`, `a:ea`, `a:cs`, and `a:sym`.
@@ -403,13 +427,15 @@ class CT_TextParagraph(BaseOxmlElement):
     get_or_add_pPr: Callable[[], CT_TextParagraphProperties]
     r_lst: list[CT_RegularTextRun]
     _add_br: Callable[[], CT_TextLineBreak]
+    _add_fld: Callable[[], CT_TextField]
     _add_r: Callable[[], CT_RegularTextRun]
 
     pPr: CT_TextParagraphProperties | None = ZeroOrOne(  # pyright: ignore[reportAssignmentType]
         "a:pPr", successors=("a:r", "a:br", "a:fld", "a:endParaRPr")
     )
     r = ZeroOrMore("a:r", successors=("a:endParaRPr",))
     br = ZeroOrMore("a:br", successors=("a:endParaRPr",))
+    fld = ZeroOrMore("a:fld", successors=("a:endParaRPr",))
     endParaRPr: CT_TextCharacterProperties | None = ZeroOrOne("a:endParaRPr", successors=())  # pyright: ignore[reportAssignmentType]
 
     def add_br(self) -> CT_TextLineBreak:
diff --git a/src/pptx/text/text.py b/src/pptx/text/text.py
@@ -2,6 +2,7 @@
 
 from __future__ import annotations
 
+import uuid
 from typing import TYPE_CHECKING, Iterator, cast
 
 from pptx.dml.fill import FillFormat
@@ -33,6 +34,7 @@
         CT_RegularTextRun,
         CT_TextBody,
         CT_TextCharacterProperties,
+        CT_TextField,
         CT_TextParagraph,
         CT_TextParagraphProperties,
     )
@@ -582,6 +584,21 @@ def add_run(self) -> _Run:
         r = self._p.add_r()
         return _Run(r, self)
 
+    def add_field(self) -> _Field:
+        """Return a new |_Field| appended after the paragraph's existing content.
+
+        The new ``<a:fld>`` element is given a fresh RFC-4122 v4 GUID `id`
+        wrapped in braces, with uppercase hex — matching the authoring format
+        PowerPoint emits when the user runs *Insert → Slide Number* or
+        *Insert → Date and Time*. The caller is expected to set `type` (e.g.
+        `"slidenum"`, `"datetime1"`) and optionally `text` (the placeholder
+        glyph PowerPoint displays for the field before it resolves the live
+        value) on the returned `_Field`.
+        """
+        f = self._p._add_fld()
+        f.id = "{%s}" % str(uuid.uuid4()).upper()
+        return _Field(f, self)
+
     @property
     def alignment(self) -> PP_PARAGRAPH_ALIGNMENT | None:
         """Horizontal alignment of this paragraph.
@@ -888,3 +905,62 @@ def text(self):
     @text.setter
     def text(self, text: str):
         self._r.text = text
+
+
+class _Field(Subshape):
+    """Field object. Corresponds to ``<a:fld>`` child element in a paragraph.
+
+    A field renders text whose value PowerPoint resolves at slide-show or open
+    time — slide numbers, the current date, the slide title, etc. The literal
+    text written to the ``<a:t>`` child is the placeholder PowerPoint shows
+    before it resolves the live value; users typically pass a glyph like
+    ``"‹#›"`` for slide numbers or the current date as a static fallback.
+
+    Not intended to be constructed directly — obtain instances from
+    :meth:`_Paragraph.add_field`.
+    """
+
+    def __init__(self, f: CT_TextField, parent: ProvidesPart):
+        super(_Field, self).__init__(parent)
+        self._f = f
+
+    @property
+    def font(self) -> Font:
+        """|Font| instance for the run-level character properties of this field.
+
+        Character properties can be and perhaps most often are inherited from
+        parent objects such as the paragraph and slide layout the field is
+        contained in. Only those specifically overridden at the field level
+        are contained in the font object.
+        """
+        rPr = self._f.get_or_add_rPr()
+        return Font(rPr)
+
+    @property
+    def text(self) -> str:
+        """Read/write. A unicode string containing the field's placeholder text.
+
+        Assignment replaces all text in the field. Control characters other
+        than tab or newline are escaped as a hex representation. For example,
+        ESC (ASCII 27) is escaped as ``"_x001B_"``.
+        """
+        return self._f.text
+
+    @text.setter
+    def text(self, text: str):
+        self._f.text = text
+
+    @property
+    def type(self) -> str | None:
+        """Read/write. The field's ``type`` attribute, e.g. ``"slidenum"``.
+
+        ECMA-376 §A.4.1 names the well-known types: ``slidenum``,
+        ``datetime1`` .. ``datetime13``, and ``title``. The OOXML schema
+        itself treats the value as a permissive string. Returns |None| when
+        no ``type`` attribute is present.
+        """
+        return self._f.type
+
+    @type.setter
+    def type(self, value: str | None):
+        self._f.type = value
diff --git a/tests/oxml/test_text.py b/tests/oxml/test_text.py
@@ -7,9 +7,9 @@
 import pytest
 
 from pptx.exc import InvalidXmlError
-from pptx.oxml.text import CT_TextField
+from pptx.oxml.text import CT_TextField, CT_TextParagraph
 
-from ..unitutil.cxml import element
+from ..unitutil.cxml import element, xml
 
 
 class DescribeCT_TextField(object):
@@ -51,3 +51,106 @@ def it_returns_empty_string_for_text_when_a_t_is_absent(self):
     def it_reads_the_text_of_its_a_t_child(self):
         fld = cast(CT_TextField, element('a:fld{id=foo,type=slidenum}/a:t"42"'))
         assert fld.text == "42"
+
+    def it_adds_an_a_t_child_on_text_assignment_when_absent(self):
+        fld = cast(CT_TextField, element("a:fld{id=foo,type=slidenum}"))
+        assert fld.t is None
+
+        fld.text = "‹#›"
+
+        assert fld.t is not None
+        assert fld.text == "‹#›"
+        assert fld.xml == xml('a:fld{id=foo,type=slidenum}/a:t"‹#›"')
+
+    def it_replaces_existing_a_t_content_on_text_assignment(self):
+        fld = cast(CT_TextField, element('a:fld{id=foo,type=slidenum}/a:t"old"'))
+
+        fld.text = "new"
+
+        assert fld.text == "new"
+        # ---only one a:t child is present; assignment replaces, not appends---
+        assert len(fld.findall("{http://schemas.openxmlformats.org/drawingml/2006/main}t")) == 1
+
+    @pytest.mark.parametrize(
+        ("input_value", "expected_a_t_text"),
+        [
+            ("hello", "hello"),
+            ("a\x07b", "a_x0007_b"),  # BEL escapes
+            ("tab\there", "tab\there"),  # tab pass-through
+            ("line1\nline2", "line1\nline2"),  # newline pass-through
+            ("esc\x1bhere", "esc_x001B_here"),  # ESC escapes, uppercase hex
+            ("", ""),
+        ],
+    )
+    def it_escapes_control_chars_when_assigning_text(
+        self, input_value: str, expected_a_t_text: str
+    ):
+        fld = cast(CT_TextField, element("a:fld{id=foo,type=slidenum}"))
+
+        fld.text = input_value
+
+        # ---round-trip: reading back returns the escaped form because the
+        # ---escape is permanent storage form, not a presentation layer---
+        assert fld.text == expected_a_t_text
+
+    def it_escapes_BEL_to_uppercase_hex_via__escape_ctrl_chars(self):
+        # ---BEL is x07; expected escape is "_x0007_" with uppercase hex---
+        assert CT_TextField._escape_ctrl_chars("ring\x07bell") == "ring_x0007_bell"
+
+    def it_passes_tab_and_newline_through__escape_ctrl_chars(self):
+        # ---x09 (HT) and x0A (LF) are explicitly excluded from the escape
+        # ---range per OOXML §22.9.2.19; all other x00..x1F characters escape.
+        assert CT_TextField._escape_ctrl_chars("a\tb\nc") == "a\tb\nc"
+
+        # ---verify x0B (VT) and x1F (US, the highest in-range value) DO escape
+        assert CT_TextField._escape_ctrl_chars("\x0b") == "_x000B_"
+        assert CT_TextField._escape_ctrl_chars("\x1f") == "_x001F_"
+
+
+class DescribeCT_TextParagraph(object):
+    """Unit-test suite for `pptx.oxml.text.CT_TextParagraph` field accessor."""
+
+    def it_can_add_an_a_fld_via__add_fld(self):
+        p = cast(CT_TextParagraph, element("a:p"))
+
+        fld = p._add_fld()
+
+        assert isinstance(fld, CT_TextField)
+        assert len(p.fld_lst) == 1
+        assert p.fld_lst[0] is fld
+
+    def it_appends_a_fld_after_existing_runs_in_document_order(self):
+        # ---fld successors=("a:endParaRPr",) places it after a:r and a:br,
+        # ---and before a:endParaRPr. Verifying the post-r position confirms
+        # ---xmlchemy honored the successors tuple correctly.
+        p = cast(CT_TextParagraph, element('a:p/(a:r/a:t"x",a:endParaRPr)'))
+
+        fld = p._add_fld()
+        fld.id = "fld-1"
+
+        # ---walk children of <a:p>; ignoring pPr (none here), expect:
+        # ---a:r, a:fld, a:endParaRPr in that order
+        tags = [child.tag.split("}")[-1] for child in p]
+        assert tags == ["r", "fld", "endParaRPr"]
+
+    def it_returns_all_fld_children_via_fld_lst(self):
+        p = cast(CT_TextParagraph, element("a:p"))
+
+        fld_a = p._add_fld()
+        fld_b = p._add_fld()
+        fld_c = p._add_fld()
+
+        assert p.fld_lst == [fld_a, fld_b, fld_c]
+
+    def it_includes_a_fld_in_content_children(self):
+        # ---content_children must surface a:r, a:br, and a:fld in document order
+        # ---so _Paragraph.text concatenates field text alongside run text.
+        p = cast(CT_TextParagraph, element("a:p"))
+        p.add_r("before")
+        fld = p._add_fld()
+        fld.id = "fld-1"
+        fld.text = "[N]"
+        p.add_r("after")
+
+        texts = [child.text for child in p.content_children]
+        assert texts == ["before", "[N]", "after"]
diff --git a/tests/text/test_text.py b/tests/text/test_text.py