Skip to content

Commit fcf7121

Browse files
Matthew HoroszowskiMatthew Horoszowski
authored andcommitted
feat: Font.color non-mutation + UTC-aware datetimes + Shapes.by_name (Modernization Phase 2)
Issue: #29 (Phase 2) Phase 1 (PR #39) shipped PathLike + PERCENT_40 typo + Slide.background fix. This PR closes three nagging upstream tickets in a single bundle, all covered by issue #29's "bug fixes that pollute every other epic if left unfixed" group plus the most-cited shape-lookup ergonomic from the API ergonomics group. Bug fixes - `Font.color` getter is now non-mutating. Prior implementation called `self.fill.solid()` on every read, inserting `<a:solidFill/>` into the run's `<a:rPr>` element — meaning *reading* a font's color silently modified the document. New implementation returns a `_LazyFontColorFormat` proxy that mirrors `ColorFormat`'s public surface; reads (`type`, `rgb`, `theme_color`, `brightness`, `transparency`) return |None| / inherit values without writing; the setter path materializes `<a:solidFill/>` lazily on first write and delegates to the real `ColorFormat`. Closes scanny#1111 and scanny#1074. - W3CDTF datetime parser returns tz-aware datetimes when the source string carries timezone information. `Z` suffix → UTC; numeric offsets like `-08:00` or `+05:30` → fixed-offset `datetime.timezone`. Strings with no offset (year-only, year-month, year-month-day, or bare timestamp) continue to return naive datetimes — we don't assume a timezone where none was given. The setter accepts both naive and tz-aware inputs; tz-aware inputs are converted to UTC via `astimezone(timezone.utc)` before serialization so the on-disk W3CDTF form always uses the canonical `YYYY-MM-DDTHH:MM:SSZ` shape. Closes scanny#957. Sign-convention fix: the prior `_offset_dt` had `sign_factor = -1 if sign == "+" else 1` (inverted vs. POSIX/W3CDTF convention) and *adjusted the clock values*; the new `_tzinfo_from_offset_str` uses `sign_factor = 1 if sign == "+" else -1` and returns a `tzinfo` instance, leaving the clock values intact. End-to-end behavior is preserved when callers convert via `astimezone(utc)` — `'2024-01-15 T10:30:00-08:00'` still represents the same instant (18:30 UTC). API ergonomics - `_BaseShapes.by_name(name)` lookup helper. Returns the first shape in document order whose `.name` matches `name` (case-sensitive, matching PowerPoint's behavior). Raises `KeyError` with a clear message on miss. Defined on `_BaseShapes` so it inherits to `SlideShapes`, `SlideLayoutShapes`, `SlideMasterShapes`, etc. Closes scanny#798, scanny#309, and scanny#532. Behavior changes (intentional, documented in the PR body) - Reading `font.color` no longer inserts `<a:solidFill/>`. Code that relied on the mutation as a side-effect needs to call `font.fill.solid()` explicitly. - Reading `core_properties.created` / `last_printed` / `modified` on a document whose XML carries `Z` or numeric offset markers now returns a tz-aware datetime. Naive-input expectations must update. Tests - 27 new pytest cases in `tests/test_modernization_phase2.py` covering byte-stable XML on Font.color reads, lazy materialization on first set, datetime parser tz-awareness across all five W3CDTF input forms, round-trip through save/reload, and the three by_name paths (hit / miss / case-sensitivity / multi-match / inheritance to layouts and masters). Two existing test fixtures updated to expect tz-aware values where the input strings carry `Z`. Full pytest: `3453 passed`. - 5 new behave scenarios in `features/modernization-phase2.feature` (Font.color non-mutation byte-diff, lazy materialization, tz-aware round-trip, by_name match, by_name miss). Existing `features/steps/coreprops.py` updated to use tz-aware datetimes for the now-tz-aware reload values. Full behave: `1041 scenarios passed, 0 failed`. - Ruff: `ruff check src tests` → All checks passed; `ruff format --check` → no diff. Skipped from issue #29 Phase 2 (deferred or no-op) - `collections.abc` import path migration (would close scanny#771): already complete in this fork; verified via grep across `src/`. - `iter_leaf_shapes`, `Mapping` ABC, `find_by_xpath`, selection-pane order listing: deferred to a future Phase 4 to keep Phase 2 focused on bug fixes. Refs #29
1 parent edf9d56 commit fcf7121

9 files changed

Lines changed: 634 additions & 37 deletions

File tree

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
Feature: Modernization & Ergonomics Phase 2 — bug fixes + by_name
2+
In order to inspect fonts without polluting the document, round-trip datetimes faithfully, and look up shapes by name
3+
As a developer using python-pptx
4+
I need a non-mutating Font.color getter, tz-aware core-property datetimes, and Shapes.by_name(name)
5+
6+
7+
Scenario: Reading font.color does not insert a:solidFill
8+
Given a fresh slide with a title placeholder
9+
When I read run.font.color.rgb on an unstyled run
10+
Then the underlying rPr XML is unchanged from before the access
11+
12+
13+
Scenario: Setting font.color.rgb materializes a:solidFill lazily
14+
Given a fresh slide with a title placeholder
15+
When I set run.font.color.rgb to RGBColor(0xFF, 0x00, 0x00)
16+
Then the underlying rPr now contains an a:solidFill child
17+
And run.font.color.rgb reads back as FF0000
18+
19+
20+
Scenario: Tz-aware core_properties.created round-trips faithfully
21+
Given a fresh presentation for core-property datetimes
22+
When I set core_properties.created to a tz-aware UTC datetime
23+
Then the reloaded core_properties.created is tz-aware
24+
25+
26+
Scenario: shapes.by_name returns the matching shape
27+
Given a fresh slide with a title placeholder
28+
Then shapes.by_name("Title 1") returns the title shape
29+
30+
31+
Scenario: shapes.by_name raises KeyError on miss
32+
Given a fresh slide with a title placeholder
33+
Then shapes.by_name("Bogus") raises KeyError

features/steps/coreprops.py

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
from __future__ import annotations
44

5-
from datetime import datetime, timedelta
5+
from datetime import datetime, timedelta, timezone
66

77
from behave import given, then, when
88
from helpers import no_core_props_pptx_path, saved_pptx_path
@@ -28,18 +28,21 @@ def step_when_open_presentation_with_no_core_props_part(context):
2828

2929
@when("I set the core properties to valid values")
3030
def step_when_set_core_doc_props_to_valid_values(context):
31+
# ---issue #29 Phase 2: datetimes round-trip as tz-aware UTC values, so
32+
# ---the input fixture is already tz-aware UTC for an apples-to-apples
33+
# ---comparison with the reloaded value.
3134
context.propvals = (
3235
("author", "Creator"),
3336
("category", "Category"),
3437
("comments", "Description"),
3538
("content_status", "Content Status"),
36-
("created", datetime(2013, 6, 15, 12, 34, 56)),
39+
("created", datetime(2013, 6, 15, 12, 34, 56, tzinfo=timezone.utc)),
3740
("identifier", "Identifier"),
3841
("keywords", "key; word; keyword"),
3942
("language", "Language"),
4043
("last_modified_by", "Last Modified By"),
41-
("last_printed", datetime(2013, 6, 15, 12, 34, 56)),
42-
("modified", datetime(2013, 6, 15, 12, 34, 56)),
44+
("last_printed", datetime(2013, 6, 15, 12, 34, 56, tzinfo=timezone.utc)),
45+
("modified", datetime(2013, 6, 15, 12, 34, 56, tzinfo=timezone.utc)),
4346
("revision", 9),
4447
("subject", "Subject"),
4548
# --- exercise unicode-text case for Python 2.7 ---
@@ -60,8 +63,10 @@ def step_then_a_core_props_part_with_def_vals_is_added(context):
6063
assert core_props.last_modified_by == "python-pptx"
6164
assert core_props.revision == 1
6265
# core_props.modified only stores time with seconds resolution, so
63-
# comparison needs to be a little loose (within two seconds)
64-
modified_timedelta = datetime.utcnow() - core_props.modified
66+
# comparison needs to be a little loose (within two seconds). Issue #29
67+
# Phase 2 makes the parser return a tz-aware datetime; use a tz-aware
68+
# `now()` here so the subtraction is well-typed.
69+
modified_timedelta = datetime.now(timezone.utc) - core_props.modified
6570
max_expected_timedelta = timedelta(seconds=2)
6671
assert modified_timedelta < max_expected_timedelta
6772

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
"""Gherkin steps for Modernization Phase 2 (issue #29)."""
2+
3+
from __future__ import annotations
4+
5+
import datetime as dt
6+
import io
7+
8+
import pytest
9+
from behave import given, then, when
10+
from lxml import etree
11+
12+
from pptx import Presentation
13+
from pptx.dml.color import RGBColor
14+
15+
16+
# given ===================================================
17+
18+
19+
@given("a fresh slide with a title placeholder")
20+
def given_a_fresh_slide_with_title(context):
21+
prs = Presentation()
22+
slide = prs.slides.add_slide(prs.slide_layouts[1])
23+
context.prs = prs
24+
context.slide = slide
25+
26+
27+
@given("a fresh presentation for core-property datetimes")
28+
def given_a_fresh_presentation_for_core_props(context):
29+
context.prs = Presentation()
30+
31+
32+
# when ====================================================
33+
34+
35+
@when("I read run.font.color.rgb on an unstyled run")
36+
def when_read_font_color_unstyled(context):
37+
tf = context.slide.shapes.title.text_frame
38+
run = tf.paragraphs[0].add_run()
39+
run.text = "x"
40+
rPr = run._r.get_or_add_rPr()
41+
context.run = run
42+
context.rPr = rPr
43+
context.before_xml = etree.tostring(rPr)
44+
_ = run.font.color.rgb
45+
46+
47+
@when("I set run.font.color.rgb to RGBColor(0xFF, 0x00, 0x00)")
48+
def when_set_font_color_red(context):
49+
tf = context.slide.shapes.title.text_frame
50+
run = tf.paragraphs[0].add_run()
51+
run.text = "x"
52+
context.run = run
53+
context.rPr = run._r.get_or_add_rPr()
54+
run.font.color.rgb = RGBColor(0xFF, 0x00, 0x00)
55+
56+
57+
@when("I set core_properties.created to a tz-aware UTC datetime")
58+
def when_set_created_utc(context):
59+
context.target_dt = dt.datetime(2024, 7, 4, 17, 0, 0, tzinfo=dt.timezone.utc)
60+
context.prs.core_properties.created = context.target_dt
61+
62+
63+
# the "save and reload via stream" step is shared with tbl_styles.py
64+
65+
66+
# then ====================================================
67+
68+
69+
@then("the underlying rPr XML is unchanged from before the access")
70+
def then_rPr_unchanged(context):
71+
after = etree.tostring(context.rPr)
72+
assert context.before_xml == after, (context.before_xml, after)
73+
74+
75+
@then("the underlying rPr now contains an a:solidFill child")
76+
def then_rPr_has_solidFill(context):
77+
ns = "{http://schemas.openxmlformats.org/drawingml/2006/main}"
78+
assert context.rPr.find("%ssolidFill" % ns) is not None
79+
80+
81+
@then("run.font.color.rgb reads back as FF0000")
82+
def then_font_color_rgb_reads_FF0000(context):
83+
assert context.run.font.color.rgb == RGBColor(0xFF, 0x00, 0x00)
84+
85+
86+
@then("the reloaded core_properties.created is tz-aware")
87+
def then_reloaded_created_is_tzaware(context):
88+
buf = io.BytesIO()
89+
context.prs.save(buf)
90+
buf.seek(0)
91+
prs2 = Presentation(buf)
92+
reloaded = prs2.core_properties.created
93+
assert reloaded.tzinfo is not None
94+
assert reloaded == context.target_dt
95+
96+
97+
@then('shapes.by_name("Title 1") returns the title shape')
98+
def then_by_name_returns_title(context):
99+
sh = context.slide.shapes.by_name("Title 1")
100+
assert sh.name == "Title 1"
101+
102+
103+
@then('shapes.by_name("Bogus") raises KeyError')
104+
def then_by_name_raises_keyerror(context):
105+
with pytest.raises(KeyError):
106+
context.slide.shapes.by_name("Bogus")

src/pptx/oxml/coreprops.py

Lines changed: 44 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -210,34 +210,44 @@ def _get_or_add(self, prop_name: str):
210210
return element
211211

212212
@classmethod
213-
def _offset_dt(cls, datetime: dt.datetime, offset_str: str):
214-
"""Return |datetime| instance offset from `datetime` by offset specified in `offset_str`.
215-
216-
`offset_str` is a string like `'-07:00'`.
217-
"""
213+
def _tzinfo_from_offset_str(cls, offset_str: str) -> dt.timezone:
214+
"""Return a :class:`datetime.timezone` parsed from a W3CDTF offset like '-08:00'."""
218215
match = cls._offset_pattern.match(offset_str)
219216
if match is None:
220217
raise ValueError(f"{repr(offset_str)} is not a valid offset string")
221218
sign, hours_str, minutes_str = match.groups()
222-
sign_factor = -1 if sign == "+" else 1
219+
sign_factor = 1 if sign == "+" else -1
223220
hours = int(hours_str) * sign_factor
224221
minutes = int(minutes_str) * sign_factor
225-
td = dt.timedelta(hours=hours, minutes=minutes)
226-
return datetime + td
222+
return dt.timezone(dt.timedelta(hours=hours, minutes=minutes))
227223

228224
_offset_pattern = re.compile(r"([+-])(\d\d):(\d\d)")
229225

230226
@classmethod
231227
def _parse_W3CDTF_to_datetime(cls, w3cdtf_str: str) -> dt.datetime:
232-
# valid W3CDTF date cases:
233-
# yyyy e.g. '2003'
234-
# yyyy-mm e.g. '2003-12'
235-
# yyyy-mm-dd e.g. '2003-12-31'
236-
# UTC timezone e.g. '2003-12-31T10:14:55Z'
237-
# numeric timezone e.g. '2003-12-31T10:14:55-08:00'
228+
"""Parse a W3CDTF string into a :class:`datetime.datetime`.
229+
230+
Returns a tz-aware datetime when the input string carries timezone
231+
information — ``Z`` suffix maps to UTC; numeric offsets like
232+
``-08:00`` map to a fixed-offset :class:`datetime.timezone`. When the
233+
input has no timezone marker (year-only, year-month, year-month-day,
234+
or a bare timestamp), the returned datetime is naive — callers
235+
should not assume any specific timezone for naive inputs.
236+
237+
Closes scanny/python-pptx#957 — the prior implementation always
238+
returned a naive datetime, even for strings that explicitly carried
239+
timezone information.
240+
241+
valid W3CDTF date cases:
242+
- yyyy e.g. '2003'
243+
- yyyy-mm e.g. '2003-12'
244+
- yyyy-mm-dd e.g. '2003-12-31'
245+
- UTC timezone e.g. '2003-12-31T10:14:55Z'
246+
- numeric timezone e.g. '2003-12-31T10:14:55-08:00'
247+
"""
238248
templates = ("%Y-%m-%dT%H:%M:%S", "%Y-%m-%d", "%Y-%m", "%Y")
239-
# strptime isn't smart enough to parse literal timezone offsets like
240-
# '-07:30', so we have to do it ourselves
249+
# ---strptime can't parse literal timezone offsets like '-07:30', so
250+
# ---we strip the offset and add tzinfo ourselves below.
241251
parseable_part = w3cdtf_str[:19]
242252
offset_str = w3cdtf_str[19:]
243253
timestamp = None
@@ -249,15 +259,31 @@ def _parse_W3CDTF_to_datetime(cls, w3cdtf_str: str) -> dt.datetime:
249259
if timestamp is None:
250260
tmpl = "could not parse W3CDTF datetime string '%s'"
251261
raise ValueError(tmpl % w3cdtf_str)
262+
# ---'Z' means UTC---
263+
if offset_str == "Z":
264+
return timestamp.replace(tzinfo=dt.timezone.utc)
265+
# ---numeric offset like '-08:00'---
252266
if len(offset_str) == 6:
253-
return cls._offset_dt(timestamp, offset_str)
267+
return timestamp.replace(tzinfo=cls._tzinfo_from_offset_str(offset_str))
268+
# ---no timezone marker; return naive (don't assume UTC)---
254269
return timestamp
255270

256271
def _set_element_datetime(self, prop_name: str, value: dt.datetime) -> None:
257-
"""Set date/time value of child element having `prop_name` to `value`."""
272+
"""Set date/time value of child element having `prop_name` to `value`.
273+
274+
Accepts both naive and tz-aware datetimes. Tz-aware inputs are
275+
converted to UTC before serialization so the on-disk W3CDTF form
276+
always uses the canonical ``YYYY-MM-DDTHH:MM:SSZ`` shape. Naive
277+
inputs are written as-is (with the trailing ``Z`` suffix indicating
278+
UTC) — callers passing naive datetimes are responsible for the
279+
timezone interpretation.
280+
"""
258281
if not isinstance(value, dt.datetime): # pyright: ignore[reportUnnecessaryIsInstance]
259282
tmpl = "property requires <type 'datetime.datetime'> object, got %s"
260283
raise ValueError(tmpl % type(value))
284+
# ---tz-aware -> normalize to UTC before serializing---
285+
if value.tzinfo is not None:
286+
value = value.astimezone(dt.timezone.utc).replace(tzinfo=None)
261287
element = self._get_or_add(prop_name)
262288
dt_str = value.strftime("%Y-%m-%dT%H:%M:%SZ")
263289
element.text = dt_str

src/pptx/shapes/shapetree.py

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,23 @@ def __len__(self) -> int:
108108
shape_elms = list(self._iter_member_elms())
109109
return len(shape_elms)
110110

111+
def by_name(self, name: str) -> BaseShape:
112+
"""Return the first shape in this collection whose `.name` equals `name`.
113+
114+
Lookup is case-sensitive, matching PowerPoint's own behavior. When
115+
multiple shapes share the same name (uncommon but possible —
116+
PowerPoint does not enforce uniqueness), the first match in
117+
document order is returned. Raises |KeyError| with a clear message
118+
if no match is found.
119+
120+
Closes scanny/python-pptx#798, scanny/python-pptx#309, and
121+
scanny/python-pptx#532.
122+
"""
123+
for shape in self:
124+
if shape.name == name:
125+
return shape
126+
raise KeyError("no shape named %r in this collection" % name)
127+
111128
def clone_placeholder(self, placeholder: LayoutPlaceholder) -> None:
112129
"""Add a new placeholder shape based on `placeholder`."""
113130
sp = placeholder.element

0 commit comments

Comments
 (0)